Making large AI models cheaper, faster and more accessible
COMMITS
/ extensions/pybind/inference/inference.cpp May 14, 2024
S
add paged-attetionv2: support seq length split across thread block (#5707)
Steve Luo committed
May 10, 2024
April 30, 2024
April 25, 2024
S
[Inference/Kernel] Optimize paged attention: Refactor key cache layout (#5643)
Steve Luo committed
April 24, 2024