A high-throughput and memory-efficient inference and serving engine for LLMs
COMMITS
20
in the last week
CONTRIBUTORS
19
active
STARS
0
total
FORKS
0
total
TOP CONTRIBUTORS
C
Cyrus Leung W
Wentao Ye M
Mateusz Sokół K
Kunshang Ji V
Vadim Gimpelson F
Fadi Arafeh T
Terry Gao J
Jee Jee Li M
Matej Rojec B
BadrBasowid RECENT COMMITS
W
[Refactor] Remove unused utils (#38153)
Wentao Ye
M
DOC: Documentation pages fixes (#38125)
Mateusz Sokół
K
[XPU] Disable xpu graph by default (#38193)
Kunshang Ji
C
V
W
Relocate Encoder CUDA graph manager (#38116)
Woosuk Kwon
C
A
H
Various Transformers v5 fixes (#38127)
Harry Mellor
E
[Cohere] Enable Cohere-Transcribe (#38120)
Ekagra Ranjan