COMMITS
March 21, 2026
K
ruby : fix dangling pointers, memory leak, and SEGV on parallel transcription (#3715)
KITAITI Makoto committed
March 19, 2026
G
release : v1.8.4
Georgi Gerganov committed
March 18, 2026
G
ci : update workflows
Georgi Gerganov committed
G
benches : update
Georgi Gerganov committed
G
sync : ggml
Georgi Gerganov committed
March 16, 2026
G
ggml : bump version to 0.9.8 (ggml/1442)
Georgi Gerganov committed
G
ggml : restore ggml_type_sizef() to aboid major version bump (ggml/1441)
Georgi Gerganov committed
A
go : handle EOF correctly in model download (#3671)
Alan committed
A
py : replace deprecated openvino-dev with openvino>=2023.3.0 (#3678)
Aiudadadadf committed
G
examples : Allow max_len to be used for any output format (#3679)
Gaël James committed
I
server: return proper HTTP status codes for error responses (#3707)
Igor Loskutov committed
G
ggml : try fix arm build (#0)
Georgi Gerganov committed
G
talk-llama : sync llama.cpp
Georgi Gerganov committed
G
sync : ggml
Georgi Gerganov committed
March 17, 2026
L
fix: VAD time mapping timestamp drift caused by overlap samples (#3711)
lohopupa committed
March 15, 2026
D
ggml : extend im2col f16 (ggml/1434)
David366AI committed
G
common : add nvfp4 (ggml/0)
Georgi Gerganov committed
J
CUDA: limit number of FA stream-k CUDA blocks (llama/20586)
Johannes Gäßler committed
P
ggml: avoid creating CUDA context during device init (llama/20595)
Pascal committed
M
B
ggml : guard against sumq2 being 0 in IQ4_NL (llama/20460)
Bartowski committed
P
cuda : add RDNA4-specific MMVQ parameter table for bs=1 decode (llama/19478)
PikaPikachu committed
R
vulkan: use graphics queue on AMD (llama/20551)
Ruben Ortlam committed
March 14, 2026
G
metal : add FA specialization for HSK = 320, HSV = 256 (llama/20549)
Georgi Gerganov committed
M
hexagon: Q4_0 and MXFP4 repack fixes (llama/20527)
Max Krasnyansky committed
N
add op gated_delta_net (llama/20455)
Neo Zhang committed
A
ggml : add native AVX512-FP16 support for F16 operations (llama/20529)
Adrien Gallouët committed
W
Z
ggml : add OpenVINO backend (llama/15307)
Zijun Yu committed
R
Fix data race in CUDA's "cpy" kernel (influences GGML's DUP, CONT operations). (llama/20507)
Rail Chabdarov committed