Commit Graph

  • ca3f6bbd3c cuda: cap grid.y at 65535 in non-contiguous dequantize/convert kernels (llama/19999) oobabooga 2026-03-01 02:40:22 -03:00
  • 699eaf3a10 CUDA: add CDNA3 MFMA support for flash attention MMA kernel (llama/19806) Jayant Lohia 2026-02-28 00:07:26 +05:30
  • b524b5a1f0 ggml-cpu: add repack for mxfp4 (llama/19738) Aman Gupta 2026-02-27 18:15:09 +08:00
  • 30c5194c96 ruby : null-check (#3689) KITAITI Makoto 2026-03-05 14:36:42 +09:00
  • 9453b4b9be gguf : sync (ggml/0) Georgi Gerganov 2026-02-27 12:24:59 +02:00
  • aaf8bdf3b8 scripts : sync gguf Georgi Gerganov 2026-02-27 12:24:33 +02:00
  • 84f8db71d8 talk-llama : sync llama.cpp Georgi Gerganov 2026-02-27 12:23:40 +02:00
  • 4734056067 sync : ggml Georgi Gerganov 2026-02-27 12:19:27 +02:00
  • 64f48603e6 replace the magic nunber 768 by max work group size to support iGPU (llama/19920) Neo Zhang 2026-02-27 09:26:07 +08:00
  • 9c1fd5cc6e ggml-zendnn: update code for latest ZenDNN API (llama/19923) Vishal Singh 2026-02-27 06:13:41 +05:30
  • 316d921c1a ggml : fix AMX and add batched support (llama/19925) Adrien Gallouët 2026-02-26 21:39:11 +01:00
  • e722ee1bf5 vulkan: fix fp16 Flash Attention on Windows AMD RDNA2 and below (llama/19921) Ruben Ortlam 2026-02-26 19:11:04 +01:00
  • f877e1b202 ggml-virtgpu: improve the reliability of the code (llama/19846) Kevin Pouget 2026-02-26 13:00:57 +01:00
  • 4cac408c60 support permuted, remove check s0/s10 (llama/19889) Neo Zhang 2026-02-26 10:27:20 +08:00
  • fb55b2654b vulkan: check for memory overlap before doing fusion (llama/19768) Jeff Bolz 2026-02-25 11:25:38 -06:00
  • 279be33a83 ggml/gguf : prevent integer overflows (llama/19856) Georgi Gerganov 2026-02-24 20:17:11 +02:00
  • 90800b5aa5 Vulkan Scalar Flash Attention Refactor (llama/19625) Ruben Ortlam 2026-02-24 08:35:48 +01:00
  • dcc877688d vulkan: fix coopmat1 without bf16 support (llama/19793) Jeff Bolz 2026-02-24 00:48:32 -06:00
  • 344eae3d22 vulkan: fix data race in mul_mat_id shader (llama/19790) Jeff Bolz 2026-02-24 00:43:12 -06:00
  • 53b571a47e hexagon refactor all Ops to use local context struct (llama/19819) Max Krasnyansky 2026-02-23 16:32:14 -08:00
  • 06fbd9c5f2 ggml-cpu: arm64: q5_K repack gemm and gemv (and generic) implementations (dotprod) (llama/19356) Alberto Cabrera Pérez 2026-02-23 12:42:52 +00:00
  • 98915f889a Improve CUDA graph capture (llama/19754) Gaurav Garg 2026-02-21 15:09:36 +05:30
  • 0c10a15447 ggml-cpu: add RVV vec dot kernels for quantization types (llama/18784) Taimur Ahmad 2026-02-20 16:30:07 +05:00
  • 0158795ebc ggml-webgpu: Add unary op (SQR, SQRT, SIN, COS) support. (llama/19700) Masashi Yoshimura 2026-02-20 01:18:30 +09:00
  • 3f68f30907 vulkan: fix MMQ shader push constants and multi-dispatch (llama/19732) Ruben Ortlam 2026-02-19 14:59:16 +01:00
  • ade724fced CUDA: fix kernel selection logic for tile FA (llama/19686) Johannes Gäßler 2026-02-19 12:42:58 +01:00
  • cc9e5cf89d llamafile: powerpc: add FP16 MMA path for Q4/Q8 matmul (llama/19709) shalinib-ibm 2026-02-19 11:58:53 +05:30
  • 8b3a52ba87 ggml webgpu: Fix bug in dispatching large matrix-vector multiplication (llama/19535) Reese Levine 2026-02-18 16:06:29 -07:00
  • fc7a78f4d8 ggml webgpu: shader library organization (llama/19530) Reese Levine 2026-02-25 09:33:32 +02:00
  • f1da0a26f5 vulkan: split mul_mat into multiple dispatches to avoid overflow (llama/19509) Jeff Bolz 2026-02-18 01:47:10 -08:00
  • 51ce7de94c opencl: refactor expm1 and softplus (llama/19404) shaofeiqi 2026-02-17 14:47:18 -08:00
  • 6fadc749a9 opencl: optimize mean and sum_row kernels (llama/19614) shaofeiqi 2026-02-17 13:56:09 -08:00
  • 58855d08c2 ggml: ggml-cpu: force-no-lto-for-cpu-feats (llama/19609) Talha Can Havadar 2026-02-17 12:22:46 +01:00
  • cf4bd07028 cuda : enable CUDA graphs for MMID 1 <= BS <= 4 (llama/19645) Georgi Gerganov 2026-02-17 12:31:49 +02:00
  • 5ee5748722 ggml : make ggml_is_view as API (llama/19539) Judd 2026-02-16 23:43:34 +08:00
  • 5d9d72ec12 Adjust workaround for ROCWMMA_FATTN/GFX9 to only newer ROCm veresions (llama/19591) Mario Limonciello 2026-02-16 07:46:08 -06:00
  • f8f7c1d891 ggml: aarch64: Implement SVE in Gemm q4_k 8x8 q8_k Kernel (llama/19132) abhijain1204fujitsu 2026-02-16 12:08:43 +05:30
  • 02a9f660b8 cuda: optimize iq2xxs/iq2xs/iq3xxs dequantization (llama/19624) David Friehs 2026-02-15 18:08:42 +01:00
  • df2f8d3bc4 cmake : check if KleidiAI API has been fetched (llama/19640) Daniel Bevenius 2026-02-15 13:59:38 +01:00
  • 22f0861efc ggml : avoid UB in gemm ukernel (llama/19642) Georgi Gerganov 2026-02-15 14:56:35 +02:00
  • 7b5a1ebaa6 ggml-cpu: optimize ggml_vec_dot_bf16 for s390x (llama/19399) Aaron Teo 2026-02-15 18:20:35 +08:00
  • 76f769d06f ggml-cpu: FA add GEMM microkernel (llama/19422) Aman Gupta 2026-02-15 11:09:24 +05:30
  • 7ee772ab2b cmake : fix KleidiAI install target failure with EXCLUDE_FROM_ALL (llama/19581) SamareshSingh 2026-02-14 23:22:53 -06:00
  • 4bea3cd329 ggml : bump version to 0.9.7 (ggml/1425) Georgi Gerganov 2026-02-15 22:21:04 +02:00
  • cec1dd9d12 examples : update miniaudio library to 0.11.24 (#3672) Dmitry Atamanov 2026-02-27 15:15:15 +05:00
  • 21411d81ea docs : fix duplicate word typo in VAD section (#3670) Maxime Grenu 2026-02-19 16:18:42 +01:00
  • 364c77f4ca talk-llama : sync llama.cpp Georgi Gerganov 2026-02-15 19:43:28 +02:00
  • 83f2ed19e1 sync : ggml Georgi Gerganov 2026-02-15 19:42:09 +02:00
  • 4ac70ce791 models : optimize qwen3next graph (llama/19375) Georgi Gerganov 2026-02-14 12:57:36 +02:00
  • 226e8c041c ggml : fix GGML_DEBUG with OpenMP (llama/19599) Adrien Gallouët 2026-02-14 11:22:57 +01:00
  • fbdac5119c metal : fix ACC op (llama/19427) Georgi Gerganov 2026-02-14 09:54:03 +02:00
  • cc448def01 vulkan: support L2_NORM with contiguous rows (llama/19604) Jeff Bolz 2026-02-13 21:42:04 -08:00
  • 197e9ab6eb vulkan: support GGML_OP_SET (llama/19584) Jeff Bolz 2026-02-13 21:36:38 -08:00
  • fc6bbab817 vulkan: Add vendor id for Qualcomm drivers (llama/19569) Sophon 2026-02-14 13:29:17 +08:00
  • e6476d4c12 hexagon: further optimizations and refactoring for flash attention (llama/19583) Max Krasnyansky 2026-02-13 16:27:30 -08:00
  • ec57bf407c vulkan: restore -inf check in FA shaders (llama/19582) Jeff Bolz 2026-02-13 11:35:29 -08:00
  • e8a25654b2 Fix wrong memcpy length for block_interleave == 4 (llama/19575) Alberto Cabrera Pérez 2026-02-13 12:32:14 +00:00
  • 628b545b7e fix vulkan ggml_acc only works in 3d but not 4d (llama/19426) ymcki 2026-02-13 20:31:37 +08:00
  • 58e3d5a42d CUDA: loop over ne2*ne3 in case it overflows (llama/19538) Aman Gupta 2026-02-13 17:01:40 +05:30
  • 3eb4905af1 CUDA: Do not mutate cgraph for fused ADDs (llama/19566) Oliver Simons 2026-02-13 10:37:55 +01:00
  • 0e94faa19c metal : improve concurrency (llama/19555) Georgi Gerganov 2026-02-13 07:35:57 +02:00
  • c5325e50fc metal : support GGML_OP_SET (llama/19548) Georgi Gerganov 2026-02-13 07:34:52 +02:00
  • 195af60a8b hexagon: fix typo in vtcm_needs_release (llama/19545) Shupei Fan 2026-02-13 07:07:49 +08:00
  • 9f87eeccdf opencl: add basic support for q4_1 (llama/19534) lhez 2026-02-12 14:52:37 -08:00
  • d8e3e2ef08 metal : update sum_rows kernel to support float4 (llama/19524) Georgi Gerganov 2026-02-12 11:35:28 +02:00
  • 39b5f414a3 Add a workaround for compilation with ROCWMMA_FATTN and gfx9 (llama/19461) Mario Limonciello 2026-02-12 02:38:35 -06:00
  • 304205679c hexagon: further optimization and tuning of matmul and dot kernels (llama/19407) Max Krasnyansky 2026-02-11 23:04:27 -08:00
  • 0326fd37dd opencl: add general Q6_K mm and Q4_K mv (llama/19347) lhez 2026-02-11 10:33:13 -08:00
  • f3e78985be ggml : unary ops support non-cont src0 + metal F16 unary ops (llama/19511) Georgi Gerganov 2026-02-11 18:58:43 +02:00
  • 3ffa1fd84e metal : extend l2_norm support for non-cont src0 (llama/19502) Georgi Gerganov 2026-02-11 14:53:19 +02:00
  • 09587ceb12 hexagon: Add ARGSORT, DIV, SQR, SQRT, SUM_ROWS, GEGLU (llama/19406) Max Krasnyansky 2026-02-10 23:21:12 -08:00
  • 3504358056 ggml : extend bin bcast for permuted src1 (llama/19484) Georgi Gerganov 2026-02-11 07:52:00 +02:00
  • de949fb1db metal : consolidate unary ops (llama/19490) Georgi Gerganov 2026-02-11 07:51:12 +02:00
  • 57c620b4b1 CUDA : Update CCCL-tag for 3.2 to final release from RC (llama/19486) Oliver Simons 2026-02-10 22:31:19 +01:00
  • 562255fd77 Plug memory leaks and free resources on shutdown (llama/19315) Nikhil Jain 2026-02-10 08:04:00 -08:00
  • d77265c818 ggml-cpu: arm64: q6_K repack gemm and gemv (and generic) implementations (dotprod) (llama/19360) Alberto Cabrera Pérez 2026-02-10 10:47:45 +00:00
  • b0fe2e84fa ggml : use noexcept overload for is_regular_file in backend registration (llama/19452) k4ss4n 2026-02-10 10:57:48 +01:00
  • 2de2fc9270 CANN: Remove unnecessary wrapper for gml_backend_buft_is_cann (llama/18968) Raul Torres 2026-02-10 06:19:30 +00:00
  • 6a74f56212 CANN: implement quantized MUL_MAT_ID for MoE models (llama/19228) hipudding 2026-02-10 14:18:59 +08:00
  • a36210c836 cuda : extend GGML_OP_PAD to work with non-cont src0 (llama/19429) Georgi Gerganov 2026-02-10 08:07:16 +02:00
  • 808904277e CUDA: Fix non-contig rope (llama/19338) Oliver Simons 2026-02-08 14:12:51 +01:00
  • 764482c317 ci: add vulkan docker image (#3644) Nuno 2026-02-09 11:33:06 +01:00
  • 052066c4f7 chore: Update outdated GitHub Actions versions (#3646) Pádraic Slattery 2026-02-09 11:32:46 +01:00
  • 525be69a66 cmake: Drop obsolete build-time configuration of backends (#3649) Christian Kastner 2026-02-09 11:32:18 +01:00
  • eb27fa2252 server : fix hardcoded /inference path in default HTML page (#3639) Sid Mohan 2026-02-09 00:10:13 -08:00
  • 193f7cdaaf ci : try fix mirrors (#3655) Georgi Gerganov 2026-02-09 09:59:22 +02:00
  • 4b23ff249e talk-llama : sync llama.cpp Georgi Gerganov 2026-02-07 10:39:43 +02:00
  • b0e81c1a2e sync : ggml Georgi Gerganov 2026-02-07 10:38:22 +02:00
  • 55d7cb2e93 metal : consolidate bin kernels (llama/19390) Georgi Gerganov 2026-02-07 10:35:56 +02:00
  • a9a0a51fba metal : fix event synchronization in cpy_tensor_async (llama/19402) Georgi Gerganov 2026-02-07 07:37:15 +02:00
  • 1739af663a ggml-webgpu: JIT compile binary operators and handle binding overlaps (llama/19310) Abhijit Ramesh 2026-02-06 10:33:30 -08:00
  • f2f7320817 sycl: add F16 support for GGML_OP_CEIL (llama/19306) Nechama Krashinski 2026-02-06 17:13:44 +02:00
  • cea22b3075 vulkan: For coopmat2 FA, use fp16 accumulators for the final result (llama/19376) Jeff Bolz 2026-02-06 02:15:13 -06:00
  • c1b63354bb vulkan: make FA mask/softcap enables spec constants (llama/19309) Jeff Bolz 2026-02-06 01:49:58 -06:00
  • 776cf61857 metal : skip loading all-zero mask (llama/19337) Georgi Gerganov 2026-02-06 09:25:11 +02:00
  • 2a7d5490f1 cuda : cuda graphs now compare all node params (llama/19383) Georgi Gerganov 2026-02-06 07:55:06 +02:00
  • 34d332aca5 metal : adaptive CPU/GPU interleave based on number of nodes (llama/19369) Georgi Gerganov 2026-02-05 19:07:22 +02:00
  • a567c140a3 vulkan: Preprocess FA mask to detect all-neg-inf and all-zero. (llama/19281) Jeff Bolz 2026-02-05 09:26:38 -06:00
  • 0781df2518 metal : add diag (llama/19330) Georgi Gerganov 2026-02-05 10:08:45 +02:00
  • 932def3198 vulkan: fix GPU deduplication logic. (llama/19222) Oleksandr Kuvshynov 2026-02-05 03:06:59 -05:00