Commit Graph

  • 9e429c47e1 cmake : fix ARM feature verification (llama/17170) Adrien Gallouët 2025-11-17 21:37:29 +01:00
  • bb88c2545f ggml : add missing AVX512 feature checks (llama/17270) Adrien Gallouët 2025-11-17 12:12:00 +01:00
  • 418314941e ggml : remove dirty flag from version string (ggml/1391) Daniel Bevenius 2025-11-24 12:51:50 +01:00
  • 9f5ed26e43 go : Enable VAD for Go bindings (#3563) Josh Montoya 2025-12-10 03:31:36 -08:00
  • a8f45ab11d go : reset context.n in Process() (#3503) Josh Montoya 2025-12-08 08:33:07 -08:00
  • a88b93f85f vad : fix buffer overflow in sample reduction loop (#3558) Joseph Sellers 2025-12-06 11:28:32 +00:00
  • d566358a1d tests : update VAD tests to use Silero V6.2.0 (#3534) Daniel Bevenius 2025-12-06 10:58:58 +01:00
  • 19ceec8eac examples : fix typo in vad-speech-segments command [no ci] (#3535) Daniel Bevenius 2025-11-20 13:35:11 +01:00
  • 40e788a5d1 readme : minor (#3516) gzq 2025-11-20 19:57:55 +08:00
  • 961aec7384 metal : fix compile on macos 11 (#3533) YangLe 2025-11-20 19:54:54 +08:00
  • 7d79ef9fb0 Initial plan copilot/add-duplicate-text-removal copilot-swe-agent[bot] 2025-11-18 10:37:04 +00:00
  • b12abefa9b sync : llama.cpp Georgi Gerganov 2025-11-17 16:31:08 +02:00
  • 0e5deca8e2 sync : ggml Georgi Gerganov 2025-11-17 16:26:39 +02:00
  • 661567357c metal : support I32 -> I32 copy (llama/17317) Georgi Gerganov 2025-11-17 11:52:00 +02:00
  • 74bb8a8b23 metal : faster argsort (llama/17315) Georgi Gerganov 2025-11-17 11:51:48 +02:00
  • 57c0e6f8b6 metal : add cumsum (llama/17305) Georgi Gerganov 2025-11-17 11:51:13 +02:00
  • d3f5487464 CANN: Use smart pointers to manage ACL objects (llama/17238) hipudding 2025-11-17 08:43:59 +08:00
  • 9d95d9a1ee vulkan: add LOG operation support for F32 and F16 (llama/17183) Pavels Zaicenkovs 2025-11-16 22:50:09 +01:00
  • f571655e8e vulkan: fix MMQ quantize_y condition (llama/17301) Ruben Ortlam 2025-11-16 19:38:17 +01:00
  • 9549cc1051 metal : remove obosolete asserts (llama/17295) Georgi Gerganov 2025-11-16 09:50:26 +02:00
  • a75525cad0 opencl: fix rms_norm_mul (llama/17250) lhez 2025-11-15 17:40:14 -08:00
  • c78845bfa9 opencl: add kernel to handle mat mul in attention to improve encoding speed (llama/17181) shaofeiqi 2025-11-15 17:33:10 -08:00
  • 1fd63da9f2 sycl : unify unary kernels with a generic implementation and enable wide operator support (llama/17213) shani-f 2025-11-16 01:52:42 +02:00
  • ea3ebd8b0d vulkan: Fuse mul_mat_id+add_id+mul and mul_mat+add+add. (llama/17287) Jeff Bolz 2025-11-15 12:54:23 -06:00
  • 7caea54450 vulkan: Replace 16-bit unpack8 calls to work around legacy Windows AMD driver bug (llama/17285) Ruben Ortlam 2025-11-15 15:18:58 +01:00
  • 4c4e663da0 vulkan: implement ABS and NEG (llama/17245) Giuseppe Scrivano 2025-11-15 12:00:29 +01:00
  • e1846fc599 vulkan: Use ggml_vk_tensor_subbuffer in mul_mat_vec(id) paths (llama/17244) Jeff Bolz 2025-11-15 04:56:15 -06:00
  • 9614a56314 vulkan: skip all-negative-inf blocks in FA (llama/17186) Jeff Bolz 2025-11-15 03:37:25 -06:00
  • 37d4bba152 vulkan: change graph_compute to be async and enable get_tensor_async (llama/17158) Jeff Bolz 2025-11-15 02:06:41 -06:00
  • 523a6c27ea metal : support argsort for ne00 > 1024 (llama/17247) Georgi Gerganov 2025-11-14 09:36:06 +02:00
  • b4d7df3ba2 metal : make the FA extra sizes consistent (llama/17143) Georgi Gerganov 2025-11-14 09:13:34 +02:00
  • a81fbfc78e ggml-cpu: handle 3d tensors in repack mat_mul (llama/17241) Alberto Cabrera Pérez 2025-11-13 20:53:00 +00:00
  • 3e684f26c1 ggml : add ops SOFTPLUS, EXPM1, TRI, SOLVE_TRI, CUMSUM (llama/17063) Piotr Wilkin (ilintar) 2025-11-13 19:54:47 +01:00
  • e8e0004fe5 vulkan: remove shell call from vulkan-shaders-gen tool, revert file check (llama/17219) Ruben Ortlam 2025-11-13 14:51:21 +01:00
  • 210f0f860b sched : fix reserve ignoring user tensor assignments (llama/17232) Diego Devesa 2025-11-13 04:14:02 -08:00
  • 91fa5b5cac ggml-cpu : add RISC-V vector intrinsic support for silu and cvar operations (llama/17227) ixgbe 2025-11-13 20:13:32 +08:00
  • 265d326fa8 metal: accelerated conv2d (llama/17175) bagheera 2025-11-13 05:32:44 -06:00
  • 6a1d830dfd Revert "ggml-cpu: handle 3d tensors in repack mat_mul (llama/17030)" (llama/17233) Georgi Gerganov 2025-11-13 12:59:37 +02:00
  • 6a91780c3b ggml-cpu : use template for argsort (llama/17222) Diego Devesa 2025-11-13 00:59:05 -08:00
  • 726912d1cb CANN: Add cross_entropy_loss op support (llama/16886) TecJesh 2025-11-13 09:39:51 +08:00
  • 84275fc493 CUDA: fuse rope + set_rows (llama/16884) Aman Gupta 2025-11-13 08:50:01 +08:00
  • 566c4c4469 CUDA: static assert to prevent misuse of memcpy_1 (llama/17198) Johannes Gäßler 2025-11-12 23:13:55 +01:00
  • 3810a6180b ggml : use std::sort in ggml_argsort CPU implementation (llama/17211) Georgi Gerganov 2025-11-12 20:43:38 +02:00
  • 7df8515824 ggml-cpu: handle 3d tensors in repack mat_mul (llama/17030) Alberto Cabrera Pérez 2025-11-12 12:52:19 +00:00
  • e8b66d9f94 CANN: Add L2_NORM op support (llama/16856) TecJesh 2025-11-12 15:11:42 +08:00
  • 8388350c66 fix ci crash about SSM_CONV (llama/17169) Neo Zhang Jianyu 2025-11-12 14:44:29 +08:00
  • 6748d27f55 hexagon: various Op fixes (llama/17135) Max Krasnyansky 2025-11-11 15:25:04 -08:00
  • 559091005a disable rms norm mul rope for chips with no fp16 rte (llama/17134) Eve 2025-11-11 18:53:30 +00:00
  • cd8f64d1b5 ggml-cpu : add RISC-V RVV (Zvfh) optimization for FP16 to FP32 conversion (llama/17161) ixgbe 2025-11-11 19:41:51 +08:00
  • 1cefb03571 ggml-cpu: templateify ggml_compute_forward_rope_f32 and _f16 (llama/16805) duduta 2025-11-11 13:33:24 +02:00
  • 3920ecce3a kleidiai: add optimized per-channel kernels for Q8_0 (llama/16993) Charles Xu 2025-11-11 12:20:31 +01:00
  • c01bf73dd1 cmake : add version to all shared object files (llama/17091) Mike Abbott 2025-11-11 04:19:50 -07:00
  • 46615d74d3 opencl: add fastdiv and use it in set_rows, ported from cuda (llama/17090) lhez 2025-11-10 15:00:13 -08:00
  • ccf525baf0 cpu: skip NOPs to avoid barriers (llama/17133) Max Krasnyansky 2025-11-10 12:44:49 -08:00
  • 40aebfe8bf metal : cap threadgroups size of set_rows (llama/17146) Georgi Gerganov 2025-11-10 21:33:35 +02:00
  • 86be60093e ggml-cpu : inspect -march and -mcpu to found the CPU (llama/16333) Adrien Gallouët 2025-11-10 20:03:36 +01:00
  • ef71d83b76 vulkan: check glslc executable string (llama/17144) Ruben Ortlam 2025-11-10 16:59:26 +01:00
  • 43f2c1ff54 vulkan: fix validation issue introduced by #16868 (llama/17145) Ruben Ortlam 2025-11-10 16:59:10 +01:00
  • bb92c79f56 metal : enable tensor API for A19 (llama/17087) Georgi Gerganov 2025-11-10 15:38:42 +02:00
  • 4fea91f06e arm64: add i8mm route with SVE ggml_vec_dot_q4_K_q8_K and ggml_vec_dot_q6_K_… (#15277) fj-y-saito 2025-11-10 22:12:59 +09:00
  • 58a97d988f cuda/vulkan : bicubic interpolation (llama/17022) Acly 2025-11-10 10:19:39 +01:00
  • 2e04e7a906 vulkan: fix memory allocations (llama/17122) Ruben Ortlam 2025-11-09 16:14:41 +01:00
  • 27f485a14c vad : Silero VAD v6.2.0 (#3524) KITAITI Makoto 2025-11-17 22:26:17 +09:00
  • d9b7613b34 ruby : VAD separately from ASR (#3518) KITAITI Makoto 2025-11-13 10:15:26 +09:00
  • a1867e0dad sync : llama.cpp Georgi Gerganov 2025-11-09 22:01:21 +02:00
  • e67dfbc51b sync : ggml Georgi Gerganov 2025-11-09 18:49:56 +02:00
  • 1993e397bb vulkan: iGPU memory reporting fix (llama/17110) Ruben Ortlam 2025-11-09 09:54:47 +01:00
  • ee8349cf10 vulkan: fix mmq out of bounds reads (llama/17108) Ruben Ortlam 2025-11-09 09:52:57 +01:00
  • db98e8c5b4 vulkan: fuse mul_mat_id + mul (llama/17095) Jeff Bolz 2025-11-09 02:48:42 -06:00
  • a4339e2ea7 metal : retain src and dst buffers during async ops (llama/17101) Georgi Gerganov 2025-11-09 08:28:51 +02:00
  • 6de3404773 vulkan: Use spec constants for conv2d s/d/p and kernel W/H (llama/16978) Jeff Bolz 2025-11-08 13:24:29 -06:00
  • 8967c9ad9b Revert "CUDA: add expert reduce kernel (ggml/16857)" (llama/17100) Aman Gupta 2025-11-08 21:05:19 +08:00
  • 522b9bce33 CUDA: skip fusion for repeating adds in bias (llama/17080) Aman Gupta 2025-11-08 16:58:05 +08:00
  • 0caa32c772 vulkan: Increase BK to 32; use BK/4 for non-CM mul_mm.comp (llama/16636) SavicStefan 2025-11-08 09:28:22 +01:00
  • 3c975ad523 ggml: disable vxe for cross-compilation by default (llama/16966) Aleksei Nikiforov 2025-11-08 09:00:20 +01:00
  • 257ce2f5c0 vulkan: fuse rms_norm + mul + rope (+ view + set_rows) (llama/16977) Jeff Bolz 2025-11-08 01:52:15 -06:00
  • 4eef518167 vulkan: Fix test-thread-safety crashes (llama/17024) Jeff Bolz 2025-11-08 01:39:45 -06:00
  • 358f77aca7 CUDA: fix MMQ stream-k fixup ne1 indices (llama/17089) Johannes Gäßler 2025-11-08 08:26:18 +01:00
  • 78ea6c5b67 ggml webgpu: faster matrix multiplication/matrix-vector multiplication (llama/17031) Reese Levine 2025-11-07 19:27:20 -08:00
  • 547724b0a5 CUDA: properly handle nb00=nb02 case for cpy (llama/17081) bssrdf 2025-11-07 17:41:58 -05:00
  • 11543bf446 vulkan : refactor buffer handling in vk_op_f32 (llama/16840) Acly 2025-11-07 21:08:50 +01:00
  • af8a88792f CUDA: fix should_use_mmvf for ne11 == 1 (llama/17085) Johannes Gäßler 2025-11-07 20:53:14 +01:00
  • a1746097bc Revert "ggml-cpu: detect correct cpu flags for arm64 (llama/16229) (#16239)" (llama/17084) Adrien Gallouët 2025-11-07 17:34:05 +01:00
  • 512592513c ggml-cpu: detect correct cpu flags for arm64 (ggml/16229) (llama/16239) iron 2025-11-08 00:18:14 +08:00
  • 5bce732795 ggml-cpu : optimize RVV q2_k and q3_k kernels (llama/16887) xctan 2025-11-07 00:12:45 +08:00
  • b5d6fa438f CUDA: fix crash on uneven context without FA (llama/16988) Johannes Gäßler 2025-11-06 14:05:47 +01:00
  • 32ed574370 metal : initial Metal4 tensor API support (llama/16634) Georgi Gerganov 2025-11-06 14:45:10 +02:00
  • 45588b272e sycl: add CONCAT operator support (llama/16047) YehuditE 2025-11-06 12:02:33 +02:00
  • b3324ae7d1 ggml-hexagon: graceful fallback for older socs where rpcmem_alloc2 and FASTRPC_GET_URI is unsupported (llama/16987) l3utterfly 2025-11-06 13:46:38 +08:00
  • 13cd906501 improve CUDA cpy memory bandwidth when copying transposed tensor (llama/16841) bssrdf 2025-11-05 15:55:04 -05:00
  • 558a04c9c7 vulkan: Fix GGML_VULKAN_CHECK_RESULTS to better handle fusion (llama/16919) Jeff Bolz 2025-11-05 12:51:03 -06:00
  • e734b5d6ef ggml webgpu: minor set rows optimization (llama/16810) Reese Levine 2025-11-09 14:44:39 +02:00
  • 44e77ccee6 refactor: replace sprintf with snprintf for safer string handling in dump functions (llama/16913) nullname 2025-11-05 04:25:39 +08:00
  • 1672d41ab0 vulkan: remove the need for the dryrun (llama/16826) Jeff Bolz 2025-11-04 13:28:17 -06:00
  • 997fdde0c4 ggml-cpu : bicubic interpolation (llama/16891) Acly 2025-11-04 13:12:20 +01:00
  • 52e43a2fa5 Fix garbled output with REPACK at high thread counts (llama/16956) Noah 2025-11-04 05:04:59 +00:00
  • e51a2f90fe CUDA: avoid mul + bias fusion when doing fusion (llama/16935) Aman Gupta 2025-11-04 10:53:48 +08:00
  • f856023f46 opencl: support imrope (llama/16914) lhez 2025-11-03 11:47:57 -08:00
  • 82ede64cd0 ggml: CUDA: add head size 72 for flash-attn (llama/16962) theo77186 2025-11-03 14:29:11 +01:00
  • 79801188f7 ggml : LoongArch fixes (llama/16958) Jinyang He 2025-11-03 14:40:02 +08:00