Commit Graph

  • a8d02735f7 vulkan: Replace deprecated VK_EXT_validation_features (llama/17637) Masato Nakasaka 2025-12-06 14:39:42 +09:00
  • 191e5f46a2 vulkan: Fix mismatch in TOPK_MOE unit test (llama/17541) Masato Nakasaka 2025-12-06 14:23:30 +09:00
  • 64a3f573e0 vulkan: add more num_blocks instantiations in rms_norm (llama/17701) Jeff Bolz 2025-12-05 15:08:56 -06:00
  • 0484147ab2 vulkan: fix top_k bug when there are ties in the input (llama/17659) Jeff Bolz 2025-12-05 15:03:19 -06:00
  • 0b53759b29 vulkan : support conv-2d with large output size (llama/17685) Acly 2025-12-05 21:46:39 +01:00
  • 23984be4da ggml webgpu: unary op suppport, code refactoring, ops support (llama/17764) Reese Levine 2025-12-05 12:25:51 -08:00
  • 7e97d3b069 vulkan: enable mmvq for q2_k on NVIDIA (llama/17675) Jeff Bolz 2025-12-05 14:21:57 -06:00
  • 32ba1ec8e0 vulkan: set all memory allocations to high priority (llama/17624) Jeff Bolz 2025-12-05 14:21:04 -06:00
  • aefcd75f4f rpc : fix alloc size logic (llama/17116) Georgi Gerganov 2025-12-05 19:39:04 +02:00
  • 322903fa67 metal : add residency sets keep-alive heartbeat (llama/17766) Georgi Gerganov 2025-12-05 19:38:54 +02:00
  • 4170159dcd HIP : fix RDNA4 build (llama/17792) Johannes Gäßler 2025-12-05 13:47:52 +01:00
  • d30b744047 Q4/Q8 Tiled Gemm Optimization. (llama/16999) shalinib-ibm 2025-12-05 17:11:51 +05:30
  • 14502d6561 CUDA: fix FA VKQ accumulator overflow (llama/17746) Johannes Gäßler 2025-12-05 09:18:10 +01:00
  • e3f3c6ead1 HIP: enable WMMA-MMQ INT kernels for RDNA 3 (llama/17576) Jiacheng (Jason) Chen 2025-12-05 03:17:37 -05:00
  • 8d44d6181a Add support for CUMSUM and TRI for CUDA. (llama/17584) Piotr Wilkin (ilintar) 2025-12-04 22:19:51 +01:00
  • 8902c9d976 metal: TRI, FILL, EXPM1, SOFTPLUS (llama/16623) Gabe Goodhart 2025-12-04 10:12:19 -07:00
  • f96ebc92d2 ggml-cpu : remove asserts always evaluating to false (llama/17728) Alberto Cabrera Pérez 2025-12-04 12:16:38 +00:00
  • 194d016456 metal : use params per pipeline instance (llama/17739) Georgi Gerganov 2025-12-04 10:34:11 +02:00
  • 92e50155c9 build : move _WIN32_WINNT definition to headers (llama/17736) Adrien Gallouët 2025-12-04 07:04:02 +01:00
  • 3794a0d3b6 ggml-cpu: remove duplicate conditional check 'iid' (llama/17650) Herman Semenoff 2025-12-04 00:03:19 +03:00
  • 7adbcafb6c CUDA: generalized (mma) FA, add Volta support (llama/17505) Johannes Gäßler 2025-12-03 16:57:05 +01:00
  • 4a00f2e3a4 metal : fix data race in pipeline library (llama/17731) Georgi Gerganov 2025-12-03 14:03:40 +02:00
  • d263bdbfb6 ggml webgpu: add support for emscripten builds (llama/17184) Reese Levine 2025-12-03 01:25:34 -08:00
  • 86cb5ab93f vulkan: Reduce temporary memory usage for TOP_K (llama/17623) Jeff Bolz 2025-12-02 12:22:04 -06:00
  • fffdf679d4 cmake : add utf8 compilation options for msvc (llama/17682) xiaobing318 2025-12-03 01:50:57 +08:00
  • 16688c6d2c ggml : use svcntb() for SVE vector length detection (llama/17474) Adrien Gallouët 2025-12-02 17:21:11 +01:00
  • a64d46a529 CANN: Disable Ger operator of OUT_PROD on 310p device (llama/17563) TianHao324 2025-12-02 20:35:23 +08:00
  • 201b910743 ggml : remove redundant n_copies check when setting input/output (llama/17612) Daniel Bevenius 2025-12-02 12:52:45 +01:00
  • e2537b4af3 ggml : add fallback definition for HWCAP2_SVE2 (llama/17683) Adrien Gallouët 2025-12-02 09:41:26 +01:00
  • 4c89232b5c ggml-cuda: reorder only relevant nodes (llama/17639) Aman Gupta 2025-12-02 12:36:31 +08:00
  • 26732d28c4 enhance argsort for UT (llama/17573) Neo Zhang Jianyu 2025-12-02 08:56:46 +08:00
  • 32090930f7 metal : add FA head size 48 (llama/17619) Georgi Gerganov 2025-12-01 12:49:53 +02:00
  • 7cd3de89bf ggml : extend the GGML_SCHED_NO_REALLOC debug logic of the scheduler (llama/17617) Georgi Gerganov 2025-12-01 12:49:33 +02:00
  • 6cc2d0534f llama-graph: avoid expand_forward for fusion (llama/17633) Aman Gupta 2025-12-01 17:12:48 +08:00
  • 0defeee679 model: LFM2-VL fixes (llama/17577) Tarek Dakhran 2025-11-30 21:57:31 +01:00
  • 706647202e ggml: fix: macOS build with -DGGML_BACKEND_DL=ON (llama/17581) Gilad S. 2025-11-30 04:00:59 +02:00
  • e68ee6e281 CUDA: add stream-based concurrency (llama/16991) Aman Gupta 2025-11-30 08:17:55 +08:00
  • 2e4a7a21fa cuda : add error checking for cudaMemcpyAsync in argsort (llama/17599) Mahekk Shaikh 2025-11-29 19:16:28 -05:00
  • 2258930c2e vulkan : fix FA mask load with bounds check (coopmat2) (llama/17606) Acly 2025-11-30 01:03:21 +01:00
  • a3459484bf sycl : support to malloc memory on device more than 4GB, update the doc and script (llama/17566) Neo Zhang 2025-11-29 20:59:44 +08:00
  • 28dff06555 ggml: replace hwcap with riscv_hwprobe for RVV detection (llama/17567) ixgbe 2025-11-29 20:56:31 +08:00
  • 2fcc0a3a9f Vulkan: MMVQ Integer Dot K-Quant and MUL_MAT_ID support (llama/16900) Ruben Ortlam 2025-11-29 09:37:22 +01:00
  • dbf8766ffa vulkan: improve topk perf for large k, fix overflow in unit tests (llama/17582) Jeff Bolz 2025-11-29 01:39:57 -06:00
  • 463003e76c ggml : add GGML_SCHED_NO_REALLOC option to disable reallocations in ggml_backend_sched (llama/17276) Diego Devesa 2025-11-28 07:33:23 -08:00
  • c372bdbb3c enable fp16/fast_fp16/bf16_mma on PH1 (llama/17551) R0CKSTAR 2025-11-28 21:08:29 +08:00
  • 90ca4e0a07 ggml-cuda: add stricter checking for fusion (llama/17568) Aman Gupta 2025-11-28 20:34:51 +08:00
  • 43441ff58a model : Qwen3 Next (llama/16095) Piotr Wilkin (ilintar) 2025-11-28 12:02:56 +01:00
  • 37e4c2ed3a CUDA: no FP16 arithmetic for vector FA kernel (llama/17558) Johannes Gäßler 2025-11-28 10:29:09 +01:00
  • 7a20963140 vulkan: Implement GGML_OP_TRI (llama/17503) Jeff Bolz 2025-11-28 03:07:29 -06:00
  • d26d1c8b85 rpc : cache and reuse compute graphs (llama/15405) Radoslav Gerganov 2025-11-28 10:33:51 +02:00
  • f92d542d4d HIP: enable mul_mat_f for RDNA4 (llama/17437) yulo 2025-11-28 15:24:30 +08:00
  • 51e842d106 SOLVE_TRI CUDA kernel for small matrices (llama/17457) Piotr Wilkin (ilintar) 2025-11-28 05:15:32 +01:00
  • 93bc8dc5a8 refactor pad_reflect_1d to make the UT case pass (llama/17204) Neo Zhang Jianyu 2025-11-28 08:50:56 +08:00
  • 3727a36c48 vulkan: Implement SOLVE_TRI (llama/17486) Jeff Bolz 2025-11-27 08:48:00 -06:00
  • e682af7886 cuda : fix UMA detection on discrete GPUs. (llama/17537) matt23654 2025-11-27 11:35:35 +00:00
  • 93f6cdb9c0 ggml-cpu: aarm64: q4_K repack gemm and gemv implementations (dotprod only) (llama/17494) Alberto Cabrera Pérez 2025-11-27 11:25:14 +00:00
  • ac92424b59 vulkan : move contiguous checks to device_supports_op (llama/17490) Acly 2025-11-27 06:54:19 +01:00
  • 310db24fca vulkan: use a fixed 1KB buffer for the add_rms_fusion opt (llama/17514) Jeff Bolz 2025-11-26 23:32:30 -06:00
  • 74ef5dd1a9 opencl: add sqr, sqrt, mean and ssm_conv (llama/17476) lhez 2025-11-26 13:29:58 -08:00
  • 3de4372465 Fix chunks being too small with small matrix sizes (llama/17526) Alberto Cabrera Pérez 2025-11-26 21:14:54 +00:00
  • c8050e5fdc vulkan: allow graph_optimize for prompt processing workloads (llama/17475) Jeff Bolz 2025-11-26 09:46:33 -06:00
  • d8b61e05f8 vulkan: Implement top-k (llama/17418) Jeff Bolz 2025-11-26 09:45:43 -06:00
  • fb31a19797 ggml-cpu : add RISC-V Zvfh impl for ggml_vec_mad_f16 (llama/17448) xctan 2025-11-26 21:33:05 +08:00
  • 8e3560c7ce ggml : fix ARM feature verification (llama/17519) Adrien Gallouët 2025-11-26 14:14:41 +01:00
  • bb7223da8a HIP: Patch failed testcase in WMMA-MMQ kernels for RDNA 4 (llama/17502) Jiacheng (Jason) Chen 2025-11-26 05:18:48 -05:00
  • f0c54d47e1 CANN: Add MROPE and IMROPE support (llama/17401) hipudding 2025-11-26 16:44:19 +08:00
  • 208450048c vulkan: Implement GGML_OP_CUMSUM (llama/17479) Jeff Bolz 2025-11-26 00:08:10 -06:00
  • 968db8bcfa ggml : add ggml_top_k (llama/17365) Georgi Gerganov 2025-11-25 15:31:43 +02:00
  • e00bb753d6 CANN: supports out_prod operator for F32 and F16 (llama/17406) TianHao324 2025-11-25 17:39:06 +08:00
  • 273e4fe7ae vulkan: Use fewer rows for scalar FA when HS is not a multiple of 16 (llama/17455) Jeff Bolz 2025-11-25 00:11:27 -06:00
  • 553d57a4e7 vulkan: more FA details in vk_perf_logger (llama/17443) Jeff Bolz 2025-11-24 15:25:24 -06:00
  • 371a21865a HIP: WMMA-MMQ kernels for RDNA 4 (llama/17156) Jiacheng (Jason) Chen 2025-11-24 14:00:10 -05:00
  • f4ede89d24 ggml-cpu: arm64: q4_K repack gemm and gemv implementations (i8mm) (llama/16739) Alberto Cabrera Pérez 2025-11-24 11:08:11 +00:00
  • faf37ffe76 ggml: add RISC-V cpu-feats (llama/17461) ixgbe 2025-11-24 19:07:14 +08:00
  • 77d874b1c3 hexagon: add support for ROPE_NEOX (llama/17458) Max Krasnyansky 2025-11-23 18:55:56 -08:00
  • 5ed0ddc458 CANN: Define cann_graph_update_required before macro (llama/17434) Raul Torres 2025-11-24 02:02:52 +00:00
  • 75cea7f8be ggml-hexagon: Initial Hexagon v68/v69 support (llama/17394) M. Mediouni 2025-11-24 01:54:49 +01:00
  • 621cb871b3 ggml-hexagon: add hex_supported_buffer for better buffer supported check (llama/17212) nullname 2025-11-24 06:26:36 +08:00
  • 61e0b7ed48 cuda : support non-contiguous i32 to i32 copy (llama/17326) Sigbjørn Skjæret 2025-11-23 11:13:34 +01:00
  • deb4958add vulkan: remove a couple unnecessary switches (llama/17419) Jeff Bolz 2025-11-22 23:29:40 -06:00
  • fc6eae781d HIP: RDNA4 tensor core support for MMF (llama/17077) yulo 2025-11-22 07:03:24 +08:00
  • 5c0e4a9cc5 opencl: refine condition for kqv mm (llama/17392) lhez 2025-11-21 14:34:48 -08:00
  • cdc1a776be vulkan: disable async for older Intel devices (llama/17369) Jeff Bolz 2025-11-21 02:58:17 -06:00
  • a009dc172c CANN: Refactor evaluate_and_capture_cann_graph (llama/17333) Raul Torres 2025-11-21 08:23:29 +00:00
  • cb3ee1b098 ggml-hexagon: fix swiglu failure at test-backend-ops (llama/17344) nullname 2025-11-21 07:45:05 +08:00
  • 46f893c2fa ggml : Fix transposed SOLVE_TRI result (llama/17323) Piotr Wilkin (ilintar) 2025-11-20 11:58:21 +01:00
  • 510805e6c1 DGX Spark: UMA support (llama/17368) Scott Fudally 2025-11-20 02:32:02 -08:00
  • 2f20938b58 ggml : remove useless and error-prone variadic macros (llama/17399) Adrien Gallouët 2025-11-20 11:18:27 +01:00
  • 51f5438089 kleidiai: fix zero-size array declaration (llama/17240) sudhiarm 2025-11-20 09:45:49 +00:00
  • 1d3a525001 ggml-cpu:add RISC-V RVV (Zvfh) optimization for FP16 vector scaling (llama/17314) ixgbe 2025-11-20 14:09:18 +08:00
  • 24b14cad87 vulkan: implement ADD1, ARANGE, FILL, SOFTPLUS, STEP, ROUND, CEIL, FLOOR, TRUNC (llama/17319) Giuseppe Scrivano 2025-11-19 17:29:45 +01:00
  • 95d0b0b0cf vulkan: support larger argsort (llama/17313) Jeff Bolz 2025-11-19 10:25:50 -06:00
  • ae8865c6e6 vulkan: Add copy_transpose shader (llama/17371) Jeff Bolz 2025-11-19 09:50:43 -06:00
  • 73d396826b cuda: fix rope fusion for gemma3 (llama/17378) Aman Gupta 2025-11-19 18:25:05 +08:00
  • 746cbed20a Fix too relaxed check on CUDA "fast copy" (can_be_transposed) condition (llama/17332) Piotr Wilkin (ilintar) 2025-11-19 10:36:33 +01:00
  • 2097a9c1bd vulkan: force full subgroups for flash attention to fix intel subgroup crash (llama/17356) Ruben Ortlam 2025-11-19 08:46:26 +01:00
  • 27c69271c5 ggml-cpu: Don't pass -mpowerpc64 when -mcpu already implies it (llama/17308) Jeremy Rand 2025-11-19 06:19:00 +00:00
  • c137d11b81 CANN: fix acl_tensor_ptr usage in ASCEND_310P ROPE (llama/17347) Chenguang Li 2025-11-18 16:41:52 +08:00
  • 24b981eff7 vulkan: support noncontig i32 copy (llama/17328) Jeff Bolz 2025-11-18 00:41:24 -06:00
  • b7dfced37f vulkan: add log RTE support to fix Nvidia CI (llama/17320) Ruben Ortlam 2025-11-17 21:37:49 +01:00