Commit Graph

  • 2f33395197 ggml-hexagon: gelu optimization (llama/18151) Shouyu 2025-12-22 13:56:52 -05:00
  • 5b0c1c1580 llamafile: add rvv support for sgemm kernels (llama/18199) Taimur Ahmad 2025-12-22 23:20:23 +05:00
  • f2fe1e5baf opencl: unpack q4_0 for adreno in get_tensor (llama/18278) lhez 2025-12-22 10:19:01 -08:00
  • dbbe6c11b5 vulkan: Extend rope fusions to allow mrope (llama/18264) Jeff Bolz 2025-12-22 11:03:13 -06:00
  • 98e59a43d1 vulkan: Implement set_tensor_async and the event interfaces (llama/18047) Jeff Bolz 2025-12-21 14:52:09 -06:00
  • b68b12f2d5 llama: fix RPC for -fit on (llama/18233) Johannes Gäßler 2025-12-21 19:33:08 +01:00
  • b893e0813a vulkan: fix im2col overflowing maxworkgroupcount (llama/18180) Jeff Bolz 2025-12-21 03:32:58 -06:00
  • f407c5e562 vulkan/cuda: fix topk_moe with exp_probs_b (llama/18071) Jeff Bolz 2025-12-21 03:27:34 -06:00
  • ad6ee3865d vulkan: support GGML_UNARY_OP_XIELU (llama/18062) Jeff Bolz 2025-12-21 03:17:58 -06:00
  • 3cd141f1a9 vulkan: in graph_optimize, try to group ADD operations (llama/18060) Jeff Bolz 2025-12-21 03:05:08 -06:00
  • 449fc7c024 Vulkan: some improvement on mul_mat_iq2_xs (llama/18031) lovedheart 2025-12-21 09:59:52 +01:00
  • 0983985f06 Added comments explaining thread block size selection logic based on row count and column size, derived from historical commit context (llama/18212) Aadeshveer Singh 2025-12-20 16:58:57 +05:30
  • 17a4cb15b8 ggml-hexagon: Implement true Q8_0 quantization on Hexagon NPU for more accurate mixed-precision matmul operations (llama/17977) Alfred 2025-12-19 12:42:28 -05:00
  • 195d8d0c65 vulkan: Add perf logger mode with concurrency (llama/17944) Jeff Bolz 2025-12-18 23:36:46 -06:00
  • fea481f412 model : add ASR support for LFM2-Audio-1.5B (conformer) (llama/18106) Xuan-Son Nguyen 2025-12-19 00:18:01 +01:00
  • 956fac433b ggml-cpu: extend support for RVV floating-point kernels (llama/17318) Taimur Ahmad 2025-12-18 19:02:09 +05:00
  • 325a9b739c remove i_major_dual (llama/18157) yulo 2025-12-18 19:50:56 +08:00
  • c3a16089e3 ggml-hexagon: swiglu_oai operation (llama/18114) Shouyu 2025-12-17 16:38:21 -05:00
  • c7ccedb5ba ggml-hexagon: gelu operation (llama/17921) Shouyu 2025-12-17 13:39:32 -05:00
  • 1f72f00542 ggml-cpu: ARM64: repack version of q8_0 (dotprod and i8mm) (llama/18096) Alberto Cabrera Pérez 2025-12-17 11:39:13 +00:00
  • 9118c05dc4 HIP: Refactor mma for RDNA and CDNA (llama/17990) yulo 2025-12-17 16:34:54 +08:00
  • 6114e69213 ruby : add Whisper::Token, fix model URI (#3575) KITAITI Makoto 2025-12-24 16:52:16 +09:00
  • 6c22e792cb talk-llama : sync llama.cpp Georgi Gerganov 2025-12-17 15:20:22 +02:00
  • 698348aadc sync : ggml Georgi Gerganov 2025-12-17 15:19:57 +02:00
  • 00108bb713 llama.android : Rewrite Android binding (w/o cpu_features dep) (llama/17413) Naco Siren 2025-12-17 00:14:47 -08:00
  • 41a95b8ba7 ggml : use WARP_SIZE/2 for argmax reduction offset (llama/18092) Aadeshveer Singh 2025-12-17 09:17:01 +05:30
  • 8dd70bdc85 ggml-hexagon: mm for mtmd (llama/17894) Shouyu 2025-12-15 13:53:56 -05:00
  • b90ec07aba metal: use shared buffers on eGPU (llama/17866) Jeremy Demeule 2025-12-15 15:14:49 +01:00
  • aaf3f39b4a llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization (llama/16653) Johannes Gäßler 2025-12-15 09:24:59 +01:00
  • b5e352a52f Support gpt-oss by OPs add-id, mul_mat for mxfp4, swiglu_oai (llama/17826) Neo Zhang Jianyu 2025-12-15 10:35:15 +08:00
  • 3bb4e1e0ac vulkan: fix mul_mat_vec_iq1_s formatting (llama/18026) Ruben Ortlam 2025-12-14 14:52:46 +01:00
  • af2c8cba6f vulkan: Fix data race/hang in scalar/cm1 flash attention (llama/17887) Jeff Bolz 2025-12-14 02:00:00 -06:00
  • 7e5df2975e vulkan: improve mul_mat_vec_iq1_s speed (llama/17874) lovedheart 2025-12-14 08:47:49 +01:00
  • cdadfc3b72 vulkan: faster q6_k matmul (llama/17813) Eve 2025-12-14 07:29:37 +00:00
  • b62ef9af7a ggml : arm repack fix build (llama/0) Georgi Gerganov 2025-12-13 22:54:14 +02:00
  • b901ebe4a3 vulkan: support get_rows for i32 (llama/17941) Jeff Bolz 2025-12-13 03:12:53 -06:00
  • f33446643e vulkan: support GGML_OP_DIAG (llama/17893) Jeff Bolz 2025-12-13 03:07:49 -06:00
  • 939d3085e9 vulkan: Multi-pass softmax for large number of cols (llama/17892) Jeff Bolz 2025-12-13 03:04:29 -06:00
  • 13bb296dbf vulkan: Allow non-pow2 n_experts in topk_moe (llama/17872) Jeff Bolz 2025-12-13 01:40:04 -06:00
  • feb856d4a1 CUDA: fix overflow in MMA kernel without stream-k (llama/17939) Johannes Gäßler 2025-12-12 17:43:58 +01:00
  • db1fcd958f cann : fix ops broken by circular padding guard (llama/17825) Sigbjørn Skjæret 2025-12-12 15:49:27 +01:00
  • 2c782ec325 ggml-cpu : fix RISC-V Q4_0 repack select and RVV feature reporting (llama/17951) ixgbe 2025-12-12 22:26:03 +08:00
  • 25d99e9135 HIP: enable mmf for RDNA3 (llama/17879) yulo 2025-12-12 18:34:33 +08:00
  • e0af519a61 SOLVE_TRI extension to more dimensions (llama/17793) Piotr Wilkin (ilintar) 2025-12-11 17:20:43 +01:00
  • fcaa8f8a7b talk-llama : sync llama.cpp sync-ggml-25-12-17 Georgi Gerganov 2025-12-17 15:20:22 +02:00
  • 82ee376b93 sync : ggml Georgi Gerganov 2025-12-17 15:19:57 +02:00
  • 7e9b6ebb47 llama.android : Rewrite Android binding (w/o cpu_features dep) (llama/17413) Naco Siren 2025-12-17 00:14:47 -08:00
  • 6829ed7187 ggml : use WARP_SIZE/2 for argmax reduction offset (llama/18092) Aadeshveer Singh 2025-12-17 09:17:01 +05:30
  • 5cac224741 ggml-hexagon: mm for mtmd (llama/17894) Shouyu 2025-12-15 13:53:56 -05:00
  • 90e623c297 metal: use shared buffers on eGPU (llama/17866) Jeremy Demeule 2025-12-15 15:14:49 +01:00
  • 542f01ec8b llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization (llama/16653) Johannes Gäßler 2025-12-15 09:24:59 +01:00
  • 6f0e8fecb2 Support gpt-oss by OPs add-id, mul_mat for mxfp4, swiglu_oai (llama/17826) Neo Zhang Jianyu 2025-12-15 10:35:15 +08:00
  • 631c3b3955 vulkan: fix mul_mat_vec_iq1_s formatting (llama/18026) Ruben Ortlam 2025-12-14 14:52:46 +01:00
  • 72c24f6e01 vulkan: Fix data race/hang in scalar/cm1 flash attention (llama/17887) Jeff Bolz 2025-12-14 02:00:00 -06:00
  • a2380a3864 vulkan: improve mul_mat_vec_iq1_s speed (llama/17874) lovedheart 2025-12-14 08:47:49 +01:00
  • 9a211ac983 vulkan: faster q6_k matmul (llama/17813) Eve 2025-12-14 07:29:37 +00:00
  • e500fa6ce4 ggml : arm repack fix build (llama/0) Georgi Gerganov 2025-12-13 22:54:14 +02:00
  • c62c6104cd vulkan: support get_rows for i32 (llama/17941) Jeff Bolz 2025-12-13 03:12:53 -06:00
  • 5cb35693b4 vulkan: support GGML_OP_DIAG (llama/17893) Jeff Bolz 2025-12-13 03:07:49 -06:00
  • 96fa63886c vulkan: Multi-pass softmax for large number of cols (llama/17892) Jeff Bolz 2025-12-13 03:04:29 -06:00
  • 4de5699291 vulkan: Allow non-pow2 n_experts in topk_moe (llama/17872) Jeff Bolz 2025-12-13 01:40:04 -06:00
  • 8d05cf479c CUDA: fix overflow in MMA kernel without stream-k (llama/17939) Johannes Gäßler 2025-12-12 17:43:58 +01:00
  • 240cb2db53 cann : fix ops broken by circular padding guard (llama/17825) Sigbjørn Skjæret 2025-12-12 15:49:27 +01:00
  • a21115d3ce ggml-cpu : fix RISC-V Q4_0 repack select and RVV feature reporting (llama/17951) ixgbe 2025-12-12 22:26:03 +08:00
  • 951f8a978f HIP: enable mmf for RDNA3 (llama/17879) yulo 2025-12-12 18:34:33 +08:00
  • 43be89ef7c SOLVE_TRI extension to more dimensions (llama/17793) Piotr Wilkin (ilintar) 2025-12-11 17:20:43 +01:00
  • 3e79e73eee build: link whisper target against Threads::Threads for FreeBSD support (#3568) Russ 2025-12-17 09:13:38 +00:00
  • 2551e4ce98 server: allow custom temp directory for ffmpeg (#3564) Marcos Del Sol Vives 2025-12-13 08:37:44 +01:00
  • f0c9017a2f ggml : arm repack fix build (#0) sync-ggml-25-12-12 Georgi Gerganov 2025-12-13 08:04:09 +02:00
  • 179d8b1c9c talk-llama : sync llama.cpp Georgi Gerganov 2025-12-12 17:56:43 +02:00
  • 48cdc06e91 sync : ggml Georgi Gerganov 2025-12-12 17:55:11 +02:00
  • 72714d169c whisper : adjust to ggml changes (#0) Georgi Gerganov 2025-12-12 17:54:58 +02:00
  • 324dd21d3c cmake : set CMAKE_RUNTIME_OUTPUT_DIRECTORY for non standalone build (ggml/1394) Congcong Cai 2025-12-12 22:37:38 +08:00
  • 1da1a6865c ggml-alloc : fix reuse-parent logic for misaligned sizes (llama/17884) Georgi Gerganov 2025-12-11 14:30:10 +02:00
  • 0c88de5c69 ggml-hexagon: fix rope failure at test-backend-ops (llama/17565) nullname 2025-12-11 06:45:43 +08:00
  • a2886fba48 Fix race conditions in threadpool when dealing with dynamic/frequent n_threads changes (llama/17748) Max Krasnyansky 2025-12-10 12:32:23 -08:00
  • cd9b8c6d18 ggml : remove GGML_KQ_MASK_PAD constant (llama/17910) Georgi Gerganov 2025-12-10 20:53:16 +02:00
  • ca8ea18d06 cuda : add missing support check for xielu (llama/17895) Sigbjørn Skjæret 2025-12-10 16:16:20 +01:00
  • ea1829134f CUDA: fix unpadded strides in MMA FA kernel (llama/17891) Johannes Gäßler 2025-12-10 12:39:56 +01:00
  • c10b4f9a01 fix softmax for iGPU (llama/17838) Neo Zhang Jianyu 2025-12-10 16:59:57 +08:00
  • 307dc525bb metal: SSM kernel improvements (llama/17876) Gabe Goodhart 2025-12-09 12:30:02 -07:00
  • 2817582be2 Add DIAG for CUDA (llama/17873) Piotr Wilkin (ilintar) 2025-12-09 20:28:57 +01:00
  • 41bbc034f0 ggml : Provide macos-specific backtrace printing to avoid terminal death (llama/17869) Gabe Goodhart 2025-12-09 09:29:07 -07:00
  • b6ae0b29d1 metal : print node names for debugging (llama/17882) Georgi Gerganov 2025-12-09 15:25:49 +02:00
  • ba463fb577 ggml : allow fill node alloc inplace (llama/17870) Sigbjørn Skjæret 2025-12-09 12:23:47 +01:00
  • 79d86a5c2c CANN: add support for partial RoPE and Vision mode (llama/17543) Chenguang Li 2025-12-09 17:53:23 +08:00
  • bef1f5a57e CUDA: fix FP16 overflow in tile FA kernel (llama/17875) Johannes Gäßler 2025-12-09 09:34:02 +01:00
  • 821c2071ab cuda : add FILL op support (llama/17851) Jay Zenith 2025-12-08 05:10:12 -08:00
  • e1562e85fc cuda: optimize SOLVE_TRI using registers and FMAF (llama/17703) wsbagnsv1 2025-12-08 10:41:08 +01:00
  • c8d0ee2f9f ggml-cpu: add ggml_thread_cpu_relax with Zihintpause support (llama/17784) ixgbe 2025-12-08 16:41:34 +08:00
  • d6d44fac69 Vulkan: improve mul_mat_vec_iq1_m (llama/16907) lovedheart 2025-12-07 18:40:42 +01:00
  • 447ef8633b sycl: add missing BF16 conversion support for Intel oneAPI (llama/17780) Law Po Ying 2025-12-07 09:18:18 +08:00
  • 898f876fe2 vulkan: perf_logger improvements (llama/17672) Jeff Bolz 2025-12-06 11:46:46 -06:00
  • ebff8f9db9 ggml-zendnn : add ZenDNN backend for AMD CPUs (llama/17690) Vishal Singh 2025-12-06 21:43:33 +05:30
  • c5e1807071 ggml : add circular tiling support to pad, for Vulkan, CUDA, and CPU (used for making seamless textures) (llama/16985) Phylliida Dev 2025-12-06 06:07:02 -08:00
  • 94be71911f HIP: fix RDNA3 FP16/BF16 matrix multiplication (llama/17817) Johannes Gäßler 2025-12-06 13:45:36 +01:00
  • b67e3abdb2 ggml : improve error handling for search path existence checks (llama/17653) Sky 2025-12-06 19:28:16 +08:00
  • c66c71e9f4 vulkan: Use one row per workgroup for f32 mmv (llama/17711) Jeff Bolz 2025-12-06 04:12:26 -06:00
  • 875d861473 vulkan: support solve_tri with larger N/K values (llama/17781) Jeff Bolz 2025-12-06 01:56:45 -06:00
  • 41cf229d72 metal : fix build(#17799) Georgi Gerganov 2025-12-06 09:33:59 +02:00