Files
Davide Eynard 4cc1a5f712 llamafile reloaded (v0.10.0) (#867)
* Update llama.cpp to a44d7712 and refresh patches

* Updated apply-patches and renames

* Removed Makefile patch, added as a removal

* Refactored patches to be applied in the llama.cpp dir

* Fixed apply-patches.sh

* Updated tool_server_server.cpp.patch

* Updated Makefile to pull llama.cpp deps

* Update llama.cpp submodule with dependency submodules

* added miniaudio.h.patch

* Fixed wrong index in miniaudio patch

* Updated patches

* Updates to server.cpp

* Added cosmocc-override.cmake

* Made patches minimal

* Removed common.h patch

* Added cosmocc 4.0.2 target

* Added readme

* Updated llama.cpp to commit a44d77126c911d105f7f800c17da21b2a5b112d1

* Updated llama.cpp to commit dbc15a79672e72e0b9c1832adddf3334f5c9229c

* Updated patches for newer llama.cpp version

* Added miniaudio

* Updated patches with common/download.cpp

* Updated patches with common/download.cpp

* Added extra deps to llama.cpp setup

* Moved to using deps from the vendors folder

* Removed miniaudio from added files

* New BUILD.mk + common/chat.cpp patch

* Updated cosmocc to 4.0.2

* Piping od with awk for better compatibility

* Renamed miniaudio patch 🤦

* Updated README_0.10.0.md

* Moved llama.cpp/common ahead of other deps

* Added COSMO to server build

* Update build/config.mk

Co-authored-by: Peter Wilson <peter@mozilla.ai>

* Update build/rules.mk

Co-authored-by: Peter Wilson <peter@mozilla.ai>

* First TUI iteration (keeping original llamafile dir for comparison)

* Add comment blocks to rule file

* Integrated llama.cpp server + TUI tool in the same llamafile tool

* Code clean

* Disable ggml logging when TUI

* Updated README

* Refactored code llamafile_new -> llamafile and simplified build

* Simplified Makefile, updated README

* Fixing uncaught exception on llama.cpp server termination when running from tui

* LLAMAFILE_INCS -> INCLUDES to fix -iquote issue

* Patching common/log to fix uncaught exception

* Updated main, removed unused import, cleaned removeArgs code

* Metal support - first iteration (only works on TUI)

* Added metal support to standalone llama.cpp

* Fixed removeArgs (again)

* Workaround for segfault at exit when TUI+metal+server

* Improved logging (now sending null callback back to metal)

* Make sure standalone llama.cpp builds metal dylib if not present

* Updated README_0.10.0.md

* Fixed typo in readme

* Improved g_interrupted_exit handling on TUI

* Moved g_interrupted_exit to cover for both sigint and newline

* Improved comments around sleep+exit in server.cpp

* Improved LLAMAFILE_TUI documentation in BUILD.mk

* Made GPU init message always appear as ephemeral

* Added back missing pictures

* Improved descriptions

* Removed redundant block

* Update llama.cpp submodule to f47edb8c19199bf0ab471f80e6a4783f0b43ef81

* Removed src_llama-hparams.cpp as not needed anymore

* Updated tools_server_server-queue patch

* Updated patch for tools/server/server.cpp

* Moved patch

* Minor updates to other patches

* Updated BUILD.mk for llama.cpp

* Updated llamafile code to use new llama.cpp main/server

* Updated llama.cpp/BUILD.mk with fixes, mtmd new files, and without main/main

* Used LLAMA_EXAMPLE_CLI for TUI params

* fix(update-llama-cpp): Use `new_build_wip` as base ref.

* Adding zipalign as a submodule (#848)

* Added third_party/zipalign as submodule

* Updated build for zipalign

* Fixes to zipalign paths in Makefile

* Fixed BUILD.mk to look for zlib.h into cosmocc's third_party/zlib

* Updated creating_llamafiles.md

* Updated makefile to also compile zipalign with make -j8

* Patching ggml to fix issues arising with multimodal models

* Added reset-repo command to Makefile

* Minor fixes to Makefile

* Added more examples to creating_llamafiles.md

* TUI support for mtmd API (#852)

* fix(update-llama-cpp): Use `new_build_wip` as base ref. (#850)

* TUI support for mtmd API - first sketch

* Improved token counting (using n_pos as in llama.cpp server)

* Removed extra logging from mtmd/clip and mtmd-helper

* Fixed parsing bug in eval_string, factored out function, added tests

---------

Co-authored-by: David de la Iglesia Castro <daviddelaiglesiacastro@gmail.com>

* Added missing tests

* Fix minja segfaulting in cosmo build (#858)

* Added tests for minja regexp bug and example patch

* Built an ad-hoc test for the cosmo build

* Avoid updating it in place, do it only when success

* Updated test to use actual patched minja code

* Add cuda support (#859)

* First attempt at cuda (still buggy, runs in TUI)

* Setting free_struct for DSO's copy of ggml_backend_buft_alloc_buffer

* Added cuda dep to llama.cpp's BUILD.mk

* Fixed warnings with GGML_VERSION and GGML_COMMIT

* Added cuda_cublas script, updated others

* Rocm parallel version - refactored cuda (tinyblas) and cublas scripts

* Using ggml/CMakeLists.txt as source of truth for GGML_VERSION / GGML_COMMIT

- updated build/config.mk to retrieve GGML_VERSION from
llama.cpp/ggml/CMakeLists.txt and GGML_COMMIT from git
- added Makefile targets to cublas, cuda, rocm shell scripts
- updated shell scripts to get variables from env, or fall
back to reading from CMakeLists

* Removed debug logging

* Added comment to TinyBlas BF16->FP16 mapping

* Factored out common build code in build-functions.sh

* Minor fixes

* Added output param to build scripts

* Minor fix

* Added support for --gpu and -ngl<=0

* Compacted the three GPU calls in llamafile_has_gpu

* Minor fixes

* Fixed cuda.c to copy dso in ~/.llamafile

* Fixed BF16->FP16 issue with tinyblas

* Updated llamafile.h comments

* Add GGML version format validation

* Made logging restriction consistent with metal

* Free all function pointers in case of error

* 862 bug metal dylib compilation (#863)

* Suggested patch with -std=c++17 for cpp files

* Read GGML_VERSION and GGML_COMMIT from build/config.mk

* Fixed GGML_VERSION in BUILD.mk, added comments

* Objects cleanup if compile fails, early fail for MAX_METAL_SRCS

* Improved error message

* Add cpu optimizations (#868)

* First attempt at cuda (still buggy, runs in TUI)

* Setting free_struct for DSO's copy of ggml_backend_buft_alloc_buffer

* Added cuda dep to llama.cpp's BUILD.mk

* Fixed warnings with GGML_VERSION and GGML_COMMIT

* Added cuda_cublas script, updated others

* Rocm parallel version - refactored cuda (tinyblas) and cublas scripts

* Using ggml/CMakeLists.txt as source of truth for GGML_VERSION / GGML_COMMIT

- updated build/config.mk to retrieve GGML_VERSION from
llama.cpp/ggml/CMakeLists.txt and GGML_COMMIT from git
- added Makefile targets to cublas, cuda, rocm shell scripts
- updated shell scripts to get variables from env, or fall
back to reading from CMakeLists

* Removed debug logging

* Added comment to TinyBlas BF16->FP16 mapping

* Factored out common build code in build-functions.sh

* Minor fixes

* Added output param to build scripts

* Minor fix

* Added support for --gpu and -ngl<=0

* Compacted the three GPU calls in llamafile_has_gpu

* Minor fixes

* First iteration - import tinyblas files, update build, fix sgemm

* Updated llamafile_sgemm interface to the one from moderl llama.cpp

* Improved CPU ident, option to disable for testing/benchmarks

* Added tests

* Added IQK kernels for quants + test files

* Q8_0 layout bug: test to check hypothesis + fix

* Added q8 layout test to build, improved comments

* Skills first commit (v0.1.0) (#870)

* Fix timeout (#876)

* Using wait_for() instead of wait() to avoid 72 mins timeout

* tested and not timing out anymore

* Fixed circular deps issue with .SECONDEXPANSION

* Fix mmap issues when loading bundled model (#882)

* Fix first iteration

* Improved comments in patched llama-mmap.cpp

* Simplified llama-map.cpp patch

* Updated ggml/src/gguf.cpp to handle opening ggufs in llamafiles

* Load gguf from .gguf, /zip/, @, .llamafile

* Properly show think mode in TUI (#885)

* Fix first iteration

* Using llama.cpp's chat parser in TUI

* Extend the approach to all models, similarly to what llama.cpp CLI does

* Removed extra newline between initial info and chat format

* Addressing reviews (handling interrupts + better logging)

* Adding tools to use skill docs as a Claude plugin (#886)

* Refactored skill docs to be used as Claude plugin
* Added .llamafile_plugin dir with symlinks to docs
* Added symlink from docs/AGENT.md to CLAUDE.md
* Now you can install docs as a plugin with `/plugin marketplace add ./.llamafile_plugin` and `/plugin install llamafile`

* Added tools/generate_patches.sh

* Updated README_0.10.0.md

* Created llama.cpp.patches/README.md

* Minor updates to skills and patches README

* skill updated to 0.1.1 (update upstream llama.cpp instructions) (#887)

* Updated skill with llama.cpp upstream sync
* Added check_patches tool

* Update llama.cpp to b908baf1825b1a89afef87b09e22c32af2ca6548 (#888)

* Update llama.cpp submodule to b908baf1825b1a89afef87b09e22c32af2ca6548

Updates patches and integration code for new llama.cpp version:
- Regenerated all patches for updated upstream code
- Added common_ngram-mod.cpp.patch (adds #include <algorithm>)
- Added vendor_cpp-httplib_httplib.cpp.patch (XNU futex workaround moved from .h)
- Added common/license.cpp stub for LICENSES symbol
- Removed obsolete vendor_minja_minja.hpp.patch (jinja now built-in)
- Removed obsolete vendor_cpp-httplib_httplib.h.patch (code moved to .cpp)
- Updated chatbot.h/cpp for common_chat_syntax -> common_chat_parser_params rename
- Removed minja test from tests/BUILD.mk

* Updated license.cpp with the one generated by cmake in upstream llama.cpp
* Updated info about license.cpp in patches' README
* Remove minja from tests
* Updated refs to minja in docs

* Fix templating support for Apertus (#894)

* Fixed templating issue with Apertus
* Load the PEG parser in chatbot_main if one is provided

* Add whisper (#880)

* Updated whisper.cpp submodule from v1.6.2-168 (6739eb83) to v1.8.3 (2eeeba56).
* Updated patches scripts + removed old patches
* Added whisperfile + extra tools (mic2raw, mic2txt, stream, whisper-server)
* Added slurp
* Updated docs and man pages

---------

Co-authored-by: angpt <anushrigupta@gmail.com>

* Add support for legacy chat, cli, server modalities (#896)

* Add CLI, SERVER, CHAT, and combined modes
* Removed log 'path  does not exist'
* Added server routes to main.cpp
* Fixing GPU log callbacks
* Added --nothink feature for CLI
* Refactored args + cleaned FLAGS

* Enabled make cosmocc for any make version

* Updated ci.yml to work with new llamafile / zipalign

* Llamacpp 7f5ee549683d600ad41db6a295a232cdd2d8eb9f (#901)

llama.cpp update, Qwen3.5 Think Mode & CLI Improvements

llama.cpp Submodule Update

    Updated to 7f5ee549683d600ad41db6a295a232cdd2d8eb9f
    Updated associated patches (removed obsolete vendor_miniaudio_miniaudio.h.patch)

Qwen3.5 Think Mode Support

    Proper handling of think/nothink mode in both chat and CLI modes
    Uses common_chat_templates_apply() with enable_thinking parameter instead of manually constructing prompts
    Correctly parses reasoning content using PEG parser with COMMON_REASONING_FORMAT_DEEPSEEK

System Prompt Handling

    Captures -p/--prompt value early in argument parsing (needed for combined mode where server parsing excludes -p)
    /clear command now properly resets g_pending_file_content to prevent uploaded files from persisting after clear

Code Quality

    Refactored cli_apply_chat_template() to return both prompt and parser params
    Added documentation comments for subtle pointer lifetime and argument parsing behaviors


---------

Co-authored-by: Stuart Henderson <sthen@users.noreply.github.com> (OpenBSD supported versions update)
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> (llama.cpp submodule update)

* Updated Makefile to download cosmocc at end of setup

* Added --image-min-tokens to TUI chat (#905)

* Integration tests (#906)

Adds integration tests for llamafiles:

    run a pre-built llamafile as well as the plain executable with a model passed as a parameter
    test tui (piping inputs to the process), server (sending HTTP requestS), cli (passing prompts as params), hybrid modes
    test plain text, multimodal, and tool calling with ad-hoc prompts
    test thinking vs no-thinking mode
    test CPU vs GPU


* Added timeout multiplier

* Added combined marker, improved combined tests

* Added check for GPU presence

* Added meaningful temperature test

* Fixed platforms where sh is needed

* Adding retry logic to server requests

* Add Blackwell GPU architecture support for CUDA 13.x (#907)

- Add sm_110f (Jetson Thor & family) and sm_121a (DGX Spark GB10)
  support for aarch64 platforms with CUDA 13.x
- Add sm_120f (RTX 5000 series, RTX PRO Blackwell) support for x86_64
  platforms with CUDA 13.x
- Enable --compress-mode=size for optimized binary size on Blackwell GPUs
- Detect CUDA version and host architecture at build time

Co-authored-by: wingx <wingenlit@outlook.com>

* Fix cuda combined mode (#909)

* Implement chat in combined mode as an OpenAI client
* Implemented stop_tui in tests
* Fixed CLI tests using --verbose
* Accept non-utf8 chars in responses
* Simplify prompt for t=0
* Added patch for tools/server/server.cpp
* Server output > devnull to avoid buffer fill up with --verbose

* Add image to cli (#912)

* Added back support for --image in CLI tool
* Added tests for multimodal cli
* Added optional mmproj parameter to TUI tests too
* Addressed review comments
* Added test to check multiple markers/images on cli

* Review docs v0.10.0 (#911)

* Updated index.md

* Moar updates to index.md

* Updated quickstart.md

* Updated support + example llamafiles

* Added example files and examples + minor fixes

* Updated structure

* Removed security

* Updated source installation

* Updated README_0.10.0, now frozen doc

* Removed ref to new_build_wip in whisperfile, make setup installs cosmocc

* Apply suggestion from @dpoulopoulos

Co-authored-by: Dimitris Poulopoulos <dimitris.a.poulopoulos@gmail.com>

* Addressed review comments

* Addressed review comments #2

---------

Co-authored-by: Dimitris Poulopoulos <dimitris.a.poulopoulos@gmail.com>

* Updated README.md, minor fix to docs/index.md

* Minor fixes to install llamafile binary

* Open next llama.cpp update PR to main

* Updated copyrights

* Removed old README, added 'based on' badges

* Better version handling (#913)

* Improve help (#914)

* Added per-mode help + nologo/ascii support
* If model is missing, bump to help for respective mode

* Update skill docs (#915)

* Updated skill not to use new_build_wip + improved it
* Removed stray new_build_wip reference

* Updated RELEASE.md for v0.10.0

---------

Co-authored-by: Peter Wilson <peter@mozilla.ai>
Co-authored-by: daavoo <daviddelaiglesiacastro@gmail.com>
Co-authored-by: angpt <anushrigupta@gmail.com>
Co-authored-by: wingenlit <63510314+wingenlit@users.noreply.github.com>
Co-authored-by: wingx <wingenlit@outlook.com>
Co-authored-by: Dimitris Poulopoulos <dimitris.a.poulopoulos@gmail.com>
2026-03-19 11:13:53 +00:00

13 lines
420 B
Plaintext

[submodule "whisper.cpp"]
path = whisper.cpp
url = https://github.com/ggerganov/whisper.cpp.git
[submodule "stable-diffusion.cpp"]
path = stable-diffusion.cpp
url = https://github.com/leejet/stable-diffusion.cpp.git
[submodule "llama.cpp"]
path = llama.cpp
url = https://github.com/ggerganov/llama.cpp.git
[submodule "third_party/zipalign"]
path = third_party/zipalign
url = https://github.com/jart/zipalign.git