mirror of
https://github.com/mozilla-ai/llamafile.git
synced 2026-03-31 05:01:59 +00:00
* Update llama.cpp to a44d7712 and refresh patches * Updated apply-patches and renames * Removed Makefile patch, added as a removal * Refactored patches to be applied in the llama.cpp dir * Fixed apply-patches.sh * Updated tool_server_server.cpp.patch * Updated Makefile to pull llama.cpp deps * Update llama.cpp submodule with dependency submodules * added miniaudio.h.patch * Fixed wrong index in miniaudio patch * Updated patches * Updates to server.cpp * Added cosmocc-override.cmake * Made patches minimal * Removed common.h patch * Added cosmocc 4.0.2 target * Added readme * Updated llama.cpp to commit a44d77126c911d105f7f800c17da21b2a5b112d1 * Updated llama.cpp to commit dbc15a79672e72e0b9c1832adddf3334f5c9229c * Updated patches for newer llama.cpp version * Added miniaudio * Updated patches with common/download.cpp * Updated patches with common/download.cpp * Added extra deps to llama.cpp setup * Moved to using deps from the vendors folder * Removed miniaudio from added files * New BUILD.mk + common/chat.cpp patch * Updated cosmocc to 4.0.2 * Piping od with awk for better compatibility * Renamed miniaudio patch 🤦 * Updated README_0.10.0.md * Moved llama.cpp/common ahead of other deps * Added COSMO to server build * Update build/config.mk Co-authored-by: Peter Wilson <peter@mozilla.ai> * Update build/rules.mk Co-authored-by: Peter Wilson <peter@mozilla.ai> * First TUI iteration (keeping original llamafile dir for comparison) * Add comment blocks to rule file * Integrated llama.cpp server + TUI tool in the same llamafile tool * Code clean * Disable ggml logging when TUI * Updated README * Refactored code llamafile_new -> llamafile and simplified build * Simplified Makefile, updated README * Fixing uncaught exception on llama.cpp server termination when running from tui * LLAMAFILE_INCS -> INCLUDES to fix -iquote issue * Patching common/log to fix uncaught exception * Updated main, removed unused import, cleaned removeArgs code * Metal support - first iteration (only works on TUI) * Added metal support to standalone llama.cpp * Fixed removeArgs (again) * Workaround for segfault at exit when TUI+metal+server * Improved logging (now sending null callback back to metal) * Make sure standalone llama.cpp builds metal dylib if not present * Updated README_0.10.0.md * Fixed typo in readme * Improved g_interrupted_exit handling on TUI * Moved g_interrupted_exit to cover for both sigint and newline * Improved comments around sleep+exit in server.cpp * Improved LLAMAFILE_TUI documentation in BUILD.mk * Made GPU init message always appear as ephemeral * Added back missing pictures * Improved descriptions * Removed redundant block * Update llama.cpp submodule to f47edb8c19199bf0ab471f80e6a4783f0b43ef81 * Removed src_llama-hparams.cpp as not needed anymore * Updated tools_server_server-queue patch * Updated patch for tools/server/server.cpp * Moved patch * Minor updates to other patches * Updated BUILD.mk for llama.cpp * Updated llamafile code to use new llama.cpp main/server * Updated llama.cpp/BUILD.mk with fixes, mtmd new files, and without main/main * Used LLAMA_EXAMPLE_CLI for TUI params * fix(update-llama-cpp): Use `new_build_wip` as base ref. * Adding zipalign as a submodule (#848) * Added third_party/zipalign as submodule * Updated build for zipalign * Fixes to zipalign paths in Makefile * Fixed BUILD.mk to look for zlib.h into cosmocc's third_party/zlib * Updated creating_llamafiles.md * Updated makefile to also compile zipalign with make -j8 * Patching ggml to fix issues arising with multimodal models * Added reset-repo command to Makefile * Minor fixes to Makefile * Added more examples to creating_llamafiles.md * TUI support for mtmd API (#852) * fix(update-llama-cpp): Use `new_build_wip` as base ref. (#850) * TUI support for mtmd API - first sketch * Improved token counting (using n_pos as in llama.cpp server) * Removed extra logging from mtmd/clip and mtmd-helper * Fixed parsing bug in eval_string, factored out function, added tests --------- Co-authored-by: David de la Iglesia Castro <daviddelaiglesiacastro@gmail.com> * Added missing tests * Fix minja segfaulting in cosmo build (#858) * Added tests for minja regexp bug and example patch * Built an ad-hoc test for the cosmo build * Avoid updating it in place, do it only when success * Updated test to use actual patched minja code * Add cuda support (#859) * First attempt at cuda (still buggy, runs in TUI) * Setting free_struct for DSO's copy of ggml_backend_buft_alloc_buffer * Added cuda dep to llama.cpp's BUILD.mk * Fixed warnings with GGML_VERSION and GGML_COMMIT * Added cuda_cublas script, updated others * Rocm parallel version - refactored cuda (tinyblas) and cublas scripts * Using ggml/CMakeLists.txt as source of truth for GGML_VERSION / GGML_COMMIT - updated build/config.mk to retrieve GGML_VERSION from llama.cpp/ggml/CMakeLists.txt and GGML_COMMIT from git - added Makefile targets to cublas, cuda, rocm shell scripts - updated shell scripts to get variables from env, or fall back to reading from CMakeLists * Removed debug logging * Added comment to TinyBlas BF16->FP16 mapping * Factored out common build code in build-functions.sh * Minor fixes * Added output param to build scripts * Minor fix * Added support for --gpu and -ngl<=0 * Compacted the three GPU calls in llamafile_has_gpu * Minor fixes * Fixed cuda.c to copy dso in ~/.llamafile * Fixed BF16->FP16 issue with tinyblas * Updated llamafile.h comments * Add GGML version format validation * Made logging restriction consistent with metal * Free all function pointers in case of error * 862 bug metal dylib compilation (#863) * Suggested patch with -std=c++17 for cpp files * Read GGML_VERSION and GGML_COMMIT from build/config.mk * Fixed GGML_VERSION in BUILD.mk, added comments * Objects cleanup if compile fails, early fail for MAX_METAL_SRCS * Improved error message * Add cpu optimizations (#868) * First attempt at cuda (still buggy, runs in TUI) * Setting free_struct for DSO's copy of ggml_backend_buft_alloc_buffer * Added cuda dep to llama.cpp's BUILD.mk * Fixed warnings with GGML_VERSION and GGML_COMMIT * Added cuda_cublas script, updated others * Rocm parallel version - refactored cuda (tinyblas) and cublas scripts * Using ggml/CMakeLists.txt as source of truth for GGML_VERSION / GGML_COMMIT - updated build/config.mk to retrieve GGML_VERSION from llama.cpp/ggml/CMakeLists.txt and GGML_COMMIT from git - added Makefile targets to cublas, cuda, rocm shell scripts - updated shell scripts to get variables from env, or fall back to reading from CMakeLists * Removed debug logging * Added comment to TinyBlas BF16->FP16 mapping * Factored out common build code in build-functions.sh * Minor fixes * Added output param to build scripts * Minor fix * Added support for --gpu and -ngl<=0 * Compacted the three GPU calls in llamafile_has_gpu * Minor fixes * First iteration - import tinyblas files, update build, fix sgemm * Updated llamafile_sgemm interface to the one from moderl llama.cpp * Improved CPU ident, option to disable for testing/benchmarks * Added tests * Added IQK kernels for quants + test files * Q8_0 layout bug: test to check hypothesis + fix * Added q8 layout test to build, improved comments * Skills first commit (v0.1.0) (#870) * Fix timeout (#876) * Using wait_for() instead of wait() to avoid 72 mins timeout * tested and not timing out anymore * Fixed circular deps issue with .SECONDEXPANSION * Fix mmap issues when loading bundled model (#882) * Fix first iteration * Improved comments in patched llama-mmap.cpp * Simplified llama-map.cpp patch * Updated ggml/src/gguf.cpp to handle opening ggufs in llamafiles * Load gguf from .gguf, /zip/, @, .llamafile * Properly show think mode in TUI (#885) * Fix first iteration * Using llama.cpp's chat parser in TUI * Extend the approach to all models, similarly to what llama.cpp CLI does * Removed extra newline between initial info and chat format * Addressing reviews (handling interrupts + better logging) * Adding tools to use skill docs as a Claude plugin (#886) * Refactored skill docs to be used as Claude plugin * Added .llamafile_plugin dir with symlinks to docs * Added symlink from docs/AGENT.md to CLAUDE.md * Now you can install docs as a plugin with `/plugin marketplace add ./.llamafile_plugin` and `/plugin install llamafile` * Added tools/generate_patches.sh * Updated README_0.10.0.md * Created llama.cpp.patches/README.md * Minor updates to skills and patches README * skill updated to 0.1.1 (update upstream llama.cpp instructions) (#887) * Updated skill with llama.cpp upstream sync * Added check_patches tool * Update llama.cpp to b908baf1825b1a89afef87b09e22c32af2ca6548 (#888) * Update llama.cpp submodule to b908baf1825b1a89afef87b09e22c32af2ca6548 Updates patches and integration code for new llama.cpp version: - Regenerated all patches for updated upstream code - Added common_ngram-mod.cpp.patch (adds #include <algorithm>) - Added vendor_cpp-httplib_httplib.cpp.patch (XNU futex workaround moved from .h) - Added common/license.cpp stub for LICENSES symbol - Removed obsolete vendor_minja_minja.hpp.patch (jinja now built-in) - Removed obsolete vendor_cpp-httplib_httplib.h.patch (code moved to .cpp) - Updated chatbot.h/cpp for common_chat_syntax -> common_chat_parser_params rename - Removed minja test from tests/BUILD.mk * Updated license.cpp with the one generated by cmake in upstream llama.cpp * Updated info about license.cpp in patches' README * Remove minja from tests * Updated refs to minja in docs * Fix templating support for Apertus (#894) * Fixed templating issue with Apertus * Load the PEG parser in chatbot_main if one is provided * Add whisper (#880) * Updated whisper.cpp submodule from v1.6.2-168 (6739eb83) to v1.8.3 (2eeeba56). * Updated patches scripts + removed old patches * Added whisperfile + extra tools (mic2raw, mic2txt, stream, whisper-server) * Added slurp * Updated docs and man pages --------- Co-authored-by: angpt <anushrigupta@gmail.com> * Add support for legacy chat, cli, server modalities (#896) * Add CLI, SERVER, CHAT, and combined modes * Removed log 'path does not exist' * Added server routes to main.cpp * Fixing GPU log callbacks * Added --nothink feature for CLI * Refactored args + cleaned FLAGS * Enabled make cosmocc for any make version * Updated ci.yml to work with new llamafile / zipalign * Llamacpp 7f5ee549683d600ad41db6a295a232cdd2d8eb9f (#901) llama.cpp update, Qwen3.5 Think Mode & CLI Improvements llama.cpp Submodule Update Updated to 7f5ee549683d600ad41db6a295a232cdd2d8eb9f Updated associated patches (removed obsolete vendor_miniaudio_miniaudio.h.patch) Qwen3.5 Think Mode Support Proper handling of think/nothink mode in both chat and CLI modes Uses common_chat_templates_apply() with enable_thinking parameter instead of manually constructing prompts Correctly parses reasoning content using PEG parser with COMMON_REASONING_FORMAT_DEEPSEEK System Prompt Handling Captures -p/--prompt value early in argument parsing (needed for combined mode where server parsing excludes -p) /clear command now properly resets g_pending_file_content to prevent uploaded files from persisting after clear Code Quality Refactored cli_apply_chat_template() to return both prompt and parser params Added documentation comments for subtle pointer lifetime and argument parsing behaviors --------- Co-authored-by: Stuart Henderson <sthen@users.noreply.github.com> (OpenBSD supported versions update) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> (llama.cpp submodule update) * Updated Makefile to download cosmocc at end of setup * Added --image-min-tokens to TUI chat (#905) * Integration tests (#906) Adds integration tests for llamafiles: run a pre-built llamafile as well as the plain executable with a model passed as a parameter test tui (piping inputs to the process), server (sending HTTP requestS), cli (passing prompts as params), hybrid modes test plain text, multimodal, and tool calling with ad-hoc prompts test thinking vs no-thinking mode test CPU vs GPU * Added timeout multiplier * Added combined marker, improved combined tests * Added check for GPU presence * Added meaningful temperature test * Fixed platforms where sh is needed * Adding retry logic to server requests * Add Blackwell GPU architecture support for CUDA 13.x (#907) - Add sm_110f (Jetson Thor & family) and sm_121a (DGX Spark GB10) support for aarch64 platforms with CUDA 13.x - Add sm_120f (RTX 5000 series, RTX PRO Blackwell) support for x86_64 platforms with CUDA 13.x - Enable --compress-mode=size for optimized binary size on Blackwell GPUs - Detect CUDA version and host architecture at build time Co-authored-by: wingx <wingenlit@outlook.com> * Fix cuda combined mode (#909) * Implement chat in combined mode as an OpenAI client * Implemented stop_tui in tests * Fixed CLI tests using --verbose * Accept non-utf8 chars in responses * Simplify prompt for t=0 * Added patch for tools/server/server.cpp * Server output > devnull to avoid buffer fill up with --verbose * Add image to cli (#912) * Added back support for --image in CLI tool * Added tests for multimodal cli * Added optional mmproj parameter to TUI tests too * Addressed review comments * Added test to check multiple markers/images on cli * Review docs v0.10.0 (#911) * Updated index.md * Moar updates to index.md * Updated quickstart.md * Updated support + example llamafiles * Added example files and examples + minor fixes * Updated structure * Removed security * Updated source installation * Updated README_0.10.0, now frozen doc * Removed ref to new_build_wip in whisperfile, make setup installs cosmocc * Apply suggestion from @dpoulopoulos Co-authored-by: Dimitris Poulopoulos <dimitris.a.poulopoulos@gmail.com> * Addressed review comments * Addressed review comments #2 --------- Co-authored-by: Dimitris Poulopoulos <dimitris.a.poulopoulos@gmail.com> * Updated README.md, minor fix to docs/index.md * Minor fixes to install llamafile binary * Open next llama.cpp update PR to main * Updated copyrights * Removed old README, added 'based on' badges * Better version handling (#913) * Improve help (#914) * Added per-mode help + nologo/ascii support * If model is missing, bump to help for respective mode * Update skill docs (#915) * Updated skill not to use new_build_wip + improved it * Removed stray new_build_wip reference * Updated RELEASE.md for v0.10.0 --------- Co-authored-by: Peter Wilson <peter@mozilla.ai> Co-authored-by: daavoo <daviddelaiglesiacastro@gmail.com> Co-authored-by: angpt <anushrigupta@gmail.com> Co-authored-by: wingenlit <63510314+wingenlit@users.noreply.github.com> Co-authored-by: wingx <wingenlit@outlook.com> Co-authored-by: Dimitris Poulopoulos <dimitris.a.poulopoulos@gmail.com>
13 lines
420 B
Plaintext
13 lines
420 B
Plaintext
[submodule "whisper.cpp"]
|
|
path = whisper.cpp
|
|
url = https://github.com/ggerganov/whisper.cpp.git
|
|
[submodule "stable-diffusion.cpp"]
|
|
path = stable-diffusion.cpp
|
|
url = https://github.com/leejet/stable-diffusion.cpp.git
|
|
[submodule "llama.cpp"]
|
|
path = llama.cpp
|
|
url = https://github.com/ggerganov/llama.cpp.git
|
|
[submodule "third_party/zipalign"]
|
|
path = third_party/zipalign
|
|
url = https://github.com/jart/zipalign.git
|