Commit Graph

  • ed4e56b6f7 corrected a few Sigbjørn Skjæret 2024-05-11 08:30:09 +02:00
  • 803e8fa1c4 updated comment Sigbjørn Skjæret 2024-05-10 23:11:43 +02:00
  • a3df77d8d2 tokenize chat format prompts before completion Sigbjørn Skjæret 2024-05-10 23:09:01 +02:00
  • 1547202b77 docs: Fix typo in README.md (#1444) Peng Yu 2024-05-10 10:35:51 -04:00
  • 7f59856fa6 fix: Enable CUDA backend for llava. Closes #1324 Andrei Betlen 2024-05-10 10:18:47 -04:00
  • 73165021bb chore: Bump version v0.2.73-metal v0.2.73-cu124 v0.2.73-cu123 v0.2.73-cu122 v0.2.73-cu121 v0.2.73 Andrei Betlen 2024-05-10 09:44:18 -04:00
  • eafb6ec5e8 feat: Update llama.cpp Andrei Betlen 2024-05-10 08:39:55 -04:00
  • ac55d0a175 fix: Clear kv cache to avoid kv bug when image is evaluated first Andrei Betlen 2024-05-10 02:38:10 -04:00
  • 4badac3a60 chore: Bump version v0.2.72-metal v0.2.72-cu124 v0.2.72-cu123 v0.2.72-cu122 v0.2.72-cu121 v0.2.72 Andrei Betlen 2024-05-10 00:56:19 -04:00
  • 561e880654 fix(security): Render all jinja templates in immutable sandbox (#1441) Sigbjørn Skjæret 2024-05-10 06:49:40 +02:00
  • b454f40a9a Merge pull request from GHSA-56xg-wfcc-g829 Patrick Peng 2024-05-10 12:47:56 +08:00
  • 9d053d6f73 Templates sometimes have BOS in them, remove duplicate Sigbjørn Skjæret 2024-05-09 20:04:06 +02:00
  • 5ab40e6167 feat: Support multiple chat templates - step 1 (#1396) Sigbjørn Skjæret 2024-05-09 15:49:09 +02:00
  • bf66a283e8 chore: Bump version v0.2.71-metal v0.2.71-cu124 v0.2.71-cu123 v0.2.71-cu122 v0.2.71-cu121 v0.2.71 Andrei Betlen 2024-05-09 03:02:52 -04:00
  • 3757328b70 fix: free last image embed in llava chat handler Andrei Betlen 2024-05-08 22:16:18 -04:00
  • 77122638b4 fix: Make leading bos_token optional for image chat formats, fix nanollava system message Andrei Betlen 2024-05-08 13:12:31 -04:00
  • 2a39b99575 feat: Update llama.cpp Andrei Betlen 2024-05-08 08:42:22 -04:00
  • 9ce5cb376a chore: Bump version v0.2.70-metal v0.2.70-cu124 v0.2.70-cu123 v0.2.70-cu122 v0.2.70-cu121 v0.2.70 Andrei Betlen 2024-05-08 02:36:42 -04:00
  • 4a7122d22f feat: fill-in-middle support (#1386) Sigbjørn Skjæret 2024-05-08 08:26:22 +02:00
  • 228949c1f7 feat: Update llama.cpp Andrei Betlen 2024-05-08 02:22:15 -04:00
  • 903b28adf5 fix: adding missing args in create_completion for functionary chat handler (#1430) Sarunas Kalade 2024-05-08 07:21:27 +01:00
  • 07966b9ba7 docs: update README.md (#1432) Ikko Eltociear Ashimine 2024-05-08 15:20:20 +09:00
  • a50d24e3a7 fix: chat_format log where auto-detected format prints None (#1434) Bruno Alvisio 2024-05-07 23:19:35 -07:00
  • 0318702cdc feat(server): Add support for setting root_path. Closes #1420 Andrei Betlen 2024-05-05 12:49:31 -04:00
  • 3666833107 feat(ci): Add docker checks and check deps more frequently (#1426) Olivier DEBAUCHE 2024-05-05 18:42:28 +02:00
  • 3e2597eac8 feat: Update llama.cpp Andrei Betlen 2024-05-05 12:12:27 -04:00
  • e0d7674e62 fix: detokenization case where first token does not start with a leading space (#1375) Noam Gat 2024-05-04 17:14:59 +03:00
  • 1f56c648c3 feat: Implement streaming for Functionary v2 + Bug fixes (#1419) Jeffrey Fong 2024-05-04 22:11:20 +08:00
  • f9b7221c8f Merge branch 'main' of github.com:abetlen/llama_cpp_python into main Andrei Betlen 2024-05-03 19:07:54 -04:00
  • 9f7a85571a fix: Use memmove to copy str_value kv_override. Closes #1417 Andrei Betlen 2024-05-03 19:07:50 -04:00
  • 0a454bebe6 feat(server): Remove temperature bounds checks for server. Closes #1384 Andrei Betlen 2024-05-03 15:23:06 -04:00
  • 2138561fab fix(server): Propagate flash_attn to model load. (#1424) Daniel Thuerck 2024-05-03 18:17:07 +02:00
  • 2117122396 chore: Bump version v0.2.69-metal v0.2.69-cu124 v0.2.69-cu123 v0.2.69-cu122 v0.2.69-cu121 v0.2.69 Andrei Betlen 2024-05-02 12:07:09 -04:00
  • d75dea18db feat: Update llama.cpp Andrei Betlen 2024-05-02 12:00:44 -04:00
  • 31b1d95a6c feat: Add llama-3-vision-alpha chat format Andrei Betlen 2024-05-02 11:32:18 -04:00
  • 4f01c452b6 fix: Change default verbose value of verbose in image chat format handlers to True to match Llama Andrei Betlen 2024-04-30 15:50:30 -04:00
  • 946156fb6c feat: Update llama.cpp Andrei Betlen 2024-04-30 15:46:45 -04:00
  • 9286b5caac Merge branch 'main' of github.com:abetlen/llama_cpp_python into main Andrei Betlen 2024-04-30 15:45:36 -04:00
  • f116175a5a fix: Suppress all logs when verbose=False, use hardcoded fileno's to work in colab notebooks. Closes #796 Closes #729 Andrei Betlen 2024-04-30 15:45:34 -04:00
  • 3226b3c5ef fix: UTF-8 handling with grammars (#1415) Jonathan Soma 2024-04-30 14:33:23 -04:00
  • 945c62c567 docs: Change all examples from interpreter style to script style. Andrei Betlen 2024-04-30 10:15:04 -04:00
  • 26478ab293 docs: Update README.md Andrei Betlen 2024-04-30 10:11:38 -04:00
  • b14dd98922 chore: Bump version v0.2.68-metal v0.2.68-cu124 v0.2.68-cu123 v0.2.68-cu122 v0.2.68-cu121 v0.2.68 Andrei Betlen 2024-04-30 09:39:56 -04:00
  • 29b6e9a5c8 fix: wrong parameter for flash attention in pickle __getstate__ Andrei Betlen 2024-04-30 09:32:47 -04:00
  • 22d77eefd2 feat: Add option to enable flash_attn to Lllama params and ModelSettings Andrei Betlen 2024-04-30 09:29:16 -04:00
  • 8c2b24d5aa feat: Update llama.cpp Andrei Betlen 2024-04-30 09:27:55 -04:00
  • 6332527a69 fix(ci): Fix build-and-release.yaml (#1413) Olivier DEBAUCHE 2024-04-30 15:16:14 +02:00
  • c8cd8c17c6 docs: Update README to include CUDA 12.4 wheels v0.2.67-metal v0.2.67-cu124 v0.2.67-cu123 v0.2.67-cu122 v0.2.67-cu121 Andrei Betlen 2024-04-30 03:12:46 -04:00
  • f417cce28a chore: Bump version v0.2.67 Andrei Betlen 2024-04-30 03:11:02 -04:00
  • 3489ef09d3 fix: Ensure image renders before text in chat formats regardless of message content order. Andrei Betlen 2024-04-30 03:08:46 -04:00
  • d03f15bb73 fix(ci): Fix bug in use of upload-artifact failing to merge multiple artifacts into a single release. Andrei Betlen 2024-04-30 02:58:55 -04:00
  • 26c7876ba0 chore: Bump version v0.2.66-metal v0.2.66-cu124 v0.2.66-cu123 v0.2.66-cu122 v0.2.66-cu121 v0.2.66 Andrei Betlen 2024-04-30 01:48:40 -04:00
  • fe2da09538 feat: Generic Chat Formats, Tool Calling, and Huggingface Pull Support for Multimodal Models (Obsidian, LLaVA1.6, Moondream) (#1147) Andrei 2024-04-30 01:35:38 -04:00
  • 64008aa0ee Fix typo generic-vlm-chat-format Andrei Betlen 2024-04-30 01:32:03 -04:00
  • f70326fa1c Update README Andrei Betlen 2024-04-30 01:30:42 -04:00
  • 6e4ad7246b Fix typo Andrei Betlen 2024-04-30 01:28:18 -04:00
  • efd99f136b Update README Andrei Betlen 2024-04-30 01:25:50 -04:00
  • f03326ce5a Update docs Andrei Betlen 2024-04-30 01:13:14 -04:00
  • fc5d01c321 Update README.md Andrei Betlen 2024-04-30 01:12:14 -04:00
  • 0e15835182 Logits all no longer required for multi-modal models Andrei Betlen 2024-04-30 01:02:57 -04:00
  • 0b891f4038 Re-order multimodal chat formats Andrei Betlen 2024-04-30 00:59:52 -04:00
  • dd47dda13f Remove unnecessary import Andrei Betlen 2024-04-30 00:49:50 -04:00
  • c89c6de1f0 Merge branch 'main' into generic-vlm-chat-format Andrei Betlen 2024-04-29 23:57:51 -04:00
  • 97fb860eba feat: Update llama.cpp Andrei Betlen 2024-04-29 23:34:55 -04:00
  • df2b5b5d44 chore(deps): bump actions/upload-artifact from 3 to 4 (#1412) dependabot[bot] 2024-04-29 22:53:42 -04:00
  • be43018e09 chore(deps): bump actions/configure-pages from 4 to 5 (#1411) dependabot[bot] 2024-04-29 22:53:21 -04:00
  • 32c000f3ec chore(deps): bump softprops/action-gh-release from 1 to 2 (#1408) dependabot[bot] 2024-04-29 22:52:58 -04:00
  • 03c654a3d9 ci(fix): Workflow actions updates and fix arm64 wheels not included in release (#1392) Olivier DEBAUCHE 2024-04-30 04:52:23 +02:00
  • 0c3bc4b928 fix(ci): Update generate wheel index script to include cu12.3 and cu12.4 Closes #1406 Andrei Betlen 2024-04-29 12:37:22 -04:00
  • 2355ce2227 ci: Add support for pre-built cuda 12.4.1 wheels (#1388) Olivier DEBAUCHE 2024-04-28 05:44:47 +02:00
  • a411612b38 feat: Add support for str type kv_overrides Andrei Betlen 2024-04-27 23:42:19 -04:00
  • c9b85bf098 feat: Update llama.cpp Andrei Betlen 2024-04-27 23:41:54 -04:00
  • 22c55cd103 Merge branch 'main' into generic-vlm-chat-format Andrei 2024-04-27 22:34:20 -04:00
  • 8f09d428af Add obisidian support Andrei Betlen 2024-04-27 22:29:02 -04:00
  • 8324ee0c89 Add nanollava support Andrei Betlen 2024-04-27 22:21:53 -04:00
  • 20e0967f14 Add Llava1.6 support Andrei Betlen 2024-04-27 22:14:38 -04:00
  • 0e182be9de Cache last image embed Andrei Betlen 2024-04-27 21:08:27 -04:00
  • c07db99e5b chore(deps): bump pypa/cibuildwheel from 2.16.5 to 2.17.0 (#1401) dependabot[bot] 2024-04-27 20:51:13 -04:00
  • 7074c4d256 chore(deps): bump docker/build-push-action from 4 to 5 (#1400) dependabot[bot] 2024-04-27 20:51:02 -04:00
  • 79318ba1d1 chore(deps): bump docker/login-action from 2 to 3 (#1399) dependabot[bot] 2024-04-27 20:50:50 -04:00
  • 27038db3d6 chore(deps): bump actions/cache from 3.3.2 to 4.0.2 (#1398) dependabot[bot] 2024-04-27 20:50:39 -04:00
  • 17bdfc818f chore(deps): bump conda-incubator/setup-miniconda from 2.2.0 to 3.0.4 (#1397) dependabot[bot] 2024-04-27 20:50:28 -04:00
  • f178636e1b fix: Functionary bug fixes (#1385) Jeffrey Fong 2024-04-28 08:49:52 +08:00
  • e6bbfb863c examples: fix quantize example (#1387) iyubondyrev 2024-04-28 02:48:47 +02:00
  • c58b56123d ci: Update action versions in build-wheels-metal.yaml (#1390) Olivier DEBAUCHE 2024-04-28 02:47:49 +02:00
  • 9e7f738220 ci: Update dependabot.yml (#1391) Olivier DEBAUCHE 2024-04-28 02:47:07 +02:00
  • 94fe4bca1c Add function calling support Andrei Betlen 2024-04-27 17:32:51 -04:00
  • fd55c29a58 Update moondream prompt Andrei Betlen 2024-04-27 16:40:40 -04:00
  • 1705893ced Update moondream chat format Andrei Betlen 2024-04-27 16:38:31 -04:00
  • 7df9483f62 Update moondream chat format Andrei Betlen 2024-04-27 15:40:38 -04:00
  • 2fd41f9cce Add moondream support (wip) Andrei Betlen 2024-04-27 13:19:24 -04:00
  • 3cef09cf2d Revert chat format test Andrei Betlen 2024-04-27 12:59:56 -04:00
  • d7b28f709f Refactor llava chat format to use a jinja2 Andrei Betlen 2024-04-27 12:59:16 -04:00
  • a3c3b5df68 Add from_pretrained support to llava chat format. Andrei Betlen 2024-04-27 12:58:40 -04:00
  • b78ed72fc6 Format and improve types for llava_cpp.py Andrei Betlen 2024-04-27 12:57:35 -04:00
  • b7338a049b Merge branch 'main' into generic-vlm-chat-format Andrei 2024-04-27 12:56:18 -04:00
  • 65edc90671 chore: Bump version v0.2.65-metal v0.2.65-cu123 v0.2.65-cu122 v0.2.65-cu121 v0.2.65 Andrei Betlen 2024-04-26 10:11:31 -04:00
  • 173ebc7878 fix: Remove duplicate pooling_type definition and add misisng n_vocab definition in bindings Andrei Betlen 2024-04-25 21:36:09 -04:00
  • f6ed21f9a2 feat: Allow for possibly non-pooled embeddings (#1380) Douglas Hanley 2024-04-25 20:32:44 -05:00
  • fcfea66857 fix: pydantic deprecation warning Andrei Betlen 2024-04-25 21:21:48 -04:00