Commit Graph

  • af78f46c3d feat: align function id with tool call response drbh 2025-03-13 19:16:47 +00:00
  • b5bac0dd2d Add comments for support of models Mohit Sharma 2025-03-21 14:11:15 +00:00
  • 50ffe00a1a Improve attn_implementation Mohit Sharma 2025-03-21 13:44:40 +00:00
  • b41faae318 cleanup comment Mohit Sharma 2025-03-21 11:28:40 +00:00
  • ac6fc70c75 Add support for other vlm Mohit Sharma 2025-03-21 11:22:12 +00:00
  • 2e60a8dd65 CI: enable server tests for backends (#3128) Baptiste Colle 2025-03-20 16:07:31 +01:00
  • d8d09f9c7b initial changes Mohit Sharma 2025-03-20 14:22:42 +00:00
  • e5503eba78 configurable termination timeout (#3126) Erik Kaunismäki 2025-03-20 14:25:56 +01:00
  • 69936732eb feat: allow model load and stub logits enable-transformers-vlm drbh 2025-03-19 19:55:19 +00:00
  • e497bc09f6 Minor fixes. (#3125) Nicolas Patry 2025-03-18 15:42:35 +01:00
  • 4d28897b4e Fix release nix workflow. v3.2.1 git_3.2.1 Nicolas Patry 2025-03-18 15:27:48 +01:00
  • f5850f4c4f Patch release 3.2.1 Nicolas Patry 2025-03-18 15:12:48 +01:00
  • 67ce543e04 Intel docker. (#3121) Nicolas Patry 2025-03-18 15:12:11 +01:00
  • 83fe45c15e Prepare for patch release. (#3124) Nicolas Patry 2025-03-18 15:11:55 +01:00
  • 11f2eec10e Publish nix docker image. (#3122) Nicolas Patry 2025-03-18 12:58:21 +01:00
  • a35fbdb925 Bug Fix: Sliding Window Attention (#3112) Mohit Sharma 2025-03-18 15:07:33 +05:30
  • 8c2c348f3c Gaudi: Sync TGI with the latest changes from the TGI-Gaudi fork (#3117) Baptiste Colle 2025-03-18 09:45:52 +01:00
  • 095775e05c launcher: correctly get the head dimension for VLMs (#3116) Daniël de Kok 2025-03-17 18:19:37 +01:00
  • e0535a13c5 increase timeouts debugging-timeouts erikkaum 2025-03-17 17:56:57 +01:00
  • febc488e0e fix: bump org name in gemma3 test drbh 2025-03-17 15:57:07 +00:00
  • 0b3e3db043 xpu 2.6 update (#3051) Wang, Yi 2025-03-17 20:48:48 +08:00
  • 2c2fc6544d fix: add pillow dependency and bump lock+requirements drbh 2025-03-14 18:17:57 +00:00
  • e5dfd41ed4 Upgrading from_env to get token from file when necessary + fix pali_gemma. Nicolas Patry 2025-03-14 17:06:36 +01:00
  • 659ce4f3fc feat: add tests for image types and remove alpha from png drbh 2025-03-14 15:33:06 +00:00
  • e5ec176bf4 fix: bump snapshots and improve exceed window test case drbh 2025-03-14 15:04:38 +00:00
  • 170a12f331 Update window size rocm flash decoding Mohit Sharma 2025-03-14 07:50:11 +00:00
  • b30cdabf68 Add window_size_left param ipex rocm Mohit Sharma 2025-03-14 07:47:45 +00:00
  • eaf18c1ccb (typo) collection link Mohit Sharma 2025-03-14 07:36:38 +00:00
  • 69e0a87dd5 (fix) flashinfer Mohit Sharma 2025-03-13 21:32:38 +00:00
  • ff82f0f84c (fix) sliding window attention Mohit Sharma 2025-03-13 19:30:39 +00:00
  • f91434e99b Make the Nix-based Docker container work on non-NixOS (#3109) Daniël de Kok 2025-03-13 14:02:45 +01:00
  • 8b91f92978 Fixing the docker build. (#3108) Nicolas Patry 2025-03-13 11:26:44 +01:00
  • 27ed848676 Release of Gaudi Backend for TGI (#3091) Baptiste Colle 2025-03-13 10:56:01 +01:00
  • 83ef364177 We need gcc during runtime to enable triton to compile kernels. (#3103) Nicolas Patry 2025-03-13 10:45:47 +01:00
  • 83b7b7bb92 Router: add gemma3-text model type (#3107) Daniël de Kok 2025-03-13 10:41:33 +01:00
  • c73ae0bd88 Update to kernels 0.2.1 (#3084) Daniël de Kok 2025-03-13 10:36:29 +01:00
  • 73ee7837b8 Update to kernels 0.2.0. use_updated_kernels Nicolas Patry 2025-03-13 10:30:07 +01:00
  • 411a28288d Release 3.2.0 v3.2.0 git_3.2.0 Nicolas Patry 2025-03-12 11:15:29 +01:00
  • d4c6faa67b Try to fix on main CI color. (#3101) origin/slind_window_fix Nicolas Patry 2025-03-12 10:12:24 +01:00
  • 4ac06ddf56 Preparing relase 3.2.0 (#3100) Nicolas Patry 2025-03-12 10:11:33 +01:00
  • f01dc9e743 Update neuron backend (#3098) David Corvoysier 2025-03-12 09:53:15 +01:00
  • 5c5528e362 Fix tool call4 (#3094) Nicolas Patry 2025-03-12 09:28:47 +01:00
  • ed46c2c414 Add gemma3 model (#3099) Mohit Sharma 2025-03-12 13:55:51 +05:30
  • f74c36fe0d Fix tool call3 (#3086) Nicolas Patry 2025-03-12 09:22:53 +01:00
  • ae4451c3da Update README.md (#3095) celsowm 2025-03-11 07:05:21 -03:00
  • b447f7e821 Fix qwen vl (#3096) Nicolas Patry 2025-03-11 11:00:41 +01:00
  • 094975c3a8 Update the llamacpp backend (#3022) Adrien Gallouët 2025-03-11 09:19:01 +01:00
  • dc5f05f8e6 Pr 3003 ci branch (#3007) drbh 2025-03-10 12:56:19 -04:00
  • 124398fa57 hotfix: qwen2 formatting (#3093) Daniël de Kok 2025-03-10 16:19:50 +01:00
  • c5ecc7a4de Small test and typing fixes (#3078) Daniël de Kok 2025-03-10 15:08:23 +01:00
  • cae0cbe87d Add modules_to_not_convert in quantized model (#3053) jiqing-feng 2025-03-10 22:03:51 +08:00
  • bbe218a4f7 Add qwen2 multi lora layers support (#3089) EachSheep 2025-03-10 19:42:59 +08:00
  • 58a65f7914 Add request parameters to OTel span for /v1/chat/completions endpoint (#3000) Alex Weston 2025-03-10 07:26:57 -04:00
  • 976eae216f Nix: the launcher needs a Python env with Torch for GPU detection (#3085) Daniël de Kok 2025-03-10 12:11:10 +01:00
  • 622908deab Fix tool call2 (#3076) Nicolas Patry 2025-03-07 19:45:57 +01:00
  • 55a6618434 Update --max-batch-total-tokens description (#3083) Alvaro Bartolome 2025-03-07 14:24:26 +01:00
  • 036d802b62 Nix: add openai to impure shell for integration tests (#3081) Daniël de Kok 2025-03-07 13:04:21 +01:00
  • e2846f76fa No root user TGI. no_root_user2 Nicolas Patry 2025-03-07 11:23:02 +01:00
  • 5a5a51217e Stop being root in the docker. no_root_user Nicolas Patry 2025-03-06 16:45:55 +01:00
  • 8e92942a18 Making tool_calls a vector. (#3075) Nicolas Patry 2025-03-05 22:32:31 +01:00
  • 3208d1cd1d Revert "Trying to reduce the logs in the case of errors." Nicolas Patry 2025-03-05 20:52:38 +01:00
  • cdf70d6a28 Trying to reduce the logs in the case of errors. Nicolas Patry 2025-03-05 20:50:43 +01:00
  • ab9dafc68f Making sure Olmo (transformers backend) works. (#3074) Nicolas Patry 2025-03-05 17:46:47 +01:00
  • 31766dad77 Force upgrade transformers version for olmo. Nicolas Patry 2025-03-05 12:17:09 +01:00
  • ec35976f82 Only add token when it is defined. (#3073) Nicolas Patry 2025-03-05 11:59:52 +01:00
  • cb42b3ad83 fix(neuron): explicitly install toolchain (#3072) David Corvoysier 2025-03-05 11:46:58 +01:00
  • c34bd9d8d9 3.1.1 Release. v3.1.1 git_3.1.1 Nicolas Patry 2025-03-04 18:11:30 +01:00
  • 491ed9e11d Patch rust release. (#3069) Nicolas Patry 2025-03-04 18:07:33 +01:00
  • 144d99c147 Fix a tiny typo in monitoring.md tutorial (#3056) Sadra Barikbin 2025-03-04 19:36:26 +03:30
  • 08bbfa16a1 Preparing for release. (#3060) Nicolas Patry 2025-03-04 16:47:10 +01:00
  • d8ff7f2623 feat: add support for HF_HUB_USER_AGENT_ORIGIN to add user-agent Origin field in Hub requests. (#3061) Hugo Larcher 2025-03-04 16:43:50 +01:00
  • e88f6f6ee9 Add property-based testing for RadixAllocator (#3068) Daniël de Kok 2025-03-04 15:09:46 +01:00
  • ddf0b02240 All the assertions. tmp_invariants Nicolas Patry 2025-02-28 17:41:22 +01:00
  • fa4e9511f8 Fix two edge cases in RadixTrie::find (#3067) Daniël de Kok 2025-03-04 13:23:27 +01:00
  • a914a21899 Revert "Patch rust release." Nicolas Patry 2025-03-04 12:16:18 +00:00
  • aad9c2b0bd Patch rust release. Nicolas Patry 2025-03-04 12:14:58 +00:00
  • 1f35cc7a31 Updating patch rust release. Nicolas Patry 2025-03-04 12:13:58 +00:00
  • 683ff53fa3 Add Gaudi Backend (#3055) Baptiste Colle 2025-02-28 12:14:58 +01:00
  • f72547c9fb feat(metrics): remove ngrok mandatory feature for backendv3 crate proxy_sse_engine_state Morgan Funtowicz 2025-02-27 22:56:04 +01:00
  • 712199c769 feat(metrics): dispatch internal engine state event from queuing/batching tasks Morgan Funtowicz 2025-02-27 22:43:20 +01:00
  • 1a9c5dec76 feat(metrics): update Cargo.lock Morgan Funtowicz 2025-02-27 21:33:41 +01:00
  • efb20054aa feat: consolidate streaming and event creation logic and add tests for streaming generations pr-2954-ci-branch drbh 2025-02-27 16:12:51 +00:00
  • 8de41f63a8 feat(metrics): exposes the engine state as an endpoint Morgan Funtowicz 2025-02-27 16:58:02 +01:00
  • bb8f59632f feat(metrics): exposes queue size as tokens along with individual requests count Morgan Funtowicz 2025-02-27 14:32:51 +01:00
  • 330f2e419f feat: improve partial parsing types and add test for balancing and partial parsing drbh 2025-02-26 18:53:12 +00:00
  • a5ddc9db52 feat: refactor and simplify chat stream more, bump tests and support stream_options drbh 2025-02-25 20:55:56 +00:00
  • c4cb54c23e fix: bump integrations requirements drbh 2025-02-24 21:56:26 +00:00
  • 31a536d796 feat: refactor chat stream to remove state machine and simplfy logic drbh 2025-02-24 21:51:33 +00:00
  • a416ddbdd9 fix: adjust integration tests for openai client dep drbh 2025-02-19 11:46:10 -05:00
  • e1b6d5be4a fix: clippy cleanup drbh 2025-02-18 21:31:20 +00:00
  • 538456ba68 fix: only send function name on first stream event drbh 2025-02-18 21:13:03 +00:00
  • 68aa6b1af0 fix: bump requirements file too drbh 2025-02-17 14:54:49 +00:00
  • fd611f30c9 fix: bump integration test deps for openai drbh 2025-02-17 14:40:31 +00:00
  • 7d17d7cef7 fix: bump client tests for api changes drbh 2025-02-17 14:19:52 +00:00
  • c215c0de88 fix: bump client test expected prefill drbh 2025-02-17 13:56:59 +00:00
  • 1529a676d9 fix: remove snap with incorrect naming drbh 2025-02-17 13:41:06 +00:00
  • 40f905d00b fix: adjust stream, improve tests and add openai client test drbh 2025-02-17 13:38:49 +00:00
  • 07c20903e5 fix: ensure wrapping curly is not included drbh 2025-02-11 15:05:14 +00:00
  • dbce04e4d3 fix: adjust streaming tool response drbh 2025-02-11 14:22:03 +00:00
  • 5f030140be fix: bump openapi spec drbh 2025-02-10 15:14:00 +00:00