Commit Graph

  • 40b00275b2 Attempt to remove AWS S3 flaky cache for sccache (#2953) Funtowicz Morgan 2025-01-27 11:21:48 +01:00
  • bafbd06744 Update transformers_flash_causal_lm.py fix-tp Cyril Vallez 2025-01-24 15:06:50 +01:00
  • de83178bc3 tp monkey patch Cyril Vallez 2025-01-24 15:03:14 +01:00
  • 6cb41a80a1 Revert "Remove AWS credentials?" Nicolas Patry 2025-01-24 14:34:17 +01:00
  • d2ff68e98d Remove AWS credentials? Nicolas Patry 2025-01-24 12:18:28 +01:00
  • b70f29d729 Bypasse perm issue. v3.0.2 git_v3.0.2 Nicolas Patry 2025-01-24 12:12:47 +01:00
  • e413b01eb1 Create patch release. Nicolas Patry 2025-01-24 10:50:15 +01:00
  • 02e4b9ab32 backend(vllm): plug in the tokio server and CLI Morgan Funtowicz 2025-01-24 10:41:07 +01:00
  • d9dda11726 Trying to put back the archlist (to fix the oom). (#2947) Nicolas Patry 2025-01-24 09:32:17 +01:00
  • d937eb64da Fixing cargo lock. Nicolas Patry 2025-01-23 18:54:34 +01:00
  • 18c4607d46 Transformers backend TP fix (#2945) Cyril Vallez 2025-01-23 18:09:57 +01:00
  • 29a0893b67 Tmp tp transformers (#2942) Nicolas Patry 2025-01-23 18:07:30 +01:00
  • 0a89902663 [TRTLLM] Expose finish reason (#2841) Funtowicz Morgan 2025-01-23 16:48:26 +01:00
  • 4e172028aa Add NVIDIA A40 to known cards (#2941) Nikolai Kolodziej 2025-01-23 14:19:21 +01:00
  • 6ab02931cf Set alias for max_completion_tokens in ChatRequest (#2932) Alvaro Bartolome 2025-01-23 14:18:47 +01:00
  • cc212154e0 Bump TensorRT-LLM backend dependency to v0.16.0 (#2931) Funtowicz Morgan 2025-01-23 13:54:40 +01:00
  • bd2ec03d53 backend(vllm): statically allocate LLMEngine Morgan Funtowicz 2025-01-22 22:15:33 +01:00
  • 1dd346666a Clarify FP8-Marlin use on capability 8.9 (#2940) Daniël de Kok 2025-01-22 18:18:11 +01:00
  • 1d3c9beba8 fix moe in quantization path (#2935) Wang, Yi 2025-01-22 21:36:15 +08:00
  • 6d335ca7ce Remove modifications in Lock. new_minor_version Nicolas Patry 2025-01-22 13:37:17 +01:00
  • b21d3c1e73 Upgrade the version number. Nicolas Patry 2025-01-22 12:29:50 +01:00
  • 2dfe3b3ee6 Upgrading the deps to have transformers==4.48.0 necessary (#2937) Nicolas Patry 2025-01-22 12:20:15 +01:00
  • cfd22726c9 backend(vllm): initial commit Morgan Funtowicz 2025-01-21 23:37:56 +01:00
  • 64a33c1f05 Run pre-commit run --all-files to fix CI (#2933) Alvaro Bartolome 2025-01-21 17:33:33 +01:00
  • bdb3e488e4 Trying to avoid the random timeout. (#2929) Nicolas Patry 2025-01-21 11:06:10 +01:00
  • 17367438f3 Give TensorRT-LLMa proper CI/CD 😍 (#2886) Funtowicz Morgan 2025-01-21 10:19:16 +01:00
  • b980848abf Flash Transformers modeling backend support (#2913) Cyril Vallez 2025-01-21 10:01:51 +01:00
  • 16162602c2 Add fp8 support moe models Mohit Sharma 2025-01-20 13:55:54 +00:00
  • 447a5b2f87 Fixing TRTLLM dockerfile. (#2922) Nicolas Patry 2025-01-20 11:13:46 +01:00
  • 630f198624 flashinfer: switch to plan API (#2904) Daniël de Kok 2025-01-17 18:18:02 +01:00
  • 8f6146f11a Revert "feat: improve qwen2-vl startup " (#2924) drbh 2025-01-17 12:09:05 -05:00
  • eecca27113 feat: improve qwen2-vl startup (#2802) drbh 2025-01-17 11:50:41 -05:00
  • 17192c9a0e fix: remove test debug params enable-qwen2vl-video drbh 2025-01-17 16:19:02 +00:00
  • 6e982f43a1 fix the crash of meta-llama/Llama-3.2-1B (#2918) Wang, Yi 2025-01-17 22:50:58 +08:00
  • b4187d6022 Add tgi_batch_current_size and tgi_batch_current_size as response header response-header-metrics Corentin REGAL 2025-01-17 15:48:02 +01:00
  • c20025dbf7 Add fp8 kv cache for ROCm (#2856) Mohit Sharma 2025-01-17 18:43:29 +05:30
  • de19e7e844 Moving to uv instead of poetry. (#2919) Nicolas Patry 2025-01-17 12:32:00 +01:00
  • d61f14f271 nix: update to PyTorch 2.5.1 (#2921) Daniël de Kok 2025-01-17 12:12:11 +01:00
  • 885144166f Flash decoding kernel adding and prefill-chunking and prefix caching enabling in intel cpu/xpu (#2815) Wang, Yi 2025-01-17 19:04:57 +08:00
  • bde5f9ad82 nix: update to PyTorch 2.5.1 nix/pytorch-2.5.1 Daniël de Kok 2025-01-17 06:44:21 +00:00
  • 82f6ea1b71 feat: improve star coder to support multi lora layers (#2883) drbh 2025-01-16 16:23:55 -05:00
  • 78cd756caf fix: improve video processing and update unsupported paths drbh 2025-01-16 17:20:27 +00:00
  • 5f78ec32a5 Do not convert weight scale to e4m3fnuz on CUDA (#2917) Daniël de Kok 2025-01-16 13:44:32 +01:00
  • 922cc38fbc Upgrading bitsandbytes. (#2910) Nicolas Patry 2025-01-15 20:07:21 +01:00
  • 120bd3e3bb Removing the github runner. (#2912) Nicolas Patry 2025-01-15 19:20:44 +01:00
  • 1470aec9d9 Fix typo in TPU docs (#2911) Baptiste Colle 2025-01-15 18:32:07 +01:00
  • 203cade244 Upgrading our rustc version. (#2908) Nicolas Patry 2025-01-15 17:04:03 +01:00
  • 46994b34fb 📝 add guide on using TPU with TGI in the docs (#2907) Baptiste Colle 2025-01-15 16:26:11 +01:00
  • dc9b8e9814 Fix docker run in README.md (#2861) Alvaro Bartolome 2025-01-15 16:07:10 +01:00
  • 3c7ae48f7f docs(conceptual/speculation): available links Train Medusa (#2863) Guspan Tanadi 2025-01-15 22:05:54 +07:00
  • cc8b9650bd Baichuan2-13B does not have max_position_embeddings in config (#2903) Wang, Yi 2025-01-15 22:56:52 +08:00
  • e07acc7f68 Enable FP8 Per-Tensor Scales and Integrate Marlin/MoE Kernels Repo for ROCm (#2825) Mohit Sharma 2025-01-15 11:38:58 +05:30
  • 48067e4a0d fmt baichuan2-13b Wang, Yi A 2025-01-13 17:23:28 -08:00
  • 22ed5703de Update server/text_generation_server/models/flash_causal_lm.py Wang, Yi 2025-01-14 08:58:48 +08:00
  • 880ab9c2f3 Add Flash decoding kernel ROCm (#2855) Mohit Sharma 2025-01-13 15:42:35 +05:30
  • 1660154ae6 fix crash in torch2.6 if TP=1 (#2885) Wang, Yi 2025-01-13 18:11:31 +08:00
  • 2e22164d4a Update using_guidance.md (#2901) Nicholas Broad 2025-01-13 02:09:35 -08:00
  • 5ad8c9a40b Baichuan2-13B does not have max_position_embeddings in config see https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat/blob/main/config.json Wang, Yi A 2025-01-12 22:47:23 -08:00
  • 83624a07be Add possible variants for A100 and H100 GPUs for auto-detecting flops (#2837) lazariv 2025-01-10 16:12:02 +01:00
  • 01067f8ba8 chore: Update jsonschema to 0.28.0 (#2870) Dmitry Dygalo 2025-01-10 06:01:54 -08:00
  • 4f7e00f4ce Update to marlin-kernels 0.3.7 (#2882) Daniël de Kok 2025-01-10 12:43:44 +01:00
  • da5ab46705 Improve vlm support (add idefics3 support) (#2437) drbh 2025-01-09 10:35:32 -05:00
  • a9c7d2e3b6 Basic flashinfer 0.2 support (#2862) Daniël de Kok 2025-01-09 16:25:00 +01:00
  • c7b2e3f100 chore: Enable blocking feature for reqwest update-jsonschema Dmitry Dygalo 2025-01-09 11:07:49 +01:00
  • afb6c728d8 update ipex xpu to fix issue in ARC770 (#2884) Wang, Yi 2025-01-09 17:11:03 +08:00
  • d37a43e581 chore: fixed some typos and attribute issues in README (#2891) Ruida Zeng 2025-01-09 03:09:23 -06:00
  • db6a9e1232 add ats support ci-update_xpu_image Wang, Yi A 2025-01-07 16:23:16 -08:00
  • b51fc1cc0f update ipex xpu to fix issue in ARC770 Wang, Yi A 2025-01-06 18:28:14 -08:00
  • b27749eba7 fix: small refactor and cleanups drbh 2025-01-03 11:01:07 -05:00
  • 840efc5f6c chore: Update jsonschema to 0.28.0 Dmitry Dygalo 2024-12-29 15:56:22 +01:00
  • dcc1194198 fix: adjust trtllm looper for video chunk enum drbh 2024-12-16 17:05:28 +00:00
  • 4f42d0c731 fix: include the video feature in cargo chef command drbh 2024-12-13 18:11:04 +00:00
  • 27f758de0a fix: make ffmpeg-next dep optional with feature drbh 2024-12-13 18:00:15 +00:00
  • b4da6ad30e fix: feature flag video and remove from non cuda dockerfiles drbh 2024-12-13 17:36:34 +00:00
  • 5322abd9f5 fix: adjust whitespace lint drbh 2024-12-13 17:05:01 +00:00
  • 91ed362e74 fix: update trtllm dockefile after rebase drbh 2024-12-13 16:59:19 +00:00
  • bb00fb33ba fix: update lints after rebase drbh 2024-12-13 16:07:52 +00:00
  • 5c7bc91a2f fix: adjust batch_tokenized_inputs output in mllama drbh 2024-12-13 15:51:06 +00:00
  • 2ae152a188 fix: update all vlm forward args, pass shared libraries to final layer in docker and doc bump drbh 2024-12-12 22:00:02 +00:00
  • 1d6bf243eb fix: remove unnecessary cast drbh 2024-12-12 18:52:17 +00:00
  • e2b75a572f fix: resolve rebase issues and add test drbh 2024-12-12 18:31:33 +00:00
  • 71ed75a21b fix: pre commit and clippy lints drbh 2024-12-12 12:00:08 -05:00
  • db97d979fb cleanup prints Miquel Farre 2024-12-11 21:09:17 +00:00
  • 19e1c8da31 working version Miquel Farre 2024-12-11 21:08:03 +00:00
  • af77a0cadf fixing ssl issue Miquel Farre 2024-12-11 14:00:23 +00:00
  • cbf1d982ec installing ssl requirements prior to rust building stage Miquel Farre 2024-12-04 09:53:29 +00:00
  • 75ab887dda fix: copy shared libraries from builder drbh 2024-12-02 18:21:47 -05:00
  • b5b2184c0a fix: include usr lib in ld path drbh 2024-11-27 20:50:42 -05:00
  • 50b5399d9c fix: add ffmpeg to final layer of container drbh 2024-11-27 18:37:38 -05:00
  • 2dc078ad1d fix: bump deps in other dockerfiles drbh 2024-11-27 15:49:35 -05:00
  • 063104c217 Fix test devshell Daniël de Kok 2024-11-27 19:51:55 +00:00
  • 05004a6cfd Make the pure build work Daniël de Kok 2024-11-27 19:09:26 +00:00
  • 98392a7a3f Cleanup impure Nix shell Daniël de Kok 2024-11-27 18:43:57 +00:00
  • 167c6f06ab fix: include ffmpeg deps in autodocs workflow drbh 2024-11-27 11:28:30 -05:00
  • 96968a0da3 fix: add ffmpeg overlay and enable build drbh 2024-11-27 00:14:45 -05:00
  • f0c38412d1 fix: add libavdevice dep to tests workflow drbh 2024-11-26 17:29:57 -05:00
  • 4a76e8b8b4 fix: add libavfilter dep to test drbh 2024-11-26 17:26:10 -05:00
  • d5cc6707e0 fix: ensure pip is installed after installing deps in test workflow drbh 2024-11-26 17:16:50 -05:00
  • daf83a95c5 fix: adjust pkg config in test drbh 2024-11-26 17:16:00 -05:00
  • 137f3bb2ef fix: adjust dependencies and bump pip along with python drbh 2024-11-26 16:56:04 -05:00