Large Language Model Text Generation Inference
COMMITS
/ flake.nix September 16, 2025
September 9, 2025
September 8, 2025
September 2, 2025
May 22, 2025
D
Nix: switch to hf-nix (#3240)
Daniël de Kok committed
May 21, 2025
D
Switch to punica-sgmv kernel from the Hub (#3236)
Daniël de Kok committed
May 15, 2025
D
Update to Torch 2.7.0 (#3221)
Daniël de Kok committed
March 24, 2025
N
Torch 2.6 (#3134)
Nicolas Patry committed
March 13, 2025
D
Make the Nix-based Docker container work on non-NixOS (#3109)
Daniël de Kok committed
D
Update to `kernels` 0.2.1 (#3084)
Daniël de Kok committed
March 10, 2025
D
Nix: the launcher needs a Python env with Torch for GPU detection (#3085)
Daniël de Kok committed
February 21, 2025
D
Use `rotary` kernel from the Hub (#3041)
Daniël de Kok committed
February 20, 2025
D
flashinfer 0.2.0.post1 -> post2 (#3040)
Daniël de Kok committed
February 18, 2025
D
Use eetq kernel from the hub (#3029)
Daniël de Kok committed
January 31, 2025
N
Back on nix main. (#2979)
Nicolas Patry committed
N
Prepare for release 3.1.0 (#2972)
Nicolas Patry committed
January 30, 2025
N
Add deepseekv3 (#2968)
Nicolas Patry committed
January 29, 2025
D
Update to moe-kernels 0.8.0 (#2966)
Daniël de Kok committed
January 27, 2025
D
Update to attention-kernels 0.2.0 (#2950)
Daniël de Kok committed
January 22, 2025
N
Upgrading the deps to have transformers==4.48.0 necessary (#2937)
Nicolas Patry committed
January 17, 2025
D
nix: update to PyTorch 2.5.1 (#2921)
Daniël de Kok committed
January 10, 2025
D
Update to marlin-kernels 0.3.7 (#2882)
Daniël de Kok committed
January 9, 2025
D
Basic flashinfer 0.2 support (#2862)
Daniël de Kok committed
December 3, 2024
D
Sync (most) server dependencies with Nix (#2782)
Daniël de Kok committed
November 19, 2024
D
Update to moe-kernels 0.7.0 (#2720)
Daniël de Kok committed
November 18, 2024
D
Add support for compressed-tensors w8a8 int checkpoints (#2745)
Daniël de Kok committed
November 17, 2024
D
Remove vLLM dependency for CUDA (#2751)
Daniël de Kok committed
November 14, 2024
D
nix: update nixpkgs (#2746)
Daniël de Kok committed
November 10, 2024
D
Add initial support for compressed-tensors checkpoints (#2732)
Daniël de Kok committed
November 4, 2024
D
nix: move to tgi-nix `main` (#2718)
Daniël de Kok committed
October 25, 2024
D
Switch from fbgemm-gpu w8a8 scaled matmul to vLLM/marlin-kernels (#2688)
Daniël de Kok committed
October 24, 2024
D
Add support for FP8 KV cache scales (#2628)
Daniël de Kok committed
October 22, 2024
D
Add `impureWithCuda` dev shell (#2677)
Daniël de Kok committed
October 8, 2024
D
nix: move back to the tgi-nix main branch (#2620)
Daniël de Kok committed
D
Add support for fused MoE Marlin for AWQ (#2616)
Daniël de Kok committed
October 4, 2024
D
nix: example of local package overrides during development (#2607)
Daniël de Kok committed
October 2, 2024
N
Mllama flash version (#2585)
Nicolas Patry committed
October 1, 2024
D
nix: experimental support for building a Docker container (#2470)
Daniël de Kok committed
September 30, 2024
D
MoE Marlin: support `desc_act` for `groupsize != -1` (#2590)
Daniël de Kok committed
D
Move flake back to tgi-nix `main` (#2586)
Daniël de Kok committed
D
Add support for GPTQ-quantized MoE models using MoE Marlin (#2557)
Daniël de Kok committed
September 27, 2024
D
Improve support for GPUs with capability < 8 (#2575)
Daniël de Kok committed
September 19, 2024
D
doc: clarify that `--quantize` is not needed for pre-quantized models (#2536)
Daniël de Kok committed
N
Stream options. (#2533)
Nicolas Patry committed
September 17, 2024
D
nix: pure Rust check/fmt/clippy/test (#2525)
Daniël de Kok committed
September 12, 2024
N
Add nix test. (#2513)
Nicolas Patry committed
D
nix: support Python tokenizer conversion in the router (#2515)
Daniël de Kok committed
September 6, 2024
D
nix: add pyright/ruff for proper LSP in the impure devshell (#2496)
Daniël de Kok committed
September 2, 2024
D
nix: improve impure devshell (#2478)
Daniël de Kok committed
August 29, 2024
D
nix: build Torch against MKL and various other improvements (#2469)
Daniël de Kok committed