Large Language Model Text Generation Inference
COMMITS
/ flake.lock September 16, 2025
September 9, 2025
September 8, 2025
September 2, 2025
May 22, 2025
D
Nix: switch to hf-nix (#3240)
Daniël de Kok committed
May 21, 2025
D
Switch to punica-sgmv kernel from the Hub (#3236)
Daniël de Kok committed
May 15, 2025
D
Update to Torch 2.7.0 (#3221)
Daniël de Kok committed
April 7, 2025
M
Update transformers to 4.51 (#3148)
Mohit Sharma committed
April 6, 2025
N
Preparing for release. (#3147)
Nicolas Patry committed
March 24, 2025
N
Torch 2.6 (#3134)
Nicolas Patry committed
March 13, 2025
D
Update to `kernels` 0.2.1 (#3084)
Daniël de Kok committed
March 10, 2025
D
Pr 3003 ci branch (#3007)
drbh committed
February 21, 2025
D
Use `rotary` kernel from the Hub (#3041)
Daniël de Kok committed
February 20, 2025
D
flashinfer 0.2.0.post1 -> post2 (#3040)
Daniël de Kok committed
February 18, 2025
D
Use eetq kernel from the hub (#3029)
Daniël de Kok committed
February 10, 2025
D
Use kernels from the kernel hub (#2988)
Daniël de Kok committed
January 31, 2025
N
Back on nix main. (#2979)
Nicolas Patry committed
N
Prepare for release 3.1.0 (#2972)
Nicolas Patry committed
January 30, 2025
N
Add deepseekv3 (#2968)
Nicolas Patry committed
January 29, 2025
D
Update to moe-kernels 0.8.0 (#2966)
Daniël de Kok committed
January 27, 2025
D
Update to attention-kernels 0.2.0 (#2950)
Daniël de Kok committed
January 22, 2025
N
Upgrading the deps to have transformers==4.48.0 necessary (#2937)
Nicolas Patry committed
January 17, 2025
N
Moving to `uv` instead of `poetry`. (#2919)
Nicolas Patry committed
D
nix: update to PyTorch 2.5.1 (#2921)
Daniël de Kok committed
January 15, 2025
N
Upgrading our rustc version. (#2908)
Nicolas Patry committed
January 10, 2025
D
Update to marlin-kernels 0.3.7 (#2882)
Daniël de Kok committed
January 9, 2025
D
Basic flashinfer 0.2 support (#2862)
Daniël de Kok committed
November 22, 2024
D
chore: Update to marlin-kernels 0.3.6 (#2771)
Daniël de Kok committed
November 21, 2024
D
nix: downgrade to outlines 0.1.3 (#2768)
Daniël de Kok committed
November 20, 2024
D
nix: update for outlines 0.1.4 (#2764)
Daniël de Kok committed
November 19, 2024
D
Update to moe-kernels 0.7.0 (#2720)
Daniël de Kok committed
November 18, 2024
D
Add support for compressed-tensors w8a8 int checkpoints (#2745)
Daniël de Kok committed
November 17, 2024
D
Remove vLLM dependency for CUDA (#2751)
Daniël de Kok committed
November 14, 2024
D
nix: update nixpkgs (#2746)
Daniël de Kok committed
November 10, 2024
D
Add initial support for compressed-tensors checkpoints (#2732)
Daniël de Kok committed
November 4, 2024
D
nix: move to tgi-nix `main` (#2718)
Daniël de Kok committed
October 28, 2024
N
We can have a tokenizer anywhere. (#2527)
Nicolas Patry committed
October 25, 2024
D
Switch from fbgemm-gpu w8a8 scaled matmul to vLLM/marlin-kernels (#2688)
Daniël de Kok committed
October 24, 2024
D
Add support for FP8 KV cache scales (#2628)
Daniël de Kok committed
October 8, 2024
D
nix: move back to the tgi-nix main branch (#2620)
Daniël de Kok committed
D
Add support for fused MoE Marlin for AWQ (#2616)
Daniël de Kok committed
October 4, 2024
D
nix: example of local package overrides during development (#2607)
Daniël de Kok committed
October 2, 2024
N
Mllama flash version (#2585)
Nicolas Patry committed
September 30, 2024
D
MoE Marlin: support `desc_act` for `groupsize != -1` (#2590)
Daniël de Kok committed
D
Move flake back to tgi-nix `main` (#2586)
Daniël de Kok committed
D
Add support for GPTQ-quantized MoE models using MoE Marlin (#2557)
Daniël de Kok committed
September 27, 2024
D
Improve support for GPUs with capability < 8 (#2575)
Daniël de Kok committed
September 19, 2024
D
Update to moe-kenels 0.3.1 (#2535)
Daniël de Kok committed
N
Stream options. (#2533)
Nicolas Patry committed
September 16, 2024
N
Adding a test for FD. (#2516)
Nicolas Patry committed