Large Language Model Text Generation Inference
COMMITS
/ Dockerfile_intel May 6, 2025
W
IPEX support FP8 kvcache/softcap/slidingwindow (#3144)
Wang, Yi committed
April 15, 2025
W
transformers flash llm/vlm enabling in ipex (#3152)
Wang, Yi committed
March 24, 2025
N
Torch 2.6 (#3134)
Nicolas Patry committed
March 18, 2025
N
Intel docker. (#3121)
Nicolas Patry committed
March 17, 2025
W
xpu 2.6 update (#3051)
Wang, Yi committed
March 4, 2025
N
Patch rust release. (#3069)
Nicolas Patry committed
N
Revert "Patch rust release."
Nicolas Patry committed
N
Patch rust release.
Nicolas Patry committed
February 20, 2025
W
update ipex and torch to 2.6 for cpu (#3039)
Wang, Yi committed
February 18, 2025
N
It's find in some machine. using hf_hub::api::sync::Api to download c… (#3030)
Nicolas Patry committed
February 7, 2025
N
Updating mllama after strftime. (#2993)
Nicolas Patry committed
February 6, 2025
W
Triton fix (#2995)
Wang, Yi committed
N
Using the "lockfile". (#2992)
Nicolas Patry committed
January 22, 2025
W
fix moe in quantization path (#2935)
Wang, Yi committed
January 17, 2025
N
Moving to `uv` instead of `poetry`. (#2919)
Nicolas Patry committed
January 15, 2025
N
Upgrading our rustc version. (#2908)
Nicolas Patry committed
January 9, 2025
W
update ipex xpu to fix issue in ARC770 (#2884)
Wang, Yi committed
December 19, 2024
W
change xpu lib download link (#2852)
Wang, Yi committed
December 6, 2024
W
use oneapi 2024 docker image directly for xpu (#2793)
Wang, Yi committed
November 26, 2024
W
November 20, 2024
D
Install compressed-tensors in Docker CPU builds
Daniël de Kok committed
November 18, 2024
W
add ipex moe implementation to support Mixtral and PhiMoe (#2707)
Wang, Yi committed
November 10, 2024
D
Add initial support for compressed-tensors checkpoints (#2732)
Daniël de Kok committed
October 30, 2024
W
October 16, 2024
O
feat: prefill chunking (#2600)
OlivierDehaene committed
October 14, 2024
W
update ipex to fix incorrect output of mllama in cpu (#2640)
Wang, Yi committed
October 8, 2024
N
Upgrade minor rust version (Fixes rust build compilation cache) (#2617)
Nicolas Patry committed
September 12, 2024
W
hotfix : enable intel ipex cpu and xpu in python3.11 (#2517)
Wang, Yi committed
September 11, 2024
N
Fix tokenization yi (#2507)
Nicolas Patry committed
September 5, 2024
W
hotfix: fix regression of attention api change in intel platform (#2439)
Wang, Yi committed
August 29, 2024
N
Lots of improvements (Still 2 allocators) (#2449)
Nicolas Patry committed
August 13, 2024
W
add numa to improve cpu inference perf (#2330)
Wang, Yi committed
August 9, 2024
N
Using HF_HOME instead of CACHE to get token read in addition to models. (#2288)
Nicolas Patry committed
July 31, 2024
N
Rebase TRT-llm (#2331)
Nicolas Patry committed
July 3, 2024
N
Fixing the dockerfile warnings. (#2173)
Nicolas Patry committed
July 2, 2024
W
fix FlashDecoding change's regression in intel platform (#2161)
Wang, Yi committed
June 25, 2024
W
Cpu tgi (#1936)
Wang, Yi committed
W
use xpu-smi to dump used memory (#2047)
Wang, Yi committed
June 24, 2024
U
Fix cargo-chef prepare (#2101)
ur4t committed
June 17, 2024
D
Set maximum grpc message receive size to 2GiB (#2075)
Daniël de Kok committed
June 6, 2024
W
Xpu gqa (#2013)
Wang, Yi committed
N
Internal runner ? (#2023)
Nicolas Patry committed
June 5, 2024
O
feat: move allocation logic to rust (#1835)
OlivierDehaene committed
June 3, 2024
W
reable xpu, broken by gptq and setuptool upgrade (#1988)
Wang, Yi committed
May 23, 2024
W
reenable xpu for tgi (#1939)
Wang, Yi committed
May 6, 2024
W
update xpu docker image and use public ipex whel (#1860)
Wang, Yi committed
N
Upgrading to rust 1.78. (#1851)
Nicolas Patry committed
April 26, 2024
W
add intel xpu support for TGI (#1475)
Wang, Yi committed