Commit Graph

  • b4adbf2f6e docs: add AWS (EC2/SageMaker) deployment + benchmarking guide (#3352) main Fahad Alghanim 2026-03-21 04:34:22 -07:00
  • db931fcf42 Update CodeQL workflow for security analysis Pauline Bailly-Masson 2026-01-08 15:02:49 +01:00
  • dfb3fbe78e fix(num_devices): fix num_shard/num device auto compute when NVIDIA_VISIBLE_DEVICES == "all" or "void" (#3346) v3.3.7 oOraph 2025-12-18 16:58:43 +01:00
  • 34988476cb Maintenance mode (#3345) Lysandre Debut 2025-12-11 15:29:10 +01:00
  • 52c6dddf97 maintenance mode Julien Chaumond 2025-12-11 14:49:38 +01:00
  • 55f7f7cb8b Maintenance mode (#3344) Lysandre Debut 2025-12-11 14:05:46 +01:00
  • 71d5987816 fix: avoid gaudi until harware is available compile-grammar-in-router drbh 2025-12-09 22:02:55 +00:00
  • 5a34147f53 fix: avoid flaky test for dev and update neuron deps drbh 2025-12-09 21:33:10 +00:00
  • d3d4f7a5f1 fix: bump test timeout drbh 2025-12-08 22:32:37 +00:00
  • dded0cdc92 fix: adjust typer for neuron cli drbh 2025-12-08 14:57:17 +00:00
  • 71450d3ef0 fix: removed unsused vocab population comment drbh 2025-12-08 12:48:01 +00:00
  • 68398dbb36 fix: adjust aiter install on amd docker drbh 2025-11-19 18:47:48 +00:00
  • 9f4283f190 fix: adjust flaky tests and amd dockerfile tweaks drbh 2025-11-19 16:32:27 +00:00
  • 93a821a2ed fix: version pins for amd, intel and neuron drbh 2025-11-19 02:45:26 +00:00
  • e42b430f47 fix: clippy cleanups drbh 2025-11-19 02:11:09 +00:00
  • 433abf1141 fix: bump outline crate hash and remove debug log drbh 2025-11-19 01:57:33 +00:00
  • 2e72daa279 feat: prefer latest outlines core and compile grammar in router drbh 2025-11-14 03:34:48 +00:00
  • 24ee40d143 feat: support max_image_fetch_size to limit (#3339) drbh 2025-11-18 12:29:21 -05:00
  • 85790a19a7 misc(gha): expose action cache url and runtime as secrets (#2964) Funtowicz Morgan 2025-11-17 10:50:10 +01:00
  • 27bc1271d1 fix: prefer meta-llama/Llama-2-7b-hf over deprecated repo update-flake-deps-and-logit-processor drbh 2025-11-06 19:42:45 +00:00
  • ae1fb28434 fix: adjust leftover spaces lint drbh 2025-11-06 18:21:18 +00:00
  • 00cef24e7f fix: bump flake and update grammar logit processor drbh 2025-11-06 00:55:48 +00:00
  • efb94e0d3d Patch version 3.3.6 (#3329) v3.3.6 Alvaro Moran 2025-09-17 01:15:23 +02:00
  • 5e747f4e30 Revert "feat: bump flake including transformers and huggingface_hub versions" (#3330) drbh 2025-09-16 11:32:19 -04:00
  • 1b90c508af Revert "Revert "feat: bump flake including transformers and huggingfa… (#3326) drbh 2025-09-09 10:44:25 -04:00
  • d2ad7c484e Update iframe sources for streaming demo (#3327) Eliott C. 2025-09-09 15:36:19 +02:00
  • c6071749db Fix mask passed to flashinfer (#3324) Daniël de Kok 2025-09-08 19:47:03 +02:00
  • 4f067c22c3 fix: remove azure (#3325) drbh 2025-09-08 13:41:45 -04:00
  • 9dedeb89ac Revert "feat: bump flake including transformers and huggingface_hub versions" (#3323) drbh 2025-09-08 06:17:29 -04:00
  • 5739b5b088 Add missing backslash (#3311) Phil 2025-09-06 09:50:14 +02:00
  • 49b414b5b8 Bump transformers in non-Nix as well to run CI transformers-ci Daniël de Kok 2025-09-05 11:45:25 +00:00
  • 8d029d2fc3 chore: release v3.3.5 v3.3.5 git_v3.3.5 Alvaro Moran 2025-09-02 16:58:41 +02:00
  • 356de85c29 feat: bump flake including transformers and huggingface_hub versions (#3313) drbh 2025-09-02 09:46:41 -04:00
  • 0f79162288 chore: prepare version 3.3.5 (#3314) Alvaro Moran 2025-09-02 15:35:42 +02:00
  • 06d9d88b95 Disable Cachix pushes (#3312) Daniël de Kok 2025-08-26 19:27:57 +02:00
  • 8801ba12cf Optimum neuron 0.3.0 (#3308) Alvaro Moran 2025-08-26 11:07:47 +02:00
  • d618424d50 HuggingFaceM4/Idefics3-8B-Llama3 crash fix (#3267) Wang, Yi 2025-08-21 16:04:30 +08:00
  • c5e6f9a178 Fix outline import issue (#3282) Wang, Yi 2025-08-21 15:53:04 +08:00
  • 6624fec1f9 Some gptq case could not be handled by ipex. but could be handle by triton (#3298) Wang, Yi 2025-08-19 15:37:49 +08:00
  • 5284b5c654 Multi modality fix (#3283) Wang, Yi 2025-08-19 15:36:36 +08:00
  • 6a2fa83540 XCCL for XPU (#3252) Wang, Yi 2025-08-19 06:37:27 +08:00
  • b4386b8c77 Migrate to V2 Pydantic interface (#3262) Emmanuel Ferdman 2025-08-19 00:55:21 +03:00
  • 75ebb228f4 compressed_tensors w8a8 test fixes quantization-0.1 Daniël de Kok 2025-07-18 11:05:02 +00:00
  • 24c2bff659 Gaudi gptq gidx support (#3297) Wang, Yi 2025-07-17 22:00:12 +08:00
  • bd33a23cac More grpcio shenanigans 20250708-ci-fixes Daniël de Kok 2025-07-08 14:49:23 +00:00
  • df53facda9 AMD grpcio? Daniël de Kok 2025-07-08 14:15:41 +00:00
  • a3db7edd67 Set grpcio upper bound to 1.73 (exclusive) Daniël de Kok 2025-07-08 13:55:00 +00:00
  • 5a6e09e32e Revert "protobuf < 6.0" Daniël de Kok 2025-07-08 13:53:13 +00:00
  • 48bb4b4f1e protobuf < 6.0 Daniël de Kok 2025-07-08 13:32:04 +00:00
  • bfdaf5773c Add outlines upper bound Daniël de Kok 2025-07-08 13:28:51 +00:00
  • 47d5991b25 Update quantization kernels to 0.1.2 for fixes Daniël de Kok 2025-07-08 12:48:15 +00:00
  • da47e5754b fix: cleanup unit tests improve-json-schema-field drbh 2025-07-07 17:59:35 +00:00
  • 43fd3bd7f4 fix: refactor and simplify structs and openapi drbh 2025-07-07 17:53:34 +00:00
  • b6540cea50 fix: lint and format fix-tool-call-def drbh 2025-07-07 16:14:00 +00:00
  • 71fbe88a30 fix: enable defs references in tool calls drbh 2025-07-07 14:35:04 +00:00
  • fc2405c549 [gaudi] Fix the CI test errors (#3286) Yuan Wu 2025-07-07 17:32:07 +08:00
  • ebb26f0ccd [gaudi] Deepseek v2 mla and add ep to unquantized moe (#3287) Wang, Yi 2025-07-07 17:29:39 +08:00
  • c90ac9f65a Update snapshot Daniël de Kok 2025-07-06 17:18:28 +00:00
  • a76ae953fe Update quantization kernels Daniël de Kok 2025-07-07 06:12:18 +00:00
  • 778b61c0da [gaudi] Remove unnecessary reinitialize to HeterogeneousNextTokenChooser to make sampling output correct (#3284) Wang, Yi 2025-07-03 16:03:16 +08:00
  • 3d2e7c8fce Optimum neuron 0.2.2 (#3281) David Corvoysier 2025-07-03 07:59:25 +02:00
  • f6005d6813 xpu lora support (#3232) Wang, Yi 2025-07-02 23:54:25 +08:00
  • 429dcd9c64 [gaudi] Gemma3 sliding window support (#3280) Wang, Yi 2025-07-01 16:06:01 +08:00
  • 5f70fbdc2a feat: allow json_schema in response format and add test drbh 2025-06-25 19:43:49 +00:00
  • 9f38d93051 Gaudi: add CI (#3160) Baptiste Colle 2025-06-24 18:51:09 +02:00
  • 719907410b [gaudi] Refine rope memory, do not need to keep sin/cos cache per layer (#3274) Wang, Yi 2025-06-23 17:15:39 +08:00
  • d4bd5cac79 chore: version 3.3.4 v3.3.4 git_v3.3.4 David Corvoysier 2025-06-19 09:08:38 +00:00
  • 238fbd4d50 Neuron backend fix and patch version 3.3.4 (#3273) David Corvoysier 2025-06-19 10:52:41 +02:00
  • 14ee6e7804 [gaudi] gemma3 text and vlm model intial support. need to add sliding window support later (#3270) Wang, Yi 2025-06-19 15:32:34 +08:00
  • 1754b79f10 chore: release 3.2.3 v3.3.3 git_v3.3.3 David Corvoysier 2025-06-18 12:59:29 +00:00
  • bd1bdebb47 doc: fix README (#3271) David Corvoysier 2025-06-18 12:35:36 +02:00
  • f13e28c98d [gaudi] Refine logging for Gaudi warmup (#3222) regisss 2025-06-18 04:34:00 -06:00
  • b4d17f18ff chore: prepare release 3.3.3 (#3269) David Corvoysier 2025-06-18 11:55:26 +02:00
  • 0627983c17 [Gaudi] use pad_token_id to pad input id (#3268) Wang, Yi 2025-06-17 15:07:25 +08:00
  • 3752143b39 [Gaudi] Fix the integration-test issues (#3265) Yuan Wu 2025-06-13 20:47:06 +08:00
  • ded4cb52ac [Gaudi] Enable Qwen3_moe model (#3244) Yuan Wu 2025-06-13 18:03:24 +08:00
  • a220e57f45 [gaudi] HuggingFaceM4/idefics2-8b issue fix (#3264) Wang, Yi 2025-06-13 18:00:08 +08:00
  • e07056ab3f [Gaudi] Remove optimum-habana (#3261) Yuan Wu 2025-06-13 04:35:36 +08:00
  • 25fdc5f03c [gaudi] Move the _update_cos_sin_cache into get_cos_sin (#3254) Yuan Wu 2025-06-13 04:31:11 +08:00
  • 613b8dd647 [gaudi] Vlm rebase and issue fix in benchmark test (#3263) Wang, Yi 2025-06-13 04:26:37 +08:00
  • 839477670a [gaudi] Perf optimization (#3256) Wang, Yi 2025-06-11 21:00:21 +08:00
  • 79183d1647 Bump neuron SDK version (#3260) David Corvoysier 2025-06-10 17:56:25 +02:00
  • 2204f91f32 fix: adjust llava logic and bump snaps support-granite-vision drbh 2025-06-06 14:54:10 +00:00
  • 30bdf922bd feat: improve llava next pooling for granite vision drbh 2025-06-04 13:50:39 +00:00
  • 1ff9d185d5 Remove useless packages (#3253) Yuan Wu 2025-06-03 19:42:29 +08:00
  • 8e41da951d Release 3.3.2 v3.3.2 git_3.3.2 Daniël de Kok 2025-05-30 14:19:18 +00:00
  • 249189d96e Prepare for 3.3.2 (#3249) Daniël de Kok 2025-05-30 16:16:36 +02:00
  • 6b6e30a6f6 [gaudi] Fix the Llama-4-Maverick-17B-128E crash issue (#3246) Yuan Wu 2025-05-29 17:38:44 +08:00
  • 70217ac345 [Gaudi] Fix the OOM issue of Llama-4-Scout-17B-16E-Instruct (#3245) Yuan Wu 2025-05-29 15:58:24 +08:00
  • f14044009a fp8 compressed tensors w8a8 support for Gaudi backend (#3242) Wang, Yi 2025-05-28 20:54:20 +08:00
  • 1883a62a94 Add Qwen3 for Gaudi backend (#3229) Yuan Wu 2025-05-23 14:58:35 +08:00
  • f58d7cf50e Nix: switch to hf-nix (#3240) Daniël de Kok 2025-05-22 17:09:15 +02:00
  • f08b44ade5 Upgrade to new vllm extension ops for Gaudi backend (fix issue in exponential bucketing) (#3239) Wang, Yi 2025-05-22 21:29:16 +08:00
  • 767a65202d Release 3.3.1 v3.3.1 git_3.3.1 Daniël de Kok 2025-05-22 07:47:12 +00:00
  • 674c514d44 Prepare for 3.3.1 (#3238) Daniël de Kok 2025-05-22 09:43:55 +02:00
  • 9e7e546923 Move input_ids to hpu and remove disposal of adapter_meta (#3237) Wang, Yi 2025-05-22 15:21:31 +08:00
  • e32528792c Switch to punica-sgmv kernel from the Hub (#3236) Daniël de Kok 2025-05-21 15:44:15 +02:00
  • 43b1b07fb9 Fix the crash in default ATTENTION path for Gaudi backend (#3235) Wang, Yi 2025-05-20 20:02:32 +08:00
  • 000e313a92 Refine warmup and upgrade to synapse AI 1.21.0 (#3234) Wang, Yi 2025-05-20 16:22:43 +08:00
  • d658b5def3 Deepseek R1 for Gaudi backend (#3211) Wang, Yi 2025-05-19 22:36:39 +08:00