Commit Graph

  • 497e234d38 [EPLB] Cleanup the transfer logic for the various eplb maps (#34520) main Sage Moore 2026-03-27 02:18:46 -07:00
  • 6287e7fa20 [P/D] Mooncake: Add unit tests and minor fixes for mooncake connector (#36946) dtc 2026-03-27 16:26:40 +08:00
  • 84e439a9cb [CI/Build] Move nightly wheel index generation to a single post-build step (#38322) Shengqi Chen 2026-03-27 15:44:18 +08:00
  • a1746ff9ec [Doc] Clarify Helm chart location in deployment guide (#38328) Yuichiro Utsumi 2026-03-27 16:43:02 +09:00
  • 019afb37c0 Merge branch 'main' into fix-pixtral-lora fix-pixtral-lora Jee Jee Li 2026-03-27 15:41:18 +08:00
  • aee4c14689 [Bugfix] Fix Hermes tool parser when stream interval > 1 (#38168) Flora Feng 2026-03-27 02:42:26 -04:00
  • 0ae89f18fd [Refactor] Move FusedMoE hidden_size roundup to quant_method (#34285) Bowen Bao 2026-03-26 23:38:26 -07:00
  • c2b17d71af [CI] Add xpu auto-label rule for Intel GPU/XPU PRs (#38320) wenjun liu 2026-03-27 14:22:38 +08:00
  • becaed6ec8 [CPU] Support CT W4A16 on CPU MP kernel (#38219) Li, Jiang 2026-03-27 14:15:28 +08:00
  • a8eab8f30d [Model] Extract GatedDeltaNetAttention into shared layer for Qwen3Next and Qwen3.5 (#37975) Xiaoshuang Wang 2026-03-27 14:13:21 +08:00
  • 2babac0bed [frontend] dump openai responses type by alias (#38262) cjackal 2026-03-27 14:58:20 +09:00
  • 7cc302dd87 [kv_offload+HMA][7/N]: Support register_kv_caches for hybrid models (#37853) Or Ozeri 2026-03-27 08:38:33 +03:00
  • a17a1f12dc inlined dsv3.2 woosuk/ds-exp Woosuk Kwon 2026-03-27 05:20:23 +00:00
  • 999dfc1622 [Bugfix] Offload blocking tokenizer ops to shared thread pool to unblock event loop (#34789) Bvicii 2026-03-26 22:17:00 -07:00
  • 03d1e1ee1f Address conflict Jee Jee Li 2026-03-27 03:39:13 +00:00
  • c0be8b1f82 [CI] Fix model swap test failures khluu/mig-small-model-swaps khluu 2026-03-26 20:18:50 -07:00
  • 6ec55b6485 [CI] Swap to smaller models for MIG slice compatibility khluu 2026-03-26 18:33:42 -07:00
  • d86060122a [CI/Build] enable Intel XPU test flow with prebuilt image (#37447) wenjun liu 2026-03-27 09:16:04 +08:00
  • f73bcb1c51 Various Transformers v5 config fixes (#38247) Harry Mellor 2026-03-26 23:06:59 +00:00
  • f0b888ffd3 tests: de-duplicate online scoring API format checks cursor/test-quality-improvements-eeea Cursor Agent 2026-03-26 22:54:25 +00:00
  • 200bef28c9 Update vllm/v1/worker/gpu/kv_connector.py wentao-skip-work-when-empty Wentao Ye 2026-03-26 18:04:12 -04:00
  • fd9820bbf9 skip kv connector empty work yewentao256 2026-03-26 21:50:28 +00:00
  • 28048bd6b0 [Bugfix] Add missing f-string prefix in xgrammar choices error message (#38162) yzong-rh 2026-03-26 17:43:03 -04:00
  • 5e87f99a09 fix khluu/mig khluu 2026-03-26 13:56:27 -07:00
  • c32e97602d [Model Runner V2] Enable forcing a specific acceptance rate during rejection sampling (#38045) Giancarlo Delfin 2026-03-26 13:38:12 -07:00
  • 0904b6550d Fix multi-node allreduce fusion (#38136) Wei Zhao 2026-03-26 16:24:36 -04:00
  • 971b9f5595 merge khluu 2026-03-26 12:51:48 -07:00
  • f26fcdfb9e [Bugfix][ROCm] Fix lru_cache on paged_mqa_logits_module (#37547) Stig-Arne Grönroos 2026-03-26 21:01:05 +02:00
  • bc9c6fbbe6 [ROCm] [Bugfix] [Release] Fix nightly rocm release pipeline (#38263) TJian 2026-03-27 02:47:10 +08:00
  • bff9a1c266 [ROCm][CI] Override PYTORCH_ROCM_ARCH with detected GPU arch in test containers (#38165) Andreas Karatzas 2026-03-26 13:33:45 -05:00
  • db01535e2b [ROCm][CI] Add uv pip compile workflow for rocm-test.txt lockfile (#37930) Andreas Karatzas 2026-03-26 12:44:01 -05:00
  • a4cf9b22ba [ROCM][Bugfix] Use correct stride in cp_mha_gather_cache_kernel for hybrid model (#37228) (#37228) jennyyyyzhen 2026-03-26 10:33:39 -07:00
  • 72f577bb39 remove requires_token_ids_cpu wentao-remove-redundant-prompt-copy yewentao256 2026-03-26 16:55:23 +00:00
  • 9c3ae04bfe [ROCm][CI] Add LM Eval Qwen3.5 Models test for MI355 (#38155) Andreas Karatzas 2026-03-26 11:51:18 -05:00
  • a8e48a7b85 [CI] Fix conch kernel crash on 3D input by reshaping to 2D before GEMM (#38178) Andreas Karatzas 2026-03-26 11:46:03 -05:00
  • b9dbc5c4ab [Mamba][APC] Add test case to compare apc outputs (#34977) Divakar Verma 2026-03-26 12:40:35 -04:00
  • 60af7b967b [Releases] [ROCm] Enable Nightly Docker Image and Wheel Releases for ROCm (#37283) TJian 2026-03-27 00:32:25 +08:00
  • bdc1719eb9 [ROCm][CI] Fix AITER state leak in shared_fused_moe_routed_transform test (#38137) Andreas Karatzas 2026-03-26 11:26:46 -05:00
  • 257c0c5b50 add pdl sm103 Roger Wang 2026-03-26 16:25:35 +00:00
  • 0aac2048bf [Bugfix] Restore CUDA graph persistent buffers for FP8 FlashMLA decode (#35175) haosdent 2026-03-27 00:13:39 +08:00
  • cb2263218e [Bugfix][Minor] Fix potential NameError in mamba backend selector and misc typos (#35886) Chuan (Richard) Li 2026-03-26 08:59:24 -07:00
  • e054f152fa [CI] Add batch invariant test for b200 (#38014) Wentao Ye 2026-03-26 11:54:54 -04:00
  • 0f5b526040 [Fix] Remove unused packing_position_embedding from PaddleOCRVL for better checkpoint compatibility (#38232) zhang-prog 2026-03-26 23:34:49 +08:00
  • be1a85b7a2 Revert "[MoE Kernel] Flashinfer nvfp4 cutedsl moe kernel integration" (#38050) (#38169) Zhewen Li 2026-03-26 07:59:09 -07:00
  • 08709ef27b temp mypy fix luka/vllm-ir/rms-norm Luka Govedic 2026-03-26 12:52:53 +00:00
  • 244f7ce442 Rename direct_dispatch to enable_torch_wrap Luka Govedic 2026-03-25 21:36:43 +00:00
  • 4b7d84977a Fix no-grad issue Luka Govedic 2026-03-25 21:30:45 +00:00
  • a0bc250e66 Add vLLM IR tests to CI Luka Govedic 2026-03-25 19:07:45 +00:00
  • eaa3062851 Fix format, mypy still failing Luka Govedic 2026-03-26 12:52:41 +00:00
  • 2e225f7bd2 [Renderer] Consolidate factory methods (#38218) Cyrus Leung 2026-03-26 20:19:22 +08:00
  • 757eafcf37 [bug-fix] GLM OCR Patch Merger context_dim (#37962) Jared Wen 2026-03-26 20:11:21 +08:00
  • dcdc145893 [CI] Reorganize scoring tests (#38207) wang.yuqi 2026-03-26 20:07:01 +08:00
  • f2d16207c7 [ROCm][CI] Fix flaky GPTQ compile correctness test (#38161) Andreas Karatzas 2026-03-26 06:57:00 -05:00
  • 37a83007fe [ROCm][CI] Fix wvSplitKrc mock argument order in test_rocm_unquantized_gemm (#38167) Andreas Karatzas 2026-03-26 06:54:59 -05:00
  • 9fdc0f3aeb merge releases/v0.18.1 khluu 2026-03-26 02:17:52 -07:00
  • bf5eec638d [Refactor] Remove unused utils (#38153) Wentao Ye 2026-03-26 05:08:19 -04:00
  • b1cb1d3d2c DOC: Documentation pages fixes (#38125) Mateusz Sokół 2026-03-26 09:55:42 +01:00
  • 6ae8bbd0c2 [XPU] Disable xpu graph by default (#38193) Kunshang Ji 2026-03-26 16:53:45 +08:00
  • a9213c0ffe [Doc] Fix outdated reference to CUDAGraphManager (#38209) Cyrus Leung 2026-03-26 16:52:38 +08:00
  • 502c41a8f6 [Model] Use helper function to run MM processors with token inputs (where applicable) (#38018) Cyrus Leung 2026-03-26 16:44:04 +08:00
  • 05d96d7991 merge Vadim Gimpelson 2026-03-26 12:21:47 +04:00
  • 52069012fe [Bugfix] Fix DeepGemm E8M0 accuracy degradation for Qwen3.5 FP8 on Blackwell (#38083) Vadim Gimpelson 2026-03-26 12:21:47 +04:00
  • e9f3b99061 support MIG khluu 2026-03-26 01:08:11 -07:00
  • 71161e8b63 [cpu][ci] remove soft-fail for Arm CI and add quant model tests (#37691) Fadi Arafeh 2026-03-26 07:03:31 +00:00
  • 38de822310 [Model] Add torch.compile support for InternVL vision encoder (#38049) Terry Gao 2026-03-25 23:52:29 -07:00
  • 2bfbdca23c [Bugfix] Fix benchmark_fused_collective.py (#38082) Jee Jee Li 2026-03-26 14:51:00 +08:00
  • 9a86a53a3a Merge branch 'main' into integrate-deepgemm-cmake integrate-deepgemm-cmake Michael Goin 2026-03-26 06:22:12 +01:00
  • a9bf5d228b Merge branch 'main' into vadim/qwen35-no-deppgemm vadim/qwen35-no-deppgemm Roger Wang 2026-03-25 22:14:43 -07:00
  • 2908094567 Add /v1/chat/completions/batch endpoint for batched chat completions (#38011) Matej Rojec 2026-03-26 05:13:33 +01:00
  • e6bf9f15ec [Bugfix][CI] Fix Marlin FP8 Linear Kernel for Compressed Tensors Format (#38092) BadrBasowid 2026-03-26 12:11:43 +08:00
  • 144030c84e Relocate Encoder CUDA graph manager (#38116) Woosuk Kwon 2026-03-25 20:52:12 -07:00
  • e9855c5c19 add Roger Wang 2026-03-26 03:21:46 +00:00
  • e2db2b4234 [Tool Parser][1/3] Pass tools to ToolParser constructor (#38029) Flora Feng 2026-03-25 22:29:06 -04:00
  • 87f05d6880 [Revert] Remove DeepGEMM availability check in DeepseekV32IndexerMetadataBuilder (#38076) Chauncey 2026-03-26 09:43:51 +08:00
  • 36f6aede23 [Misc] Optimized check to encapsulate both CUDA and ROCm platforms (#34549) Andreas Karatzas 2026-03-25 20:43:07 -05:00
  • 9704a5c310 Disable dual stream execution of input projection for Qwen3 (#38152) Xin Yang 2026-03-25 18:20:39 -07:00
  • 74056039b7 Fix minimax m2.5 nvfp4 kv scales weight loading (#37214) Wei Zhao 2026-03-25 20:48:06 -04:00
  • d7d51a7ee5 [Bugfix] Fix Qwen3.5-FP8 Weight Loading Error on TPU (#37348) Jacob Platin 2026-03-25 17:46:01 -07:00
  • 3c3c084240 Various Transformers v5 fixes (#38127) Harry Mellor 2026-03-26 00:10:08 +00:00
  • 58b0c78a42 [MRV2] Support expert index capture woosuk/mrv2-expert-indices Woosuk Kwon 2026-03-25 23:48:32 +00:00
  • 7b54f60db0 [Cohere] Enable Cohere-Transcribe (#38120) Ekagra Ranjan 2026-03-25 19:13:51 -04:00
  • 9116a55322 adjust expected resultand tol Vadim Gimpelson 2026-03-26 02:33:49 +04:00
  • 6cf77e4c69 Add sp_min_token_num=0 to E2E correctness tests for SP and AsyncTP copilot/add-sp-min-token-to-e2e-tests copilot-swe-agent[bot] 2026-03-25 21:52:38 +00:00
  • ebaa11ade8 Initial plan copilot-swe-agent[bot] 2026-03-25 21:49:53 +00:00
  • a0e8c74005 [ROCm]: Update rope+kvcache fusion conditions and disable custom op by default (#36716) Rohan Potdar 2026-03-25 15:58:44 -05:00
  • 70a2152830 [MultiModal] add support for numpy array embeddings (#38119) Guillaume Guy 2026-03-25 15:13:04 -05:00
  • 23d0a6db0d optimize redundant copy yewentao256 2026-03-25 19:50:46 +00:00
  • 1c794bf748 Merge branch 'main' into wentao-fix-qwen3.5-batch-invariant wentao-fix-qwen3.5-batch-invariant Wentao Ye 2026-03-25 15:03:31 -04:00
  • 0ead2311b1 update yewentao256 2026-03-25 19:03:04 +00:00
  • efc123ed74 Merge branch 'main' into luka/vllm-ir/rms-norm Luka Govedic 2026-03-25 19:02:22 +00:00
  • 978fc18bf0 [ROCm] Utilize persistent MLA kernel from AITER (#36574) Sathish Sanjeevi 2026-03-25 12:00:42 -07:00
  • 7d6917bef5 [ROCm] Fix MoE kernel test failures on gfx950 (#37833) Andreas Karatzas 2026-03-25 13:46:40 -05:00
  • e38817fadb [Core][KV Connector] Remove use of num_cached_tokens in error handling (#38096) Mark McLoughlin 2026-03-25 18:20:48 +00:00
  • 72cad44d3c [Frontend] Move APIServerProcessManager target server fn (#38115) Nick Hill 2026-03-25 11:14:41 -07:00
  • ba2f0acc2d [Misc] Reorganize inputs (#35182) Cyrus Leung 2026-03-26 01:22:54 +08:00
  • 678b3c99e8 [MoE Kernel] Flashinfer nvfp4 cutedsl moe kernel integration (#38050) Yongye Zhu 2026-03-25 13:16:40 -04:00
  • bf4cc9ed2d [2/n] Migrate per_token_group_quant to torch stable ABI (#36058) mikaylagawarecki 2026-03-25 13:15:13 -04:00
  • 1ac2ef2e53 [CI/Docs] Improve aarch64/DGX Spark support for dev setup (#38057) Ben Browning 2026-03-25 12:24:42 -04:00
  • 6e37c46b35 [compile] Add some more startup tests for top models (#38046) Richard Zou 2026-03-25 12:02:22 -04:00
  • 1bf2ddd0ee [Refactor] Rename WAITING_FOR_FSM to WAITING_FOR_STRUCTURED_OUTPUT_GRAMMAR (#38048) Wentao Ye 2026-03-25 11:41:44 -04:00