A high-throughput and memory-efficient inference and serving engine for LLMs
https://code.morphllm.com/vllm-project/vllm.git amd blackwell cuda deepseek deepseek-v3 gpt gpt-oss inference kimi llama llm llm-serving model-serving moe openai pytorch qwen qwen3 tpu transformer