MORPH
®
EXPLORE
SEARCH
/
SIGN IN
SIGN UP
EXPLORE
SEARCH
vllm-project
/
vllm
UNCLAIMED
A high-throughput and memory-efficient inference and serving engine for LLMs
74452
0
1
Python
CODE
ISSUES
RELEASES
WIKI
ACTIVITY
ANALYTICS
main
vllm
/
benchmarks
Download ZIP
attention_benchmarks
auto_tune
cutlass_benchmarks
disagg_benchmarks
fused_kernels
kernels
multi_turn
overheads
structured_schemas
backend_request_func.py
24.5 KB
benchmark_batch_invariance.py
13.1 KB
benchmark_block_pool.py
2.2 KB
benchmark_hash.py
3.5 KB
benchmark_latency.py
489 B
benchmark_long_document_qa_throughput.py
6.3 KB
benchmark_ngram_proposer.py
6.4 KB
benchmark_prefix_block_hash.py
3.2 KB
benchmark_prefix_caching.py
10.1 KB
benchmark_prioritization.py
6.5 KB
benchmark_serving_structured_output.py
35.8 KB
benchmark_serving.py
483 B
benchmark_throughput.py
498 B
benchmark_topk_topp.py
15.0 KB
benchmark_utils.py
1.5 KB
README.md
1.0 KB
run_structured_output_benchmark.sh
3.7 KB
sonnet.txt
22.2 KB