benchmarks - vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

74452 0 1 Python

	attention_benchmarks
	auto_tune
	cutlass_benchmarks
	disagg_benchmarks
	fused_kernels
	kernels
	multi_turn
	overheads
	structured_schemas
	backend_request_func.py	24.5 KB
	benchmark_batch_invariance.py	13.1 KB
	benchmark_block_pool.py	2.2 KB
	benchmark_hash.py	3.5 KB
	benchmark_latency.py	489 B
	benchmark_long_document_qa_throughput.py	6.3 KB
	benchmark_ngram_proposer.py	6.4 KB
	benchmark_prefix_block_hash.py	3.2 KB
	benchmark_prefix_caching.py	10.1 KB
	benchmark_prioritization.py	6.5 KB
	benchmark_serving_structured_output.py	35.8 KB
	benchmark_serving.py	483 B
	benchmark_throughput.py	498 B
	benchmark_topk_topp.py	15.0 KB
	benchmark_utils.py	1.5 KB
	README.md	1.0 KB
	run_structured_output_benchmark.sh	3.7 KB
	sonnet.txt	22.2 KB