csrc - vllm-project/vllm - Morph

SIGN IN SIGN UP

vllm-project / vllm UNCLAIMED

A high-throughput and memory-efficient inference and serving engine for LLMs

0 0 0 Python

	attention
	core
	cpu
	cutlass_extensions
	libtorch_stable
	mamba
	moe
	quantization
	quickreduce
	rocm
	activation_kernels.cu	26.7 KB
	cache_kernels_fused.cu	11.0 KB
	cache_kernels.cu	61.3 KB
	cache.h	4.2 KB
	concat_mla_q.cuh	2.2 KB
	cub_helpers.h	446 B
	cuda_compat.h	1.9 KB
	cuda_utils_kernels.cu	1008 B
	cuda_utils.h	1.4 KB
	cuda_vec_utils.cuh	10.8 KB
	cuda_view.cu	2.1 KB
	cumem_allocator_compat.h	3.7 KB
	cumem_allocator.cpp	25.6 KB
	custom_all_reduce_test.cu	13.0 KB
	custom_all_reduce.cu	7.0 KB
	custom_all_reduce.cuh	22.3 KB
	custom_quickreduce.cu	4.8 KB
	dispatch_utils.h	7.7 KB
	dsv3_fused_a_gemm.cu	27.0 KB
	fused_qknorm_rope_kernel.cu	17.0 KB
	launch_bounds_utils.h	2.2 KB
	layernorm_kernels.cu	12.0 KB
	layernorm_quant_kernels.cu	11.7 KB
	ops.h	18.0 KB
	pos_encoding_kernels.cu	7.7 KB
	sampler.cu	26.9 KB
	topk.cu	13.4 KB
	torch_bindings.cpp	33.9 KB
	type_convert.cuh	6.0 KB