MORPH
®
EXPLORE
SEARCH
/
SIGN IN
SIGN UP
EXPLORE
SEARCH
vllm-project
/
vllm
UNCLAIMED
A high-throughput and memory-efficient inference and serving engine for LLMs
0
0
0
Python
CODE
ISSUES
RELEASES
WIKI
ACTIVITY
ANALYTICS
main
vllm
/
csrc
Download ZIP
attention
core
cpu
cutlass_extensions
libtorch_stable
mamba
moe
quantization
quickreduce
rocm
activation_kernels.cu
26.7 KB
cache_kernels_fused.cu
11.0 KB
cache_kernels.cu
61.3 KB
cache.h
4.2 KB
concat_mla_q.cuh
2.2 KB
cub_helpers.h
446 B
cuda_compat.h
1.9 KB
cuda_utils_kernels.cu
1008 B
cuda_utils.h
1.4 KB
cuda_vec_utils.cuh
10.8 KB
cuda_view.cu
2.1 KB
cumem_allocator_compat.h
3.7 KB
cumem_allocator.cpp
25.6 KB
custom_all_reduce_test.cu
13.0 KB
custom_all_reduce.cu
7.0 KB
custom_all_reduce.cuh
22.3 KB
custom_quickreduce.cu
4.8 KB
dispatch_utils.h
7.7 KB
dsv3_fused_a_gemm.cu
27.0 KB
fused_qknorm_rope_kernel.cu
17.0 KB
launch_bounds_utils.h
2.2 KB
layernorm_kernels.cu
12.0 KB
layernorm_quant_kernels.cu
11.7 KB
ops.h
18.0 KB
pos_encoding_kernels.cu
7.7 KB
sampler.cu
26.9 KB
topk.cu
13.4 KB
torch_bindings.cpp
33.9 KB
type_convert.cuh
6.0 KB