feat: add support for KV cache quantization options (#1307)
* add KV cache quantization options https://github.com/abetlen/llama-cpp-python/discussions/1220 https://github.com/abetlen/llama-cpp-python/issues/1305 * Add ggml_type * Use ggml_type instead of string for quantization * Add server support --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>
L
Limour committed
f165048a69b2ad5b70d4fd2ec4adebfb411b5b47
Parent: aa9f1ae
Committed by GitHub <noreply@github.com>
on 4/1/2024, 2:19:28 PM