feat: add support for KV cache quantization options (#1307)

* add KV cache quantization options

https://github.com/abetlen/llama-cpp-python/discussions/1220
https://github.com/abetlen/llama-cpp-python/issues/1305

* Add ggml_type

* Use ggml_type instead of string for quantization

* Add server support

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>

Limour committed 2y ago

f165048a69b2ad5b70d4fd2ec4adebfb411b5b47

Parent: aa9f1ae

Committed by GitHub <noreply@github.com> on 4/1/2024, 2:19:28 PM