2023-03-24 00:06:24 -04:00
# 🦙 Python Bindings for `llama.cpp`
2023-03-27 18:30:12 -04:00
[](https://abetlen.github.io/llama-cpp-python)
2023-04-05 04:41:24 -04:00
[](https://github.com/abetlen/llama-cpp-python/actions/workflows/test.yaml)
2023-03-24 00:06:24 -04:00
[](https://pypi.org/project/llama-cpp-python/)
[](https://pypi.org/project/llama-cpp-python/)
[](https://pypi.org/project/llama-cpp-python/)
[](https://pypi.org/project/llama-cpp-python/)
2023-03-23 05:33:06 -04:00
2023-03-23 23:55:42 -04:00
Simple Python bindings for * * @ggerganov 's** [`llama.cpp` ](https://github.com/ggerganov/llama.cpp ) library.
2023-03-23 16:00:10 -04:00
This package provides:
2023-03-23 05:33:06 -04:00
2023-03-23 23:55:42 -04:00
- Low-level access to C API via `ctypes` interface.
- High-level Python API for text completion
- OpenAI-like API
2023-03-24 00:06:24 -04:00
- LangChain compatibility
2023-05-17 11:40:12 -04:00
Documentation is available at [https://abetlen.github.io/llama-cpp-python ](https://abetlen.github.io/llama-cpp-python ).
2023-06-13 09:52:22 +10:00
Detailed MacOS Metal GPU install documentation is available at [docs/macos_install.md ](docs/macos_install.md )
2023-05-07 05:20:04 -04:00
## Installation from PyPI (recommended)
2023-03-23 05:33:06 -04:00
2023-04-28 17:12:03 -04:00
Install from PyPI (requires a c compiler):
2023-03-23 05:33:06 -04:00
``` bash
2023-03-23 14:24:34 -04:00
pip install llama-cpp-python
2023-03-23 05:33:06 -04:00
```
2023-06-13 00:56:05 -05:00
The above command will attempt to install the package and build `llama.cpp` from source.
2023-04-28 17:08:18 -04:00
This is the recommended installation method as it ensures that `llama.cpp` is built with the available optimizations for your system.
2023-05-19 02:20:41 -04:00
If you have previously installed `llama-cpp-python` through pip and want to upgrade your version or rebuild the package with different compiler options, please add the following flags to ensure that the package is rebuilt correctly:
``` bash
pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir
```
2023-05-15 20:46:59 +10:00
Note: If you are using Apple Silicon (M1) Mac, make sure you have installed a version of Python that supports arm64 architecture. For example:
```
wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh
bash Miniforge3-MacOSX-arm64.sh
```
Otherwise, while installing it will build the llama.ccp x86 version which will be 10x slower on Apple Silicon (M1) Mac.
2023-05-07 05:20:04 -04:00
2023-06-10 15:59:26 -07:00
### Installation with OpenBLAS / cuBLAS / CLBlast / Metal
2023-05-07 05:20:04 -04:00
`llama.cpp` supports multiple BLAS backends for faster processing.
Use the `FORCE_CMAKE=1` environment variable to force the use of `cmake` and install the pip package for the desired BLAS backend.
To install with OpenBLAS, set the `LLAMA_OPENBLAS=1` environment variable before installing:
``` bash
2023-05-09 21:34:46 +05:30
CMAKE_ARGS = "-DLLAMA_OPENBLAS=on" FORCE_CMAKE = 1 pip install llama-cpp-python
2023-05-07 05:20:04 -04:00
```
To install with cuBLAS, set the `LLAMA_CUBLAS=1` environment variable before installing:
``` bash
2023-05-09 21:34:46 +05:30
CMAKE_ARGS = "-DLLAMA_CUBLAS=on" FORCE_CMAKE = 1 pip install llama-cpp-python
2023-05-07 05:20:04 -04:00
```
To install with CLBlast, set the `LLAMA_CLBLAST=1` environment variable before installing:
``` bash
2023-05-09 21:34:46 +05:30
CMAKE_ARGS = "-DLLAMA_CLBLAST=on" FORCE_CMAKE = 1 pip install llama-cpp-python
2023-05-07 05:20:04 -04:00
```
2023-06-10 15:59:26 -07:00
To install with Metal (MPS), set the `LLAMA_METAL=on` environment variable before installing:
``` bash
CMAKE_ARGS = "-DLLAMA_METAL=on" FORCE_CMAKE = 1 pip install llama-cpp-python
```
2023-04-28 17:08:18 -04:00
2023-04-05 17:44:25 -04:00
## High-level API
2023-03-23 05:33:06 -04:00
2023-05-07 01:41:19 -04:00
The high-level API provides a simple managed interface through the `Llama` class.
Below is a short example demonstrating how to use the high-level API to generate text:
2023-03-23 05:33:06 -04:00
``` python
>> > from llama_cpp import Llama
2023-04-09 22:45:55 -04:00
>> > llm = Llama ( model_path = " ./models/7B/ggml-model.bin " )
2023-03-23 05:33:06 -04:00
>> > output = llm ( " Q: Name the planets in the solar system? A: " , max_tokens = 32 , stop = [ " Q: " , " \n " ] , echo = True )
>> > print ( output )
{
" id " : " cmpl-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx " ,
" object " : " text_completion " ,
" created " : 1679561337 ,
2023-04-09 22:45:55 -04:00
" model " : " ./models/7B/ggml-model.bin " ,
2023-03-23 05:33:06 -04:00
" choices " : [
{
" text " : " Q: Name the planets in the solar system? A: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, Neptune and Pluto. " ,
" index " : 0 ,
" logprobs " : None ,
" finish_reason " : " stop "
}
] ,
" usage " : {
" prompt_tokens " : 14 ,
" completion_tokens " : 28 ,
" total_tokens " : 42
}
}
```
2023-03-24 00:06:24 -04:00
2023-04-05 17:44:25 -04:00
## Web Server
`llama-cpp-python` offers a web server which aims to act as a drop-in replacement for the OpenAI API.
This allows you to use llama.cpp compatible models with any OpenAI compatible client (language libraries, services, etc).
To install the server package and get started:
``` bash
pip install llama-cpp-python[ server]
2023-05-07 05:10:52 -04:00
python3 -m llama_cpp.server --model models/7B/ggml-model.bin
2023-05-05 14:21:57 +02:00
```
2023-04-05 17:44:25 -04:00
Navigate to [http://localhost:8000/docs ](http://localhost:8000/docs ) to see the OpenAPI documentation.
2023-04-12 11:53:39 +02:00
## Docker image
A Docker image is available on [GHCR ](https://ghcr.io/abetlen/llama-cpp-python ). To run the server:
``` bash
2023-05-11 22:12:35 -04:00
docker run --rm -it -p 8000:8000 -v /path/to/models:/models -e MODEL = /models/ggml-model-name.bin ghcr.io/abetlen/llama-cpp-python:latest
2023-04-12 11:53:39 +02:00
```
2023-04-05 17:44:25 -04:00
## Low-level API
2023-05-07 01:41:19 -04:00
The low-level API is a direct [`ctypes` ](https://docs.python.org/3/library/ctypes.html ) binding to the C API provided by `llama.cpp` .
The entire lowe-level API can be found in [llama_cpp/llama_cpp.py ](https://github.com/abetlen/llama-cpp-python/blob/master/llama_cpp/llama_cpp.py ) and directly mirrors the C API in [llama.h ](https://github.com/ggerganov/llama.cpp/blob/master/llama.h ).
Below is a short example demonstrating how to use the low-level API to tokenize a prompt:
``` python
>> > import llama_cpp
>> > import ctypes
>> > params = llama_cpp . llama_context_default_params ( )
# use bytes for char * params
>> > ctx = llama_cpp . llama_init_from_file ( b " ./models/7b/ggml-model.bin " , params )
>> > max_tokens = params . n_ctx
# use ctypes arrays for array params
2023-05-15 14:52:25 -07:00
>> > tokens = ( llama_cpp . llama_token * int ( max_tokens ) ) ( )
2023-05-07 01:41:19 -04:00
>> > n_tokens = llama_cpp . llama_tokenize ( ctx , b " Q: Name the planets in the solar system? A: " , tokens , max_tokens , add_bos = llama_cpp . c_bool ( True ) )
>> > llama_cpp . llama_free ( ctx )
```
Check out the [examples folder ](examples/low_level_api ) for more examples of using the low-level API.
2023-04-05 17:44:25 -04:00
2023-04-02 21:03:39 -04:00
# Documentation
Documentation is available at [https://abetlen.github.io/llama-cpp-python ](https://abetlen.github.io/llama-cpp-python ).
If you find any issues with the documentation, please open an issue or submit a PR.
# Development
This package is under active development and I welcome any contributions.
To get started, clone the repository and install the package in development mode:
``` bash
2023-05-01 18:07:45 -04:00
git clone --recurse-submodules git@github.com:abetlen/llama-cpp-python.git
2023-04-30 23:28:50 -07:00
# Install with pip
pip install -e .
# if you want to use the fastapi / openapi server
pip install -e .[ server]
# If you're a poetry user, installing will also include a virtual environment
poetry install --all-extras
. .venv/bin/activate
2023-04-02 21:03:39 -04:00
# Will need to be re-run any time vendor/llama.cpp is updated
python3 setup.py develop
```
# How does this compare to other Python bindings of `llama.cpp`?
2023-04-04 10:57:22 -04:00
I originally wrote this package for my own use with two goals in mind:
2023-04-02 21:03:39 -04:00
- Provide a simple process to install `llama.cpp` and access the full C API in `llama.h` from Python
- Provide a high-level Python API that can be used as a drop-in replacement for the OpenAI API so existing apps can be easily ported to use `llama.cpp`
Any contributions and changes to this package will be made with these goals in mind.
2023-03-24 00:06:24 -04:00
# License
This project is licensed under the terms of the MIT license.