Add run_inference_server.py for Running llama.cpp Built-in Server (#204)
* Update CMakeLists.txt I added a CMake option to compile the Llama.cpp server. This update allows us to easily build and deploy the server using BitNet * Create run_inference_server.py same as run_inference, but for use with llama.cpp's built in server, for some extra comfort In particular: - The build directory is determined based on whether the system is running on Windows or not. - A list of arguments (`--model`, `-m` etc.) is created. - The main argument list is parsed and passed to the `subprocess.run()` method to execute the system command.
B
Benjamin Wegener committed
1792346223ac13d4e26b65e3d0397b60d6ed3746
Parent: c17d1c5
Committed by GitHub <noreply@github.com>
on 5/8/2025, 8:22:12 AM