A high-throughput and memory-efficient inference and serving engine for LLMs
Releases are snapshots of your project at specific points in time.