Migrate inference to llama_batch and llama_decode api (#795)

* Add low-level batching notebook

* fix: tokenization of special characters: (#850)

It should behave like llama.cpp, where most out of the box usages
treat special characters accordingly

* Update CHANGELOG

* Cleanup

* Fix runner label

* Update notebook

* Use llama_decode and batch api

* Support logits_all parameter

---------

Co-authored-by: Antoine Lizee <antoine.lizee@gmail.com>

Andrei committed 2y ago

ab028cb878412400b41b68b68460adeea6cb6fd8

Parent: f436e0c

Committed by GitHub <noreply@github.com> on 11/3/2023, 12:13:57 AM