Migrate inference to llama_batch and llama_decode api (#795)
* Add low-level batching notebook * fix: tokenization of special characters: (#850) It should behave like llama.cpp, where most out of the box usages treat special characters accordingly * Update CHANGELOG * Cleanup * Fix runner label * Update notebook * Use llama_decode and batch api * Support logits_all parameter --------- Co-authored-by: Antoine Lizee <antoine.lizee@gmail.com>
A
Andrei committed
ab028cb878412400b41b68b68460adeea6cb6fd8
Parent: f436e0c
Committed by GitHub <noreply@github.com>
on 11/3/2023, 12:13:57 AM