mirror of
https://github.com/huggingface/tokenizers.git
synced 2026-03-27 06:01:18 +00:00
* Add benchmark for deserializing large added vocab * revert dumb stuff, isolate changes * try to only normalize once * small improvement? * some updates * nit * fmt * normalized string are a fucking waste of time when you just want to add tokens to the vocab man.... * more attempts * works * let's fucking go, parity * update * hahahhahaha * revert changes that are not actually even needed * add a python test! * use normalizer before come on * nit * update to a more concrete usecase * fix build * style * reduce sample size * --allow unmaintained * clippy happy * up * up * derive impl * revert unrelated * fmt * ignore * remove stupid file
3.2 KiB
3.2 KiB