💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
COMMITS
/ bindings/python/scripts/convert.py April 17, 2024
N
Fixing doc. (#1499)
Nicolas Patry committed
March 12, 2024
A
[`remove black`] And use ruff (#1436)
Arthur committed
May 20, 2021
L
Fix SPM conversions (#686)
Lysandre Debut committed
April 21, 2021
L
Revert "Fix SPM conversions"
Lysandre committed
L
Fix SPM conversions
Lysandre committed
February 3, 2021
A
Fix SentencePiece tokenizers conversion
Anthony MOI committed
September 24, 2020
N
Removed now wrong code in `convert.py`, fixed strange black magic.
Nicolas Patry committed
September 23, 2020
N
Adressing first pass of comments.
Nicolas Patry committed
September 22, 2020
N
Going back for `not` fuse_unk by default for BPE, but add a flag to
Nicolas Patry committed
September 18, 2020
N
Updating convert scripts with Replace normalizer.
Nicolas Patry committed
N
Fixing convert/check scripts.
Nicolas Patry committed
September 17, 2020
N
Moving StripAccents within normalizer for Albert +XLNet, but now crash
Nicolas Patry committed
N
Making convert script machine agnostic.
Nicolas Patry committed
N
Adding a new convert script, that will convert all python Tokenizer code
Nicolas Patry committed