Commit Graph

  • 2208ba2a2f feat(tokenizer): add early exit for left truncation feat/early_exit_right_truncation Luc Georges 2026-03-26 19:45:57 +01:00
  • a7fb4e2867 refactor(tokenizer): typo in tokenize_with_limit fn name Luc Georges 2026-03-26 19:14:44 +01:00
  • ba3b7bcbde fix(tokenizer): skip early exit for OnlySecond strategy Luc Georges 2026-03-26 19:05:33 +01:00
  • 45f431ee8c feat(tokenizer): add early exit for right truncation Luc Georges 2026-03-26 18:43:47 +01:00
  • d1ac6e9654 Final doc fix (#1989) main Arthur 2026-03-26 17:51:09 +01:00
  • 22823c7803 update (#1988) Arthur 2026-03-26 17:35:30 +01:00
  • 88f4f79f36 Fix doc builds (#1987) Arthur 2026-03-26 17:25:05 +01:00
  • bd19f4b997 Fix doc builds (#1986) Arthur 2026-03-26 16:56:50 +01:00
  • a09f44bf3b refactor: fmt feat/improve_whitespace_splitting_perf Luc Georges 2026-03-26 16:48:19 +01:00
  • 1ae7d0ed16 feat: add new faster whitespace split pretok Luc Georges 2026-03-26 16:26:46 +01:00
  • 44ccca5abd Fix doc builds (#1984) Arthur 2026-03-26 16:15:26 +01:00
  • a5a221069d Fix doc builds (#1983) Arthur 2026-03-26 16:06:08 +01:00
  • d36499e3aa Fix doc builds (#1982) Arthur 2026-03-26 15:40:54 +01:00
  • 229bdf3d7a add async decode batch (#1966) Michael Feil 2026-03-26 04:07:32 -07:00
  • 67595efe57 Bump picomatch in /tokenizers/examples/unstable_wasm/www (#1979) dependabot[bot] 2026-03-26 12:00:38 +01:00
  • bd3df8bf45 fix binding test post merge (#1981) Arthur 2026-03-26 11:55:52 +01:00
  • b432bc3769 let' s check something tryout Arthur 2026-03-26 11:26:58 +01:00
  • ec3e249115 Mark Python tests that need network access (#1872) Gordon Messmer 2026-03-26 06:19:30 -04:00
  • 9bbf770020 feat: add progress_format option for machine-readable JSON output (#1921) Andrii Podanenko 2026-03-26 12:02:46 +02:00
  • 44a84169fd Add riscv64 build, make Linux wheel build matrix more explicit (#1951) Trevor Gamblin 2026-03-26 05:46:45 -04:00
  • 36ea27b863 Bump minimatch from 3.1.3 to 3.1.5 in /bindings/node (#1956) dependabot[bot] 2026-03-26 08:08:31 +01:00
  • 7c4006413d Bump flatted from 3.3.1 to 3.4.2 in /bindings/node (#1972) dependabot[bot] 2026-03-26 08:07:25 +01:00
  • cd5c8f09e1 Bump picomatch from 2.3.1 to 2.3.2 in /bindings/node (#1980) dependabot[bot] 2026-03-26 08:07:04 +01:00
  • fbf1f1aed8 feat: Add weakref support to Tokenizer class (#1958) Shintaro Murakami 2026-03-26 06:51:37 +09:00
  • e0502118e6 Fix broken source links in documentation (#1934) Shivam 2026-03-25 10:22:53 -07:00
  • 2044dcac0b Fix identity comparison in toctree_tags.py (#1949) Luka Aladashvili 2026-03-25 21:21:34 +04:00
  • d95b505010 feat(benchmarks-only): adding longer-context llama3 benchmarks (#1971) Michael Feil 2026-03-25 10:13:55 -07:00
  • 8ec1976d19 fix ci (#1978) Arthur 2026-03-25 18:13:04 +01:00
  • 9d89977060 feat: add support for uv in Makefile (#1977) Luc Georges 2026-03-25 15:41:12 +01:00
  • cbd8cf251e Remove unnecessary to_vec() from slice() (#1964) J Berg 2026-03-24 14:15:21 +00:00
  • c4e27cfc84 Fix multithreaded concurrency test to use shared tokenizer instance (#1950) Shintaro Murakami 2026-02-28 01:54:02 +09:00
  • c370063d8e Bump minimatch from 3.1.2 to 3.1.3 in /bindings/node (#1955) dependabot[bot] 2026-02-25 12:04:21 +01:00
  • 4c2e48ab2f Update to PyO3 0.28 to automatically disable GIL (#1948) Nathan Goldbaum 2026-02-25 02:12:20 -07:00
  • c58a47475c docs: add all features (#1953) Wayne Lau 2026-02-24 17:55:05 +08:00
  • fbff6cab7e revert feature/role-to-token Arthur 2026-02-20 11:21:26 +01:00
  • 6fbda4d6a9 just make it happy for now Arthur 2026-02-19 11:20:49 +01:00
  • c01b781396 skip musli arm7 for now Arthur 2026-02-19 11:18:25 +01:00
  • e535285039 more fmt Arthur 2026-02-19 11:17:27 +01:00
  • 631af6f5af fmt bindings Arthur 2026-02-19 11:17:12 +01:00
  • 3a6ba371dc fix adding a token as not special when it was special was still special Arthur 2026-02-19 11:10:05 +01:00
  • 8f34aaf653 test more Arthur 2026-02-19 10:47:33 +01:00
  • b1aa45cf8e update Arthur 2026-02-19 10:40:28 +01:00
  • bf41f0f22d try to update auto with maturin Arthur 2026-02-19 09:23:27 +01:00
  • 54d16ef9c6 clippy happy Arthur 2026-02-19 08:58:20 +01:00
  • c1a2e2974f fix test Arthur 2026-02-18 17:30:54 +01:00
  • 94ce5da7c7 fmt Arthur 2026-02-18 17:29:45 +01:00
  • 01dd6f246e fmt Arthur 2026-02-18 15:26:10 +01:00
  • bf6d45ab7c settatr Arthur 2026-02-18 15:26:00 +01:00
  • a0b109ac5f add setattr Arthur 2026-02-18 15:24:32 +01:00
  • 2b6b91154a move func + add support for _id Arthur 2026-02-18 15:08:54 +01:00
  • f56aa2fe52 will this properly expose role-to-token? Arthur 2026-02-18 14:59:49 +01:00
  • 511224ee1b remove somet stuff Arthur 2026-02-18 14:26:32 +01:00
  • 4ee98861bb Merge branch 'main' of github.com:huggingface/tokenizers into feature/role-to-token Arthur 2026-02-18 14:15:39 +01:00
  • 50352f73a5 Add type hint, update to pyo3 0.27, add automatic type hint generator (#1928) Arthur 2026-02-11 14:26:43 +01:00
  • 6eba494a37 Add a multithreaded tokenizer test and as well as 3.14t CI (#1864) Nathan Goldbaum 2026-02-11 03:42:28 -07:00
  • d3b76e2e5d Add windows arm64 wheel build to python release (#1907) Finn Womack 2026-02-11 02:30:03 -08:00
  • 206ebb588f Fix escape HTML characters in EncodingVisualizer output (#1937) 大橋 玲音 2026-02-10 02:08:43 +09:00
  • ec119e3f31 revertt and simplify just QOL feature/unigram-unk-token Arthur 2026-02-06 11:06:36 +01:00
  • 9c8b066dcb Bump webpack in /tokenizers/examples/unstable_wasm/www (#1946) dependabot[bot] 2026-02-06 09:45:30 +01:00
  • 33917e999e fix: use IPython.display instead of deprecated IPython.core.display (#1936) 大橋 玲音 2026-02-06 17:22:22 +09:00
  • 995a477f25 Add Python 3.14 CI (#1925) Nathan Goldbaum 2026-02-06 01:21:02 -07:00
  • 0ef35f1f6d Update fancy-regex dependency to 0.17 (#1940) Ben Beasley 2026-02-06 08:09:19 +00:00
  • 2fa762e93e Add get_special_tokens and is_special_token methods feature/special-tokens-api Arthur 2026-02-05 15:02:42 +01:00
  • 9988ba5a7b Add post_process_tokens and post_process_ids methods feature/post-process-tokens Arthur 2026-02-05 14:50:32 +01:00
  • 31ef73cfca feat: add unk_token property to Unigram model Arthur 2026-02-05 09:27:49 +01:00
  • 7c781c9975 feat: add role_to_token field for special token metadata Arthur 2026-02-05 09:01:41 +01:00
  • 9c57a8505e Add missing test dependency (#1938) Alexander Lent 2026-02-02 04:57:57 -05:00
  • 7938d8c857 Bump lodash from 4.17.21 to 4.17.23 in /bindings/node (#1935) dependabot[bot] 2026-01-28 15:48:27 +01:00
  • b3889ab98a Add copy to a decostream (#1930) Arthur 2026-01-19 11:07:08 +01:00
  • b874abe731 Fix warnings: remove a print and remove some deprecation warnings (#1924) Arthur 2026-01-05 14:02:44 +01:00
  • a2fe1cc0a9 use macos-15-intel (#1923) Arthur 2026-01-05 13:28:34 +01:00
  • f383101a26 fix max build? v0.22.2 v0.22.2-rc0 Arthur 2026-01-05 11:33:42 +01:00
  • fb691515d0 Python release fix (#1905) Finn Womack 2026-01-05 02:09:46 -08:00
  • e60793ef1c Python release fix (#1905) Finn Womack 2026-01-05 02:09:46 -08:00
  • ecad3f18a3 Fix unclosed annotation span in EncodingVisualizer (#1911) 大橋 玲音 2025-12-16 19:52:44 +09:00
  • f7db48f532 dev new version Arthur 2025-12-02 14:02:13 +01:00
  • 6573f2c561 add lock Arthur 2025-12-02 13:52:55 +01:00
  • 1d9dbb82a3 push the release Arthur 2025-12-02 13:51:12 +01:00
  • 95504c0293 add .lock and v0.22.2 v0.22.2rc0 Arthur 2025-12-02 13:36:57 +01:00
  • 8604740782 update stub for typing (#1896) Arthur 2025-12-02 12:48:56 +01:00
  • a5e03bab57 Bump express in /tokenizers/examples/unstable_wasm/www (#1903) dependabot[bot] 2025-12-02 12:10:57 +01:00
  • ebbc3c8da3 bump PyO3 to 0.26 (#1901) David Hewitt 2025-12-02 10:55:55 +00:00
  • b83d7c986c DOCS: add add_prefix_space to processors.ByteLevel (#1878) Tobias Pitters 2025-11-28 18:38:49 +01:00
  • 060786018e Mark immutable pyclasses as frozen (#1861) Nathan Goldbaum 2025-11-28 08:41:25 -07:00
  • 09dafe2f44 Remove runtime stderr warning from Python bindings (#1898) Copilot 2025-11-28 13:52:00 +01:00
  • 89d271becc Initial plan copilot/remove-stderr-warning-python-bindings copilot-swe-agent[bot] 2025-11-28 11:07:57 +00:00
  • c47328ce54 fmt austinleedavis/main Arthur 2025-11-28 08:53:07 +01:00
  • 23a5c16f0c Merge branch 'main' into main Arthur 2025-11-28 08:27:59 +01:00
  • a05b60c55f [MINOR:TYPO] Update mod.rs (#1883) Christopher Akiki 2025-11-28 08:01:52 +01:00
  • bc6e6cd09a fix: used normalize_str in BaseTokenizer.normalize (#1884) Ishita Bhattacharyya 2025-11-28 12:21:28 +05:30
  • d6a4acc0d2 Update serialization (#1891) Arthur 2025-11-27 23:07:18 +01:00
  • 47e4ffebab Bump js-yaml from 3.14.1 to 3.14.2 in /bindings/node (#1892) dependabot[bot] 2025-11-27 22:33:44 +01:00
  • 904bbd0afa Bump node-forge in /tokenizers/examples/unstable_wasm/www (#1889) dependabot[bot] 2025-11-27 19:32:58 +01:00
  • 849155249b Update indicatif dependency (#1867) Gordon Messmer 2025-11-27 10:15:51 -08:00
  • 007fc767ac Add cargo-semver-checks to Rust CI workflow (#1875) Haixuan Xavier Tao 2025-10-16 11:22:48 +02:00
  • 386f3d8267 ci: add support for building Win-ARM64 wheels (#1869) MUGUNDAN 2025-10-16 14:19:18 +05:30
  • 14e1e97d4c u32 extra-tokens Arthur 2025-10-08 10:23:00 +02:00
  • ea0da14c71 current draft! Arthur 2025-10-07 16:19:41 +02:00
  • 916df54268 update dev version to 0.22.2 Arthur 2025-09-19 11:46:04 +02:00
  • afaae08883 push the minor v0.22.1 v0.22.1-rc0 Arthur 2025-09-19 11:44:39 +02:00