Commit Graph

  • 0ff2ab0f64 Fixing the stream by removing the read_index altogether. (#1716) Nicolas Patry 2025-01-09 17:41:15 +01:00
  • 862d1a346a Fix panic in DecodeStream::step due to incorrect index usage (#1699) Sungyoon Jeong 2025-01-09 21:24:04 +09:00
  • c04b97aab1 Update documentation of Rust feature (#1711) sondalex 2025-01-09 12:08:45 +01:00
  • bdfc38b78d Fix typos (#1715) tinyboxvk 2025-01-09 06:53:20 -04:00
  • 6945933829 update Split pretokenizer docstrings (#1701) Dylan-Harden3 2025-01-08 05:35:52 -06:00
  • 166edd87c8 Fixing the README. (#1714) Nicolas Patry 2025-01-08 12:31:17 +01:00
  • 3a6504d274 Upgrade to PyO3 0.23 (#1708) Nicolas Patry 2024-12-31 18:36:01 +01:00
  • 555d44c47a Add feature flag hint to README.md, fixes #1633 (#1709) sftse 2024-12-30 17:01:53 +01:00
  • 24d29f498d Update dev version and pyproject.toml (#1693) Arthur 2024-11-27 16:01:48 +01:00
  • cf102e6725 proper release v0.21.0 git_v0.21.0 Arthur Zucker 2024-11-27 14:05:58 +01:00
  • 2c750ab83c v0.21.0 Arthur Zucker 2024-11-27 13:54:49 +01:00
  • 90795fe474 version = "0.21.0-rc0" v0.21.0rc0 Arthur Zucker 2024-11-27 13:42:29 +01:00
  • 15e85b199c v0.21.0 Arthur Zucker 2024-11-27 13:37:33 +01:00
  • 1bf2a66b80 v0.20.4-dev0 Arthur Zucker 2024-11-27 10:07:49 +01:00
  • 350e5db82d v0.20.4 v0.20.4 git_v0.20.4-rc0 Arthur Zucker 2024-11-26 18:00:50 +01:00
  • 8f026799eb update release workflow v0.20.4rc0 Arthur Zucker 2024-11-26 17:44:13 +01:00
  • 6647c2006b v0.20.4rc0 Arthur Zucker 2024-11-26 17:27:48 +01:00
  • 4bc7b4cc2a Fix encode_batch and encode_batch_fast to accept ndarrays again (#1679) Dimitris Iliopoulos 2024-11-21 05:55:11 -05:00
  • eb4cc86d4e Bump cross-spawn from 6.0.5 to 6.0.6 in /bindings/node (#1687) dependabot[bot] 2024-11-25 10:04:06 +01:00
  • ac34660e44 Fix encode_batch and encode_batch_fast to accept ndarrays again (#1679) Dimitris Iliopoulos 2024-11-21 05:55:11 -05:00
  • f0c48bd89a Update README.md with install from source Arthur 2024-11-15 21:51:39 +01:00
  • a1c572e2be 0.20.3-rc0 v0.20.4-rc0 git_v0.20.3-rc0 Nicolas Patry 2024-11-15 18:09:03 +07:00
  • cc5fb01a2f Decode stream python (#1678) Nicolas Patry 2024-11-15 19:06:22 +08:00
  • 500db282a8 Adding an API for decode streaming. (#1677) Nicolas Patry 2024-11-15 13:02:38 +08:00
  • f4c9fd7f40 Testing ABI3 wheels to reduce number of wheels (#1674) Nicolas Patry 2024-11-15 13:02:22 +08:00
  • 5aa9f6cff0 Disable caching for long strings. (#1676) Nicolas Patry 2024-11-07 21:36:27 +08:00
  • c6b5c3eab7 More cache options. (#1675) Nicolas Patry 2024-11-06 18:12:09 +08:00
  • 1740bff7a6 Revert "Upgrade python versions." Nicolas Patry 2024-11-06 13:18:03 +08:00
  • b81ec467a6 Upgrade python versions. Nicolas Patry 2024-11-06 13:17:22 +08:00
  • b63262a481 update cargo plus v0.20.3 v0.20.3 v0.20.3-release Arthur Zucker 2024-11-05 18:17:11 +01:00
  • 5e97b53f0b update tag Arthur Zucker 2024-11-05 18:16:18 +01:00
  • 6af367b0db cargo lock v0.20.3rc1 Arthur Zucker 2024-11-05 17:54:50 +01:00
  • da6c367170 v0.20.3 Arthur Zucker 2024-11-05 16:26:11 +01:00
  • 57884ebaa2 [MINOR:TYPO] Fix docstrings (#1653) Christopher Akiki 2024-11-05 16:25:06 +01:00
  • 5e223ceb48 fix pylist (#1673) Arthur 2024-11-05 16:24:23 +01:00
  • 0f3a3f957e update workflow Arthur Zucker 2024-11-04 18:38:32 +01:00
  • 7c36735389 v0.20.2-dev.0 version Arthur Zucker 2024-11-04 18:36:40 +01:00
  • caa650512c v 0.20.2 v0.20.2 python-313-release Arthur Zucker 2024-11-04 18:15:59 +01:00
  • 5266b42a3d update python version install of setup python v0.20.2rc1 Arthur Zucker 2024-11-04 15:49:33 +01:00
  • 4f7ba38119 some windows cna't support 3.13 :( Arthur Zucker 2024-11-04 15:14:40 +01:00
  • 645487a0e0 cargo lock Arthur Zucker 2024-11-04 14:56:18 +01:00
  • e7799d696e oups should be a RC1 first Arthur Zucker 2024-11-04 14:48:03 +01:00
  • d2754da4e3 v0.20.2 + cargo Arthur Zucker 2024-11-04 14:34:44 +01:00
  • f6f59ac51b update python release Arthur Zucker 2024-11-02 13:06:08 +01:00
  • 6c15458868 Bump actions versions (#1669) tinyboxvk 2024-11-01 06:19:35 -03:00
  • 6ade8c2d21 PyO3 0.22 (#1665) Dimitris Iliopoulos 2024-11-01 05:17:23 -04:00
  • 41e0eaa561 Bump actions/checkout to v4 (#1667) tinyboxvk 2024-10-29 10:32:07 -03:00
  • 5512a424bf Add safety comments (#1651) Manish Goregaokar 2024-10-29 01:44:06 -07:00
  • 6ea758872d Unsound call of set_var (#1664) sftse 2024-10-25 15:44:30 +02:00
  • a8738a95d1 Arg name correction: auth_token -> token (#1621) rravenel 2024-10-24 07:32:09 -07:00
  • a56f73eded good defaults? fix-cache-issues Arthur Zucker 2024-10-21 15:45:23 +02:00
  • b2c667cfbc update Arthur Zucker 2024-10-21 15:18:43 +02:00
  • c81c34a233 for now compile but breaking Arthur Zucker 2024-10-21 15:05:26 +02:00
  • 714a3bd0c3 use sysinfo to pre-allocate a big enough cache Arthur Zucker 2024-10-21 14:53:52 +02:00
  • ab3236f640 propagate to unigram Arthur Zucker 2024-10-21 11:57:29 +02:00
  • 6e0175acc5 add a clear_cache function! Arthur Zucker 2024-10-21 11:47:36 +02:00
  • 9b77c054ef Fix off-by-one error in tokenizer::normalizer::Range::len (#1638) Ryan Landay 2024-10-14 02:40:17 -04:00
  • bce68a60cb Bump cookie and express in /tokenizers/examples/unstable_wasm/www (#1648) dependabot[bot] 2024-10-10 15:30:24 +02:00
  • 51826532d4 push new dev version Arthur Zucker 2024-10-10 12:00:16 +02:00
  • d98298a2c2 0.20.1 v0.20.1 branch_0.20.1.rc1 Arthur Zucker 2024-10-10 11:45:24 +02:00
  • de305f2170 update to ubuntu-22.04 v0.20.1rc1 Arthur Zucker 2024-10-10 11:27:06 +02:00
  • 1053470ff7 use --interpreter ${{ matrix.interpreter || '3.7 3.8 3.9 3.10 3.11 3.12 pypy3.7 pypy3.8 pypy3.9 pypy3.10' }} Arthur Zucker 2024-10-10 11:17:09 +02:00
  • f7c33eb3b2 add Cargo Arthur Zucker 2024-10-10 10:14:44 +02:00
  • eca17be37b v 0.20.1-rc1 Arthur Zucker 2024-10-10 10:14:01 +02:00
  • 81d83361d0 fix the unigram::from calls assign-token Arthur Zucker 2024-10-05 17:58:22 +02:00
  • 167ecdebfb small fixed Arthur Zucker 2024-10-05 17:56:06 +02:00
  • 0475c057dd fix added vocab tests Arthur Zucker 2024-10-05 17:17:52 +02:00
  • e8933fa5b9 potential initial solution for the annoying unigram model :) Arthur Zucker 2024-10-05 17:16:31 +02:00
  • ee7ce80e0b forgot to remove from added tokens map! Arthur Zucker 2024-10-04 15:55:43 +02:00
  • 545d7230f4 fix unwrap errors Arthur Zucker 2024-10-04 15:24:14 +02:00
  • ed34ffd334 add a small test Arthur Zucker 2024-10-04 15:00:35 +02:00
  • b5640a65cf simplify the logic Arthur Zucker 2024-10-04 14:46:42 +02:00
  • 6d48e58219 remove print Arthur Zucker 2024-07-12 10:47:38 +02:00
  • 2d4b3735e4 fix everything Arthur Zucker 2024-07-12 10:38:40 +02:00
  • 4190db7ddd pass compilation Arthur Zucker 2024-07-12 10:12:29 +02:00
  • 4794ed516f fix Arthur Zucker 2024-07-12 10:07:27 +02:00
  • b359bde47a nit Arthur Zucker 2024-07-12 09:58:06 +02:00
  • ddab901338 current update Arthur Zucker 2024-07-12 09:49:42 +02:00
  • 97e8818ecf add python bindongs as well Arthur Zucker 2024-07-12 08:52:04 +02:00
  • fc0f0656f0 allow to assign a new token Arthur Zucker 2024-07-12 08:08:14 +02:00
  • 557fde76d8 style: simplify string formatting for readability (#1632) Hamir Mahal 2024-10-04 04:11:50 -07:00
  • 3d51a1695f Fix documentation build (#1642) Arthur 2024-10-01 14:48:02 +02:00
  • 294ab86fe0 Bump webpack in /tokenizers/examples/unstable_wasm/www (#1641) dependabot[bot] 2024-10-01 14:17:23 +02:00
  • 2b97a5e49e Bump send and express in /tokenizers/examples/unstable_wasm/www (#1631) dependabot[bot] 2024-10-01 14:17:09 +02:00
  • 077678d1d1 Bump serve-static and express in /tokenizers/examples/unstable_wasm/www (#1630) dependabot[bot] 2024-10-01 14:16:53 +02:00
  • 2204066e78 Bump body-parser and express in /tokenizers/examples/unstable_wasm/www (#1629) dependabot[bot] 2024-10-01 14:16:41 +02:00
  • 3fb1371c1c [ignore_merges] Fix offsets (#1640) Arthur 2024-10-01 09:22:20 +02:00
  • b4a38c4f63 Bump actions/download-artifact from 3 to 4.1.7 in /.github/workflows (#1626) dependabot[bot] 2024-09-30 16:38:28 +02:00
  • ac4e717070 potential solution? fix-split-special Arthur Zucker 2024-09-30 14:31:47 +02:00
  • 14a07b06e4 fix filelink (#1610) 152334H 2024-08-12 05:35:33 +00:00
  • 75aef5b75b Update README.md (#1608) Arthur 2024-08-09 10:40:21 +02:00
  • a5adaace3d version 0.20.0 v0.20.0 branch_v0.20.0.rc1 Arthur Zucker 2024-08-08 18:42:29 +02:00
  • 81c471cf17 update dev version 0.20.0 Arthur Zucker 2024-08-08 18:10:55 +02:00
  • 85cc05a32f Fix CI (#1607) Nicolas Patry 2024-08-08 17:09:30 +02:00
  • a8def0739f Merge branch 'fix_release' of github.com:huggingface/tokenizers into branch_v0.20.0.rc1 v0.20.0rc1 Arthur Zucker 2024-08-08 16:57:51 +02:00
  • fe5067347e Fix CI Nicolas Patry 2024-08-08 16:55:47 +02:00
  • b253835968 push cargo Arthur Zucker 2024-08-08 16:34:33 +02:00
  • fc3bb7653c update dependencies Arthur Zucker 2024-08-08 16:33:01 +02:00
  • bfd9cdeefb Perf improvement 16% by removing offsets. (#1587) Nicolas Patry 2024-08-08 14:56:13 +02:00
  • b6d01b788a initial commit fast-regex Arthur Zucker 2024-06-20 16:01:51 +02:00