175 Commits

Author SHA1 Message Date
Arthur
d1ac6e9654 Final doc fix (#1989)
* update

* there was an error

* update
2026-03-26 17:51:09 +01:00
Arthur
22823c7803 update (#1988) 2026-03-26 17:35:30 +01:00
Arthur
88f4f79f36 Fix doc builds (#1987)
* fix binding test post merge

* update

* update

* something

* simple

* simplify

* up

* update

* remove

* update workflow

* update

* add pip

* fix
2026-03-26 17:25:05 +01:00
Arthur
bd19f4b997 Fix doc builds (#1986)
* fix binding test post merge

* update

* update

* something

* simple

* simplify

* up

* update

* remove

* update workflow

* update
2026-03-26 16:56:50 +01:00
Arthur
44ccca5abd Fix doc builds (#1984)
* fix binding test post merge

* update

* update

* something

* simple

* simplify

* up

* update

* remove

* update workflow
2026-03-26 16:15:26 +01:00
Arthur
a5a221069d Fix doc builds (#1983)
* fix binding test post merge

* update

* update

* something

* simple

* simplify

* up

* update

* remove
2026-03-26 16:06:08 +01:00
Arthur
d36499e3aa Fix doc builds (#1982)
* fix binding test post merge

* update

* update

* something

* simple

* simplify
2026-03-26 15:40:54 +01:00
Trevor Gamblin
44a84169fd Add riscv64 build, make Linux wheel build matrix more explicit (#1951)
* workflows/CI: make rustc targets more explicit

For Linux builds, distinguish between 'target' and 'arch', since the two
are not always the same (e.g. the target for ppc64le is actually
powerpc64le-unknown-linux-gnu). This allows more explicit support for
other platforms when needed.

Signed-off-by: Trevor Gamblin <tgamblin@baylibre.com>

* workflows/CI: add riscv64 build

Note that the 'target' and 'arch' values here are different - arch is
riscv64, but the actual rustc target is riscv64gc-unknown-linux-gnu,
hence the previous change.

Signed-off-by: Trevor Gamblin <tgamblin@baylibre.com>

---------

Signed-off-by: Trevor Gamblin <tgamblin@baylibre.com>
2026-03-26 10:46:45 +01:00
Shivam
e0502118e6 Fix broken source links in documentation (#1934)
The documentation source links were pointing to `src/tokenizers/...` which
doesn't exist. The Python source files are located at
`bindings/python/py_src/tokenizers/...`.

Add `version_tag_suffix` parameter to documentation build workflows to
generate correct GitHub source links.

Fixes #1910
2026-03-25 18:22:53 +01:00
Arthur
8ec1976d19 fix ci (#1978)
* fix ci

* fix stubs

* nit

* exclude

* full fix

* update

* up

* revert

* workflow up

* thius?

* up

* add logs I suspect its just maturin missing

* marutin not installed but not needed

* update

* check style after running tests since I mess up the .pyi

* nit?
2026-03-25 18:13:04 +01:00
Arthur
50352f73a5 Add type hint, update to pyo3 0.27, add automatic type hint generator (#1928)
* something that is supposed to work but my env does not allow it, seems to be uv related

* ?

* up

* nits

* let' s try

* part of tthe update for pyo3 0.27

* more pyo3 fixes

* update

* does this help?

* help

* finally

* update stub accordingly

* export more of the submodules

* moooore

* add individual .pypi

* cleanup

* update pyo3 signatures and fix warning

* style

* update

* more updates

* sytle

* clippy happy

* does this help?

* fix

* fix

* ?

* what?

* add dwarwub case co

* up?

* update

* clippy and fmt

* this time it works

* remove offending one

* update

* remove shit

* remove more shit that was unwanted

* ?

* simplify a bit

* more verbose?

* more simplification

* fmt

* fix some of the typing in rust directly to please TY (but also just fix some typing.Any

* fix script running

* fix , ignore and exclude

* style

* update

* fmt + add it to style?

* cleanup

* Simplify stub.py docstring injection

- Replace complex modifications dict with simple insertions list
- Remove nested process_function_or_method function
- Use bottom-to-top line replacement for cleaner logic
- Remove unused importlib import

* isolate stub generation into separate tools/stub-gen crate

- Move stub_generation.rs to tools/stub-gen/ as standalone crate
- Remove stub-gen feature and pyo3-introspection from main crate
- Auto-detect PYTHONHOME for uv/venv environments
- Update Makefile and README with new instructions

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-11 14:26:43 +01:00
Nathan Goldbaum
6eba494a37 Add a multithreaded tokenizer test and as well as 3.14t CI (#1864)
* Add multithreaded tokenizer test

* Add 3.13t CI

* update to use 3.14t

* fix ty check

* Run ruff

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2026-02-11 11:42:28 +01:00
Finn Womack
d3b76e2e5d Add windows arm64 wheel build to python release (#1907)
* Add windows arm64 support to python release workflow

* Run on fork

Updated workflow to include 'arm64-runner' branch and commented out conditions.

* fix typo

* add arm64 python install for all versions

* use python-install option

* clean up fork changes

* Update .github/workflows/python-release.yml

* revert 3.14 addition

Waiting to add in a different PR that adds all 3.14 builds at the same time

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2026-02-11 11:30:03 +01:00
Nathan Goldbaum
995a477f25 Add Python 3.14 CI (#1925) 2026-02-06 09:21:02 +01:00
Arthur
a2fe1cc0a9 use macos-15-intel (#1923) 2026-01-05 13:28:34 +01:00
Finn Womack
e60793ef1c Python release fix (#1905)
* add interpretor install, enable workflow run in fork

* add additional python versions

* Refactor python version setup for x86 windows

* try splitting interpreter into an array

* revert to hard coded list

* try using extra argument

* Fix quotes

* Clean up python install

* revert workflow conditions
2026-01-05 11:09:46 +01:00
Arthur
8604740782 update stub for typing (#1896)
* update stub for typing

* up

* add ty type checker

* update stub

* up

* some update

* add owner to stub?

* update

* no print

* uptime funk

* mm

* wtf

* fix

* fix more

* some fixses are manual but come on

* up

* # type: ignore[import]

* reduce the scope of ty for less changes

* ups

* up?
2025-12-02 12:48:56 +01:00
Arthur
d6a4acc0d2 Update serialization (#1891)
* Add benchmark for deserializing large added vocab

* revert dumb stuff, isolate changes

* try to only normalize once

* small improvement?

* some updates

* nit

* fmt

* normalized string are a fucking waste of time when you just want to add tokens to the vocab man....

* more attempts

* works

* let's fucking go, parity

* update

* hahahhahaha

* revert changes that are not actually even needed

* add a python test!

* use normalizer before come on

* nit

* update to a more concrete usecase

* fix build

* style

* reduce sample size

* --allow unmaintained

* clippy happy

* up

* up

* derive impl

* revert unrelated

* fmt

* ignore

* remove stupid file
2025-11-27 23:07:18 +01:00
Haixuan Xavier Tao
007fc767ac Add cargo-semver-checks to Rust CI workflow (#1875)
This adds semver validation to catch breaking changes before release.
The check runs on Ubuntu during CI and compares against the published crate on crates.io.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-16 11:22:48 +02:00
MUGUNDAN
386f3d8267 ci: add support for building Win-ARM64 wheels (#1869)
* ci: add support for building Win-ARM64 wheels

* ci: add support for building Win-ARM64 wheels
2025-10-16 10:49:18 +02:00
Arthur
01f8bc834c clippy (#1781)
* clippy

* fmtr

* rutc?

* fix onig issue

* up

* decode stream default

* jump a release for cargo audit ...

* more cliippy stuff

* clippy?

* proper style

* fmt
2025-05-27 11:30:32 +02:00
Nicolas Patry
4383a25787 Update the release builds following 0.21.1. (#1746)
* Update the release builds following 0.21.1.

* Clippy fix.
2025-03-13 13:01:41 +01:00
Arthur
c45aebd102 🚨 Support updating template processors (#1652)
* current updates

* simplify

* set_item works, but `tokenizer._tokenizer.post_processor[1].single = ["$0", "</s>"]` does not !

* fix: `normalizers` deserialization and other refactoring

* fix: `pre_tokenizer` deserialization

* feat: add `__len__` implementation for `normalizer::PySequence`

* feat: add `__setitem__` impl for `normalizers::PySequence`

* feat: add `__setitem__` impl to `pre_tokenizer::PySequence`

* feat: add `__setitem__` impl to `post_processor::PySequence`

* test: add normalizer sequence setter check

* refactor: allow unused `processors::setter` macro

* test: add `__setitem__` test for processors & pretok

* refactor: `unwrap` -> `PyException::new_err()?`

* refactor: fmt

* refactor: remove unnecessary `pub`

* feat(bindings): add missing getters & setters for pretoks

* feat(bindings): add missing getters & setters for processors

* refactor(bindings): rewrite RwLock poison error msg

* refactor: remove debug print

* feat(bindings): add description as to why custom deser is needed

* feat: make post proc sequence elements mutable

* fix(binding): serialization

---------

Co-authored-by: Luc Georges <luc.sydney.georges@gmail.com>
2025-01-28 14:58:35 +01:00
Nicolas Patry
3a6504d274 Upgrade to PyO3 0.23 (#1708)
* Upgrade to PyO3 0.23

* Macos-12 deprecated?

* Clippy.

* Clippy auto ellision.
2024-12-31 18:36:01 +01:00
Arthur Zucker
1bf2a66b80 v0.20.4-dev0 2024-11-27 10:07:49 +01:00
Nicolas Patry
f4c9fd7f40 Testing ABI3 wheels to reduce number of wheels (#1674)
* Testing ABI3 wheels to reduce number of wheels

* No need for py-clone  anymore.

* Upgrade python versions.

* Remove those flakes.

* Promoting new CI + Fixing secret.
2024-11-15 06:02:22 +01:00
Nicolas Patry
1740bff7a6 Revert "Upgrade python versions."
This reverts commit b81ec467a6.
2024-11-06 13:18:03 +08:00
Nicolas Patry
b81ec467a6 Upgrade python versions. 2024-11-06 13:17:22 +08:00
Arthur Zucker
0f3a3f957e update workflow 2024-11-04 18:38:32 +01:00
tinyboxvk
6c15458868 Bump actions versions (#1669)
* Update docs-check.yml

Bump actions/setup-python to v5
Bump python-version to 3.12 (default on ubuntu-latest)
Switch actions-rs/toolchain to dtolnay/rust-toolchain as the former one is no longer maintained

* Update node-release.yml

Bump actions/setup-python to v5
Switch actions-rs/toolchain to dtolnay/rust-toolchain as the former one is no longer maintained
Bump actions/cache to v4
Bump actions/setup-node to v4
Bump actions/upload-artifact to v4
Bump actions/download-artifact to v4

* Update node.yml

Switch actions-rs/toolchain to dtolnay/rust-toolchain as the former one is no longer maintained
Bump actions/cache to v4
Bump actions/setup-node to v4

* Update python-release-conda.yml

Switch actions-rs/toolchain to dtolnay/rust-toolchain as the former one is no longer maintained
Bump conda-incubator/setup-miniconda to v3

* Update python-release.yml

Bump actions/setup-python to v5
Bump actions/download-artifact to v4

* Update rust-release.yml

Switch actions-rs/toolchain to dtolnay/rust-toolchain as the former one is no longer maintained
Bump actions/cache to v4

* Update stale.yml

Bump actions/stale to v9

* Update python.yml

Bump actions/setup-python to v5
2024-11-01 10:19:35 +01:00
tinyboxvk
41e0eaa561 Bump actions/checkout to v4 (#1667)
Signed-off-by: tinyboxvk <tinyboxvk@users.noreply.github.com>
2024-10-29 14:32:07 +01:00
Arthur
3d51a1695f Fix documentation build (#1642)
* use v4

* fix ruff

* style
2024-10-01 14:48:02 +02:00
dependabot[bot]
b4a38c4f63 Bump actions/download-artifact from 3 to 4.1.7 in /.github/workflows (#1626)
Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 3 to 4.1.7.
- [Release notes](https://github.com/actions/download-artifact/releases)
- [Commits](https://github.com/actions/download-artifact/compare/v3...v4.1.7)

---
updated-dependencies:
- dependency-name: actions/download-artifact
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-09-30 16:38:28 +02:00
Nicolas Patry
85cc05a32f Fix CI (#1607) 2024-08-08 17:09:30 +02:00
Nicolas Patry
7b80359dd2 Fixing release CI strict (taken from safetensors). 2024-08-06 09:11:30 +02:00
Luc Georges
418c35c09e feat(ci): add trufflehog secrets detection (#1551)
* feat(ci): add trufflehog secrets detection

* fix(ci): remove unnecessary permissions
2024-06-10 16:10:23 +02:00
Nicolas Patry
25aee8b88c [BREAKING CHANGE] Ignore added_tokens (both special and not) in the decoder (#1513)
* [BREAKING CHANGE] Ignore added_tokens (both special and not) in the
decoder

Causes issues with `ByteLevel` messing up some `AddedTokens` with some
utf-8 range used in the bytelevel mapping.

This commit tests the extend of the damage of ignoring the decoder for
those tokens.

* Format.

* Installing cargo audit.

* Minor fix.

* Fixing "bug" in node/python.

* Autoformat.

* Clippy.

* Only prefix space when there's no decoder.
2024-05-06 11:49:38 +02:00
Arthur
f2ec3b239b remove enforcement of non special when adding tokens (#1521)
* remove enforcement of non special when adding tokens

* mut no longer needed

* add a small test

* nit

* style

* audit

* ignore cargo audit's own vulnerability

* update

* revert

* remove CVE
2024-04-30 15:53:47 +02:00
Nicolas Patry
e0defa7355 Remove 3.13 (potential undefined behavior.) (#1497) 2024-04-16 15:56:47 +02:00
Arthur
accd0650b8 Update release for python3.12 windows (#1438)
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2024-01-19 15:56:47 +01:00
Nicolas Patry
8f9b945c75 Stale bot. (#1404) 2023-12-05 14:11:37 +01:00
Remy
985d49ae64 fix: remove useless token (#1371) 2023-10-19 14:29:01 +02:00
Nicolas Patry
aed491df8c Fixing the progressbar. (#1353)
* Fixing the progressbar.

* Upgrade deps.

* Update cargo audit

* Ssh this action.

* Fixing esaxx by using slower rust version.

* Trying the new esaxx version.

* Publish.

* Get cache again.
2023-10-05 15:33:58 +02:00
Nicolas Patry
d2010d5165 Move to maturing mimicking move for safetensors. + Rewritten node bindings. (#1331)
* Move to maturing mimicking move for `safetensors`.

* Tmp.

* Fix sdist.

* Wat?

* Clippy 1.72

* Remove if.

* Conda sed.

* Fix doc check workflow.

* Moving to maturin AND removing http + openssl mess (smoothing transition
moving to `huggingface_hub`)

* Fix dep

* Black.

* New node bindings.

* Fix docs + node cache ?

* Yarn.

* Working dir.

* Extension module.

* Put back interpreter.

* Remove cache.

* New attempt

* Multi python.

* Remove FromPretrained.

* Remove traces of `fromPretrained`.

* Drop 3.12 for windows?

* Typo.

* Put back the default feature for ignoring links during simple test.

* Fix ?

* x86_64 -> x64.

* Remove warning for windows bindings.

* Excluse aarch.

* Include/exclude.

* Put back workflows in correct states.
2023-08-28 16:24:14 +02:00
Nicolas Patry
f2952020d5 Python 38 arm (#1330) 2023-08-23 16:29:16 +02:00
Nicolas Patry
6c350d88fe Re-using scritpts from safetensors. (#1328) 2023-08-23 15:37:38 +02:00
Nicolas Patry
b35d33f981 Release all at once for simplicity. (#1320) 2023-08-14 13:49:45 +02:00
Chris Ha
862046ac94 CD backports (#1318)
* CD backports

follow
huggingface/safetensors#317

* fix node bindings?

`cargo check` doesnt work on my local configuration from `tokenizers/bindings/node/native`
i don't think it will be a problem but i have difficulty telling

* backport #315

* safetensors#317 back ports
2023-08-10 18:52:22 +02:00
Mishig
348ed70e58 [doc build] Use secrets (#1273) 2023-06-09 12:58:27 +02:00
Funtowicz Morgan
a03330607b Update all GH Actions with dependency on actions/checkout from v[1,2] to v3 to notably improve performance (retrieve only the commit being checked-out) (#1256) 2023-05-22 14:50:00 +02:00