52 Commits

Author SHA1 Message Date
Arthur
8ec1976d19 fix ci (#1978)
* fix ci

* fix stubs

* nit

* exclude

* full fix

* update

* up

* revert

* workflow up

* thius?

* up

* add logs I suspect its just maturin missing

* marutin not installed but not needed

* update

* check style after running tests since I mess up the .pyi

* nit?
2026-03-25 18:13:04 +01:00
Arthur
50352f73a5 Add type hint, update to pyo3 0.27, add automatic type hint generator (#1928)
* something that is supposed to work but my env does not allow it, seems to be uv related

* ?

* up

* nits

* let' s try

* part of tthe update for pyo3 0.27

* more pyo3 fixes

* update

* does this help?

* help

* finally

* update stub accordingly

* export more of the submodules

* moooore

* add individual .pypi

* cleanup

* update pyo3 signatures and fix warning

* style

* update

* more updates

* sytle

* clippy happy

* does this help?

* fix

* fix

* ?

* what?

* add dwarwub case co

* up?

* update

* clippy and fmt

* this time it works

* remove offending one

* update

* remove shit

* remove more shit that was unwanted

* ?

* simplify a bit

* more verbose?

* more simplification

* fmt

* fix some of the typing in rust directly to please TY (but also just fix some typing.Any

* fix script running

* fix , ignore and exclude

* style

* update

* fmt + add it to style?

* cleanup

* Simplify stub.py docstring injection

- Replace complex modifications dict with simple insertions list
- Remove nested process_function_or_method function
- Use bottom-to-top line replacement for cleaner logic
- Remove unused importlib import

* isolate stub generation into separate tools/stub-gen crate

- Move stub_generation.rs to tools/stub-gen/ as standalone crate
- Remove stub-gen feature and pyo3-introspection from main crate
- Auto-detect PYTHONHOME for uv/venv environments
- Update Makefile and README with new instructions

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-11 14:26:43 +01:00
Nathan Goldbaum
6eba494a37 Add a multithreaded tokenizer test and as well as 3.14t CI (#1864)
* Add multithreaded tokenizer test

* Add 3.13t CI

* update to use 3.14t

* fix ty check

* Run ruff

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2026-02-11 11:42:28 +01:00
Nathan Goldbaum
995a477f25 Add Python 3.14 CI (#1925) 2026-02-06 09:21:02 +01:00
Arthur
8604740782 update stub for typing (#1896)
* update stub for typing

* up

* add ty type checker

* update stub

* up

* some update

* add owner to stub?

* update

* no print

* uptime funk

* mm

* wtf

* fix

* fix more

* some fixses are manual but come on

* up

* # type: ignore[import]

* reduce the scope of ty for less changes

* ups

* up?
2025-12-02 12:48:56 +01:00
Arthur
d6a4acc0d2 Update serialization (#1891)
* Add benchmark for deserializing large added vocab

* revert dumb stuff, isolate changes

* try to only normalize once

* small improvement?

* some updates

* nit

* fmt

* normalized string are a fucking waste of time when you just want to add tokens to the vocab man....

* more attempts

* works

* let's fucking go, parity

* update

* hahahhahaha

* revert changes that are not actually even needed

* add a python test!

* use normalizer before come on

* nit

* update to a more concrete usecase

* fix build

* style

* reduce sample size

* --allow unmaintained

* clippy happy

* up

* up

* derive impl

* revert unrelated

* fmt

* ignore

* remove stupid file
2025-11-27 23:07:18 +01:00
Nicolas Patry
4383a25787 Update the release builds following 0.21.1. (#1746)
* Update the release builds following 0.21.1.

* Clippy fix.
2025-03-13 13:01:41 +01:00
Arthur
c45aebd102 🚨 Support updating template processors (#1652)
* current updates

* simplify

* set_item works, but `tokenizer._tokenizer.post_processor[1].single = ["$0", "</s>"]` does not !

* fix: `normalizers` deserialization and other refactoring

* fix: `pre_tokenizer` deserialization

* feat: add `__len__` implementation for `normalizer::PySequence`

* feat: add `__setitem__` impl for `normalizers::PySequence`

* feat: add `__setitem__` impl to `pre_tokenizer::PySequence`

* feat: add `__setitem__` impl to `post_processor::PySequence`

* test: add normalizer sequence setter check

* refactor: allow unused `processors::setter` macro

* test: add `__setitem__` test for processors & pretok

* refactor: `unwrap` -> `PyException::new_err()?`

* refactor: fmt

* refactor: remove unnecessary `pub`

* feat(bindings): add missing getters & setters for pretoks

* feat(bindings): add missing getters & setters for processors

* refactor(bindings): rewrite RwLock poison error msg

* refactor: remove debug print

* feat(bindings): add description as to why custom deser is needed

* feat: make post proc sequence elements mutable

* fix(binding): serialization

---------

Co-authored-by: Luc Georges <luc.sydney.georges@gmail.com>
2025-01-28 14:58:35 +01:00
Nicolas Patry
f4c9fd7f40 Testing ABI3 wheels to reduce number of wheels (#1674)
* Testing ABI3 wheels to reduce number of wheels

* No need for py-clone  anymore.

* Upgrade python versions.

* Remove those flakes.

* Promoting new CI + Fixing secret.
2024-11-15 06:02:22 +01:00
Nicolas Patry
1740bff7a6 Revert "Upgrade python versions."
This reverts commit b81ec467a6.
2024-11-06 13:18:03 +08:00
Nicolas Patry
b81ec467a6 Upgrade python versions. 2024-11-06 13:17:22 +08:00
tinyboxvk
6c15458868 Bump actions versions (#1669)
* Update docs-check.yml

Bump actions/setup-python to v5
Bump python-version to 3.12 (default on ubuntu-latest)
Switch actions-rs/toolchain to dtolnay/rust-toolchain as the former one is no longer maintained

* Update node-release.yml

Bump actions/setup-python to v5
Switch actions-rs/toolchain to dtolnay/rust-toolchain as the former one is no longer maintained
Bump actions/cache to v4
Bump actions/setup-node to v4
Bump actions/upload-artifact to v4
Bump actions/download-artifact to v4

* Update node.yml

Switch actions-rs/toolchain to dtolnay/rust-toolchain as the former one is no longer maintained
Bump actions/cache to v4
Bump actions/setup-node to v4

* Update python-release-conda.yml

Switch actions-rs/toolchain to dtolnay/rust-toolchain as the former one is no longer maintained
Bump conda-incubator/setup-miniconda to v3

* Update python-release.yml

Bump actions/setup-python to v5
Bump actions/download-artifact to v4

* Update rust-release.yml

Switch actions-rs/toolchain to dtolnay/rust-toolchain as the former one is no longer maintained
Bump actions/cache to v4

* Update stale.yml

Bump actions/stale to v9

* Update python.yml

Bump actions/setup-python to v5
2024-11-01 10:19:35 +01:00
tinyboxvk
41e0eaa561 Bump actions/checkout to v4 (#1667)
Signed-off-by: tinyboxvk <tinyboxvk@users.noreply.github.com>
2024-10-29 14:32:07 +01:00
Nicolas Patry
25aee8b88c [BREAKING CHANGE] Ignore added_tokens (both special and not) in the decoder (#1513)
* [BREAKING CHANGE] Ignore added_tokens (both special and not) in the
decoder

Causes issues with `ByteLevel` messing up some `AddedTokens` with some
utf-8 range used in the bytelevel mapping.

This commit tests the extend of the damage of ignoring the decoder for
those tokens.

* Format.

* Installing cargo audit.

* Minor fix.

* Fixing "bug" in node/python.

* Autoformat.

* Clippy.

* Only prefix space when there's no decoder.
2024-05-06 11:49:38 +02:00
Arthur
f2ec3b239b remove enforcement of non special when adding tokens (#1521)
* remove enforcement of non special when adding tokens

* mut no longer needed

* add a small test

* nit

* style

* audit

* ignore cargo audit's own vulnerability

* update

* revert

* remove CVE
2024-04-30 15:53:47 +02:00
Nicolas Patry
aed491df8c Fixing the progressbar. (#1353)
* Fixing the progressbar.

* Upgrade deps.

* Update cargo audit

* Ssh this action.

* Fixing esaxx by using slower rust version.

* Trying the new esaxx version.

* Publish.

* Get cache again.
2023-10-05 15:33:58 +02:00
Nicolas Patry
d2010d5165 Move to maturing mimicking move for safetensors. + Rewritten node bindings. (#1331)
* Move to maturing mimicking move for `safetensors`.

* Tmp.

* Fix sdist.

* Wat?

* Clippy 1.72

* Remove if.

* Conda sed.

* Fix doc check workflow.

* Moving to maturin AND removing http + openssl mess (smoothing transition
moving to `huggingface_hub`)

* Fix dep

* Black.

* New node bindings.

* Fix docs + node cache ?

* Yarn.

* Working dir.

* Extension module.

* Put back interpreter.

* Remove cache.

* New attempt

* Multi python.

* Remove FromPretrained.

* Remove traces of `fromPretrained`.

* Drop 3.12 for windows?

* Typo.

* Put back the default feature for ignoring links during simple test.

* Fix ?

* x86_64 -> x64.

* Remove warning for windows bindings.

* Excluse aarch.

* Include/exclude.

* Put back workflows in correct states.
2023-08-28 16:24:14 +02:00
Funtowicz Morgan
a03330607b Update all GH Actions with dependency on actions/checkout from v[1,2] to v3 to notably improve performance (retrieve only the commit being checked-out) (#1256) 2023-05-22 14:50:00 +02:00
Andrew Kane
67080e163a Include license file in Rust crate (#1115)
* Include license file in Rust crate

* Ignore security warning.

* Also for python.

* Upgrading ubuntu version.

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2022-11-30 23:17:56 +01:00
Nicolas Patry
bbae829a72 Adding rust audit. (#1099)
* Adding rust audit.

* Update clap version + derive_builder (they clashed).

* Ignoring specific CVE which can be ignored

https://github.com/Azure/iot-identity-service/issues/481

* Updating python lock.

* Revert `derive-builder` update.

* Adding back help msg.
2022-11-09 12:59:36 +01:00
Nicolas Patry
4ef0afbeb6 Update old gh actions, remove deprecated doc building. (#1069) 2022-10-05 17:59:46 +02:00
Nicolas Patry
6113666624 Updating python formatting. (#1079)
* Updating python formatting.

* Forgot gh action.

* Skipping isort to prevent circular imports.

* Updating stub.

* Removing `isort` (it contradicts `stub.py`).

* Fixing weird stub black/isort disagreeement.
2022-10-05 15:29:33 +02:00
h-vetinari
519cc13be0 Upgrade pyo3 to 0.16 (#956)
* Upgrade pyo3 to 0.15

Rebase-conflicts-fixed-by: H. Vetinari <h.vetinari@gmx.com>

* Upgrade pyo3 to 0.16

Rebase-conflicts-fixed-by: H. Vetinari <h.vetinari@gmx.com>

* Install Python before running cargo clippy

* Fix clippy warnings

* Use `PyArray_Check` instead of downcasting to `PyArray1<u8>`

* Enable `auto-initialize` of pyo3 to fix `cargo test
--no-default-features`

* Fix some test cases

Why do they change?

* Refactor and add SAFETY comments to `PyArrayUnicode`

Replace deprecated `PyUnicode_FromUnicode` with `PyUnicode_FromKindAndData`

Co-authored-by: messense <messense@icloud.com>
2022-05-05 15:48:40 +02:00
Nicolas Patry
0eb7455fe5 Preparing 0.12 release. (#967)
* Preparing `0.12` release.

* Fix click version: https://github.com/psf/black/issues/2964
2022-03-31 11:06:33 +02:00
JC Louis
cabbecb96c add python3.10 release (#877)
* add missing python3.9 classifier

* add python3.10 release

* run tests on 3.10

* Revert "run tests on 3.10"

This reverts commit ceed64249e.
2022-01-12 09:42:13 +01:00
Nicolas Patry
b240ccb68a Updating doc with real links. (#851)
* Updating doc with real links.

* Remove cache to make it build ?
2021-12-17 17:50:24 +01:00
Nicolas Patry
c1100ec542 Clippy fixes. (#846)
* Clippy fixes.

* Drop support for Python 3.6

* Remove other 3.6

* Re-enabling caches for build (5h + seems too long and issue seems
solved)

https://github.com/actions/virtual-environments/issues/572

* `npm audit fix`.

* Fix yaml ?

* Pyarrow issue fixed: https://github.com/huggingface/datasets/pull/2268

* Installing dev libraries.

* Install python dev elsewhere ?

* Typo.

* No sudo.

* ...

* Testing the GH again.

* Maybe v2 will fix ?

* Fixing tests on MacOS Python 3.8+
2021-12-15 15:55:48 +01:00
Anthony MOI
755e5f5c1e Remove support for Python 3.5 (#714)
* Python - remove support for python 3.5

* revert ci

* revert build-wheels.sh

* Update CHANGELOG.md
2021-05-24 17:31:01 -04:00
Anthony MOI
f12be3030f Try with ubuntu 18.04 2021-03-16 12:32:06 -04:00
Anthony MOI
2c711d45ce CI - Force pyarrow<3.0.0 for now 2021-02-03 12:44:46 -05:00
Anthony MOI
91dae1de15 Doc - Add documentation for training from iterators 2021-01-12 15:51:38 -05:00
Nicolas Patry
352c92ad33 Automatically stubbing the pyi files while keeping inspecting ability (#509)
* First pass on automatic stubbing our python files.

* And now modifying all rust docs to be visible in Pyi files.

* Better assert fail message.

* Fixing github workflow.

* Removing types not exported anymore.

* Fixing `Tokenizer` signature.

* Disabling auto __init__.py.

* Re-enabling some types.

* Don't overwrite non automated __init__.py

* Automated most __init__.py

* Restubbing after rebase.

* Fixing env for tests.

* Install blakc in the env.

* Use PY35 target in stub.py

Co-authored-by: Anthony MOI <m.anthony.moi@gmail.com>
2020-11-17 15:13:00 -05:00
Nicolas Patry
a410903051 Upgrading to black 20.8b1 2020-09-24 09:27:30 -04:00
Nicolas Patry
df827d538f Adding clippy as a linter within the Python binding. (#388)
* Adding clippy as a linter within the Python binding.

* Missing clippy (dropped commit ??)
2020-09-04 09:09:02 -04:00
ropottnik
50ac90d338 testable example docs for training-serialization (#373)
* testable usage docs for training and serialization and reference in README.md

* Generate Readme from testable examples + template

* add up-to-date check for Readme with generated one

* try make pipeline fail by adding something to the lib.rs readme

* remove difference from lib.rs again to make pipeline pass

* fix black version

Co-authored-by: Simon Ertl <simon@Simons-MacBook-Pro.local>
2020-08-31 13:59:34 -04:00
Anthony MOI
ac1dc4b842 CI - Also install numpy for python tests 2020-08-21 18:39:49 -04:00
Sebastian Pütz
f6adcf0e7c Remove typetag, bump deps. 2020-08-04 15:59:33 -04:00
Anthony MOI
1f65f10a20 Disable cache on Python workflow for now 2020-08-03 19:42:02 -04:00
Sebastian Pütz
1f64761480 Cache based on Cargo.lock. 2020-08-03 10:51:57 -04:00
Sebastian Pütz
0d7c232f95 Move Python source to subdirectory.
This allows testing versions not built in-place. Otherwise
importing (or testing) in the package root fails without develop
builds.
Replace maturin with setuptools_rust since maturin fails with
proper project structure.
2020-07-25 23:40:47 +02:00
Anthony MOI
5f760df231 CI - Add build checks on macos for Python 2020-06-22 20:31:52 -04:00
Anthony MOI
f8f5cafccd Fix CI for windows 32-bit 2020-05-21 19:26:05 -04:00
Anthony MOI
b98e330418 Use older nightly on windows for now 2020-05-21 09:42:13 -04:00
Anthony MOI
33681faaf2 Python - Check it builds for windows 32 2020-04-08 16:03:24 -04:00
Anthony MOI
b03fea1d66 Python - Update workflow and Makefile with tests 2020-04-01 17:36:33 -04:00
Pierric Cistac
d3fb1d12f4 Try avoid duplicated github actions in PRs 2020-04-01 16:39:51 -04:00
Anthony MOI
449222c659 Update Python workflow to help find right nightly
Whenever a component is missing from the last nightly (here rustfmt)
this should help find the last nightly that did have it.
2020-04-01 14:16:36 -04:00
Pierric Cistac
d90593a5e8 Run github actions on pull requests
Try to fix actions not running for pull requests opened by external contributors cc @n1t0
2020-04-01 14:04:14 -04:00
Pierric Cistac
0c572097a2 fix cargo cache in ci
see https://github.com/actions/cache/issues/133#issuecomment-599102035
2020-03-18 15:35:35 -04:00
Pierric Cistac
6b3dfcfd85 fix path-ignore in workflows 2020-03-04 10:48:05 -05:00