637 Commits

Author SHA1 Message Date
Matthew Honnibal
d4bb796b5e Add least-privilege permissions to CI workflow 2026-03-27 08:49:43 +01:00
Matthew Honnibal
9d29209d04 Pass github context via stdin instead of CLI arg 2026-03-27 08:49:41 +01:00
Matthew Honnibal
4216738cf8 Pin GitHub Actions to commit SHAs for supply chain security 2026-03-26 15:48:47 +01:00
Matthew Honnibal
297938e704 Add smoke test and upgrade test to release build workflow
After wheels are built:
- smoke_test: install from wheel, download en_core_web_sm, verify entities
- upgrade_test: install previous spacy, download model, upgrade from wheel,
  verify model still loads and produces entities
2026-03-23 18:46:43 +01:00
Matthew Honnibal
f175a51e2d Fully migrate to Pydantic v2 (#13940)
Use confection v1.3 and Thinc v8.3.13, which implement custom validation logic in place of Pydantic, allowing us to properly adopt Pydantic v2 and provide full Python 3.14 support.

Our dependency tree used Pydantic v1 in unusual ways, and relied on behaviours that Pydantic v2 reformed. In the time since Pydantic v2 was released there were a few attempts to migrate over to it, but the task has been complicated by the fact that the confection library has a fairly tangled implementation and I had reduced availability for open-source work in 2024 and 2025.

Specifically, our library confection provides the extensible configuration system we use in spaCy and Thinc. The config system allows you to refer to values that will be supplied by arbitrary functions, that e.g. define some neural network model or its sublayers. The functionality in confection is complicated because we aggressively prioritised user experience in the specification, even if it required increased implementation complexity.

Confection's original implementation built a dynamic Pydantic v1 schema for function-supplied values ("promises"). We validate the schema before calling any promises, and then validate the schema again after calling all the promises and substituting in their values. The variable-interpolation system adds further difficulties to the implementation, and we have to do it all subclassing the Python built-in configparser, which ties us to implementation choices I'd do differently if I had a clean slate.

Here's one summary of Pydantic v1-specific behaviours that the migration to v2 particularly difficult for us. This particular summary was produced during a session with Claude Code Opus 4.6, so nuances of it might be wrong. The full history of attempts at doing this spans over different refactors separated by a few months at a time, so I don't have a full record of all the things that I struggled with. It's possible some details of this summary are incorrect though.

The core problem we kept hitting: Pydantic v2 compiles validation schemas upfront and has much stricter immutability. The whole session has been a series of workarounds for this:

```
 1. Schema mutation — v1 let you mutate __fields__ in place; v2 needs model_rebuild() which loses forward ref namespaces, or create_model subclasses which don't propagate to parent schemas.
 2. model_dump vs dict — v2 converts dataclasses to dicts, breaking resolved objects. Needed a custom _model_to_dict helper.
 3. model_construct drops extras — v2 silently drops fields with extra="forbid", needed manual workarounds.
 4. Strict coercion — v2 coerces ndarray to List[Floats1d] via iteration, needed strict=True.
 5. Forward refs — Every schema with TYPE_CHECKING imports needs model_rebuild() with the right namespace, and that breaks when confection re-rebuilds later.
In order to adjust for behavioural differences like this, I'd refactored confection to build the different versions of the schema in multiple passes, instead of building all the representations together as we'd been doing. However this refactor itself had problems, further complicating the migration.
```

~I've now bitten the bullet and rolled back the refactor I'd been attempting of confection, and instead replaced the Pydantic validation with custom logic. This allows Confection to remove Pydantic as a dependency entirely.~ Update: Actually I went back and got the refactor working. All much nicer now.

I've taken some lengths to explain this because migrating off a dependency after breaking changes can be a sensitive topic. I want to stress that the changes Pydantic made from v1 to v2 are very good, and I greatly appreciate them as a user of FastAPI in our services. It would be very bad for the ecosystem if Pydantic pinned themselves to exactly matching the behaviours they had in v1 just to avoid breaking support for the sort of thing we'd been doing. Instead users who were relying on those behaviours like us should just find some way to adapt --- either vendor the v1 version we need, or change our behaviours, or implement an alternative. I would have liked to do this sooner but we've ultimately gone with the third option.
2026-03-23 13:45:02 +01:00
Matthew Honnibal
86f7ce303a Limit CI ruff lint to isort-only checks for now 2026-03-21 08:41:25 +01:00
Matthew Honnibal
79b5f811bf Update CI validation workflow: replace black, isort, flake8 with ruff 2026-03-21 08:39:43 +01:00
Matthew Honnibal
a534b43ced Fix wheel path name on cibuildwheel 2025-11-13 14:23:55 +01:00
Matthew Honnibal
09b4eb4ebe Use reuseable gha 2025-11-10 11:53:30 +01:00
Matthew Honnibal
d01a180a7f Update build matrix for tests 2025-11-05 10:26:48 +01:00
Matthew Honnibal
5bebbf7550 Python 3.13 support (#13823)
In order to support Python 3.13, we had to migrate to Cython 3.0. This caused some tricky interaction with our Pydantic usage, because Cython 3 uses the from __future__ import annotations semantics, which causes type annotations to be saved as strings.

The end result is that we can't have Language.factory decorated functions in Cython modules anymore, as the Language.factory decorator expects to inspect the signature of the functions and build a Pydantic model. If the function is implemented in Cython, an error is raised because the type is not resolved.

To address this I've moved the factory functions into a new module, spacy.pipeline.factories. I've added __getattr__ importlib hooks to the previous locations, in case anyone was importing these functions directly. The change should have no backwards compatibility implications.

Along the way I've also refactored the registration of functions for the config. Previously these ran as import-time side-effects, using the registry decorator. I've created instead a new module spacy.registrations. When the registry is accessed it calls a function ensure_populated(), which cases the registrations to occur.

I've made a similar change to the Language.factory registrations in the new spacy.pipeline.factories module.

I want to remove these import-time side-effects so that we can speed up the loading time of the library, which can be especially painful on the CLI. I also find that I'm often working to track down the implementations of functions referenced by strings in the config. Having the registrations all happen in one place will make this easier.

With these changes I've fortunately avoided the need to migrate to Pydantic v2 properly --- we're still using the v1 compatibility shim. We might not be able to hold out forever though: Pydantic (reasonably) aren't actively supporting the v1 shims. I put a lot of work into v2 migration when investigating the 3.13 support, and it's definitely challenging. In any case, it's a relief that we don't have to do the v2 migration at the same time as the Cython 3.0/Python 3.13 support.
2025-05-22 13:47:21 +02:00
Matthew Honnibal
b3c46c315e Add support for linux-arm 2025-02-03 18:32:23 +01:00
Matthew Honnibal
be0fa812c2 Update cibuildwheel 2024-12-11 13:08:40 +01:00
Matthew Honnibal
a6317b3836 Fix allocation of non-transient strings in StringStore (#13713)
* Fix bug in memory-zone code when adding non-transient strings. The error could result in segmentation faults or other memory errors during memory zones if new labels were added to the model.
* Fix handling of new morphological labels within memory zones. Addresses second issue reported in Memory leak of MorphAnalysis object. #13684
2024-12-11 13:06:53 +01:00
Matthew Honnibal
63f1b53c1a Check test failure 2024-10-01 16:49:49 +02:00
Matthew Honnibal
924cbc9703 Fix environment variable for test 2024-10-01 16:08:06 +02:00
Matthew Honnibal
6c038aaae0 Don't disable tests on workflow changes 2024-10-01 15:32:01 +02:00
Matthew Honnibal
f0084b9143 Fix matrix in tests 2024-10-01 15:28:22 +02:00
Matthew Honnibal
ff81bfb8db Update tests 2024-10-01 13:21:10 +02:00
Matthew Honnibal
8266031454 Merge numpy version update 2024-09-14 11:08:35 +02:00
Matthew Honnibal
8dcc4b8daf Skip running tests on PRs 2024-09-14 11:07:23 +02:00
Matthew Honnibal
83b4015b36 Remove aarch 2024-09-13 12:35:50 +02:00
Matthew Honnibal
419bfaf6e7 Update cibuildwheel 2024-09-13 10:44:48 +02:00
Matthew Honnibal
1869a197c9 Try enabling macos-14 for arm builds 2024-09-11 16:06:57 +02:00
Matthew Honnibal
a8accc3396 Use cibuildwheel to build wheels (#13603)
* Add workflow files for cibuildwheel

* Add config for cibuildwheel

* Set version for experimental prerelease

* Try updating cython

* Skip 32-bit windows builds

* Revert "Try updating cython"

This reverts commit c1b794ab5c.

* Try to import cibuildwheel settings from previous setup
2024-08-20 12:15:05 +02:00
Matthew Honnibal
f78e5ce732 Disable extra CI 2024-06-21 14:32:00 +02:00
Sofie Van Landeghem
ecd85d2618 Update Typer pin and GH actions (#13471)
* update gh actions

* pin typer upperbound to 1.0.0
2024-04-29 13:28:46 +02:00
Sofie Van Landeghem
74836524e3 Bump to v5 (#13470) 2024-04-29 10:36:31 +02:00
Sofie Van Landeghem
6d6c10ab9c Fix CI (#13469)
* Remove hardcoded architecture setting

* update classifiers to include Python 3.12
2024-04-29 10:18:07 +02:00
Sofie Van Landeghem
4dc5fe5469 Renamed main branch back to v4 for now (#13395)
* Update gputests.yml

* Update slowtests.yml
2024-03-26 09:53:07 +01:00
Daniël de Kok
d84068e460 Run slow tests: v4 -> main (#13290)
* Run slow tests: v4 -> main

* Also update the branch in GPU tests
2024-01-30 13:58:28 +01:00
Ines Montani
8cfccdd2f8 Update links [ci skip] 2023-12-11 15:51:43 +01:00
Ines Montani
f78b91c03b Update links [ci skip] 2023-12-11 15:51:01 +01:00
Ines Montani
bf7c2ea99a Add merch link [ci skip] 2023-11-22 12:55:00 +01:00
Adriane Boyd
92f1d0a195 CI: Switch to stable python 3.12 and limit 3.11 runs (#13104) 2023-11-03 15:46:03 +01:00
Adriane Boyd
1b043dde3f Revert "disable tests until 3.7 models are available"
This reverts commit 991bcc111e.
2023-10-01 18:48:31 +02:00
Adriane Boyd
78504c25a5 CI: Add python 3.12.0rc2 2023-09-28 17:12:42 +02:00
Adriane Boyd
b4990395f9 Update mypy requirements 2023-09-28 17:12:42 +02:00
Adriane Boyd
ff4215f1c7 Drop support for python 3.6 (#13009)
* Drop support for python 3.6

* Update docs
2023-09-25 14:48:38 +02:00
svlandeg
79ec68f01b Merge branch 'upstream_master' into sync_develop 2023-07-19 12:08:52 +02:00
Basile Dura
b0228d8ea6 ci: add cython linter (#12694)
* chore: add cython-linter dev dependency

* fix: lexeme.pyx

* fix: morphology.pxd

* fix: tokenizer.pxd

* fix: vocab.pxd

* fix: morphology.pxd (line length)

* ci: add cython-lint

* ci: fix cython-lint call

* Fix kb/candidate.pyx.

* Fix kb/kb.pyx.

* Fix kb/kb_in_memory.pyx.

* Fix kb.

* Fix training/ partially.

* Fix training/. Ignore trailing whitespaces and too long lines.

* Fix ml/.

* Fix matcher/.

* Fix pipeline/.

* Fix tokens/.

* Fix build errors. Fix vocab.pyx.

* Fix cython-lint install and run.

* Fix lexeme.pyx, parts_of_speech.pxd, vectors.pyx. Temporarily disable cython-lint execution.

* Fix attrs.pyx, lexeme.pyx, symbols.pxd, isort issues.

* Make cython-lint install conditional. Fix tokenizer.pyx.

* Fix remaining files. Reenable cython-lint check.

* Readded parentheses.

* Fix test_build_dependencies().

* Add explanatory comment to cython-lint execution.

---------

Co-authored-by: Raphael Mitsch <r.mitsch@outlook.com>
2023-07-19 12:03:31 +02:00
Adriane Boyd
76329e1dde Revert "Temporarily skip download CLI related tests in CI"
This reverts commit 46ce66021a.
2023-07-06 12:48:06 +02:00
Daniël de Kok
e73c1a89bf CI: add isort --check to validate job (#12727) 2023-06-15 23:10:25 +01:00
Adriane Boyd
9b7a59c325 Revert "CI: Disable fail-fast (#12658)" (#12676)
This reverts commit 1f088cbf4a.
2023-05-26 10:57:02 +02:00
Adriane Boyd
1f088cbf4a CI: Disable fail-fast (#12658)
While the typing_extensions/pydantic `Literal` bugs are being sorted
out, disable fail-fast so the rest of the CI is available for
development purposes.
2023-05-23 10:48:06 +02:00
Adriane Boyd
46ce66021a Temporarily skip download CLI related tests in CI 2023-05-08 09:17:33 +02:00
Adriane Boyd
6817e3d372 CI: Only run test suite once with thinc-apple-ops for macos python 3.11 (#12436)
* CI: Only run test suite once with thinc-apple-ops for macos python 3.11

* Adjust syntax

* Try alternate syntax

* Try alternate syntax

* Try alternate syntax
2023-04-28 14:29:51 +02:00
Adriane Boyd
68da580a4c CI: Disable Azure (#12560) 2023-04-21 15:05:53 +02:00
Adriane Boyd
2fba21be63 Restrict github workflows to explosion (#12470) 2023-03-27 12:44:04 +02:00
Adriane Boyd
54c614e116 CI: Separate spacy universe validation into a separate workflow (#12440)
* Separate spacy universe validation into a separate workflow

* Fix new workflow name
2023-03-17 10:59:53 +01:00