* Fix helm release process
* Fix CI helm checks, remove version number param
* Remove rc preparation test as not existing anymore
---------
Co-authored-by: Jens Scheffler <jscheffl@apache.org>
* CI: Skip newsfragment check when `skip newsfragment check` label is set
When the `skip newsfragment check` label is applied to a PR, the
newsfragment PR number check workflow is skipped entirely. The workflow
now also triggers on `labeled`/`unlabeled` events so it re-evaluates
when labels change.
* Add skip instructions to newsfragment check error and docs
Update the CI error message to inform users they can add the
'skip newsfragment check' label to bypass the check. Also document
this option in the contribution workflow guide.
* Update python version exclusion to 3.15
* Add 3.14 metadata version classifiers and related constants
* Regenerate Breeze command help screenshots
* Assorted workarounds to fix breeze image building
- constraints are skipped entirely
- greenlet pin updated
* Exclude cassandra
* Exclude amazon
* Exclude google
* CI: Only add pydantic extra to Airflow 2 migration tests
Before this fix there were two separate issues in the migration-test setup for Python 3.14:
1. The migration workflow always passes --airflow-extras pydantic.
2. For Python 3.14, the minimum Airflow version is resolved to 3.2.0 by get_min_airflow_version_for_python.py, and apache-airflow[pydantic]==3.2.0 is not a valid thing to install.
So when constraints installation fails, the fallback path tries to install an invalid spec.
* Disable DB migration tests for python 3.14
* Enforce werkzeug 3.x for python 3.14
* Increase K8s executor test timeout for Python 3.14
Python 3.14 changed the default multiprocessing start method from 'fork' to 'forkserver' on Linux. The forkserver start method is slower because each new process must import modules from scratch rather than copying the parent's address space. This makes `multiprocessing.Manager()` initialization take longer, causing the test to exceed its 10s timeout.
* Adapt LocalExecutor tests for Python 3.14 forkserver default
Python 3.14 changed the default multiprocessing start method from
'fork' to 'forkserver' on Linux. Like 'spawn', 'forkserver' doesn't
share the parent's address space, so mock patches applied in the test
process are invisible to worker subprocesses.
- Skip tests that mock across process boundaries on non-fork methods
- Add test_executor_lazy_worker_spawning to verify that non-fork start
methods defer worker creation and skip gc.freeze
- Make test_multiple_team_executors_isolation and
test_global_executor_without_team_name assert the correct worker
count for each start method instead of assuming pre-spawning
- Remove skip from test_clean_stop_on_signal (works on all methods)
and increase timeout from 5s to 30s for forkserver overhead
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Bump dependencies to versions supporting 3.14
* Fix PROD image build failing on Python 3.14 due to excluded providers
The PROD image build installed all provider wheels regardless of Python
version compatibility. Providers like google and amazon that exclude
Python 3.14 were still passed to pip, causing resolution failures (e.g.
ray has no cp314 wheel on PyPI).
Two fixes:
- get_distribution_specs.py now reads each wheel's Requires-Python
metadata and skips incompatible wheels instead of passing them to pip.
- The requires-python specifier generation used !=3.14 which per PEP 440
only excludes 3.14.0, not 3.14.3. Changed to !=3.14.* wildcard.
* Split core test types into 2 matrix groups to avoid OOM on Python 3.14
Non-DB core tests use xdist which runs all test types in a single pytest
process. With 2059 items across 4 workers, memory accumulates until the
OOM killer strikes at ~86% completion (exit code 137).
Split core test types into 2 groups (API/Always/CLI and
Core/Other/Serialization), similar to how provider tests already use
_split_list with NUMBER_OF_LOW_DEP_SLICES. Each group gets ~1000 items,
well under the ~1770 threshold where OOM occurs.
Update selective_checks test expectations to reflect the 2-group split.
* Gracefully handle an already removed password file in fixture
The old code had a check-then-act race (if `os.path.exists` → `os.remove`), which fails when the file doesn't exist at removal time. `contextlib.suppress(FileNotFoundError)` handles this atomically — if the file is missing (never created in this xdist worker, or removed between check and delete), it's silently ignored.
* Fix OOM and flaky tests in test_process_utils
Replace multiprocessing.Process with subprocess.Popen running minimal
inline scripts. multiprocessing.Process uses fork(), which duplicates
the entire xdist worker memory. At 95% test completion the worker has
accumulated hundreds of MBs; forking it triggers the OOM killer
(exit code 137) on Python 3.14.
subprocess.Popen starts a fresh lightweight process (~10MB) without
copying the parent's memory, avoiding the OOM entirely.
Also replace the racy ps -ax process counting in
TestKillChildProcessesByPids with psutil.pid_exists() checks on the
specific PID — the old approach was non-deterministic because unrelated
processes could start/stop between measurements.
* Add prek hook to validate python_version markers for excluded providers
When a provider declares excluded-python-versions in provider.yaml,
every dependency string referencing that provider in pyproject.toml
must carry a matching python_version marker. Missing markers cause
excluded providers to be silently installed as transitive dependencies
(e.g. aiobotocore pulling in amazon on Python 3.14).
The new check-excluded-provider-markers hook reads exclusions from
provider.yaml and validates all dependency strings in pyproject.toml
at commit time, preventing regressions like the one fixed in the
previous commit.
* Update `uv.lock`
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The `--provider` flag was only passed to `extract_metadata.py` but not
to `extract_parameters.py` or `extract_connections.py`. This caused
incremental builds to scan all 99 providers and 1625 modules instead
of just the requested one.
The registry workflow was building the CI image from scratch every run
(~24 min) because it lacked the BuildKit mount cache that
ci-image-build.yml provides. Inline `breeze ci-image build` with
registry cache doesn't help because Docker layer cache invalidates
on every commit when the build context changes.
Split into two jobs following the established pattern used by
ci-amd-arm.yml and update-constraints-on-push.yml:
- `build-ci-image`: calls ci-image-build.yml which handles mount cache
restore, ghcr.io login, registry cache, and image stashing
- `build-and-publish-registry`: restores the stashed image via
prepare_breeze_and_image action, then runs the rest unchanged
* Fix merge crash when incremental extract skips modules.json
extract_parameters.py with --provider intentionally skips writing
modules.json (only the targeted provider's parameters are extracted).
The merge script assumed modules.json always exists, causing a
FileNotFoundError during incremental builds.
Handle missing new_modules_path the same way missing
existing_modules_path is already handled: treat it as an empty list.
* Fix /mnt not writable when loading stashed CI image
The prepare_breeze_and_image action loads the CI image from /mnt, which
requires make_mnt_writeable.sh to run first. Each job gets a fresh
runner, so the writeable /mnt from the build job doesn't carry over.
* Regenerate pnpm lockfile for workspace mode
Adding `packages: ['.']` to pnpm-workspace.yaml changed how pnpm
processes overrides, causing ERR_PNPM_LOCKFILE_CONFIG_MISMATCH with
--frozen-lockfile. Regenerate the lockfile with pnpm 9 to match.
* Scope prebuild uv resolution to dev/registry project
The prebuild script ran `uv run` without --project, causing uv to
resolve the full workspace including samba → krb5 which needs
libkrb5-dev (not installed on the CI runner).
Eleventy pagination templates emit empty fallback JSON for every provider,
even when only one provider's data was extracted. A plain `aws s3 sync`
uploads those stubs and overwrites real connection/parameter data.
Changes:
- Exclude per-provider connections.json and parameters.json from the main
S3 sync during incremental builds, then selectively upload only the
target provider's API files
- Filter connections early in extract_connections.py (before the loop)
and support space-separated multi-provider IDs
- Suppress SCARF_ANALYTICS and DO_NOT_TRACK telemetry in CI
- Document the Eleventy pagination limitation in README and AGENTS.md
* Exclude all per-provider API files during incremental S3 sync
The previous exclude only covered connections.json and parameters.json,
but modules.json and versions.json for non-target providers also contain
incomplete data (no version info extracted) and would overwrite correct
data on S3. Simplify to exclude the entire api/providers/* subtree and
selectively upload only the target provider's directory.
* Also exclude provider HTML pages during incremental S3 sync
Non-target provider pages are rebuilt without connection/parameter data
(the version-specific extraction files don't exist locally). Without
this exclude, the incremental build overwrites complete HTML pages on
S3 with versions missing the connection builder section.
The providers listing page uses merged data (all providers) and must
be updated during incremental builds — especially for new providers.
AWS CLI --include after --exclude re-includes the specific file.
extract_metadata.py unconditionally creates dev/registry/logos/ even
when no logos are found (e.g. incremental build for a provider without
logos). The workflow's `if [ -d ... ]` check passes for the empty
directory, then `cp -r logos/*` fails because the glob doesn't expand.
Fix both sides:
- Workflow: check directory is non-empty before glob-based cp
- extract_metadata.py: create logo directories lazily, only when a
logo is actually found
`uv run` from the repo root resolves the full workspace, pulling in
all providers including samba → gssapi which requires libkrb5-dev
(not installed on the runner). Use `--project dev/registry` so uv
resolves deps from dev/registry/pyproject.toml (pydantic + pyyaml)
instead of the entire workspace.
Slack notifications for CI failures and missing doc inventories were
posted on every failing run regardless of whether the failure was
already reported. This adds per-branch state tracking via GitHub
Actions artifacts so notifications are only sent when the set of
failures changes or 24 hours pass (as a "still not fixed" reminder).
Recovery notifications are posted when a previously-failing run passes.
* Switch CI dependency management from constraints to uv.lock
closes: #54609
* Fix selective_checks tests for push events without upgrade
Push events no longer trigger upgrade-to-newer-dependencies unless
uv.lock or pyproject.toml files changed. Updated test expectations.
* Fix remaining selective_checks tests for push events
Update two more test cases that expected upgrade-to-newer-dependencies
to be true for PUSH events.
* Fix CI failures: include uv.lock in Docker context and handle missing constraints
- Add uv.lock to .dockerignore allowlist so uv sync --frozen works in Docker builds
- Make packaging install in install_from_docker_context_files.sh conditional on
constraints.txt existing, since the uv.lock path skips constraints download
* Fix static checks: update uv.lock and breeze docs after rebase
* Use install script with uv.lock constraints for dev dependencies in CI
Revert the entrypoint_ci.sh change from `uv sync --all-packages` back
to using the install_development_dependencies.py script. The uv sync
approach fails when provider source directories are not fully available
in the container (e.g. with selected mounts).
Instead, generate constraints from uv.lock via `uv export` and pass
them to the existing script, which installs only the needed development
dependencies via `uv pip install`.
Also add uv.lock to VOLUMES_FOR_SELECTED_MOUNTS so it is available
inside containers using the "tests and providers" mount mode.
* Warn instead of failing on missing 3rd-party doc inventories
Third-party Sphinx intersphinx inventories (e.g., Pandas) are sometimes
temporarily unavailable. Previously, any download failure terminated the
entire doc build. Now missing 3rd-party inventories produce warnings and
fall back to cached versions when available. A marker file is written for
CI to detect missing inventories and send Slack notifications on canary
builds. Publishing workflows fail by default but can opt out.
- Add --fail-on-missing-third-party-inventories flag (default: off)
- Add --clean-inventory-cache flag (--clean-build no longer deletes cache)
- Cache inventories via stash action in CI and publish workflows
- Send Slack warning on canary builds when inventories are missing
* Add documentation for inventory cache handling options
Document the new --clean-inventory-cache, --fail-on-missing-third-party-inventories,
and --ignore-missing-inventories flags in the contributing docs, Breeze developer
tasks, and release management docs.
* Skip missing third-party inventories in intersphinx mapping
When a third-party inventory file doesn't exist in the cache,
skip it from the Sphinx intersphinx_mapping instead of referencing
a non-existent file. This prevents Sphinx build errors when
third-party inventory downloads fail.
When UPGRADE_COOLDOWN_DAYS is set, the upgrade check will not fail
if there was a recent "Upgrade important" commit within the cooldown
period. This prevents noisy CI failures when versions were recently
addressed. The CI workflow sets a 4-day cooldown matching the existing
prek autoupdate cooldown.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Add tests for scripts and remove redundant sys.path.insert calls
- Remove 85 redundant `sys.path.insert(0, str(Path(__file__).parent.resolve()))`
calls from scripts in ci/prek/, cov/, and in_container/. Python already
adds the script's directory to sys.path when running a file directly,
making these calls unnecessary.
- Keep 6 cross-directory sys.path.insert calls that are genuinely needed
(AIRFLOW_CORE_SOURCES_PATH, AIRFLOW_ROOT, etc.).
- Add __init__.py files to scripts/ci/ and scripts/ci/prek/ to make them
proper Python packages.
- Add scripts/pyproject.toml with package discovery and pytest config.
- Add 176 tests covering: common_prek_utils (insert_documentation,
check_list_sorted, get_provider_id_from_path, ConsoleDiff, etc.),
new_session_in_provide_session, check_deprecations, unittest_testcase,
changelog_duplicates, newsfragments, checkout_no_credentials, and
check_order_dockerfile_extras.
- Add scripts tests to CI: new SCRIPTS_FILES file group in selective
checks, run-scripts-tests output, and tests-scripts job in
basic-tests.yml.
- Document scripts as a workspace distribution in CLAUDE.md.
* Add pytest as dev dependency for scripts distribution
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Use devel-common instead of pytest for scripts dev dependencies
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Fix xdist test collection order for newsfragment tests
Sort the VALID_CHANGE_TYPES set when passing to parametrize to ensure
deterministic test ordering across xdist workers.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Update scripts/ci/prek/changelog_duplicates.py
Co-authored-by: Dev-iL <6509619+Dev-iL@users.noreply.github.com>
* Refactor scripts tests: convert setup methods to fixtures and extract constants
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Dev-iL <6509619+Dev-iL@users.noreply.github.com>
Consolidate ~25 duplicated module type definitions into
`dev/registry/registry_tools/types.py`. All extraction scripts now
import from this shared module, and a generated `types.json` feeds the
Eleventy frontend — so adding a new type means editing one Python dict
instead of ~10 files.
- Make `dev/registry` a uv workspace member with its own pyproject.toml
- Create `registry_tools/types.py` as canonical type registry
- Refactor extract_metadata, extract_parameters, extract_versions to
import from registry_tools.types instead of hardcoding
- Derive module counts from modules.json (runtime discovery) instead
of AST suffix matching — fixes Databricks operator undercount
- Generate types.json for frontend; templates and JS loop over it
- Remove stats grid from provider version page (redundant with filters)
- Add pre-commit hook to keep types.json in sync with types.py
- Add test_types.py for type registry validation
- Fix `"Base" in name` → `name.startswith("Base")` filter bug in
extract_versions.py (was dropping DatabaseOperator, etc.)
- Copy logos to registry/public/logos/ for local dev convenience
* Fix module counts on provider cards and version pages
Eleventy loads providers.json and providerVersions.js as separate data
objects — mutating provider objects in providerVersions.js doesn't
propagate to templates that read from providers.json directly.
Add moduleCountsByProvider.js data file that builds {provider_id: counts}
from modules.json. Templates now read counts from this dedicated source
instead of relying on in-place mutation.
* Merge into existing providers.json in incremental mode
When running extract_metadata.py --provider X, read existing
providers.json and merge rather than overwrite. This makes
parallel runs for different providers safe on the same filesystem.
* Fix statsData.js to read module counts from modules.json
statsData.js was reading p.module_counts from providers.json, which no
longer carries counts. Read from modules.json directly (same pattern as
moduleCountsByProvider.js). Fixes empty Popular Providers on homepage
and zero-count stats.
* Fix breeze registry commands for suspended providers and backfill
Two fixes:
1. extract-data: Install suspended providers (e.g. apache-beam) in the
breeze container before running extraction. These providers have source
code in the repo but aren't pre-installed in the CI image, so
extract_parameters.py couldn't discover their classes at runtime.
2. backfill: Run extract_versions.py as a first step to produce
metadata.json from git tags. Without metadata.json, Eleventy skips
generating version pages — so backfilled parameters/connections data
was invisible on the site.
Adds a new breeze subcommand that extracts runtime parameters and connection
types for previously released provider versions using `uv run --with` — no
Docker or breeze CI image needed.
Also includes:
- Unit tests for all helper functions (16 tests)
- Breeze docs for the backfill command
- GitHub Actions workflow (registry-backfill.yml) that runs providers in
parallel via matrix strategy, then publishes versions.json
- Fix providerVersions.js to use runtime module_counts from modules.json
instead of AST-based counts from providers.json
Two issues:
- `tomllib` is Python 3.11+; use try/except fallback to `tomli` (same
pattern as other breeze modules)
- `TestReadProviderYamlInfo` tests used real filesystem paths that depend
on `tomllib`; replaced with `tmp_path`-based mock files
The registry build job uses static AWS credentials (access key + secret),
not OIDC, so `id-token: write` is not needed. Removing it fixes the
`workflow_call` from `publish-docs-to-s3.yml` which only grants
`contents: read` — callers cannot escalate permissions for nested jobs.
Add a new Breeze CLI command that helps maintainers efficiently triage
open PRs from non-collaborators that don't meet minimum quality criteria.
The command fetches open PRs via GitHub GraphQL API with optimized chunked
queries, runs deterministic CI checks (failures, merge conflicts, missing
test workflows), optionally runs LLM-based quality assessment, and presents
flagged PRs interactively for maintainer review with author profiles and
contribution history.
Key features:
- Optimized GraphQL queries with chunking to avoid GitHub timeout errors
- Deterministic CI failure detection with categorized fix instructions
- LLM assessment via `claude` or `codex` CLI for content quality
- Interactive review with Rich panels, clickable links, and author context
- "maintainer-accepted" label to skip PRs on future runs
- Workflow approval support for first-time contributor PRs awaiting CI runs
- Merge conflict and commits-behind detection with rebase guidance
Update dev/breeze/src/airflow_breeze/commands/pr_commands.py
Update dev/breeze/src/airflow_breeze/utils/llm_utils.py
Update contributing-docs/05_pull_requests.rst
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Eclipse's octopin has been archived in Feb and will not get any
more updates https://github.com/eclipse-csi/octopin.
Dependabot should be good enough to do the updates for us.
Devlist Discussion: https://lists.apache.org/thread/7n4pklzcc4lxtxsy9g69ssffg9qbdyvb
A static-site provider registry for discovering and browsing Airflow providers and their modules. Deployed at `airflow.apache.org/registry/` alongside the existing docs infrastructure (S3 + CloudFront).
Staging preview: https://airflow.staged.apache.org/registry/
## Acknowledgments
Many of you know the [Astronomer Registry](https://registry.astronomer.io), which has been the go-to for discovering providers for years. Big thanks to **Astronomer** and @josh-fell for building and maintaining it. This new registry is designed to be a community-owned successor on `airflow.apache.org`, with the eventual goal of redirecting `registry.astronomer.io` traffic here once it's stable. Thanks also to @ashb for suggesting and prototyping the Eleventy-based approach.
## What it does
The registry indexes all 99 official providers and 840 modules (operators, hooks, sensors, triggers, transfers, bundles, notifiers, secrets backends, log handlers, executors) from the existing
`providers/*/provider.yaml` files and source code in this repo. No external data sources beyond PyPI download stats.
**Pages:**
- **Homepage** — search bar (Cmd+K), stats counters, featured and new providers
- **Providers listing** — filterable by lifecycle stage (stable/incubation/deprecated), category, and sort order (downloads, name, recently updated)
- **Provider detail** — module counts by type, install command with extras/version selection, dependency info, connection builder, and a tabbed module browser with category sidebar and per-module search
- **Explore by Category** — providers grouped into Cloud, Databases, Data Warehouses, Messaging, AI/ML, Data Processing, etc.
- **Statistics** — module type distribution, lifecycle breakdown, top providers by downloads and module count
- **JSON API** — `/api/providers.json`, `/api/modules.json`, per-provider endpoints for modules, parameters, and connections
**Connection Builder** — pick a connection type (e.g. `aws`, `redshift`), fill in the form fields with placeholders and sensitivity markers, and export as URI, JSON, or environment variable format. Fields are
extracted from provider.yaml connection metadata.
* Add CI workflow to validate newsfragment PR numbers
Newsfragment files follow the naming convention `{pr_number}.{type}.rst`,
but nothing currently validates that the PR number in the filename matches
the actual PR number. This has led to cases where a contributor copies a
newsfragment from another PR or makes a typo, and the mismatch goes
unnoticed until a reviewer catches it manually.
The existing `scripts/ci/prek/newsfragments.py` validation script runs as
a local pre-commit hook where the PR number is not yet known, so it cannot
perform this check. Rather than extending that script with optional CLI
args and a separate CI invocation, this adds a standalone lightweight
workflow that uses `gh pr diff --name-only` to get the list of changed
files, greps for newsfragment `.rst` files, and checks that none have a
mismatched PR number — all in a single piped command, no checkout needed.
Notes for reviewers:
- `gh pr diff --name-only` includes deleted files. In practice, newsfragment
deletions only happen during towncrier releases on main, not in contributor
PRs, so this is not a concern for the `pull_request` trigger.
- `GH_TOKEN: ${{ github.token }}` follows the same pattern as
`milestone-tag-assistant.yml` and `backport-cli.yml` which also call `gh`
CLI directly.
- The `pull-requests: read` permission is required for `gh pr diff` to work
on fork PRs.
* fixup! Add CI workflow to validate newsfragment PR numbers
fixup! Add CI workflow to validate newsfragment PR numbers
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>