mirror of
https://github.com/apache/airflow.git
synced 2026-03-26 15:28:46 +00:00
The `--provider` flag was only passed to `extract_metadata.py` but not to `extract_parameters.py` or `extract_connections.py`. This caused incremental builds to scan all 99 providers and 1625 modules instead of just the requested one. The registry workflow was building the CI image from scratch every run (~24 min) because it lacked the BuildKit mount cache that ci-image-build.yml provides. Inline `breeze ci-image build` with registry cache doesn't help because Docker layer cache invalidates on every commit when the build context changes. Split into two jobs following the established pattern used by ci-amd-arm.yml and update-constraints-on-push.yml: - `build-ci-image`: calls ci-image-build.yml which handles mount cache restore, ghcr.io login, registry cache, and image stashing - `build-and-publish-registry`: restores the stashed image via prepare_breeze_and_image action, then runs the rest unchanged * Fix merge crash when incremental extract skips modules.json extract_parameters.py with --provider intentionally skips writing modules.json (only the targeted provider's parameters are extracted). The merge script assumed modules.json always exists, causing a FileNotFoundError during incremental builds. Handle missing new_modules_path the same way missing existing_modules_path is already handled: treat it as an empty list. * Fix /mnt not writable when loading stashed CI image The prepare_breeze_and_image action loads the CI image from /mnt, which requires make_mnt_writeable.sh to run first. Each job gets a fresh runner, so the writeable /mnt from the build job doesn't carry over. * Regenerate pnpm lockfile for workspace mode Adding `packages: ['.']` to pnpm-workspace.yaml changed how pnpm processes overrides, causing ERR_PNPM_LOCKFILE_CONFIG_MISMATCH with --frozen-lockfile. Regenerate the lockfile with pnpm 9 to match. * Scope prebuild uv resolution to dev/registry project The prebuild script ran `uv run` without --project, causing uv to resolve the full workspace including samba → krb5 which needs libkrb5-dev (not installed on the CI runner). Eleventy pagination templates emit empty fallback JSON for every provider, even when only one provider's data was extracted. A plain `aws s3 sync` uploads those stubs and overwrites real connection/parameter data. Changes: - Exclude per-provider connections.json and parameters.json from the main S3 sync during incremental builds, then selectively upload only the target provider's API files - Filter connections early in extract_connections.py (before the loop) and support space-separated multi-provider IDs - Suppress SCARF_ANALYTICS and DO_NOT_TRACK telemetry in CI - Document the Eleventy pagination limitation in README and AGENTS.md * Exclude all per-provider API files during incremental S3 sync The previous exclude only covered connections.json and parameters.json, but modules.json and versions.json for non-target providers also contain incomplete data (no version info extracted) and would overwrite correct data on S3. Simplify to exclude the entire api/providers/* subtree and selectively upload only the target provider's directory. * Also exclude provider HTML pages during incremental S3 sync Non-target provider pages are rebuilt without connection/parameter data (the version-specific extraction files don't exist locally). Without this exclude, the incremental build overwrites complete HTML pages on S3 with versions missing the connection builder section. The providers listing page uses merged data (all providers) and must be updated during incremental builds — especially for new providers. AWS CLI --include after --exclude re-includes the specific file.