Files
airflow/.github/workflows
Kaxil Naik 208eab43bc Fix registry incremental build processing all providers (#63769)
The `--provider` flag was only passed to `extract_metadata.py` but not
to `extract_parameters.py` or `extract_connections.py`. This caused
incremental builds to scan all 99 providers and 1625 modules instead
of just the requested one.

The registry workflow was building the CI image from scratch every run
(~24 min) because it lacked the BuildKit mount cache that
ci-image-build.yml provides. Inline `breeze ci-image build` with
registry cache doesn't help because Docker layer cache invalidates
on every commit when the build context changes.

Split into two jobs following the established pattern used by
ci-amd-arm.yml and update-constraints-on-push.yml:

- `build-ci-image`: calls ci-image-build.yml which handles mount cache
  restore, ghcr.io login, registry cache, and image stashing
- `build-and-publish-registry`: restores the stashed image via
  prepare_breeze_and_image action, then runs the rest unchanged

* Fix merge crash when incremental extract skips modules.json

extract_parameters.py with --provider intentionally skips writing
modules.json (only the targeted provider's parameters are extracted).
The merge script assumed modules.json always exists, causing a
FileNotFoundError during incremental builds.

Handle missing new_modules_path the same way missing
existing_modules_path is already handled: treat it as an empty list.

* Fix /mnt not writable when loading stashed CI image

The prepare_breeze_and_image action loads the CI image from /mnt, which
requires make_mnt_writeable.sh to run first. Each job gets a fresh
runner, so the writeable /mnt from the build job doesn't carry over.

* Regenerate pnpm lockfile for workspace mode

Adding `packages: ['.']` to pnpm-workspace.yaml changed how pnpm
processes overrides, causing ERR_PNPM_LOCKFILE_CONFIG_MISMATCH with
--frozen-lockfile. Regenerate the lockfile with pnpm 9 to match.

* Scope prebuild uv resolution to dev/registry project

The prebuild script ran `uv run` without --project, causing uv to
resolve the full workspace including samba → krb5 which needs
libkrb5-dev (not installed on the CI runner).

Eleventy pagination templates emit empty fallback JSON for every provider,
even when only one provider's data was extracted.  A plain `aws s3 sync`
uploads those stubs and overwrites real connection/parameter data.

Changes:
- Exclude per-provider connections.json and parameters.json from the main
  S3 sync during incremental builds, then selectively upload only the
  target provider's API files
- Filter connections early in extract_connections.py (before the loop)
  and support space-separated multi-provider IDs
- Suppress SCARF_ANALYTICS and DO_NOT_TRACK telemetry in CI
- Document the Eleventy pagination limitation in README and AGENTS.md

* Exclude all per-provider API files during incremental S3 sync

The previous exclude only covered connections.json and parameters.json,
but modules.json and versions.json for non-target providers also contain
incomplete data (no version info extracted) and would overwrite correct
data on S3.  Simplify to exclude the entire api/providers/* subtree and
selectively upload only the target provider's directory.

* Also exclude provider HTML pages during incremental S3 sync

Non-target provider pages are rebuilt without connection/parameter data
(the version-specific extraction files don't exist locally). Without
this exclude, the incremental build overwrites complete HTML pages on
S3 with versions missing the connection builder section.

The providers listing page uses merged data (all providers) and must
be updated during incremental builds — especially for new providers.
AWS CLI --include after --exclude re-includes the specific file.
2026-03-17 19:43:05 +00:00
..