Files
airflow/.github
Kaxil Naik 475cdae229 Registry: single source of truth for module types (#63322)
Consolidate ~25 duplicated module type definitions into
`dev/registry/registry_tools/types.py`. All extraction scripts now
import from this shared module, and a generated `types.json` feeds the
Eleventy frontend — so adding a new type means editing one Python dict
instead of ~10 files.

- Make `dev/registry` a uv workspace member with its own pyproject.toml
- Create `registry_tools/types.py` as canonical type registry
- Refactor extract_metadata, extract_parameters, extract_versions to
  import from registry_tools.types instead of hardcoding
- Derive module counts from modules.json (runtime discovery) instead
  of AST suffix matching — fixes Databricks operator undercount
- Generate types.json for frontend; templates and JS loop over it
- Remove stats grid from provider version page (redundant with filters)
- Add pre-commit hook to keep types.json in sync with types.py
- Add test_types.py for type registry validation
- Fix `"Base" in name` → `name.startswith("Base")` filter bug in
  extract_versions.py (was dropping DatabaseOperator, etc.)
- Copy logos to registry/public/logos/ for local dev convenience

* Fix module counts on provider cards and version pages

Eleventy loads providers.json and providerVersions.js as separate data
objects — mutating provider objects in providerVersions.js doesn't
propagate to templates that read from providers.json directly.

Add moduleCountsByProvider.js data file that builds {provider_id: counts}
from modules.json. Templates now read counts from this dedicated source
instead of relying on in-place mutation.

* Merge into existing providers.json in incremental mode

When running extract_metadata.py --provider X, read existing
providers.json and merge rather than overwrite. This makes
parallel runs for different providers safe on the same filesystem.

* Fix statsData.js to read module counts from modules.json

statsData.js was reading p.module_counts from providers.json, which no
longer carries counts. Read from modules.json directly (same pattern as
moduleCountsByProvider.js). Fixes empty Popular Providers on homepage
and zero-count stats.

* Fix breeze registry commands for suspended providers and backfill

Two fixes:

1. extract-data: Install suspended providers (e.g. apache-beam) in the
   breeze container before running extraction. These providers have source
   code in the repo but aren't pre-installed in the CI image, so
   extract_parameters.py couldn't discover their classes at runtime.

2. backfill: Run extract_versions.py as a first step to produce
   metadata.json from git tags. Without metadata.json, Eleventy skips
   generating version pages — so backfilled parameters/connections data
   was invisible on the site.
2026-03-11 04:15:43 +00:00
..