mirror of
https://github.com/apache/airflow.git
synced 2026-03-26 15:28:46 +00:00
Consolidate ~25 duplicated module type definitions into
`dev/registry/registry_tools/types.py`. All extraction scripts now
import from this shared module, and a generated `types.json` feeds the
Eleventy frontend — so adding a new type means editing one Python dict
instead of ~10 files.
- Make `dev/registry` a uv workspace member with its own pyproject.toml
- Create `registry_tools/types.py` as canonical type registry
- Refactor extract_metadata, extract_parameters, extract_versions to
import from registry_tools.types instead of hardcoding
- Derive module counts from modules.json (runtime discovery) instead
of AST suffix matching — fixes Databricks operator undercount
- Generate types.json for frontend; templates and JS loop over it
- Remove stats grid from provider version page (redundant with filters)
- Add pre-commit hook to keep types.json in sync with types.py
- Add test_types.py for type registry validation
- Fix `"Base" in name` → `name.startswith("Base")` filter bug in
extract_versions.py (was dropping DatabaseOperator, etc.)
- Copy logos to registry/public/logos/ for local dev convenience
* Fix module counts on provider cards and version pages
Eleventy loads providers.json and providerVersions.js as separate data
objects — mutating provider objects in providerVersions.js doesn't
propagate to templates that read from providers.json directly.
Add moduleCountsByProvider.js data file that builds {provider_id: counts}
from modules.json. Templates now read counts from this dedicated source
instead of relying on in-place mutation.
* Merge into existing providers.json in incremental mode
When running extract_metadata.py --provider X, read existing
providers.json and merge rather than overwrite. This makes
parallel runs for different providers safe on the same filesystem.
* Fix statsData.js to read module counts from modules.json
statsData.js was reading p.module_counts from providers.json, which no
longer carries counts. Read from modules.json directly (same pattern as
moduleCountsByProvider.js). Fixes empty Popular Providers on homepage
and zero-count stats.
* Fix breeze registry commands for suspended providers and backfill
Two fixes:
1. extract-data: Install suspended providers (e.g. apache-beam) in the
breeze container before running extraction. These providers have source
code in the repo but aren't pre-installed in the CI image, so
extract_parameters.py couldn't discover their classes at runtime.
2. backfill: Run extract_versions.py as a first step to produce
metadata.json from git tags. Without metadata.json, Eleventy skips
generating version pages — so backfilled parameters/connections data
was invisible on the site.