Devlist Discussion: https://lists.apache.org/thread/7n4pklzcc4lxtxsy9g69ssffg9qbdyvb
A static-site provider registry for discovering and browsing Airflow providers and their modules. Deployed at `airflow.apache.org/registry/` alongside the existing docs infrastructure (S3 + CloudFront).
Staging preview: https://airflow.staged.apache.org/registry/
## Acknowledgments
Many of you know the [Astronomer Registry](https://registry.astronomer.io), which has been the go-to for discovering providers for years. Big thanks to **Astronomer** and @josh-fell for building and maintaining it. This new registry is designed to be a community-owned successor on `airflow.apache.org`, with the eventual goal of redirecting `registry.astronomer.io` traffic here once it's stable. Thanks also to @ashb for suggesting and prototyping the Eleventy-based approach.
## What it does
The registry indexes all 99 official providers and 840 modules (operators, hooks, sensors, triggers, transfers, bundles, notifiers, secrets backends, log handlers, executors) from the existing
`providers/*/provider.yaml` files and source code in this repo. No external data sources beyond PyPI download stats.
**Pages:**
- **Homepage** — search bar (Cmd+K), stats counters, featured and new providers
- **Providers listing** — filterable by lifecycle stage (stable/incubation/deprecated), category, and sort order (downloads, name, recently updated)
- **Provider detail** — module counts by type, install command with extras/version selection, dependency info, connection builder, and a tabbed module browser with category sidebar and per-module search
- **Explore by Category** — providers grouped into Cloud, Databases, Data Warehouses, Messaging, AI/ML, Data Processing, etc.
- **Statistics** — module type distribution, lifecycle breakdown, top providers by downloads and module count
- **JSON API** — `/api/providers.json`, `/api/modules.json`, per-provider endpoints for modules, parameters, and connections
**Connection Builder** — pick a connection type (e.g. `aws`, `redshift`), fill in the form fields with placeholders and sensitivity markers, and export as URI, JSON, or environment variable format. Fields are
extracted from provider.yaml connection metadata.
Lineage collection happens exclusively during task execution and is only used by worker processes. Server components such as the scheduler and API server do not use it. Move the lineage module from airflow-core to the task SDK to better align with the ongoing client–server separation.
* Adding Japanese translations for UI
* Response to comments on PR#59313. Updated CODEOWNERS, added missing translations, removed extra translations, and not keeping Providers as it is (in English).
* Add Thai UI translation
* Update code owners
* Add another translator
* Revise text to sound more polite
* Add missing keys and remove some
* Improve Thai translations for better clarity and understanding
* Add @potiuk as sponsor and @Srabasti to the comment
---------
Co-authored-by: blackbass64 <athibet.pi@gmail.com>
* feat(i18n): add Chinese simplified support
* Feat (i18n): Added Simplified Chinese support
* Restore the latest pyproject.toml
* Updated Simplified Chinese translation, added calendar-related translation and operator translation, and fixed some existing translation content.
* Update the CODEOWNERS file to add ownership claims for the Simplified Chinese translation directory.
* 1.Update the international configuration order for Simplified and Traditional Chinese.
2.Fixed Simplified Chinese language code owner information to be empty.
* Adjust the location of the owner description of the Chinese Simplified and Traditional translation file to optimize the description
* Update airflow-core/src/airflow/ui/public/i18n/locales/zh-CN/assets.json
Co-authored-by: Guangyang Li <2060045+gyli@users.noreply.github.com>
* Update airflow-core/src/airflow/ui/public/i18n/locales/zh-CN/assets.json
Co-authored-by: Guangyang Li <2060045+gyli@users.noreply.github.com>
* Update the Chinese localization file, fix the translation of "Add new task to queue", and optimize the sentence structure of "DAG not yet in collection".
* Update the simplified Chinese translations of DAG to Dag and DAGs to Dags according to https://github.com/apache/airflow/pull/55099. Update some hard-to-understand translations.
* Update the Simplified Chinese translation to optimize the translation related to "login" and "backfill" to make it more in line with user habits.
* Updated the code owner of the Simplified Chinese translation, adding @gyli as a collaborator for the zh-CN language.
* Optimize the Simplified Chinese translation and unify the terms related to "data backfill" to "backfill" to improve user understanding and experience.
* Apply suggestion from @potiuk
---------
Co-authored-by: fortytwo <fortytwo@example.com>
Co-authored-by: Guangyang Li <2060045+gyli@users.noreply.github.com>
Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
This change continues the work that was started in Airflow 3.0 and AIP-72 to
use Structlog in place of stdlib logging in Airflow.
The primary change here is to make LoggingMixin return a customized
structlogger; customized to maintain surface compatability with logging.Logger
(surface meaning things like handlers and filters aren't preserved) and to
then capture/redirect all logging via stdlib via Structlog processors.
This is the first step in allowing all Airflow components to be able to
produce JSON logs natively.
Things of note in this implementation:
- We have a customized structlog filtering logger that has a "per-logger-tree"
level concept.
This is implemented using a prefix trie[1] to lookup the logging level to
efficiently be able to look up the level for child loggers when configured
only at the parent level.
- We have a custom `PercentFormatRender` class that renders non-JSON logs that
understands the stdlib style format strings, meaning users custom logging
config will be respected again for the daemon components.
(Note though: this won't help with Task logs, as those are always JSON and
the UI does the rendering of those.)
- There is no longer a need for a different log format for colored and plain -- using
color format specifiers (`%(blue)s`, `%(log_level)s` etc) when colors are
disabled/not available will output nothing in their place.
- Introduce an mechanism for users to easily set the log level for loggers --
for instance if you are debugging the scheduler, it would be nice to be able
to set the `airflow.jobs.scheduler_job_runner` logger to DEBUG while keeping
everything else at info.
[1]: https://en.wikipedia.org/wiki/Trie
The reason for not using caplog has gone away with the switch to structlog,
and we already override the builtin `caplog` fixture to use our structlog
version
The pre-commit is a fantastic tool, and we heavily used it for years,
but generally the tool stagnated and is not showing a sign of adapting
to our needs. For years we tried to convince pre-commit maintainers that
things like autocomplete are necessary - but it met with pretty much
resistance (if not hostility) from the maintainer. Also there was no
chance for them to accept expectations of bigger projects like ours,
where we have a huge monorepo and not only multiple needs but also
different parts of the repo needing different language support (golang,
typescript soon) - and apparenty the maintainer of pre-commit does not
think monorepo is a good thing at all. Similarly - they did not recognize
the raise of `uv` and the only way to use `uv` with pre-commit is to patch
it by installing `pre-comit-uv` that essentialy patches pre-commit with
uv support. This is not really sustainable and the tool lags behind many
of our needs.
Luckily - we have new project in town - prek - which rewrites pre-commit
that is 100% compatible (now), 10x faster (because rust), uses `uv`
natively, supports auto-complete already and they have very friendly
maintainer who is not only supporting us but also very happily works
on improving `prek` to close all the gaps, and plans to implement (with
our support of course and cooperation) monorepo support - that will
allow us to modularise our pre-commits.
This PR switches our pre-commit support to use prek exclusively:
* breeze static checks command is completely removed
* custom auto-complete code in breeze as well
* instructions are updated to setup prek instead of precommit
* CI is updated to run prek instead of pre-commmit
* documentation for static checks is reviewed and new features that
prek enables are added
* add spanish translations + codeownership cfg
* fix typos
* nit: more natural message
* rm space
* Update .github/CODEOWNERS to add bbovenzi as codeowner
Co-authored-by: Jens Scheffler <95105677+jscheffl@users.noreply.github.com>
* improve wording and fix typo
* replace fecha wording to a standard
* more inclusive text
* welcome message more inclusive
---------
Co-authored-by: Jens Scheffler <95105677+jscheffl@users.noreply.github.com>
* Feat(i18n): Move translation files to public, use i18next-http-backend
* Adopted i18next-http-backend
* Fix(i18n): Update locales directory path to public for translation files
* Fix(i18n): Update static file path for translation locales
* Fix(i18n): Update translation config file path
* Fix(i18n): Update ESLint configuration to include jsonc-parser
* Fix(i18n): Update ESLint file patterns and add 'components' to namespaces
* Fix(i18n): Initialize i18n(en) in DagCard tests
* chore: enforce consistent array type syntax per @typescript-eslint rule
* Revert "chore: enforce consistent array type syntax per @typescript-eslint rule"
This reverts commit d218738c4e5cba091eae1f1161367116836c23ff.
* Fix(i18n): Ignore i18n locale files in ESLint TypeScript rules
* Fix(tests): Remove unused imports and clean up test setup
* Apply pre-commit formatting to openapi-gen/ files as per #51755
* Fix(eslint): Update i18n and TypeScript rules configuration
* Apply ESLint and i18nRule
* Mov Arabic and French translations to public
* Fix: Update translation file paths to public directory
* Moved i18n doc and completeness checker script from public to dev/i18n
* fix(i18n): i18n policy document translation completeness check command
* tests passed
* narrow down CI
* fix host
* add drill for testing
* add docker inspect
* create a funtion
* minor changes
* change the inspect
* add status condition
* change host to zeros
* rewrite the pipeline
* push image and reuse
* remove tag
* rename tag
* change to locally pushed image
* add sha to image name
* add sha to image name
* add debug and login
* change token name
* change token name
* remove login
* remove login
* remove login
* move token to hight level
* copy from ci
* change to secrets
* test workflow
* test workflow
* rerun on CI login
* rerun on CI login
* logout
* revert logout
* revert login
* remove cleanup
* readd cleanup
* change docker pass
* add login to action
* move up
* chage token
* one job
* one job
* remove action
* delete if condition
* rename image
* fix tar name
* fix tar name
* fix tar name
* remove status
* remove stdin
* add logs and hostname
* removed restart
* add chmod
* create entry sh
* create entry sh
* add status
* remove failure
* add user
* revert back all ci yamls
* revert back shell script
* fix tinkerpop
* change back to gremlin in the providers list
* change docs
* change docs
* add conn_name_attr
* move system test
* change to cap
* change to 1.0.0
* fix sh
* removed serializer and some changes
* fixing docs
* update provider
* change to 2.9.0 and add spell check
* ran breeze release management
* add doc strings
* add asterisk
* moved operators doc
* fixed docs
* fixed pyproject
* minor change to docs
* add conf.py
* fixed docs
* fix toml
* changed to 2.10
* remove fab from tinkerpop
* fix prov info
* add close method
* fix integration
* change gremlin host
* change gremlin host back
* remove async
* remove package
* fix pyproject
* fix test
---------
Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
While no-one likes the name, we currently cannot use the name
"apache-airflow-providers-edge" and since all our tooling assumes
naming convention matching provider packages this is the fastest
way we can get the name that is acceptable by PyPI.
The lazy consensus decision has been made at the devlist to switch
entirely to `uv` as development tool:
link: https://lists.apache.org/thread/6xxdon9lmjx3xh8zw09xc5k9jxb2n256
This PR implements that decision and removes a lot of baggage connected
to using `pip` additionally to uv to install and sync the environment.
It also introduces more consistency in the way how distribution
packages are used in airflow sources - basicaly switching all internal
distributions to use `pyproject.toml` approach and linking them all
together via `uv`'s workspace feature.
This enables much more streamlined development workflows, where any
part of airflow development is manageable using `uv sync` in the right
distribution - opening the way to moving more of the "sub-worfklows"
from the CI image to local virtualenv environment.
Unfortunately, such change cannot be done incrementally, really, because
any change in the project layout drags with itself a lot of changes
in the test/CI/management scripts, so we have to implement one big
PR covering the move.
This PR is "safe" in terms of the airflow and provider's code - it
does not **really** (except occasional imports and type hint changes
resulting from better isolation of packages) change Airflow code nor
it should not affect any airflow or provider code, because it does
not move any of the folder where airflow or provider's code is modified.
It does move the test code - in a number of "auxiliary" distributions
we have. It also moves the `docs` generation code to `devel-common`
and introduces separate conf.py files for every doc package.
What is still NOT done after that move and will be covered in the
follow-up changes:
* isolating docs-building to have separate configuraiton for docs
building per distribution - allowing to run doc build locally
with it's own conf.py file
* moving some of the tests and checks out from breeze container
image up to the local environment (for example mypy checks) and
likely isolating them per-provider
* Constraints are still generated using `pip freeze` and automatically
managed by our custom scripts in `canary` builds - this will be
replaced later by switching to `uv.lock` mechanism.
* potentially, we could merge `devel-common` and `dev` - to be
considered as a follow-up.
* PROD image is stil build with `pip` by default when using
`PyPI` or distribution packages - but we do not support building
the source image with `pip` - when building from sources, uv
is forced internally to install packages. Currently we have
no plans to change default PROD building to use `uv`.
This is the detailed list of changes implemented in this PR:
* uv is now mandatory to install as pre-requisite in order to
develop airflow. We do not support installing airflow for
development with `pip` - there will be a lot of cases where
it will not work for development - including development
dependencies and installing several distributions together.
* removed meta-package `hatch_build.py' and replacing it with
pre-commit automatically modifying declarative pyproject.toml
* stripped down `hatch_build_airflow_core.py` to only cover custom
git and asset build hooks (and renaming the file to `hatch_build.py`
and moving all airflow dependencies to `pyproject.toml`
* converted "loose" packages in airflow repo into distributions:
* docker-tests
* kubernetes-tests
* helm-tests
* dev (here we do not have `src` subfolder - sources are directly
in the distribution, which is for-now inconsistent with other
distributions).
The names of the `_tests` distribution folders have been renamed to
the `-tests` convention to make sure the imports are always
referring to base of each distribution and are not used from the
content root.
* Each eof the distributions (on top of already existing airflow-core,
task-sdk, devel-common and 90+providers has it's own set of
dependencies, and the top-level meta-package workspace root brings
those distributions together allowing to install them all tegether
with a simple `uv sync --all-packages` command and come up with
consistent set of dependencies that are good for all those
packages (yay!). This is used to build CI image with single
common environment to run the tests (with some quirks due to
constraints use where we have to manually list all distributions
until we switch to `uv.lock` mechanism)
* `doc` code is moved to `devel-common` distribution. The `doc` folder
only keeps README informing where the other doc code is, the
spelling_wordlist.txt and start_docs_server.sh. The documentation is
generated in `generated/generated-docs/` folder which is entirely
.gitignored.
* the documentation is now fully moved to:
* `airflow-core/docs` - documentation for Airflow Core
* `providers/**/docs` - documentation for Providers
* `chart/docs` - documentation for Helm Chart
* `task-sdk/docs` - documentation for Task SDK (new format not yet published)
* `docker-stack-docs` - documentation for Docker Stack'
* `providers-summary-docs` - documentation for provider summary page
* `versions` are not dynamically retrieved from `__init__.py` all
of them are synchronized directly to pyproject.toml files - this
way - except the custom build hook - we have no dynamic components
in our `pyproject.toml` properties.
* references to extras were removed from INSTALL and other places,
the only references to extras remains in the user documentation - we
stop using extras for local development, we switch to using
dependency groups.
* backtracking command was removed from breeze - we did not need it
since we started using `uv`
* internal commands (except constraint generation) have been moved to
`uv` from `pip`
* breeze requires `uv` to be installed and expects to be installed by
`uv tool install -e ./dev/breeze`
* pyproject.tomls are dynamically modified when we add a version
suffix dynamically (`--version-suffix-for-pypi`) - only for the
time of building the versions with updated suffix
* `mypy` checks are now consistently used across all the different
distributions and for consistency (and to fix some of the issues
with namespace packages) rather than using "folder" approach
when running mypy checks, even if we run mypy for whole
distribution, we run check on individual files rather than on
a folder. That adds consistency in execution of mypy heursistics.
Rather than using in-container mypy script all the logic of
selection and parameters passed to mypy are in pre-commit code.
For now we are still using CI image to run mypy because mypy is
very sensitive to version of dependencies installed, we should
be able to switch to running mypy locally once we have the
`uv.lock` mechanism incorporated in our workflows.
* lower bounds for dependencies have been set consistently across
all the distributions. With `uv sync` and dependabot, those
should be generally kept consistently for the future
* the `devel-common` dependencies have been groupped together in
`devel-common` extras - including `basic`, `doc`, `doc-gen`, and
`all` which will make it easier to install them for some OS-es
(basic is used as default set of dependencies to cover most
common set of development dependencies to be used for development)
* generated/provider_dependencies.json are not committed to the
repository any longer. They are .gitignored and geberated
on-the-flight as needed (breeze will generate them automatically
when empty and pre-commit will always regenerate them to be
consistent with provider's pyproject.toml files.
* `chart-utils` have been noved to `helm-tests` from `devel-common`
as they were only used there.
* for k8s tests we are using the `uv` main `.venv` environment
rather than creating our own `.build` environment and we use
`uv sync` to keep it in sync
* Updated `uv` version to 0.6.10
* We are using `uv sync` to perform "upgrade to newer depencies"
in `canary` builds and locally
* leveldb has been turned into "dependency group" and removed from
apache-airflow and apache-airflow-core extras, it is now only
available by google provider's leveldb optional extra to install
with `pip`
* Include unit-testing into CI and breeze, add distribution pieces
* Merge task-sdk and airflow-ctl test workflow and can be extended for each non-core distro, update release management doc packages to distributions
* Remove no needed comment
* Remove duplicate ISSUE_MATCH_IN_BODY definition, unify non-core release logic and include airflowctl release method in release_management_commands.py, create DistributionPackageBuildType for identifying dist name
* Update dev/breeze/doc/05_test_commands.rst
Co-authored-by: LIU ZHE YOU <68415893+jason810496@users.noreply.github.com>
* Fix dash problem
* Remove not used vars from ci.yml
* Update breeze selective check tests
* Update breeze selective check tests, fix typo in release_management_commands.py, fix pre-commit naming in mypy, fix dist naming
* Fix pre-commit hook, fix dist path for release_management_commands.py, fix breeze test
* add airflowctl to mypy_folder.py, include __init__.py to airflowctl, include into missing scripts for installation and release, pre-commit adjustment, files are moved to src/airflow/ctl structure to fit into generic structure, include airflow-ctl into .dockerignore,
* Remove uv workspaces for now which preventing ci image to be built
* Fix airflow-ctl workspace and include devel-common again along with pytest_plugins to make breeze testing work
* Revert provider yaml workspace changes
* Remove bespoke handle of provider.toml and remove airflow-ctl from provider.toml template
* Move back distribution name to airflowctl, update CI logic to more dynamic via inputs for non-core distributions
* Fix path in mypy, remove not needed __init__.py and duplicate conftest in tests
* Remove airflow-ctl from providers test
---------
Co-authored-by: LIU ZHE YOU <68415893+jason810496@users.noreply.github.com>
With the remote commands being added to a separate distribution, we no
longer need to distinguish between local and remote commands in core -
they are all local commands!
This is continuation of the separation of the Airflow codebase into
separate distributions. This one splits airflow into two of them:
* apache-airflow - becomes an empty, meta no-code distribution that
only has dependencies to apache-airflow-core and task-sdk
distributions and it has preinstalled provider distributions
added in standard "wheel" distribution. All "extras" lead
either to "apache-airflow-core" extras or to providers - the
dependencies and optional dependencies are calculated differently
depending on "editable" or "standard" mode - in editable mode,
just provider dependencies are installed for preinstalled providers
in standard mode - those preinstalled providers are dependencies.
* the apache-airflow-core distribution contains all airflow core
sources (previously in apache-airflow) and it has no provider
extras. Thanks to that apache-airflow distribution does not
have any dynamically calculated dependencies.
* the apache-airflow-core distribution hs "hatch_build_airflow_core.py"
build hooks that add custom build target and implement custom
cleanup in order to implement compiling assets as part of the build.
* During the move, the following changes were applied for consistency:
* packages when used in context of distribution packages have been
renamed to "distributions" - including all documentations and
commands in breeze to void confusion with import packages
(see
https://packaging.python.org/en/latest/discussions/distribution-package-vs-import-package/)
* all tests in `airflow-core` follow now the same convention
where tests are in `unit`, `system` and `integration` package.
no extra package has been as second level, because all the
provider tests have "<PROVIDER>" there, so we just have to avoid
naming airflow unit."<PROVIDER>" with the same name as provider.
* all tooling in CI/DEV have been updated to follow the new
structure. We should always build to packages now when we
are building them using `breeze`.
closes https://github.com/apache/airflow/issues/47499
`TriggerDagRunOperator` requires direct DB access to trigger DAG runs, which is not feasible under AIP-72 / Task SDK. Several approaches were considered:
1. **New Airflow API provider**
- Move `TriggerDagRunOperator` to a new provider built on top of the HTTP provider and Airflow API.
- This would allow DAG authors to configure a dedicated Airflow connection.
- However, the Public API currently requires API tokens and does not support username/password authentication, making setup cumbersome.
2. **Pass Task Token to API**
- Use Task Token from the controller DAG to authenticate API calls.
- This would allow triggering DAGs in the same deployment using the execution API.
- However, Task-identity tokens have not been implemented yet.
3. **Handle via Task Execution API (Chosen Approach)**
- Raise a `TriggerDagRunOperator` exception.
- The Task Runner catches this and invokes a new endpoint in the Task Execution API to trigger the target DAG.
Since (2) is not yet available, this PR/commit implements (3) as a temporary solution.
This is the next stage of refactoring of airflow packages, after
moving providers to standalone dstribution and separating devel-common
as a common distribution.
The `task_sdk` has been renamed to `task-sdk` - this way we will
never import anything in task_sdk accidentally starting from content
root. Some changes have been needed to make it works:
* autouse fixture was added to pytest plugin to add `task-sdk/tests`
to PYTHONPATH to make it root import
* all tests were moved to `task_sdk` package inside the tests folder
* all imports for tests are now `from task_sdk`
* common tools for task_sdk has been moved to
`devel-common/src/test_utils/task_sdk.py` in order to allow importing
them before `task-sdk/tests` is added to pythonpath