This is the next stage of refactoring of airflow packages, after
moving providers to standalone dstribution and separating devel-common
as a common distribution.
The `task_sdk` has been renamed to `task-sdk` - this way we will
never import anything in task_sdk accidentally starting from content
root. Some changes have been needed to make it works:
* autouse fixture was added to pytest plugin to add `task-sdk/tests`
to PYTHONPATH to make it root import
* all tests were moved to `task_sdk` package inside the tests folder
* all imports for tests are now `from task_sdk`
* common tools for task_sdk has been moved to
`devel-common/src/test_utils/task_sdk.py` in order to allow importing
them before `task-sdk/tests` is added to pythonpath
This is provider package similar to those and is intended to be an abstraction over Apache Kafka, Amazon SQL, and Google PubSub to begin with. It can then be expanded to other messaging systems based on community adoption.
The initial goal with this is to provide a simple abstraction for integrating Event Driven Scheduling coming with Airflow 3 to message notification systems such as Kafka, currently being used to publish data availability.
---------
Co-authored-by: vincbeck <vincbeck@amazon.com>
Co-authored-by: Vincent <97131062+vincbeck@users.noreply.github.com>
Co-authored-by: Jed Cunningham <66968678+jedcunningham@users.noreply.github.com>
This uses a similar approach to the DAG Parser -- the subprocess runs the async
Triggers (i.e. user code) in a process and sends messages back and forth to the
supervisor/parent to perform CRUD operations on the DB.
I have also massively re-worked how per-trigger logging works to greatly
simplify it. I hope @dstandish will approve. The main way it has been
simplified is with the switch to TaskSDK then _all_ (100%! Really) of logs are
set as JSON over a socket to the parent process; everything in the subprocess
logs to this output, there is no differentiation needed in stdlib, no custom
handlers etc. and by making use of structlog's automatic context vars we can
include a trigger_id field -- if we find that we route the output to the right
trigger specific log file.
This is all now so much simpler with structlog in the mix.
Logging from the async process works as follows:
- stdlib logging is configured to send messages via struct log as json
- As part of the stdlib->structlog processing change we include structlog bound
contextvars
- When a triggerer coro starts it binds `trigger_id` as a paramter
- When the Supervisor receives a log message (which happens as LD JSON over a
dedicated socket channel) it parses the JSON, and if it finds `trigger_id`
key in there it redirects it to the trigger file log, else prints it.
Of note: I haven't allowed triggers to directly access or set Xcom, Variables
etc. We can add that in future if there is demand.
There was a lot of code and references to old provider ways of handling
old structure of providers. Once all providers have been moved, we can
now remove that old code and rename old the "new_providers" references
to just "providers"
This is a set of cleanup steps (first stage) that allow us to remove
the "intermediate" provider's distribution from Airlfow code and replace
it fully with individual provider's distributions - already with own
`pyproject.toml` files and basically being (when we complete) a
completely separate distributions from Airflow and without implicit
dependencies between unrelated distributions.
There are a number of other changes needed but that one is only focusing
on removing all references to the "umbrella" `providers` distribution
and consequences of removing it.
Those are the changes implemented in this PR:
* There are no separate "providers" system tests - each provider has
own system tests and there are no common "generic" providers empty
system test
* Integration tests are moved to respective providers under the
`integration` package inside `tests` directory
* (nearly) empty __init__.py files are added in `tests` directories
of providers - this way "tests" becomes just a directory and root
for all tests per provider, rather than a Python package on its own.
That allows to use "from integration.PROVIDER import" and
"from system.PROVIDER" rather than importing them from the root of
the whole airflow project. The (nearly) is because we need to
handle multiple "system", "system.apache" and other import locations.
* Removed references to "providers/" generic package which were
scheduled for removal after all providers are moved to the new
structure
* Few remaining references / links referring to old "providers/src" and
"providers/test" have been fixed.
* The "conftest.py" files in all providers are trimmed down - the code
to store ignored deprecation warnings have been moved to the
test_common pytest_plugin. That allows to remove 90+ duplicated
snippets of deprecation_warnings retrieval while keeping the warnings
per-provider in the provider's distribution.
* The "moving_providers" scripts are removed. They've done their job and
are not needed any more - we keep them in history
* The __init__.py files are automatically checked and properly updated
in provider folders - in order to properly handle path extension
mechanisms
* The www tests that were using FAB permisssion model are moved to the
FAB provider tests.
So far cleaning airflow installation only happened in canary runs
and it caused some PRs not failing when they should - for exmaple
the #45294 was green when it should fail because uuid6 package was
not removed before installing old version of Airlfow.
Cleaning airflow installation is fast with uv so we should be
ok with running it always for compatibility tests.
* Update provider.yaml
* renaming files
* moving microsoft.azure provider files to new folder structure
* moving microsoft.azure provider files to new folder structure
* Fixing breeze unit test params
* Static checks fixes
* Remove ms azure ignore from pytest args
* Add pattern match to leave out ms azure providers
* Fix pytest args for test types
* Removed dead code
* Added pytest fixtures to conftest.py
* Added powerbi pytest fixture to conftest.py
* Fixes for failing tests
* Fixes for failing tests
* Moved conftest.py
* Fix for failing compat and provider tests
* Move conftest.py to providers/microsoft/azure/tests/ and associated imports changes
* Path changes to read resource files correctly
* Removed microsoft from providers/tests
* fixup! Removed microsoft from providers/tests
---------
Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
* Update provider schema to include excluded-python-versions
The provider schema previously supported this field but it was dropped
during the provider migration project. Cloudant is the only provider at
the time of writing which makes use of this mechanism
* Migrate Cloudant provider to new structure
* Updates from PR feedback
* Add wildcard for all patch releases of an excluded python version
If it is not already present add a wild card for all patch releases of
an excluded python version
* Fix install_airflow script to remove cloudant on 3.9
* Remove more spaces
* Remove the fix for version exclusions
With functional version exclusion in requires-python working, many
things in our CI are not setup to support this. So go back to "soft"
or non-functional exclusions. This will at least update the classifiers
The #46358 moved docker to another mounted directory - but this
directory and all files in it are owned by host user. The directory
and all files inside should be owned by root in order to properly
reflect permissions of the files when building docker images.
The change is now simplified. Rather than passing mount directory
by variable and passing it through GitHub Actions, we hard-code
the location of docker in cleanup_docker.sh script - we also
incorporate changing ownership and showing disk space in the same
cleanup_docker.sh script and make sure that script is only called
in the "real" (not composite) actions at the beginning - right
after the repository is checked out - previously that script
was also called in composite actions and changing the repo to be
writeable was done AFTER cleanup_docker.sh - which would not
work as we want the /mnt directory to be still owned by the
host user, but the docker storage should be still owned by root.
* init run on google and fix mypy issues
* fix tests
* replace providers.tests.system.google with providers.google.tests.google
* add example dag's path for new structure
* fix paths in docs
* fix import path for failing test in test_dataprep_system
* fix DEFAULT_GCP_SYSTEM_TEST_PROJECT_ID import
* update google test paths
* replace providers/tests/google with providers/google/tests/provider_tests/google
* remove test_get_package_extras_for_old_providers
* fix test_folders_for_parallel_test_types test
* fix test_pytest_args_for_regular_test_types
* replace providers/tests/system/google/ with providers/google/tests/system/google
* update example_dirs in example_not_excluded_dags
* fix glob path in ExampleCoverageTest.example_paths
* fix providers_prefix
* fix serialization tests
* fix
* add filterwarnings to test_bigquery and test_kubernetes_engine modules
* add explicit info logging to test_dataflow.py
* add explicit info logging to test_datafusion.py
* add explicit info logging to test_datafusion.py
* add explicit info logging to failing caplog tests
* enable info logging
* add tmp.tar.gz
* refactor: Move microsoft providers (except azure) to new location
* refactor: Move test for microsoft providers to new location
* refactor: Move docs for microsoft providers to new location
* refactor: Created pyproject.toml for mssql
* refactor: Created pyproject.toml for psrp
* refactor: Created pyproject.toml for winrm
* refactor: fixed names in psrp and winrm pyproject.toml
* refactor: Added README.rst to microsoft mssql provider
* refactor: Removed CHANGELOG.rst from microsoft mssql provider
* refactor: Removed CHANGELOG.rst and added README.rst in microsoft psrp and winrm providers
* Revert "refactor: Removed CHANGELOG.rst and added README.rst in microsoft psrp and winrm providers"
This reverts commit a2037244a6.
* Revert "refactor: Removed CHANGELOG.rst from microsoft mssql provider"
This reverts commit 1263fa86e9.
* Revert "refactor: Added README.rst to microsoft mssql provider"
This reverts commit d75c9ff936.
* Revert "refactor: fixed names in psrp and winrm pyproject.toml"
This reverts commit 21807280ec.
* Revert "refactor: Created pyproject.toml for winrm"
This reverts commit b26b5888e9.
* Revert "refactor: Created pyproject.toml for psrp"
This reverts commit c815cb89f5.
* Revert "refactor: Created pyproject.toml for mssql"
This reverts commit 675f31e856.
* Revert "refactor: Move docs for microsoft providers to new location"
This reverts commit ae0dce6ad3.
* Revert "refactor: Move test for microsoft providers to new location"
This reverts commit 9a5e4f6e2e.
* Revert "refactor: Move microsoft providers (except azure) to new location"
This reverts commit c95f04168d.
* refactor: Moved microsoft mssql provider
* refactor: Moved microsoft psrp provider
* refactor: Fixed load of replace.sql for assertion in test_generate_insert_sql
* refactor: Changed name for filename in load_file method
* refactor: Reorganized imports
---------
Co-authored-by: David Blain <david.blain@infrabel.be>
* ci(github-acitons): add script to check significant newsfragments
* docs(newsfragments): fix incorrect format
* docs(newsfragments): upgrade api-66.significant format
* ci(pre-commit): add x mod to significant_newsfragments_checker.py
* Triage disk space issues, DO NOT MERGE
* Triage disk space issues, DO NOT MERGE
* Triage disk space issues, DO NOT MERGE
* Triage disk space issues, DO NOT MERGE
* Triage disk space issues, DO NOT MERGE
* Triage disk space issues, DO NOT MERGE
* Triage disk space issues, DO NOT MERGE
* Move docker storage to second drive in general
* Cleanup debug and triaging stuff
* Exception of docker volume for constraints building