While we have some discussion on-going whether we should use some
shorter, machine-readable friendly versions of licence specification
in our source code headers here [1], the notion is that:
a) PMC can make judgment calls when to include different versions of
the licence
b) This expectation only applies to the code we actually release
in our official releases.
This change makes some judgment call on using much shorter, SPDX
driven licence headers in some specific files:
* markdown files that are intended to be consumed by agents
(AGENTS.md, SKILLS.md, CLAUDE.md and so on)
* all the markdown .github/* files that are clearly meta-data for
GitHub and which we exclude from released sources
We also make sure all those files are excluded from the official
source releases and distribution packages we prepare.
[1] https://lists.apache.org/thread/j1tn63r2lf13v3d1tnnqff8fkcl4nx53
The lazy consensus decision has been made at the devlist to switch
entirely to `uv` as development tool:
link: https://lists.apache.org/thread/6xxdon9lmjx3xh8zw09xc5k9jxb2n256
This PR implements that decision and removes a lot of baggage connected
to using `pip` additionally to uv to install and sync the environment.
It also introduces more consistency in the way how distribution
packages are used in airflow sources - basicaly switching all internal
distributions to use `pyproject.toml` approach and linking them all
together via `uv`'s workspace feature.
This enables much more streamlined development workflows, where any
part of airflow development is manageable using `uv sync` in the right
distribution - opening the way to moving more of the "sub-worfklows"
from the CI image to local virtualenv environment.
Unfortunately, such change cannot be done incrementally, really, because
any change in the project layout drags with itself a lot of changes
in the test/CI/management scripts, so we have to implement one big
PR covering the move.
This PR is "safe" in terms of the airflow and provider's code - it
does not **really** (except occasional imports and type hint changes
resulting from better isolation of packages) change Airflow code nor
it should not affect any airflow or provider code, because it does
not move any of the folder where airflow or provider's code is modified.
It does move the test code - in a number of "auxiliary" distributions
we have. It also moves the `docs` generation code to `devel-common`
and introduces separate conf.py files for every doc package.
What is still NOT done after that move and will be covered in the
follow-up changes:
* isolating docs-building to have separate configuraiton for docs
building per distribution - allowing to run doc build locally
with it's own conf.py file
* moving some of the tests and checks out from breeze container
image up to the local environment (for example mypy checks) and
likely isolating them per-provider
* Constraints are still generated using `pip freeze` and automatically
managed by our custom scripts in `canary` builds - this will be
replaced later by switching to `uv.lock` mechanism.
* potentially, we could merge `devel-common` and `dev` - to be
considered as a follow-up.
* PROD image is stil build with `pip` by default when using
`PyPI` or distribution packages - but we do not support building
the source image with `pip` - when building from sources, uv
is forced internally to install packages. Currently we have
no plans to change default PROD building to use `uv`.
This is the detailed list of changes implemented in this PR:
* uv is now mandatory to install as pre-requisite in order to
develop airflow. We do not support installing airflow for
development with `pip` - there will be a lot of cases where
it will not work for development - including development
dependencies and installing several distributions together.
* removed meta-package `hatch_build.py' and replacing it with
pre-commit automatically modifying declarative pyproject.toml
* stripped down `hatch_build_airflow_core.py` to only cover custom
git and asset build hooks (and renaming the file to `hatch_build.py`
and moving all airflow dependencies to `pyproject.toml`
* converted "loose" packages in airflow repo into distributions:
* docker-tests
* kubernetes-tests
* helm-tests
* dev (here we do not have `src` subfolder - sources are directly
in the distribution, which is for-now inconsistent with other
distributions).
The names of the `_tests` distribution folders have been renamed to
the `-tests` convention to make sure the imports are always
referring to base of each distribution and are not used from the
content root.
* Each eof the distributions (on top of already existing airflow-core,
task-sdk, devel-common and 90+providers has it's own set of
dependencies, and the top-level meta-package workspace root brings
those distributions together allowing to install them all tegether
with a simple `uv sync --all-packages` command and come up with
consistent set of dependencies that are good for all those
packages (yay!). This is used to build CI image with single
common environment to run the tests (with some quirks due to
constraints use where we have to manually list all distributions
until we switch to `uv.lock` mechanism)
* `doc` code is moved to `devel-common` distribution. The `doc` folder
only keeps README informing where the other doc code is, the
spelling_wordlist.txt and start_docs_server.sh. The documentation is
generated in `generated/generated-docs/` folder which is entirely
.gitignored.
* the documentation is now fully moved to:
* `airflow-core/docs` - documentation for Airflow Core
* `providers/**/docs` - documentation for Providers
* `chart/docs` - documentation for Helm Chart
* `task-sdk/docs` - documentation for Task SDK (new format not yet published)
* `docker-stack-docs` - documentation for Docker Stack'
* `providers-summary-docs` - documentation for provider summary page
* `versions` are not dynamically retrieved from `__init__.py` all
of them are synchronized directly to pyproject.toml files - this
way - except the custom build hook - we have no dynamic components
in our `pyproject.toml` properties.
* references to extras were removed from INSTALL and other places,
the only references to extras remains in the user documentation - we
stop using extras for local development, we switch to using
dependency groups.
* backtracking command was removed from breeze - we did not need it
since we started using `uv`
* internal commands (except constraint generation) have been moved to
`uv` from `pip`
* breeze requires `uv` to be installed and expects to be installed by
`uv tool install -e ./dev/breeze`
* pyproject.tomls are dynamically modified when we add a version
suffix dynamically (`--version-suffix-for-pypi`) - only for the
time of building the versions with updated suffix
* `mypy` checks are now consistently used across all the different
distributions and for consistency (and to fix some of the issues
with namespace packages) rather than using "folder" approach
when running mypy checks, even if we run mypy for whole
distribution, we run check on individual files rather than on
a folder. That adds consistency in execution of mypy heursistics.
Rather than using in-container mypy script all the logic of
selection and parameters passed to mypy are in pre-commit code.
For now we are still using CI image to run mypy because mypy is
very sensitive to version of dependencies installed, we should
be able to switch to running mypy locally once we have the
`uv.lock` mechanism incorporated in our workflows.
* lower bounds for dependencies have been set consistently across
all the distributions. With `uv sync` and dependabot, those
should be generally kept consistently for the future
* the `devel-common` dependencies have been groupped together in
`devel-common` extras - including `basic`, `doc`, `doc-gen`, and
`all` which will make it easier to install them for some OS-es
(basic is used as default set of dependencies to cover most
common set of development dependencies to be used for development)
* generated/provider_dependencies.json are not committed to the
repository any longer. They are .gitignored and geberated
on-the-flight as needed (breeze will generate them automatically
when empty and pre-commit will always regenerate them to be
consistent with provider's pyproject.toml files.
* `chart-utils` have been noved to `helm-tests` from `devel-common`
as they were only used there.
* for k8s tests we are using the `uv` main `.venv` environment
rather than creating our own `.build` environment and we use
`uv sync` to keep it in sync
* Updated `uv` version to 0.6.10
* We are using `uv sync` to perform "upgrade to newer depencies"
in `canary` builds and locally
* leveldb has been turned into "dependency group" and removed from
apache-airflow and apache-airflow-core extras, it is now only
available by google provider's leveldb optional extra to install
with `pip`
This PR moves the documentation of CI of ours to inside Breeze
doc folder and splits the documentation in separate docs / chapters
following similar changes done for Breeze docs #36936 and the
contributing docs #36969.
Following #36936 and the fact that GitHub stopped rendering big .rst
files, we also split CONTRIBUTING.rst into multiplet files. It will be
much easier to follow and it will render in GitHub.
Following #36726, #36744, #36763, #36819 this PR adds the feature of
making source tarball that we release as an official release of
the ASF for Helm Chart into reproducible tarball. This means that
anyone should be able to produce such tarball using the sources
of airflow and verify that he tarball pushed to SVN by the
release manager is built from our source repositories.
We also do the same with Helm package. It turns out that gpg signing
of the package does not modify the .tgz file - it just adds .prov file
containing checksum and signature, so we can safely re-pack the .tar.gz
package in a reproducible way, this way we have both reproduciblity and
provenance check nicely working together.
There are few changes in this PR that are related:
* Bumped Helm version in our environment to use the latest one and
using the `breeze k8s setup-env` environment to run all the release
commands - this way we can be sure same helm version is used to build
the package, further making it more reproducible.
* The reproducible packaging utility we have has been refeactored now -
we take "source" archive as parameter rather than directory and simply
repack it in reproducible way.
* The tool also applies group/other ownership removal on its own,
because helm package has no option to umask the generated files.
* In this change we also ignore subcharts from being exported to the source
tarball package as we shoudl not include source files from postgres in
our source package..
* Both - the tarball and helm package are generated in `dist` folder similarly as
all our other packages.
* Documentation for releasing the packages and verifying them is updated.
* CI jobs are updated to use the new commands and generated packages are
produced as artifacts so that we can be sure the commands continue
working and produce the right output.
The BREEZE.rst document became enormous - enough for it to stop being
rendered by GitHub. This change splits it into multiple smaller
documents - each focusing on a specific aspect of Breeze, making it
possible by the user to focus on only that aspect that the user
is interested at.
We also add nice index that guides the user to know about all the
aspects of Breeze.
This PR converts the Helm tests to use the new Python Breeze.
It has all the features of previous Kind breeze command and more.
All the commands are now grupped under k8s group and they are very
easy to use locally (even easier than the previous version).
The CI part is also converted and simplified - i.e. the upgrade
test is now much faster (only tests one upgrade per job and it
runs withing the original Helm/Kubernetes tests jobs so it will
not have the cluster creation overhead.
Most importantly - this is almost the last step before we can get
rid of the old legacy breeze code and one that we can get rid of
the `./breeze-legacy` script because all functionality from the
old breeze has been moved to the Python version with this change.
This removal allows us also to remove a lot of the common library
bash code that is not used any more anywhere - even in CI.
The only change left is running regular tests in parallel.
Closes: #23085
The labelling workflow has proven to be far less useful than we
thought and some of the recent changes in selective checks made
it largely obsolete. The committers can still add "full tests needed"
label when they think it is needed and there is no need to label
the PRs automatically for that (or any other reason).
For quite a while this workflow is basically a useless noise.