* allocating pseudo-terminal inside the python script creating
the images instead of trying to do it by docker compose run
* better diagnostics in case of error (verbosity handling)
* properly allocating console with forcing pesudo-terminal creation
inside the container when --tty command is used with breeze shell
via `enable-tty.yaml`
* upgrading prek + uv to latest versions
* a bit of refactoring how the docker-compose files are referrred to
* Console in the script also uses pseudo-terminal
- Update uv from 0.9.4 to 0.9.5
- Update ruff from 0.14.1 to 0.14.2
- Update mypy to 1.18.2
- Update Python to 3.12.12
- Update various other dependencies
Apparently the prek cache mechanism has been somewhat broken for a
while - after we split prek to monorepo. The hash files used to
determine prek-cache was different for save and restore step
(the `**/` has been missing in the save cache step. Which means
that we always failed to restore cache and created it from the
scratch.
Also, it seems that the prek cache-when prepared refers to the uv
version that is pre-installed for it in case uv is not installed
in the system. And it refers to the uv version when creating the
virtual environments used by prek, and we first attempted to
install prek and create cache, and only after we installed uv, which
had a side-effect that in some cases the installed venvs referred
to a missing python binary.
Finally - there is a bug in prek https://github.com/j178/prek/issues/918
that pygrep cache contains reference to a non-existing python binary
that should be run when pygrep runs.
Also it's possible that some of the cache installed in workspace by the
github worker remained, and we did not preemptively clean the cache when
we attempted to restore it and failed.
This PR attempts to restore the cache usage in a more robust way:
* fixed cache key on save to save cache with proper name
* added uv version to cache key for prek
* always install uv in desired version before installing prek
* if we faile to cache-hit and restore the cache, we clean-up
the .cache/prek folder
* we do not look at skipped hooks when installing prek and restoring
or saving cache. There is very little saving on some hooks and
since we are preparing the cache in "build-info" now - it's better
to always use the same cache, no matter if some checks are skipped
* upgraded to prek 0.2.10 that fixed the issue with pygrep cache
You can specify commit hash to apply when you build documentation.
This allows to regenerate past version of the documentation by
checking out the exact version tag that was used back then and
applying the commit hash with fixes.
This might help in fixing issues like
https://github.com/apache/airflow/issues/53646
It turns out that when we are installing Breeze we were using
the "Image" python version and not the 'default" python version
to install breeze, and Python 3.12 and 3.11 are not installed by
default when generate-constraints runs.
This change fixes this problem, also it changes the name of the
generate-constraints job to only show the python version used.
This is continuation of the separation of the Airflow codebase into
separate distributions. This one splits airflow into two of them:
* apache-airflow - becomes an empty, meta no-code distribution that
only has dependencies to apache-airflow-core and task-sdk
distributions and it has preinstalled provider distributions
added in standard "wheel" distribution. All "extras" lead
either to "apache-airflow-core" extras or to providers - the
dependencies and optional dependencies are calculated differently
depending on "editable" or "standard" mode - in editable mode,
just provider dependencies are installed for preinstalled providers
in standard mode - those preinstalled providers are dependencies.
* the apache-airflow-core distribution contains all airflow core
sources (previously in apache-airflow) and it has no provider
extras. Thanks to that apache-airflow distribution does not
have any dynamically calculated dependencies.
* the apache-airflow-core distribution hs "hatch_build_airflow_core.py"
build hooks that add custom build target and implement custom
cleanup in order to implement compiling assets as part of the build.
* During the move, the following changes were applied for consistency:
* packages when used in context of distribution packages have been
renamed to "distributions" - including all documentations and
commands in breeze to void confusion with import packages
(see
https://packaging.python.org/en/latest/discussions/distribution-package-vs-import-package/)
* all tests in `airflow-core` follow now the same convention
where tests are in `unit`, `system` and `integration` package.
no extra package has been as second level, because all the
provider tests have "<PROVIDER>" there, so we just have to avoid
naming airflow unit."<PROVIDER>" with the same name as provider.
* all tooling in CI/DEV have been updated to follow the new
structure. We should always build to packages now when we
are building them using `breeze`.
While this config worked to setup uv/no-uv, it has never been
really used. Instead in all places where `use-uv` parameter
is used, the option determines the default rather than config
on breeze setup level.
We are using various caches in our build and so far - due to the
way how "standard" caching works, PRs from forks could not effectively
use the cache from main Airflow repository - because caches are not
shared with other repositories - so the PRs builds could only
use cache effectively when they were rebased and continued running from
the same fork.
This PR improves caching strategy using "stash" action from the ASF.
Unlike `cache` - the action uses artifacts to store cache, and that
makes it possible for the stash action to use such cache uploaded from
`main` canary builds in PRs coming from the fork.
As part of this change all the places where setup-python was used
and breeze installed afterwards were reviewed and updated to use
only breeze installation action (it already installs python) and this
action has been improved to use UV caching effectively.
Overall this PR should decrease setup overhead for many jobs across
the CI workflow.
Follow-up after #45266
Using cache for breeze might cause various issues and it does not
really speed up the installation that significantly (installing
breeze is about 20 seconds and restoring cache and checking if
breeze is installed there is ~8 seconds, so we are savig some 10
seconds per build.
Removing cache will make breeze always runs in a clean state and
also it has less potential for potential cache-poisoning issues.
Since cache is shared among multiple workflows and runs, that is also
far safer option from security point of view.
The `breeze-python-version` has been used in a number of places
where we wanted to make sure of reproducibility of prepared
artifacts (because Python 3.8 produces different tar files).
However that caused some problems with caching of breeze
environments - where some of them were using Python 3.8 and some
were using Python 3.9.
This PR fixes it by:
* making Python 3.9 default for all breeze installations in CI
* removing the breeze-python-version parameter and all its usages
from all workflows
The tests of breeze (ib basic-tests.yml) are still run with the
default-python-version not with Python 3.9 (which currently is
Python 3.8) - so the risk that breeze will not work on Python
3.8 is low.
This will also synchronize itself in October, when Python 3.8 will
reach end-of-life and we will move to Python 3.9 as the default
version.
When building reproducible packages with Python 3.8 they are ...
not reproducible. The tarfile produces slightly different output
and packages are not binary identical.
This change forces anyone preparing reproducible package to have
breeze installed using Python 3.9+
In some circumstances, when breeze is installed in CI (when we
update to newer breeze version in "build-info" workflow in old
branches) breeze is not able to auto-detect sources it was installed
from.
This PR changes it by passing the sources via environment variable.