SIGN IN SIGN UP
apache / airflow UNCLAIMED

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

44796 0 0 Python
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
# syntax=docker/dockerfile:1.4
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
# THIS DOCKERFILE IS INTENDED FOR PRODUCTION USE AND DEPLOYMENT.
# NOTE! IT IS ALPHA-QUALITY FOR NOW - WE ARE IN A PROCESS OF TESTING IT
#
#
# This is a multi-segmented image. It actually contains two images:
#
# airflow-build-image - there all airflow dependencies can be installed (and
# built - for those dependencies that require
# build essentials). Airflow is installed there with
Switch from --user to venv for PROD image and enable uv (#37796) This PR introduces a joint way to treat the .local (--user) folder as both - venv and `--user` package installation. It fixes a number of problems the `--user` installation created us in the past and does it in fully backwards compatible way. This improves both "production" use for end user as well as local iteration on the PROD image during tests - but also for CI. Improvements for "end user": * user does not have to use `pip install --user` to install new packages any more and it is not enabled by default with PIP_USER flag. * users can use uv to install packages when they extend the image (but it's not obligatory - pip continues working as it did) * users can use `uv` to build custom production image, which gives 40%-50% saving for image build time compring to `pip`. * python -m venv --system-site-packages continues to use the .local packages from the .local installation (and not uses them if --system-site-packages is not used) - so we have full compatibility with previous images. Improvements for development: * when image is built from sources (no --use-docker-context-files are specified), airflow is installed in --editable mode, which means that airflow + all providers are installed locally from airflow sources, not from packages - which means that both airflow and providers have the latest version inside the prod image. * when local sources changes and you want to run k8s tests locally, it is now WAY faster (several minutes) to iterate with your changes because you do not have to rebuild the base image - the only thing needed is to copy sources to the PROD image to "/opt/airflow" which is where editable installlation is done from. You only need to rebuild the image if dependencies change. * By default `uv` is used for local source build for k8s tests so even if you have to rebuild it, it is way faster (60%-80%) during iterating with the image. CI/DEV tooling improvements: * this PR switches to use `uv` by default for most prod images we build in CI, but it adds a check if the image still builds with `pip`. * we also switch to more PEP standard way of installing packages from local filesystem (package-name @ file:///FILE) Fixes: #37785 Fixes: #37815 Update contributing-docs/testing/k8s_tests.rst Update contributing-docs/testing/k8s_tests.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update scripts/docker/install_airflow.sh Update docs/docker-stack/changelog.rst Update docs/docker-stack/build.rst Co-authored-by: Niko Oliveira <onikolas@amazon.com>
2024-03-06 01:27:15 +01:00
# ${HOME}/.local virtualenv which is also considered
# As --user folder by python when creating venv with
# --system-site-packages
#
# main - this is the actual production image that is much
# smaller because it does not contain all the build
# essentials. Instead the ${HOME}/.local folder
# is copied from the build-image - this way we have
# only result of installation and we do not need
# all the build essentials. This makes the image
# much smaller.
#
Switch to 'buildkit' to build Airflow images (#20664) The "buildkit" is much more modern docker build mechanism and supports multiarchitecture builds which makes it suitable for our future ARM support, it also has nicer UI and much more sophisticated caching mechanisms as well as supports better multi-segment builds. BuildKit has been promoted to official for quite a while and it is rather stable now. Also we can now install BuildKit Plugin to docker that add capabilities of building and managin cache using dedicated builders (previously BuildKit cache was managed using rather complex external tools). This gives us an opportunity to vastly simplify our build scripts, because it has now much more robust caching mechanism than the old docker build (which forced us to pull images before using them as cache). We had a lot of complexity involved in efficient caching but with BuildKit all that can be vastly simplified and we can get rid of: * keeping base python images in our registry * keeping build segments for prod image in our registry * keeping manifest images in our registry * deciding when to pull or pull&build image (not needed now, we can always build image with --cache-from and buildkit will pull cached layers as needed * building the image when performing pre-commit (rather than that we simply encourage users to rebuild the image via breeze command) * pulling the images before building * separate 'build' cache kept in our registry (not needed any more as buildkit allows to keep cache for all segments of multi-segmented build in a single cache * the nice animated tty UI of buildkit eliminates the need of manual spinner * and a number of other complexities. Depends on #20238
2022-01-18 22:59:30 +01:00
# Use the same builder frontend version for everyone
ARG AIRFLOW_EXTRAS="aiobotocore,amazon,async,celery,cncf-kubernetes,common-io,common-messaging,docker,elasticsearch,fab,ftp,git,google,google-auth,graphviz,grpc,hashicorp,http,ldap,microsoft-azure,mysql,odbc,openlineage,pandas,postgres,redis,sendgrid,sftp,slack,snowflake,ssh,statsd,uv"
ARG ADDITIONAL_AIRFLOW_EXTRAS=""
ARG ADDITIONAL_PYTHON_DEPS=""
ARG AIRFLOW_HOME=/opt/airflow
Simplify tooling by switching completely to uv (#48223) The lazy consensus decision has been made at the devlist to switch entirely to `uv` as development tool: link: https://lists.apache.org/thread/6xxdon9lmjx3xh8zw09xc5k9jxb2n256 This PR implements that decision and removes a lot of baggage connected to using `pip` additionally to uv to install and sync the environment. It also introduces more consistency in the way how distribution packages are used in airflow sources - basicaly switching all internal distributions to use `pyproject.toml` approach and linking them all together via `uv`'s workspace feature. This enables much more streamlined development workflows, where any part of airflow development is manageable using `uv sync` in the right distribution - opening the way to moving more of the "sub-worfklows" from the CI image to local virtualenv environment. Unfortunately, such change cannot be done incrementally, really, because any change in the project layout drags with itself a lot of changes in the test/CI/management scripts, so we have to implement one big PR covering the move. This PR is "safe" in terms of the airflow and provider's code - it does not **really** (except occasional imports and type hint changes resulting from better isolation of packages) change Airflow code nor it should not affect any airflow or provider code, because it does not move any of the folder where airflow or provider's code is modified. It does move the test code - in a number of "auxiliary" distributions we have. It also moves the `docs` generation code to `devel-common` and introduces separate conf.py files for every doc package. What is still NOT done after that move and will be covered in the follow-up changes: * isolating docs-building to have separate configuraiton for docs building per distribution - allowing to run doc build locally with it's own conf.py file * moving some of the tests and checks out from breeze container image up to the local environment (for example mypy checks) and likely isolating them per-provider * Constraints are still generated using `pip freeze` and automatically managed by our custom scripts in `canary` builds - this will be replaced later by switching to `uv.lock` mechanism. * potentially, we could merge `devel-common` and `dev` - to be considered as a follow-up. * PROD image is stil build with `pip` by default when using `PyPI` or distribution packages - but we do not support building the source image with `pip` - when building from sources, uv is forced internally to install packages. Currently we have no plans to change default PROD building to use `uv`. This is the detailed list of changes implemented in this PR: * uv is now mandatory to install as pre-requisite in order to develop airflow. We do not support installing airflow for development with `pip` - there will be a lot of cases where it will not work for development - including development dependencies and installing several distributions together. * removed meta-package `hatch_build.py' and replacing it with pre-commit automatically modifying declarative pyproject.toml * stripped down `hatch_build_airflow_core.py` to only cover custom git and asset build hooks (and renaming the file to `hatch_build.py` and moving all airflow dependencies to `pyproject.toml` * converted "loose" packages in airflow repo into distributions: * docker-tests * kubernetes-tests * helm-tests * dev (here we do not have `src` subfolder - sources are directly in the distribution, which is for-now inconsistent with other distributions). The names of the `_tests` distribution folders have been renamed to the `-tests` convention to make sure the imports are always referring to base of each distribution and are not used from the content root. * Each eof the distributions (on top of already existing airflow-core, task-sdk, devel-common and 90+providers has it's own set of dependencies, and the top-level meta-package workspace root brings those distributions together allowing to install them all tegether with a simple `uv sync --all-packages` command and come up with consistent set of dependencies that are good for all those packages (yay!). This is used to build CI image with single common environment to run the tests (with some quirks due to constraints use where we have to manually list all distributions until we switch to `uv.lock` mechanism) * `doc` code is moved to `devel-common` distribution. The `doc` folder only keeps README informing where the other doc code is, the spelling_wordlist.txt and start_docs_server.sh. The documentation is generated in `generated/generated-docs/` folder which is entirely .gitignored. * the documentation is now fully moved to: * `airflow-core/docs` - documentation for Airflow Core * `providers/**/docs` - documentation for Providers * `chart/docs` - documentation for Helm Chart * `task-sdk/docs` - documentation for Task SDK (new format not yet published) * `docker-stack-docs` - documentation for Docker Stack' * `providers-summary-docs` - documentation for provider summary page * `versions` are not dynamically retrieved from `__init__.py` all of them are synchronized directly to pyproject.toml files - this way - except the custom build hook - we have no dynamic components in our `pyproject.toml` properties. * references to extras were removed from INSTALL and other places, the only references to extras remains in the user documentation - we stop using extras for local development, we switch to using dependency groups. * backtracking command was removed from breeze - we did not need it since we started using `uv` * internal commands (except constraint generation) have been moved to `uv` from `pip` * breeze requires `uv` to be installed and expects to be installed by `uv tool install -e ./dev/breeze` * pyproject.tomls are dynamically modified when we add a version suffix dynamically (`--version-suffix-for-pypi`) - only for the time of building the versions with updated suffix * `mypy` checks are now consistently used across all the different distributions and for consistency (and to fix some of the issues with namespace packages) rather than using "folder" approach when running mypy checks, even if we run mypy for whole distribution, we run check on individual files rather than on a folder. That adds consistency in execution of mypy heursistics. Rather than using in-container mypy script all the logic of selection and parameters passed to mypy are in pre-commit code. For now we are still using CI image to run mypy because mypy is very sensitive to version of dependencies installed, we should be able to switch to running mypy locally once we have the `uv.lock` mechanism incorporated in our workflows. * lower bounds for dependencies have been set consistently across all the distributions. With `uv sync` and dependabot, those should be generally kept consistently for the future * the `devel-common` dependencies have been groupped together in `devel-common` extras - including `basic`, `doc`, `doc-gen`, and `all` which will make it easier to install them for some OS-es (basic is used as default set of dependencies to cover most common set of development dependencies to be used for development) * generated/provider_dependencies.json are not committed to the repository any longer. They are .gitignored and geberated on-the-flight as needed (breeze will generate them automatically when empty and pre-commit will always regenerate them to be consistent with provider's pyproject.toml files. * `chart-utils` have been noved to `helm-tests` from `devel-common` as they were only used there. * for k8s tests we are using the `uv` main `.venv` environment rather than creating our own `.build` environment and we use `uv sync` to keep it in sync * Updated `uv` version to 0.6.10 * We are using `uv sync` to perform "upgrade to newer depencies" in `canary` builds and locally * leveldb has been turned into "dependency group" and removed from apache-airflow and apache-airflow-core extras, it is now only available by google provider's leveldb optional extra to install with `pip`
2025-04-02 13:11:13 +02:00
ARG AIRFLOW_IMAGE_TYPE="prod"
ARG AIRFLOW_UID="50000"
ARG AIRFLOW_USER_HOME_DIR=/home/airflow
# latest released version here
ARG AIRFLOW_VERSION="3.1.8"
Migrated prod builds to use python built from source (#53770) Moved the prod build to also use the src built python as done by CI. This let's us iterate faster on our python versions without needing to wait for the community image of python. This involves a number of changes: * 3.0.5 -> 3.0.6 airflow version change * use released official packages rather than git repo to install Python * added wget (it was added in the original image to pull python packages so added for compatibility) added those flags to python build (same as in the * original build) * --with-ensurepip --build="$gnuArch" * --enable-loadable-sqlite-extension * --enable-option-checking=fatal * --enable-shared * --with-lto * added cleanup of apt after installing packages * added removal of .pyc/.test etc. files (saves 350 MB) * added relinking of symbolic links from /usr/python/bin to /usr/local/bin in the "main" image as well as in the build image. * we do not need AIRFLOW_SETUPTOOLS_VERSION any more - this was only added to upgrade setuptools, because the Python official image had a very old setuptools version. * checked all "customize" scripts and make them "work" * Updated the 'version upgrade" script to upgrade AIRFLOW_PYTHON_VERSION everywhere * Updated Changelog and documentation to update new ways of building images * removed installation with GitHub URL (it won't work easily after splitting to multiple packages - not easily at least) and it's not needed * all python, pip and similar links are created in /usr/python/bin * /usr/python/bin is always first in the PATH - before /usr/local/bin * added changelog entry explaining that Python's installation home has been moved to /usr/python/ from /usr/local * removal of installed editable distributions in breeze happens now first and THEN we install when --use-airflow-version is used. * LD_LIBRARY_PATH was not set so the shared python libraries could not be loaded when venv was created Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
2025-08-30 20:18:30 +05:30
ARG BASE_IMAGE="debian:bookworm-slim"
ARG AIRFLOW_PYTHON_VERSION="3.12.13"
# PYTHON_LTO: Controls whether Python is built with Link-Time Optimization (LTO).
#
# Link-Time Optimization uses MD5 checksums during the compilation process to verify
# object files and intermediate representations. In FIPS-compliant environments, MD5
# is blocked as it's not an approved cryptographic algorithm (see FIPS 140-2/140-3).
# This can cause Python builds with LTO to fail when FIPS mode is enabled.
#
# When building FIPS-compliant images, set this to "false" to disable LTO:
# docker build --build-arg PYTHON_LTO="false" ...
#
# Default: "true" (LTO enabled for better performance)
#
# Related: https://github.com/apache/airflow/issues/58337
ARG PYTHON_LTO="true"
# You can swap comments between those two args to test pip from the main version
# When you attempt to test if the version of `pip` from specified branch works for our builds
# Also use `force pip` label on your PR to swap all places we use `uv` to `pip`
ARG AIRFLOW_PIP_VERSION=26.0.1
# ARG AIRFLOW_PIP_VERSION="git+https://github.com/pypa/pip.git@main"
ARG AIRFLOW_UV_VERSION=0.10.12
ARG AIRFLOW_USE_UV="false"
ARG AIRFLOW_IMAGE_REPOSITORY="https://github.com/apache/airflow"
ARG AIRFLOW_IMAGE_README_URL="https://raw.githubusercontent.com/apache/airflow/main/docs/docker-stack/README.md"
Switch from --user to venv for PROD image and enable uv (#37796) This PR introduces a joint way to treat the .local (--user) folder as both - venv and `--user` package installation. It fixes a number of problems the `--user` installation created us in the past and does it in fully backwards compatible way. This improves both "production" use for end user as well as local iteration on the PROD image during tests - but also for CI. Improvements for "end user": * user does not have to use `pip install --user` to install new packages any more and it is not enabled by default with PIP_USER flag. * users can use uv to install packages when they extend the image (but it's not obligatory - pip continues working as it did) * users can use `uv` to build custom production image, which gives 40%-50% saving for image build time compring to `pip`. * python -m venv --system-site-packages continues to use the .local packages from the .local installation (and not uses them if --system-site-packages is not used) - so we have full compatibility with previous images. Improvements for development: * when image is built from sources (no --use-docker-context-files are specified), airflow is installed in --editable mode, which means that airflow + all providers are installed locally from airflow sources, not from packages - which means that both airflow and providers have the latest version inside the prod image. * when local sources changes and you want to run k8s tests locally, it is now WAY faster (several minutes) to iterate with your changes because you do not have to rebuild the base image - the only thing needed is to copy sources to the PROD image to "/opt/airflow" which is where editable installlation is done from. You only need to rebuild the image if dependencies change. * By default `uv` is used for local source build for k8s tests so even if you have to rebuild it, it is way faster (60%-80%) during iterating with the image. CI/DEV tooling improvements: * this PR switches to use `uv` by default for most prod images we build in CI, but it adds a check if the image still builds with `pip`. * we also switch to more PEP standard way of installing packages from local filesystem (package-name @ file:///FILE) Fixes: #37785 Fixes: #37815 Update contributing-docs/testing/k8s_tests.rst Update contributing-docs/testing/k8s_tests.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update scripts/docker/install_airflow.sh Update docs/docker-stack/changelog.rst Update docs/docker-stack/build.rst Co-authored-by: Niko Oliveira <onikolas@amazon.com>
2024-03-06 01:27:15 +01:00
# By default we install latest airflow from PyPI so we do not need to copy sources of Airflow
# from the host - so we are using Dockerfile and copy it to /Dockerfile in target image
# because this is the only file we know exists locally. This way you can build the image in PyPI with
# **just** the Dockerfile and no need for any other files from Airflow repository.
# However, in case of breeze/development use we use latest sources and we override those
# SOURCES_FROM/TO with "." and "/opt/airflow" respectively - so that sources of Airflow (and all providers)
# are used to build the PROD image used in tests.
ARG AIRFLOW_SOURCES_FROM="Dockerfile"
ARG AIRFLOW_SOURCES_TO="/Dockerfile"
# By default latest released version of airflow is installed (when empty) but this value can be overridden
# and we can install version according to specification (For example ==2.0.2 or <3.0.0).
ARG AIRFLOW_VERSION_SPECIFICATION=""
# By default PIP has progress bar but you can disable it.
ARG PIP_PROGRESS_BAR="on"
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
##############################################################################################
# This is the script image where we keep all inlined bash scripts needed in other segments
##############################################################################################
FROM scratch as scripts
##############################################################################################
# Please DO NOT modify the inlined scripts manually. The content of those files will be
Switch pre-commit to prek (#54258) The pre-commit is a fantastic tool, and we heavily used it for years, but generally the tool stagnated and is not showing a sign of adapting to our needs. For years we tried to convince pre-commit maintainers that things like autocomplete are necessary - but it met with pretty much resistance (if not hostility) from the maintainer. Also there was no chance for them to accept expectations of bigger projects like ours, where we have a huge monorepo and not only multiple needs but also different parts of the repo needing different language support (golang, typescript soon) - and apparenty the maintainer of pre-commit does not think monorepo is a good thing at all. Similarly - they did not recognize the raise of `uv` and the only way to use `uv` with pre-commit is to patch it by installing `pre-comit-uv` that essentialy patches pre-commit with uv support. This is not really sustainable and the tool lags behind many of our needs. Luckily - we have new project in town - prek - which rewrites pre-commit that is 100% compatible (now), 10x faster (because rust), uses `uv` natively, supports auto-complete already and they have very friendly maintainer who is not only supporting us but also very happily works on improving `prek` to close all the gaps, and plans to implement (with our support of course and cooperation) monorepo support - that will allow us to modularise our pre-commits. This PR switches our pre-commit support to use prek exclusively: * breeze static checks command is completely removed * custom auto-complete code in breeze as well * instructions are updated to setup prek instead of precommit * CI is updated to run prek instead of pre-commmit * documentation for static checks is reviewed and new features that prek enables are added
2025-08-17 09:00:14 +02:00
# replaced by prek automatically from the "scripts/docker/" folder.
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
# This is done in order to avoid problems with caching and file permissions and in order to
# make the PROD Dockerfile standalone
##############################################################################################
# The content below is automatically copied from scripts/docker/install_os_dependencies.sh
COPY <<"EOF" /install_os_dependencies.sh
#!/usr/bin/env bash
set -euo pipefail
if [[ "$#" != 1 ]]; then
Fix, cleanup and refactor adding apt dependencies when building image (#55151) After migrating to python built from sources, it turned out that we had a weird mixture of system and source build python in our CI images images. There were some dependencies installed that installed python3 debian dependencies and they were breaking the way how source build python 3.11 interacted with system one. Namely - debian Python 3.11 had internal ABI incompatibility between 3.11.4 and 3.11.5 that caused the ssl build with latest 3.11.5 fail when 3.11.5 system python has been installed while the new python was being built - because system includes interfered with the build process. The system python was pulled by "software-properties-common" and "gdb" - we do not need to use either of them during the build process, and user in In order to avoid this, we now make sure that we do not install any debian dependencies that pull any of the python3 system packages, before we install Python from sources. We also reviewed and cleaned the way how we configure list of packages installed during image building: * packages are reviewed and unnecessary ones removed * packages that are pulling python3 system packages are moved to additional apt dev dependencies * additional apt dev dependencies are installed AFTER python is installed * the packages listed in scripts are now one-per-line * they are alphabetically sorted * finally LD_LIBRARY_PATH is set to have the "source-installed" python libraries to be always prioritised against the system libraries if system libraries are installed to the main "/usr/lib" directories.
2025-09-01 23:16:20 +02:00
echo
Migrated prod builds to use python built from source (#53770) Moved the prod build to also use the src built python as done by CI. This let's us iterate faster on our python versions without needing to wait for the community image of python. This involves a number of changes: * 3.0.5 -> 3.0.6 airflow version change * use released official packages rather than git repo to install Python * added wget (it was added in the original image to pull python packages so added for compatibility) added those flags to python build (same as in the * original build) * --with-ensurepip --build="$gnuArch" * --enable-loadable-sqlite-extension * --enable-option-checking=fatal * --enable-shared * --with-lto * added cleanup of apt after installing packages * added removal of .pyc/.test etc. files (saves 350 MB) * added relinking of symbolic links from /usr/python/bin to /usr/local/bin in the "main" image as well as in the build image. * we do not need AIRFLOW_SETUPTOOLS_VERSION any more - this was only added to upgrade setuptools, because the Python official image had a very old setuptools version. * checked all "customize" scripts and make them "work" * Updated the 'version upgrade" script to upgrade AIRFLOW_PYTHON_VERSION everywhere * Updated Changelog and documentation to update new ways of building images * removed installation with GitHub URL (it won't work easily after splitting to multiple packages - not easily at least) and it's not needed * all python, pip and similar links are created in /usr/python/bin * /usr/python/bin is always first in the PATH - before /usr/local/bin * added changelog entry explaining that Python's installation home has been moved to /usr/python/ from /usr/local * removal of installed editable distributions in breeze happens now first and THEN we install when --use-airflow-version is used. * LD_LIBRARY_PATH was not set so the shared python libraries could not be loaded when venv was created Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
2025-08-30 20:18:30 +05:30
echo "ERROR! There should be 'runtime', 'ci' or 'dev' parameter passed as argument.".
Fix, cleanup and refactor adding apt dependencies when building image (#55151) After migrating to python built from sources, it turned out that we had a weird mixture of system and source build python in our CI images images. There were some dependencies installed that installed python3 debian dependencies and they were breaking the way how source build python 3.11 interacted with system one. Namely - debian Python 3.11 had internal ABI incompatibility between 3.11.4 and 3.11.5 that caused the ssl build with latest 3.11.5 fail when 3.11.5 system python has been installed while the new python was being built - because system includes interfered with the build process. The system python was pulled by "software-properties-common" and "gdb" - we do not need to use either of them during the build process, and user in In order to avoid this, we now make sure that we do not install any debian dependencies that pull any of the python3 system packages, before we install Python from sources. We also reviewed and cleaned the way how we configure list of packages installed during image building: * packages are reviewed and unnecessary ones removed * packages that are pulling python3 system packages are moved to additional apt dev dependencies * additional apt dev dependencies are installed AFTER python is installed * the packages listed in scripts are now one-per-line * they are alphabetically sorted * finally LD_LIBRARY_PATH is set to have the "source-installed" python libraries to be always prioritised against the system libraries if system libraries are installed to the main "/usr/lib" directories.
2025-09-01 23:16:20 +02:00
echo
exit 1
fi
Migrated prod builds to use python built from source (#53770) Moved the prod build to also use the src built python as done by CI. This let's us iterate faster on our python versions without needing to wait for the community image of python. This involves a number of changes: * 3.0.5 -> 3.0.6 airflow version change * use released official packages rather than git repo to install Python * added wget (it was added in the original image to pull python packages so added for compatibility) added those flags to python build (same as in the * original build) * --with-ensurepip --build="$gnuArch" * --enable-loadable-sqlite-extension * --enable-option-checking=fatal * --enable-shared * --with-lto * added cleanup of apt after installing packages * added removal of .pyc/.test etc. files (saves 350 MB) * added relinking of symbolic links from /usr/python/bin to /usr/local/bin in the "main" image as well as in the build image. * we do not need AIRFLOW_SETUPTOOLS_VERSION any more - this was only added to upgrade setuptools, because the Python official image had a very old setuptools version. * checked all "customize" scripts and make them "work" * Updated the 'version upgrade" script to upgrade AIRFLOW_PYTHON_VERSION everywhere * Updated Changelog and documentation to update new ways of building images * removed installation with GitHub URL (it won't work easily after splitting to multiple packages - not easily at least) and it's not needed * all python, pip and similar links are created in /usr/python/bin * /usr/python/bin is always first in the PATH - before /usr/local/bin * added changelog entry explaining that Python's installation home has been moved to /usr/python/ from /usr/local * removal of installed editable distributions in breeze happens now first and THEN we install when --use-airflow-version is used. * LD_LIBRARY_PATH was not set so the shared python libraries could not be loaded when venv was created Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
2025-08-30 20:18:30 +05:30
AIRFLOW_PYTHON_VERSION=${AIRFLOW_PYTHON_VERSION:-3.10.18}
PYTHON_LTO=${PYTHON_LTO:-true}
Migrated prod builds to use python built from source (#53770) Moved the prod build to also use the src built python as done by CI. This let's us iterate faster on our python versions without needing to wait for the community image of python. This involves a number of changes: * 3.0.5 -> 3.0.6 airflow version change * use released official packages rather than git repo to install Python * added wget (it was added in the original image to pull python packages so added for compatibility) added those flags to python build (same as in the * original build) * --with-ensurepip --build="$gnuArch" * --enable-loadable-sqlite-extension * --enable-option-checking=fatal * --enable-shared * --with-lto * added cleanup of apt after installing packages * added removal of .pyc/.test etc. files (saves 350 MB) * added relinking of symbolic links from /usr/python/bin to /usr/local/bin in the "main" image as well as in the build image. * we do not need AIRFLOW_SETUPTOOLS_VERSION any more - this was only added to upgrade setuptools, because the Python official image had a very old setuptools version. * checked all "customize" scripts and make them "work" * Updated the 'version upgrade" script to upgrade AIRFLOW_PYTHON_VERSION everywhere * Updated Changelog and documentation to update new ways of building images * removed installation with GitHub URL (it won't work easily after splitting to multiple packages - not easily at least) and it's not needed * all python, pip and similar links are created in /usr/python/bin * /usr/python/bin is always first in the PATH - before /usr/local/bin * added changelog entry explaining that Python's installation home has been moved to /usr/python/ from /usr/local * removal of installed editable distributions in breeze happens now first and THEN we install when --use-airflow-version is used. * LD_LIBRARY_PATH was not set so the shared python libraries could not be loaded when venv was created Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
2025-08-30 20:18:30 +05:30
GOLANG_MAJOR_MINOR_VERSION=${GOLANG_MAJOR_MINOR_VERSION:-1.24.4}
COSIGN_VERSION=${COSIGN_VERSION:-3.0.5}
Migrated prod builds to use python built from source (#53770) Moved the prod build to also use the src built python as done by CI. This let's us iterate faster on our python versions without needing to wait for the community image of python. This involves a number of changes: * 3.0.5 -> 3.0.6 airflow version change * use released official packages rather than git repo to install Python * added wget (it was added in the original image to pull python packages so added for compatibility) added those flags to python build (same as in the * original build) * --with-ensurepip --build="$gnuArch" * --enable-loadable-sqlite-extension * --enable-option-checking=fatal * --enable-shared * --with-lto * added cleanup of apt after installing packages * added removal of .pyc/.test etc. files (saves 350 MB) * added relinking of symbolic links from /usr/python/bin to /usr/local/bin in the "main" image as well as in the build image. * we do not need AIRFLOW_SETUPTOOLS_VERSION any more - this was only added to upgrade setuptools, because the Python official image had a very old setuptools version. * checked all "customize" scripts and make them "work" * Updated the 'version upgrade" script to upgrade AIRFLOW_PYTHON_VERSION everywhere * Updated Changelog and documentation to update new ways of building images * removed installation with GitHub URL (it won't work easily after splitting to multiple packages - not easily at least) and it's not needed * all python, pip and similar links are created in /usr/python/bin * /usr/python/bin is always first in the PATH - before /usr/local/bin * added changelog entry explaining that Python's installation home has been moved to /usr/python/ from /usr/local * removal of installed editable distributions in breeze happens now first and THEN we install when --use-airflow-version is used. * LD_LIBRARY_PATH was not set so the shared python libraries could not be loaded when venv was created Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
2025-08-30 20:18:30 +05:30
if [[ "${1}" == "runtime" ]]; then
INSTALLATION_TYPE="RUNTIME"
elif [[ "${1}" == "dev" ]]; then
Migrated prod builds to use python built from source (#53770) Moved the prod build to also use the src built python as done by CI. This let's us iterate faster on our python versions without needing to wait for the community image of python. This involves a number of changes: * 3.0.5 -> 3.0.6 airflow version change * use released official packages rather than git repo to install Python * added wget (it was added in the original image to pull python packages so added for compatibility) added those flags to python build (same as in the * original build) * --with-ensurepip --build="$gnuArch" * --enable-loadable-sqlite-extension * --enable-option-checking=fatal * --enable-shared * --with-lto * added cleanup of apt after installing packages * added removal of .pyc/.test etc. files (saves 350 MB) * added relinking of symbolic links from /usr/python/bin to /usr/local/bin in the "main" image as well as in the build image. * we do not need AIRFLOW_SETUPTOOLS_VERSION any more - this was only added to upgrade setuptools, because the Python official image had a very old setuptools version. * checked all "customize" scripts and make them "work" * Updated the 'version upgrade" script to upgrade AIRFLOW_PYTHON_VERSION everywhere * Updated Changelog and documentation to update new ways of building images * removed installation with GitHub URL (it won't work easily after splitting to multiple packages - not easily at least) and it's not needed * all python, pip and similar links are created in /usr/python/bin * /usr/python/bin is always first in the PATH - before /usr/local/bin * added changelog entry explaining that Python's installation home has been moved to /usr/python/ from /usr/local * removal of installed editable distributions in breeze happens now first and THEN we install when --use-airflow-version is used. * LD_LIBRARY_PATH was not set so the shared python libraries could not be loaded when venv was created Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
2025-08-30 20:18:30 +05:30
INSTALLATION_TYPE="DEV"
elif [[ "${1}" == "ci" ]]; then
INSTALLATION_TYPE="CI"
else
Fix, cleanup and refactor adding apt dependencies when building image (#55151) After migrating to python built from sources, it turned out that we had a weird mixture of system and source build python in our CI images images. There were some dependencies installed that installed python3 debian dependencies and they were breaking the way how source build python 3.11 interacted with system one. Namely - debian Python 3.11 had internal ABI incompatibility between 3.11.4 and 3.11.5 that caused the ssl build with latest 3.11.5 fail when 3.11.5 system python has been installed while the new python was being built - because system includes interfered with the build process. The system python was pulled by "software-properties-common" and "gdb" - we do not need to use either of them during the build process, and user in In order to avoid this, we now make sure that we do not install any debian dependencies that pull any of the python3 system packages, before we install Python from sources. We also reviewed and cleaned the way how we configure list of packages installed during image building: * packages are reviewed and unnecessary ones removed * packages that are pulling python3 system packages are moved to additional apt dev dependencies * additional apt dev dependencies are installed AFTER python is installed * the packages listed in scripts are now one-per-line * they are alphabetically sorted * finally LD_LIBRARY_PATH is set to have the "source-installed" python libraries to be always prioritised against the system libraries if system libraries are installed to the main "/usr/lib" directories.
2025-09-01 23:16:20 +02:00
echo
Migrated prod builds to use python built from source (#53770) Moved the prod build to also use the src built python as done by CI. This let's us iterate faster on our python versions without needing to wait for the community image of python. This involves a number of changes: * 3.0.5 -> 3.0.6 airflow version change * use released official packages rather than git repo to install Python * added wget (it was added in the original image to pull python packages so added for compatibility) added those flags to python build (same as in the * original build) * --with-ensurepip --build="$gnuArch" * --enable-loadable-sqlite-extension * --enable-option-checking=fatal * --enable-shared * --with-lto * added cleanup of apt after installing packages * added removal of .pyc/.test etc. files (saves 350 MB) * added relinking of symbolic links from /usr/python/bin to /usr/local/bin in the "main" image as well as in the build image. * we do not need AIRFLOW_SETUPTOOLS_VERSION any more - this was only added to upgrade setuptools, because the Python official image had a very old setuptools version. * checked all "customize" scripts and make them "work" * Updated the 'version upgrade" script to upgrade AIRFLOW_PYTHON_VERSION everywhere * Updated Changelog and documentation to update new ways of building images * removed installation with GitHub URL (it won't work easily after splitting to multiple packages - not easily at least) and it's not needed * all python, pip and similar links are created in /usr/python/bin * /usr/python/bin is always first in the PATH - before /usr/local/bin * added changelog entry explaining that Python's installation home has been moved to /usr/python/ from /usr/local * removal of installed editable distributions in breeze happens now first and THEN we install when --use-airflow-version is used. * LD_LIBRARY_PATH was not set so the shared python libraries could not be loaded when venv was created Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
2025-08-30 20:18:30 +05:30
echo "ERROR! Wrong argument. Passed ${1} and it should be one of 'runtime', 'ci' or 'dev'.".
Fix, cleanup and refactor adding apt dependencies when building image (#55151) After migrating to python built from sources, it turned out that we had a weird mixture of system and source build python in our CI images images. There were some dependencies installed that installed python3 debian dependencies and they were breaking the way how source build python 3.11 interacted with system one. Namely - debian Python 3.11 had internal ABI incompatibility between 3.11.4 and 3.11.5 that caused the ssl build with latest 3.11.5 fail when 3.11.5 system python has been installed while the new python was being built - because system includes interfered with the build process. The system python was pulled by "software-properties-common" and "gdb" - we do not need to use either of them during the build process, and user in In order to avoid this, we now make sure that we do not install any debian dependencies that pull any of the python3 system packages, before we install Python from sources. We also reviewed and cleaned the way how we configure list of packages installed during image building: * packages are reviewed and unnecessary ones removed * packages that are pulling python3 system packages are moved to additional apt dev dependencies * additional apt dev dependencies are installed AFTER python is installed * the packages listed in scripts are now one-per-line * they are alphabetically sorted * finally LD_LIBRARY_PATH is set to have the "source-installed" python libraries to be always prioritised against the system libraries if system libraries are installed to the main "/usr/lib" directories.
2025-09-01 23:16:20 +02:00
echo
exit 1
fi
function get_dev_apt_deps() {
if [[ "${DEV_APT_DEPS=}" == "" ]]; then
Fix, cleanup and refactor adding apt dependencies when building image (#55151) After migrating to python built from sources, it turned out that we had a weird mixture of system and source build python in our CI images images. There were some dependencies installed that installed python3 debian dependencies and they were breaking the way how source build python 3.11 interacted with system one. Namely - debian Python 3.11 had internal ABI incompatibility between 3.11.4 and 3.11.5 that caused the ssl build with latest 3.11.5 fail when 3.11.5 system python has been installed while the new python was being built - because system includes interfered with the build process. The system python was pulled by "software-properties-common" and "gdb" - we do not need to use either of them during the build process, and user in In order to avoid this, we now make sure that we do not install any debian dependencies that pull any of the python3 system packages, before we install Python from sources. We also reviewed and cleaned the way how we configure list of packages installed during image building: * packages are reviewed and unnecessary ones removed * packages that are pulling python3 system packages are moved to additional apt dev dependencies * additional apt dev dependencies are installed AFTER python is installed * the packages listed in scripts are now one-per-line * they are alphabetically sorted * finally LD_LIBRARY_PATH is set to have the "source-installed" python libraries to be always prioritised against the system libraries if system libraries are installed to the main "/usr/lib" directories.
2025-09-01 23:16:20 +02:00
DEV_APT_DEPS="\
apt-transport-https \
apt-utils \
build-essential \
dirmngr \
freetds-bin \
freetds-dev \
git \
graphviz \
graphviz-dev \
krb5-user \
lcov \
ldap-utils \
libbluetooth-dev \
libbz2-dev \
libc6-dev \
libdb-dev \
libev-dev \
libev4 \
libffi-dev \
libgdbm-compat-dev \
libgdbm-dev \
libgeos-dev \
libkrb5-dev \
libldap2-dev \
libleveldb-dev \
libleveldb1d \
liblzma-dev \
libncurses5-dev \
libreadline6-dev \
libsasl2-2 \
libsasl2-dev \
libsasl2-modules \
libsqlite3-dev \
libssl-dev \
libxmlsec1 \
libxmlsec1-dev \
libzstd-dev \
locales \
lsb-release \
lzma \
lzma-dev \
openssh-client \
openssl \
pkg-config \
pkgconf \
sasl2-bin \
sqlite3 \
sudo \
tk-dev \
unixodbc \
unixodbc-dev \
uuid-dev \
wget \
xz-utils \
zlib1g-dev \
"
export DEV_APT_DEPS
fi
}
function get_runtime_apt_deps() {
Switch our base image to use Debian bookworm (#35376) Debian bookworm (12) is the current stable version of Debian and it is on the market for more than a year so all the other dependencies should have enough time to catch up. While Debian bullseye is still supported (oldstable) it will be switching to LTS support mode (managed by volunteers) roughly in July 2024 - but we want to switch our reference images to bookworm long before that date. This PR switches our reference images to Debian Bookworm for Dockerfiles and images that will be released to Airflow 2.8.0. Similarly as with "Debian buster -> Debian bullseye" switch we will switch our reference images to bookworm and we will not be publishing images based on bullseye. However our users will still be able to build custom images using our Dockerfiles with bullseye base image until we release Airflow 2.9.0 where the bullseye support will be dropped entirely. We provide release notes and instructions on how users can build the bullseye images if they still want to do it - for example because their system level dependencies will require them to do so, but the users are advised to switch to bookworm-based images as soon as possible. The users will likely still be able to build custom Airflow images for future airflow releases (using Dockerfiles released with Airlfow 2.8), however as of Airflow 2.9, we will not release Dockerfiles with support for that and we will not verify if Airflow with default depenencies can be installed on bullseye Debian. Co-authored-by: raphaelauv <raphaelauv@gmail.com>
2023-11-06 10:32:46 +01:00
local debian_version
local debian_version_apt_deps
# Get debian version without installing lsb_release
# shellcheck disable=SC1091
debian_version=$(. /etc/os-release; printf '%s\n' "$VERSION_CODENAME";)
echo
echo "DEBIAN CODENAME: ${debian_version}"
echo
Fix, cleanup and refactor adding apt dependencies when building image (#55151) After migrating to python built from sources, it turned out that we had a weird mixture of system and source build python in our CI images images. There were some dependencies installed that installed python3 debian dependencies and they were breaking the way how source build python 3.11 interacted with system one. Namely - debian Python 3.11 had internal ABI incompatibility between 3.11.4 and 3.11.5 that caused the ssl build with latest 3.11.5 fail when 3.11.5 system python has been installed while the new python was being built - because system includes interfered with the build process. The system python was pulled by "software-properties-common" and "gdb" - we do not need to use either of them during the build process, and user in In order to avoid this, we now make sure that we do not install any debian dependencies that pull any of the python3 system packages, before we install Python from sources. We also reviewed and cleaned the way how we configure list of packages installed during image building: * packages are reviewed and unnecessary ones removed * packages that are pulling python3 system packages are moved to additional apt dev dependencies * additional apt dev dependencies are installed AFTER python is installed * the packages listed in scripts are now one-per-line * they are alphabetically sorted * finally LD_LIBRARY_PATH is set to have the "source-installed" python libraries to be always prioritised against the system libraries if system libraries are installed to the main "/usr/lib" directories.
2025-09-01 23:16:20 +02:00
debian_version_apt_deps="\
libffi8 \
libldap-2.5-0 \
libssl3 \
netcat-openbsd\
"
Switch our base image to use Debian bookworm (#35376) Debian bookworm (12) is the current stable version of Debian and it is on the market for more than a year so all the other dependencies should have enough time to catch up. While Debian bullseye is still supported (oldstable) it will be switching to LTS support mode (managed by volunteers) roughly in July 2024 - but we want to switch our reference images to bookworm long before that date. This PR switches our reference images to Debian Bookworm for Dockerfiles and images that will be released to Airflow 2.8.0. Similarly as with "Debian buster -> Debian bullseye" switch we will switch our reference images to bookworm and we will not be publishing images based on bullseye. However our users will still be able to build custom images using our Dockerfiles with bullseye base image until we release Airflow 2.9.0 where the bullseye support will be dropped entirely. We provide release notes and instructions on how users can build the bullseye images if they still want to do it - for example because their system level dependencies will require them to do so, but the users are advised to switch to bookworm-based images as soon as possible. The users will likely still be able to build custom Airflow images for future airflow releases (using Dockerfiles released with Airlfow 2.8), however as of Airflow 2.9, we will not release Dockerfiles with support for that and we will not verify if Airflow with default depenencies can be installed on bullseye Debian. Co-authored-by: raphaelauv <raphaelauv@gmail.com>
2023-11-06 10:32:46 +01:00
echo
echo "APPLIED INSTALLATION CONFIGURATION FOR DEBIAN VERSION: ${debian_version}"
echo
if [[ "${RUNTIME_APT_DEPS=}" == "" ]]; then
Fix, cleanup and refactor adding apt dependencies when building image (#55151) After migrating to python built from sources, it turned out that we had a weird mixture of system and source build python in our CI images images. There were some dependencies installed that installed python3 debian dependencies and they were breaking the way how source build python 3.11 interacted with system one. Namely - debian Python 3.11 had internal ABI incompatibility between 3.11.4 and 3.11.5 that caused the ssl build with latest 3.11.5 fail when 3.11.5 system python has been installed while the new python was being built - because system includes interfered with the build process. The system python was pulled by "software-properties-common" and "gdb" - we do not need to use either of them during the build process, and user in In order to avoid this, we now make sure that we do not install any debian dependencies that pull any of the python3 system packages, before we install Python from sources. We also reviewed and cleaned the way how we configure list of packages installed during image building: * packages are reviewed and unnecessary ones removed * packages that are pulling python3 system packages are moved to additional apt dev dependencies * additional apt dev dependencies are installed AFTER python is installed * the packages listed in scripts are now one-per-line * they are alphabetically sorted * finally LD_LIBRARY_PATH is set to have the "source-installed" python libraries to be always prioritised against the system libraries if system libraries are installed to the main "/usr/lib" directories.
2025-09-01 23:16:20 +02:00
RUNTIME_APT_DEPS="\
${debian_version_apt_deps} \
apt-transport-https \
apt-utils \
curl \
dumb-init \
freetds-bin \
git \
gnupg \
iputils-ping \
krb5-user \
ldap-utils \
libev4 \
libgeos-dev \
libsasl2-2 \
libsasl2-modules \
libxmlsec1 \
locales \
lsb-release \
openssh-client \
rsync \
sasl2-bin \
sqlite3 \
sudo \
unixodbc \
wget\
"
export RUNTIME_APT_DEPS
fi
}
function install_docker_cli() {
apt-get update
apt-get install ca-certificates curl
install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/debian/gpg -o /etc/apt/keyrings/docker.asc
chmod a+r /etc/apt/keyrings/docker.asc
# shellcheck disable=SC1091
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/debian \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
tee /etc/apt/sources.list.d/docker.list > /dev/null
apt-get update
apt-get install -y --no-install-recommends docker-ce-cli
}
function install_debian_dev_dependencies() {
apt-get update
apt-get install -yqq --no-install-recommends apt-utils >/dev/null 2>&1
Migrated prod builds to use python built from source (#53770) Moved the prod build to also use the src built python as done by CI. This let's us iterate faster on our python versions without needing to wait for the community image of python. This involves a number of changes: * 3.0.5 -> 3.0.6 airflow version change * use released official packages rather than git repo to install Python * added wget (it was added in the original image to pull python packages so added for compatibility) added those flags to python build (same as in the * original build) * --with-ensurepip --build="$gnuArch" * --enable-loadable-sqlite-extension * --enable-option-checking=fatal * --enable-shared * --with-lto * added cleanup of apt after installing packages * added removal of .pyc/.test etc. files (saves 350 MB) * added relinking of symbolic links from /usr/python/bin to /usr/local/bin in the "main" image as well as in the build image. * we do not need AIRFLOW_SETUPTOOLS_VERSION any more - this was only added to upgrade setuptools, because the Python official image had a very old setuptools version. * checked all "customize" scripts and make them "work" * Updated the 'version upgrade" script to upgrade AIRFLOW_PYTHON_VERSION everywhere * Updated Changelog and documentation to update new ways of building images * removed installation with GitHub URL (it won't work easily after splitting to multiple packages - not easily at least) and it's not needed * all python, pip and similar links are created in /usr/python/bin * /usr/python/bin is always first in the PATH - before /usr/local/bin * added changelog entry explaining that Python's installation home has been moved to /usr/python/ from /usr/local * removal of installed editable distributions in breeze happens now first and THEN we install when --use-airflow-version is used. * LD_LIBRARY_PATH was not set so the shared python libraries could not be loaded when venv was created Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
2025-08-30 20:18:30 +05:30
apt-get install -y --no-install-recommends wget curl gnupg2 lsb-release ca-certificates
# shellcheck disable=SC2086
export ${ADDITIONAL_DEV_APT_ENV?}
if [[ ${DEV_APT_COMMAND} != "" ]]; then
bash -o pipefail -o errexit -o nounset -o nolog -c "${DEV_APT_COMMAND}"
fi
if [[ ${ADDITIONAL_DEV_APT_COMMAND} != "" ]]; then
bash -o pipefail -o errexit -o nounset -o nolog -c "${ADDITIONAL_DEV_APT_COMMAND}"
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
fi
apt-get update
Switch our base image to use Debian bookworm (#35376) Debian bookworm (12) is the current stable version of Debian and it is on the market for more than a year so all the other dependencies should have enough time to catch up. While Debian bullseye is still supported (oldstable) it will be switching to LTS support mode (managed by volunteers) roughly in July 2024 - but we want to switch our reference images to bookworm long before that date. This PR switches our reference images to Debian Bookworm for Dockerfiles and images that will be released to Airflow 2.8.0. Similarly as with "Debian buster -> Debian bullseye" switch we will switch our reference images to bookworm and we will not be publishing images based on bullseye. However our users will still be able to build custom images using our Dockerfiles with bullseye base image until we release Airflow 2.9.0 where the bullseye support will be dropped entirely. We provide release notes and instructions on how users can build the bullseye images if they still want to do it - for example because their system level dependencies will require them to do so, but the users are advised to switch to bookworm-based images as soon as possible. The users will likely still be able to build custom Airflow images for future airflow releases (using Dockerfiles released with Airlfow 2.8), however as of Airflow 2.9, we will not release Dockerfiles with support for that and we will not verify if Airflow with default depenencies can be installed on bullseye Debian. Co-authored-by: raphaelauv <raphaelauv@gmail.com>
2023-11-06 10:32:46 +01:00
local debian_version
local debian_version_apt_deps
# Get debian version without installing lsb_release
# shellcheck disable=SC1091
debian_version=$(. /etc/os-release; printf '%s\n' "$VERSION_CODENAME";)
echo
echo "DEBIAN CODENAME: ${debian_version}"
echo
# shellcheck disable=SC2086
Fix, cleanup and refactor adding apt dependencies when building image (#55151) After migrating to python built from sources, it turned out that we had a weird mixture of system and source build python in our CI images images. There were some dependencies installed that installed python3 debian dependencies and they were breaking the way how source build python 3.11 interacted with system one. Namely - debian Python 3.11 had internal ABI incompatibility between 3.11.4 and 3.11.5 that caused the ssl build with latest 3.11.5 fail when 3.11.5 system python has been installed while the new python was being built - because system includes interfered with the build process. The system python was pulled by "software-properties-common" and "gdb" - we do not need to use either of them during the build process, and user in In order to avoid this, we now make sure that we do not install any debian dependencies that pull any of the python3 system packages, before we install Python from sources. We also reviewed and cleaned the way how we configure list of packages installed during image building: * packages are reviewed and unnecessary ones removed * packages that are pulling python3 system packages are moved to additional apt dev dependencies * additional apt dev dependencies are installed AFTER python is installed * the packages listed in scripts are now one-per-line * they are alphabetically sorted * finally LD_LIBRARY_PATH is set to have the "source-installed" python libraries to be always prioritised against the system libraries if system libraries are installed to the main "/usr/lib" directories.
2025-09-01 23:16:20 +02:00
apt-get install -y --no-install-recommends ${DEV_APT_DEPS}
}
Fix, cleanup and refactor adding apt dependencies when building image (#55151) After migrating to python built from sources, it turned out that we had a weird mixture of system and source build python in our CI images images. There were some dependencies installed that installed python3 debian dependencies and they were breaking the way how source build python 3.11 interacted with system one. Namely - debian Python 3.11 had internal ABI incompatibility between 3.11.4 and 3.11.5 that caused the ssl build with latest 3.11.5 fail when 3.11.5 system python has been installed while the new python was being built - because system includes interfered with the build process. The system python was pulled by "software-properties-common" and "gdb" - we do not need to use either of them during the build process, and user in In order to avoid this, we now make sure that we do not install any debian dependencies that pull any of the python3 system packages, before we install Python from sources. We also reviewed and cleaned the way how we configure list of packages installed during image building: * packages are reviewed and unnecessary ones removed * packages that are pulling python3 system packages are moved to additional apt dev dependencies * additional apt dev dependencies are installed AFTER python is installed * the packages listed in scripts are now one-per-line * they are alphabetically sorted * finally LD_LIBRARY_PATH is set to have the "source-installed" python libraries to be always prioritised against the system libraries if system libraries are installed to the main "/usr/lib" directories.
2025-09-01 23:16:20 +02:00
function install_additional_dev_dependencies() {
if [[ "${ADDITIONAL_DEV_APT_DEPS=}" != "" ]]; then
# shellcheck disable=SC2086
apt-get install -y --no-install-recommends ${ADDITIONAL_DEV_APT_DEPS}
fi
}
Migrated prod builds to use python built from source (#53770) Moved the prod build to also use the src built python as done by CI. This let's us iterate faster on our python versions without needing to wait for the community image of python. This involves a number of changes: * 3.0.5 -> 3.0.6 airflow version change * use released official packages rather than git repo to install Python * added wget (it was added in the original image to pull python packages so added for compatibility) added those flags to python build (same as in the * original build) * --with-ensurepip --build="$gnuArch" * --enable-loadable-sqlite-extension * --enable-option-checking=fatal * --enable-shared * --with-lto * added cleanup of apt after installing packages * added removal of .pyc/.test etc. files (saves 350 MB) * added relinking of symbolic links from /usr/python/bin to /usr/local/bin in the "main" image as well as in the build image. * we do not need AIRFLOW_SETUPTOOLS_VERSION any more - this was only added to upgrade setuptools, because the Python official image had a very old setuptools version. * checked all "customize" scripts and make them "work" * Updated the 'version upgrade" script to upgrade AIRFLOW_PYTHON_VERSION everywhere * Updated Changelog and documentation to update new ways of building images * removed installation with GitHub URL (it won't work easily after splitting to multiple packages - not easily at least) and it's not needed * all python, pip and similar links are created in /usr/python/bin * /usr/python/bin is always first in the PATH - before /usr/local/bin * added changelog entry explaining that Python's installation home has been moved to /usr/python/ from /usr/local * removal of installed editable distributions in breeze happens now first and THEN we install when --use-airflow-version is used. * LD_LIBRARY_PATH was not set so the shared python libraries could not be loaded when venv was created Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
2025-08-30 20:18:30 +05:30
function link_python() {
# link python binaries to /usr/local/bin and /usr/python/bin with and without 3 suffix
# Links in /usr/local/bin are needed for tools that expect python to be there
# Links in /usr/python/bin are needed for tools that are detecting home of python installation including
# lib/site-packages. The /usr/python/bin should be first in PATH in order to help with the last part.
for dst in pip3 python3 python3-config; do
src="$(echo "${dst}" | tr -d 3)"
echo "Linking ${dst} in /usr/local/bin and /usr/python/bin"
ln -sv "/usr/python/bin/${dst}" "/usr/local/bin/${dst}"
for dir in /usr/local/bin /usr/python/bin; do
if [[ ! -e "${dir}/${src}" ]]; then
echo "Creating ${src} - > ${dst} link in ${dir}"
ln -sv "${dir}/${dst}" "${dir}/${src}"
fi
done
done
for dst in /usr/python/lib/*
do
src="/usr/local/lib/$(basename "${dst}")"
if [[ -e "${src}" ]]; then
rm -rf "${src}"
fi
echo "Linking ${dst} to ${src}"
ln -sv "${dst}" "${src}"
done
ldconfig
Migrated prod builds to use python built from source (#53770) Moved the prod build to also use the src built python as done by CI. This let's us iterate faster on our python versions without needing to wait for the community image of python. This involves a number of changes: * 3.0.5 -> 3.0.6 airflow version change * use released official packages rather than git repo to install Python * added wget (it was added in the original image to pull python packages so added for compatibility) added those flags to python build (same as in the * original build) * --with-ensurepip --build="$gnuArch" * --enable-loadable-sqlite-extension * --enable-option-checking=fatal * --enable-shared * --with-lto * added cleanup of apt after installing packages * added removal of .pyc/.test etc. files (saves 350 MB) * added relinking of symbolic links from /usr/python/bin to /usr/local/bin in the "main" image as well as in the build image. * we do not need AIRFLOW_SETUPTOOLS_VERSION any more - this was only added to upgrade setuptools, because the Python official image had a very old setuptools version. * checked all "customize" scripts and make them "work" * Updated the 'version upgrade" script to upgrade AIRFLOW_PYTHON_VERSION everywhere * Updated Changelog and documentation to update new ways of building images * removed installation with GitHub URL (it won't work easily after splitting to multiple packages - not easily at least) and it's not needed * all python, pip and similar links are created in /usr/python/bin * /usr/python/bin is always first in the PATH - before /usr/local/bin * added changelog entry explaining that Python's installation home has been moved to /usr/python/ from /usr/local * removal of installed editable distributions in breeze happens now first and THEN we install when --use-airflow-version is used. * LD_LIBRARY_PATH was not set so the shared python libraries could not be loaded when venv was created Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
2025-08-30 20:18:30 +05:30
}
function install_debian_runtime_dependencies() {
apt-get update
apt-get install --no-install-recommends -yqq apt-utils >/dev/null 2>&1
Migrated prod builds to use python built from source (#53770) Moved the prod build to also use the src built python as done by CI. This let's us iterate faster on our python versions without needing to wait for the community image of python. This involves a number of changes: * 3.0.5 -> 3.0.6 airflow version change * use released official packages rather than git repo to install Python * added wget (it was added in the original image to pull python packages so added for compatibility) added those flags to python build (same as in the * original build) * --with-ensurepip --build="$gnuArch" * --enable-loadable-sqlite-extension * --enable-option-checking=fatal * --enable-shared * --with-lto * added cleanup of apt after installing packages * added removal of .pyc/.test etc. files (saves 350 MB) * added relinking of symbolic links from /usr/python/bin to /usr/local/bin in the "main" image as well as in the build image. * we do not need AIRFLOW_SETUPTOOLS_VERSION any more - this was only added to upgrade setuptools, because the Python official image had a very old setuptools version. * checked all "customize" scripts and make them "work" * Updated the 'version upgrade" script to upgrade AIRFLOW_PYTHON_VERSION everywhere * Updated Changelog and documentation to update new ways of building images * removed installation with GitHub URL (it won't work easily after splitting to multiple packages - not easily at least) and it's not needed * all python, pip and similar links are created in /usr/python/bin * /usr/python/bin is always first in the PATH - before /usr/local/bin * added changelog entry explaining that Python's installation home has been moved to /usr/python/ from /usr/local * removal of installed editable distributions in breeze happens now first and THEN we install when --use-airflow-version is used. * LD_LIBRARY_PATH was not set so the shared python libraries could not be loaded when venv was created Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
2025-08-30 20:18:30 +05:30
apt-get install -y --no-install-recommends wget curl gnupg2 lsb-release ca-certificates
# shellcheck disable=SC2086
export ${ADDITIONAL_RUNTIME_APT_ENV?}
if [[ "${RUNTIME_APT_COMMAND}" != "" ]]; then
bash -o pipefail -o errexit -o nounset -o nolog -c "${RUNTIME_APT_COMMAND}"
fi
if [[ "${ADDITIONAL_RUNTIME_APT_COMMAND}" != "" ]]; then
bash -o pipefail -o errexit -o nounset -o nolog -c "${ADDITIONAL_RUNTIME_APT_COMMAND}"
fi
apt-get update
# shellcheck disable=SC2086
apt-get install -y --no-install-recommends ${RUNTIME_APT_DEPS} ${ADDITIONAL_RUNTIME_APT_DEPS}
apt-get autoremove -yqq --purge
apt-get clean
Migrated prod builds to use python built from source (#53770) Moved the prod build to also use the src built python as done by CI. This let's us iterate faster on our python versions without needing to wait for the community image of python. This involves a number of changes: * 3.0.5 -> 3.0.6 airflow version change * use released official packages rather than git repo to install Python * added wget (it was added in the original image to pull python packages so added for compatibility) added those flags to python build (same as in the * original build) * --with-ensurepip --build="$gnuArch" * --enable-loadable-sqlite-extension * --enable-option-checking=fatal * --enable-shared * --with-lto * added cleanup of apt after installing packages * added removal of .pyc/.test etc. files (saves 350 MB) * added relinking of symbolic links from /usr/python/bin to /usr/local/bin in the "main" image as well as in the build image. * we do not need AIRFLOW_SETUPTOOLS_VERSION any more - this was only added to upgrade setuptools, because the Python official image had a very old setuptools version. * checked all "customize" scripts and make them "work" * Updated the 'version upgrade" script to upgrade AIRFLOW_PYTHON_VERSION everywhere * Updated Changelog and documentation to update new ways of building images * removed installation with GitHub URL (it won't work easily after splitting to multiple packages - not easily at least) and it's not needed * all python, pip and similar links are created in /usr/python/bin * /usr/python/bin is always first in the PATH - before /usr/local/bin * added changelog entry explaining that Python's installation home has been moved to /usr/python/ from /usr/local * removal of installed editable distributions in breeze happens now first and THEN we install when --use-airflow-version is used. * LD_LIBRARY_PATH was not set so the shared python libraries could not be loaded when venv was created Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
2025-08-30 20:18:30 +05:30
link_python
rm -rf /var/lib/apt/lists/* /var/log/*
}
function install_cosign() {
local arch
arch="$(dpkg --print-architecture)"
declare -A cosign_sha256s=(
# https://github.com/sigstore/cosign/releases/download/v${COSIGN_VERSION}/cosign_checksums.txt
[amd64]="db15cc99e6e4837daabab023742aaddc3841ce57f193d11b7c3e06c8003642b2"
[arm64]="d098f3168ae4b3aa70b4ca78947329b953272b487727d1722cb3cb098a1a20ab"
)
local cosign_sha256="${cosign_sha256s[${arch}]}"
if [[ -z "${cosign_sha256}" ]]; then
echo "Unsupported architecture for cosign: ${arch}"
exit 1
fi
curl -fsSL \
"https://github.com/sigstore/cosign/releases/download/v${COSIGN_VERSION}/cosign-linux-${arch}" \
-o /tmp/cosign
echo "${cosign_sha256} /tmp/cosign" | sha256sum --check
chmod +x /tmp/cosign
}
Migrated prod builds to use python built from source (#53770) Moved the prod build to also use the src built python as done by CI. This let's us iterate faster on our python versions without needing to wait for the community image of python. This involves a number of changes: * 3.0.5 -> 3.0.6 airflow version change * use released official packages rather than git repo to install Python * added wget (it was added in the original image to pull python packages so added for compatibility) added those flags to python build (same as in the * original build) * --with-ensurepip --build="$gnuArch" * --enable-loadable-sqlite-extension * --enable-option-checking=fatal * --enable-shared * --with-lto * added cleanup of apt after installing packages * added removal of .pyc/.test etc. files (saves 350 MB) * added relinking of symbolic links from /usr/python/bin to /usr/local/bin in the "main" image as well as in the build image. * we do not need AIRFLOW_SETUPTOOLS_VERSION any more - this was only added to upgrade setuptools, because the Python official image had a very old setuptools version. * checked all "customize" scripts and make them "work" * Updated the 'version upgrade" script to upgrade AIRFLOW_PYTHON_VERSION everywhere * Updated Changelog and documentation to update new ways of building images * removed installation with GitHub URL (it won't work easily after splitting to multiple packages - not easily at least) and it's not needed * all python, pip and similar links are created in /usr/python/bin * /usr/python/bin is always first in the PATH - before /usr/local/bin * added changelog entry explaining that Python's installation home has been moved to /usr/python/ from /usr/local * removal of installed editable distributions in breeze happens now first and THEN we install when --use-airflow-version is used. * LD_LIBRARY_PATH was not set so the shared python libraries could not be loaded when venv was created Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
2025-08-30 20:18:30 +05:30
function install_python() {
Fix, cleanup and refactor adding apt dependencies when building image (#55151) After migrating to python built from sources, it turned out that we had a weird mixture of system and source build python in our CI images images. There were some dependencies installed that installed python3 debian dependencies and they were breaking the way how source build python 3.11 interacted with system one. Namely - debian Python 3.11 had internal ABI incompatibility between 3.11.4 and 3.11.5 that caused the ssl build with latest 3.11.5 fail when 3.11.5 system python has been installed while the new python was being built - because system includes interfered with the build process. The system python was pulled by "software-properties-common" and "gdb" - we do not need to use either of them during the build process, and user in In order to avoid this, we now make sure that we do not install any debian dependencies that pull any of the python3 system packages, before we install Python from sources. We also reviewed and cleaned the way how we configure list of packages installed during image building: * packages are reviewed and unnecessary ones removed * packages that are pulling python3 system packages are moved to additional apt dev dependencies * additional apt dev dependencies are installed AFTER python is installed * the packages listed in scripts are now one-per-line * they are alphabetically sorted * finally LD_LIBRARY_PATH is set to have the "source-installed" python libraries to be always prioritised against the system libraries if system libraries are installed to the main "/usr/lib" directories.
2025-09-01 23:16:20 +02:00
# If system python (3.11 in bookworm) is installed (via automatic installation of some dependencies for example), we need
# to fail and make sure that it is not there, because there can be strange interactions if we install
# newer version and system libraries are installed, because
# when you create a virtualenv part of the shared libraries of Python can be taken from the system
# Installation leading to weird errors when you want to install some modules - for example when you install ssl:
# /usr/python/lib/python3.11/lib-dynload/_ssl.cpython-311-aarch64-linux-gnu.so: undefined symbol: _PyModule_Add
if dpkg -l | grep '^ii' | grep '^ii libpython' >/dev/null; then
echo
echo "ERROR! System python is installed by one of the previous steps"
echo
echo "Please make sure that no python packages are installed by default. Displaying the reason why libpython3.11 is installed:"
echo
apt-get install -yqq aptitude >/dev/null
aptitude why libpython3.11
echo
exit 1
else
echo
echo "GOOD! System python is not installed - OK"
echo
fi
Migrated prod builds to use python built from source (#53770) Moved the prod build to also use the src built python as done by CI. This let's us iterate faster on our python versions without needing to wait for the community image of python. This involves a number of changes: * 3.0.5 -> 3.0.6 airflow version change * use released official packages rather than git repo to install Python * added wget (it was added in the original image to pull python packages so added for compatibility) added those flags to python build (same as in the * original build) * --with-ensurepip --build="$gnuArch" * --enable-loadable-sqlite-extension * --enable-option-checking=fatal * --enable-shared * --with-lto * added cleanup of apt after installing packages * added removal of .pyc/.test etc. files (saves 350 MB) * added relinking of symbolic links from /usr/python/bin to /usr/local/bin in the "main" image as well as in the build image. * we do not need AIRFLOW_SETUPTOOLS_VERSION any more - this was only added to upgrade setuptools, because the Python official image had a very old setuptools version. * checked all "customize" scripts and make them "work" * Updated the 'version upgrade" script to upgrade AIRFLOW_PYTHON_VERSION everywhere * Updated Changelog and documentation to update new ways of building images * removed installation with GitHub URL (it won't work easily after splitting to multiple packages - not easily at least) and it's not needed * all python, pip and similar links are created in /usr/python/bin * /usr/python/bin is always first in the PATH - before /usr/local/bin * added changelog entry explaining that Python's installation home has been moved to /usr/python/ from /usr/local * removal of installed editable distributions in breeze happens now first and THEN we install when --use-airflow-version is used. * LD_LIBRARY_PATH was not set so the shared python libraries could not be loaded when venv was created Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
2025-08-30 20:18:30 +05:30
wget -O python.tar.xz "https://www.python.org/ftp/python/${AIRFLOW_PYTHON_VERSION%%[a-z]*}/Python-${AIRFLOW_PYTHON_VERSION}.tar.xz"
local major_minor_version
Migrated prod builds to use python built from source (#53770) Moved the prod build to also use the src built python as done by CI. This let's us iterate faster on our python versions without needing to wait for the community image of python. This involves a number of changes: * 3.0.5 -> 3.0.6 airflow version change * use released official packages rather than git repo to install Python * added wget (it was added in the original image to pull python packages so added for compatibility) added those flags to python build (same as in the * original build) * --with-ensurepip --build="$gnuArch" * --enable-loadable-sqlite-extension * --enable-option-checking=fatal * --enable-shared * --with-lto * added cleanup of apt after installing packages * added removal of .pyc/.test etc. files (saves 350 MB) * added relinking of symbolic links from /usr/python/bin to /usr/local/bin in the "main" image as well as in the build image. * we do not need AIRFLOW_SETUPTOOLS_VERSION any more - this was only added to upgrade setuptools, because the Python official image had a very old setuptools version. * checked all "customize" scripts and make them "work" * Updated the 'version upgrade" script to upgrade AIRFLOW_PYTHON_VERSION everywhere * Updated Changelog and documentation to update new ways of building images * removed installation with GitHub URL (it won't work easily after splitting to multiple packages - not easily at least) and it's not needed * all python, pip and similar links are created in /usr/python/bin * /usr/python/bin is always first in the PATH - before /usr/local/bin * added changelog entry explaining that Python's installation home has been moved to /usr/python/ from /usr/local * removal of installed editable distributions in breeze happens now first and THEN we install when --use-airflow-version is used. * LD_LIBRARY_PATH was not set so the shared python libraries could not be loaded when venv was created Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
2025-08-30 20:18:30 +05:30
major_minor_version="${AIRFLOW_PYTHON_VERSION%.*}"
local major minor
major="${major_minor_version%.*}"
minor="${major_minor_version#*.}"
Migrated prod builds to use python built from source (#53770) Moved the prod build to also use the src built python as done by CI. This let's us iterate faster on our python versions without needing to wait for the community image of python. This involves a number of changes: * 3.0.5 -> 3.0.6 airflow version change * use released official packages rather than git repo to install Python * added wget (it was added in the original image to pull python packages so added for compatibility) added those flags to python build (same as in the * original build) * --with-ensurepip --build="$gnuArch" * --enable-loadable-sqlite-extension * --enable-option-checking=fatal * --enable-shared * --with-lto * added cleanup of apt after installing packages * added removal of .pyc/.test etc. files (saves 350 MB) * added relinking of symbolic links from /usr/python/bin to /usr/local/bin in the "main" image as well as in the build image. * we do not need AIRFLOW_SETUPTOOLS_VERSION any more - this was only added to upgrade setuptools, because the Python official image had a very old setuptools version. * checked all "customize" scripts and make them "work" * Updated the 'version upgrade" script to upgrade AIRFLOW_PYTHON_VERSION everywhere * Updated Changelog and documentation to update new ways of building images * removed installation with GitHub URL (it won't work easily after splitting to multiple packages - not easily at least) and it's not needed * all python, pip and similar links are created in /usr/python/bin * /usr/python/bin is always first in the PATH - before /usr/local/bin * added changelog entry explaining that Python's installation home has been moved to /usr/python/ from /usr/local * removal of installed editable distributions in breeze happens now first and THEN we install when --use-airflow-version is used. * LD_LIBRARY_PATH was not set so the shared python libraries could not be loaded when venv was created Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
2025-08-30 20:18:30 +05:30
echo "Verifying Python ${AIRFLOW_PYTHON_VERSION} (${major_minor_version})"
if [[ "${major}" -gt 3 ]] || [[ "${major}" -eq 3 && "${minor}" -ge 11 ]]; then
# Sigstore verification for Python >= 3.11 (PEP 761)
declare -A sigstore_identities=(
# https://peps.python.org/pep-0664/#release-manager-and-crew
[3.11]="pablogsal@python.org"
# https://peps.python.org/pep-0693/#release-manager-and-crew
[3.12]="thomas@python.org"
# https://peps.python.org/pep-0719/#release-manager-and-crew
[3.13]="thomas@python.org"
# https://peps.python.org/pep-0745/#release-manager-and-crew
[3.14]="hugo@python.org"
)
declare -A sigstore_issuers=(
[3.11]="https://accounts.google.com"
[3.12]="https://accounts.google.com"
[3.13]="https://accounts.google.com"
[3.14]="https://github.com/login/oauth"
)
wget -O python.tar.xz.sigstore \
"https://www.python.org/ftp/python/${AIRFLOW_PYTHON_VERSION%%[a-z]*}/Python-${AIRFLOW_PYTHON_VERSION}.tar.xz.sigstore"
install_cosign
local identity="${sigstore_identities[${major_minor_version}]}"
local issuer="${sigstore_issuers[${major_minor_version}]}"
/tmp/cosign verify-blob \
--bundle python.tar.xz.sigstore \
--certificate-identity "${identity}" \
--certificate-oidc-issuer "${issuer}" \
python.tar.xz
rm -f python.tar.xz.sigstore /tmp/cosign
else
# PGP verification for Python 3.10
declare -A keys=(
# gpg: key 64E628F8D684696D: public key "Pablo Galindo Salgado <pablogsal@gmail.com>" imported
# https://peps.python.org/pep-0619/#release-manager-and-crew
[3.10]="A035C8C19219BA821ECEA86B64E628F8D684696D"
)
wget -O python.tar.xz.asc \
"https://www.python.org/ftp/python/${AIRFLOW_PYTHON_VERSION%%[a-z]*}/Python-${AIRFLOW_PYTHON_VERSION}.tar.xz.asc"
GNUPGHOME="$(mktemp -d)"; export GNUPGHOME
local gpg_key="${keys[${major_minor_version}]}"
echo "Using GPG key ${gpg_key}"
gpg --batch --keyserver hkps://keys.openpgp.org --recv-keys "${gpg_key}"
gpg --batch --verify python.tar.xz.asc python.tar.xz
gpgconf --kill all
rm -rf "${GNUPGHOME}" python.tar.xz.asc
fi
Migrated prod builds to use python built from source (#53770) Moved the prod build to also use the src built python as done by CI. This let's us iterate faster on our python versions without needing to wait for the community image of python. This involves a number of changes: * 3.0.5 -> 3.0.6 airflow version change * use released official packages rather than git repo to install Python * added wget (it was added in the original image to pull python packages so added for compatibility) added those flags to python build (same as in the * original build) * --with-ensurepip --build="$gnuArch" * --enable-loadable-sqlite-extension * --enable-option-checking=fatal * --enable-shared * --with-lto * added cleanup of apt after installing packages * added removal of .pyc/.test etc. files (saves 350 MB) * added relinking of symbolic links from /usr/python/bin to /usr/local/bin in the "main" image as well as in the build image. * we do not need AIRFLOW_SETUPTOOLS_VERSION any more - this was only added to upgrade setuptools, because the Python official image had a very old setuptools version. * checked all "customize" scripts and make them "work" * Updated the 'version upgrade" script to upgrade AIRFLOW_PYTHON_VERSION everywhere * Updated Changelog and documentation to update new ways of building images * removed installation with GitHub URL (it won't work easily after splitting to multiple packages - not easily at least) and it's not needed * all python, pip and similar links are created in /usr/python/bin * /usr/python/bin is always first in the PATH - before /usr/local/bin * added changelog entry explaining that Python's installation home has been moved to /usr/python/ from /usr/local * removal of installed editable distributions in breeze happens now first and THEN we install when --use-airflow-version is used. * LD_LIBRARY_PATH was not set so the shared python libraries could not be loaded when venv was created Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
2025-08-30 20:18:30 +05:30
mkdir -p /usr/src/python
tar --extract --directory /usr/src/python --strip-components=1 --file python.tar.xz
rm python.tar.xz
cd /usr/src/python
arch="$(dpkg --print-architecture)"; arch="${arch##*-}"
gnuArch="$(dpkg-architecture --query DEB_BUILD_GNU_TYPE)"
EXTRA_CFLAGS="$(dpkg-buildflags --get CFLAGS)"
EXTRA_CFLAGS="${EXTRA_CFLAGS:-} -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer";
LDFLAGS="$(dpkg-buildflags --get LDFLAGS)"
LDFLAGS="${LDFLAGS:--Wl},--strip-all"
# Link-Time Optimization (LTO) uses MD5 checksums for object file verification during
# compilation. In FIPS mode, MD5 is blocked as a non-approved algorithm, causing builds
# to fail. The PYTHON_LTO variable allows disabling LTO for FIPS-compliant builds.
# See: https://github.com/apache/airflow/issues/58337
local lto_option=""
if [[ "${PYTHON_LTO:-true}" == "true" ]]; then
lto_option="--with-lto"
fi
local build_log
build_log=$(mktemp)
echo "Building Python ${AIRFLOW_PYTHON_VERSION} from source..."
if ! (
./configure --enable-optimizations --prefix=/usr/python/ --with-ensurepip --build="$gnuArch" \
--enable-loadable-sqlite-extensions --enable-option-checking=fatal \
--enable-shared ${lto_option} && \
make -s -j "$(nproc)" "EXTRA_CFLAGS=${EXTRA_CFLAGS:-}" \
"LDFLAGS=${LDFLAGS:--Wl},-rpath='\$\$ORIGIN/../lib'" python && \
make -s -j "$(nproc)" install
) > "${build_log}" 2>&1; then
echo
echo "ERROR! Python build failed. Build output:"
echo
cat "${build_log}"
rm -f "${build_log}"
exit 1
fi
rm -f "${build_log}"
Migrated prod builds to use python built from source (#53770) Moved the prod build to also use the src built python as done by CI. This let's us iterate faster on our python versions without needing to wait for the community image of python. This involves a number of changes: * 3.0.5 -> 3.0.6 airflow version change * use released official packages rather than git repo to install Python * added wget (it was added in the original image to pull python packages so added for compatibility) added those flags to python build (same as in the * original build) * --with-ensurepip --build="$gnuArch" * --enable-loadable-sqlite-extension * --enable-option-checking=fatal * --enable-shared * --with-lto * added cleanup of apt after installing packages * added removal of .pyc/.test etc. files (saves 350 MB) * added relinking of symbolic links from /usr/python/bin to /usr/local/bin in the "main" image as well as in the build image. * we do not need AIRFLOW_SETUPTOOLS_VERSION any more - this was only added to upgrade setuptools, because the Python official image had a very old setuptools version. * checked all "customize" scripts and make them "work" * Updated the 'version upgrade" script to upgrade AIRFLOW_PYTHON_VERSION everywhere * Updated Changelog and documentation to update new ways of building images * removed installation with GitHub URL (it won't work easily after splitting to multiple packages - not easily at least) and it's not needed * all python, pip and similar links are created in /usr/python/bin * /usr/python/bin is always first in the PATH - before /usr/local/bin * added changelog entry explaining that Python's installation home has been moved to /usr/python/ from /usr/local * removal of installed editable distributions in breeze happens now first and THEN we install when --use-airflow-version is used. * LD_LIBRARY_PATH was not set so the shared python libraries could not be loaded when venv was created Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
2025-08-30 20:18:30 +05:30
cd /
rm -rf /usr/src/python
find /usr/python -depth \
\( \
\( -type d -a \( -name test -o -name tests -o -name idle_test \) \) \
-o \( -type f -a \( -name 'libpython*.a' \) \) \
Migrated prod builds to use python built from source (#53770) Moved the prod build to also use the src built python as done by CI. This let's us iterate faster on our python versions without needing to wait for the community image of python. This involves a number of changes: * 3.0.5 -> 3.0.6 airflow version change * use released official packages rather than git repo to install Python * added wget (it was added in the original image to pull python packages so added for compatibility) added those flags to python build (same as in the * original build) * --with-ensurepip --build="$gnuArch" * --enable-loadable-sqlite-extension * --enable-option-checking=fatal * --enable-shared * --with-lto * added cleanup of apt after installing packages * added removal of .pyc/.test etc. files (saves 350 MB) * added relinking of symbolic links from /usr/python/bin to /usr/local/bin in the "main" image as well as in the build image. * we do not need AIRFLOW_SETUPTOOLS_VERSION any more - this was only added to upgrade setuptools, because the Python official image had a very old setuptools version. * checked all "customize" scripts and make them "work" * Updated the 'version upgrade" script to upgrade AIRFLOW_PYTHON_VERSION everywhere * Updated Changelog and documentation to update new ways of building images * removed installation with GitHub URL (it won't work easily after splitting to multiple packages - not easily at least) and it's not needed * all python, pip and similar links are created in /usr/python/bin * /usr/python/bin is always first in the PATH - before /usr/local/bin * added changelog entry explaining that Python's installation home has been moved to /usr/python/ from /usr/local * removal of installed editable distributions in breeze happens now first and THEN we install when --use-airflow-version is used. * LD_LIBRARY_PATH was not set so the shared python libraries could not be loaded when venv was created Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
2025-08-30 20:18:30 +05:30
\) -exec rm -rf '{}' +
link_python
}
function install_golang() {
curl "https://dl.google.com/go/go${GOLANG_MAJOR_MINOR_VERSION}.linux-$(dpkg --print-architecture).tar.gz" -o "go${GOLANG_MAJOR_MINOR_VERSION}.linux.tar.gz"
rm -rf /usr/local/go && tar -C /usr/local -xzf go"${GOLANG_MAJOR_MINOR_VERSION}".linux.tar.gz
}
function apt_clean() {
apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false
rm -rf /var/lib/apt/lists/* /var/log/*
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
}
if [[ "${INSTALLATION_TYPE}" == "RUNTIME" ]]; then
get_runtime_apt_deps
install_debian_runtime_dependencies
install_docker_cli
Migrated prod builds to use python built from source (#53770) Moved the prod build to also use the src built python as done by CI. This let's us iterate faster on our python versions without needing to wait for the community image of python. This involves a number of changes: * 3.0.5 -> 3.0.6 airflow version change * use released official packages rather than git repo to install Python * added wget (it was added in the original image to pull python packages so added for compatibility) added those flags to python build (same as in the * original build) * --with-ensurepip --build="$gnuArch" * --enable-loadable-sqlite-extension * --enable-option-checking=fatal * --enable-shared * --with-lto * added cleanup of apt after installing packages * added removal of .pyc/.test etc. files (saves 350 MB) * added relinking of symbolic links from /usr/python/bin to /usr/local/bin in the "main" image as well as in the build image. * we do not need AIRFLOW_SETUPTOOLS_VERSION any more - this was only added to upgrade setuptools, because the Python official image had a very old setuptools version. * checked all "customize" scripts and make them "work" * Updated the 'version upgrade" script to upgrade AIRFLOW_PYTHON_VERSION everywhere * Updated Changelog and documentation to update new ways of building images * removed installation with GitHub URL (it won't work easily after splitting to multiple packages - not easily at least) and it's not needed * all python, pip and similar links are created in /usr/python/bin * /usr/python/bin is always first in the PATH - before /usr/local/bin * added changelog entry explaining that Python's installation home has been moved to /usr/python/ from /usr/local * removal of installed editable distributions in breeze happens now first and THEN we install when --use-airflow-version is used. * LD_LIBRARY_PATH was not set so the shared python libraries could not be loaded when venv was created Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
2025-08-30 20:18:30 +05:30
apt_clean
else
get_dev_apt_deps
install_debian_dev_dependencies
Migrated prod builds to use python built from source (#53770) Moved the prod build to also use the src built python as done by CI. This let's us iterate faster on our python versions without needing to wait for the community image of python. This involves a number of changes: * 3.0.5 -> 3.0.6 airflow version change * use released official packages rather than git repo to install Python * added wget (it was added in the original image to pull python packages so added for compatibility) added those flags to python build (same as in the * original build) * --with-ensurepip --build="$gnuArch" * --enable-loadable-sqlite-extension * --enable-option-checking=fatal * --enable-shared * --with-lto * added cleanup of apt after installing packages * added removal of .pyc/.test etc. files (saves 350 MB) * added relinking of symbolic links from /usr/python/bin to /usr/local/bin in the "main" image as well as in the build image. * we do not need AIRFLOW_SETUPTOOLS_VERSION any more - this was only added to upgrade setuptools, because the Python official image had a very old setuptools version. * checked all "customize" scripts and make them "work" * Updated the 'version upgrade" script to upgrade AIRFLOW_PYTHON_VERSION everywhere * Updated Changelog and documentation to update new ways of building images * removed installation with GitHub URL (it won't work easily after splitting to multiple packages - not easily at least) and it's not needed * all python, pip and similar links are created in /usr/python/bin * /usr/python/bin is always first in the PATH - before /usr/local/bin * added changelog entry explaining that Python's installation home has been moved to /usr/python/ from /usr/local * removal of installed editable distributions in breeze happens now first and THEN we install when --use-airflow-version is used. * LD_LIBRARY_PATH was not set so the shared python libraries could not be loaded when venv was created Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
2025-08-30 20:18:30 +05:30
install_python
Fix, cleanup and refactor adding apt dependencies when building image (#55151) After migrating to python built from sources, it turned out that we had a weird mixture of system and source build python in our CI images images. There were some dependencies installed that installed python3 debian dependencies and they were breaking the way how source build python 3.11 interacted with system one. Namely - debian Python 3.11 had internal ABI incompatibility between 3.11.4 and 3.11.5 that caused the ssl build with latest 3.11.5 fail when 3.11.5 system python has been installed while the new python was being built - because system includes interfered with the build process. The system python was pulled by "software-properties-common" and "gdb" - we do not need to use either of them during the build process, and user in In order to avoid this, we now make sure that we do not install any debian dependencies that pull any of the python3 system packages, before we install Python from sources. We also reviewed and cleaned the way how we configure list of packages installed during image building: * packages are reviewed and unnecessary ones removed * packages that are pulling python3 system packages are moved to additional apt dev dependencies * additional apt dev dependencies are installed AFTER python is installed * the packages listed in scripts are now one-per-line * they are alphabetically sorted * finally LD_LIBRARY_PATH is set to have the "source-installed" python libraries to be always prioritised against the system libraries if system libraries are installed to the main "/usr/lib" directories.
2025-09-01 23:16:20 +02:00
install_additional_dev_dependencies
Migrated prod builds to use python built from source (#53770) Moved the prod build to also use the src built python as done by CI. This let's us iterate faster on our python versions without needing to wait for the community image of python. This involves a number of changes: * 3.0.5 -> 3.0.6 airflow version change * use released official packages rather than git repo to install Python * added wget (it was added in the original image to pull python packages so added for compatibility) added those flags to python build (same as in the * original build) * --with-ensurepip --build="$gnuArch" * --enable-loadable-sqlite-extension * --enable-option-checking=fatal * --enable-shared * --with-lto * added cleanup of apt after installing packages * added removal of .pyc/.test etc. files (saves 350 MB) * added relinking of symbolic links from /usr/python/bin to /usr/local/bin in the "main" image as well as in the build image. * we do not need AIRFLOW_SETUPTOOLS_VERSION any more - this was only added to upgrade setuptools, because the Python official image had a very old setuptools version. * checked all "customize" scripts and make them "work" * Updated the 'version upgrade" script to upgrade AIRFLOW_PYTHON_VERSION everywhere * Updated Changelog and documentation to update new ways of building images * removed installation with GitHub URL (it won't work easily after splitting to multiple packages - not easily at least) and it's not needed * all python, pip and similar links are created in /usr/python/bin * /usr/python/bin is always first in the PATH - before /usr/local/bin * added changelog entry explaining that Python's installation home has been moved to /usr/python/ from /usr/local * removal of installed editable distributions in breeze happens now first and THEN we install when --use-airflow-version is used. * LD_LIBRARY_PATH was not set so the shared python libraries could not be loaded when venv was created Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
2025-08-30 20:18:30 +05:30
if [[ "${INSTALLATION_TYPE}" == "CI" ]]; then
install_golang
fi
install_docker_cli
Migrated prod builds to use python built from source (#53770) Moved the prod build to also use the src built python as done by CI. This let's us iterate faster on our python versions without needing to wait for the community image of python. This involves a number of changes: * 3.0.5 -> 3.0.6 airflow version change * use released official packages rather than git repo to install Python * added wget (it was added in the original image to pull python packages so added for compatibility) added those flags to python build (same as in the * original build) * --with-ensurepip --build="$gnuArch" * --enable-loadable-sqlite-extension * --enable-option-checking=fatal * --enable-shared * --with-lto * added cleanup of apt after installing packages * added removal of .pyc/.test etc. files (saves 350 MB) * added relinking of symbolic links from /usr/python/bin to /usr/local/bin in the "main" image as well as in the build image. * we do not need AIRFLOW_SETUPTOOLS_VERSION any more - this was only added to upgrade setuptools, because the Python official image had a very old setuptools version. * checked all "customize" scripts and make them "work" * Updated the 'version upgrade" script to upgrade AIRFLOW_PYTHON_VERSION everywhere * Updated Changelog and documentation to update new ways of building images * removed installation with GitHub URL (it won't work easily after splitting to multiple packages - not easily at least) and it's not needed * all python, pip and similar links are created in /usr/python/bin * /usr/python/bin is always first in the PATH - before /usr/local/bin * added changelog entry explaining that Python's installation home has been moved to /usr/python/ from /usr/local * removal of installed editable distributions in breeze happens now first and THEN we install when --use-airflow-version is used. * LD_LIBRARY_PATH was not set so the shared python libraries could not be loaded when venv was created Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
2025-08-30 20:18:30 +05:30
apt_clean
fi
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
EOF
# The content below is automatically copied from scripts/docker/install_mysql.sh
COPY <<"EOF" /install_mysql.sh
#!/usr/bin/env bash
. "$( dirname "${BASH_SOURCE[0]}" )/common.sh"
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
set -euo pipefail
common::get_colors
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
declare -a packages
readonly MARIADB_LTS_VERSION="10.11"
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
: "${INSTALL_MYSQL_CLIENT:?Should be true or false}"
Change default MySQL client to MariaDB (#36243) This PR is a response to pretty catastrophic issue caused by expiring key on MySQL repository on 14th of December. Oracle does not follow the best practices for signing their packages (while all others do) and their packages and repositories are signed with a key with short expiry date. This basically puts an expiry on their repository, and anyone who releases images following best practices of installation, while keeping the repository after installing mysql libraries has to rebuild their past released images every 2 years or so. This is the last straw for our MySQL client installation problems (we had a few, especially for ARM images) and we decided that we will switch to MariaDB client libraries by default (while allowing our users to build custom images with MySQL libraries). The issue tracked in Airflow repository is: #36231 The issues in Oracle's MySQL repo: * https://bugs.mysql.com/bug.php?id=113427 * https://bugs.mysql.com/bug.php?id=113428 * https://bugs.mysql.com/bug.php?id=113432 This PR implements a number of changes: * MariaDB client is now default client used for both ARM and X86 * both pre-2023 and 2023 keys for MySQL are now added to be trusted when custom image with MySQL client is built * MySQL repository is removed after installing MySQL (to avoid repeating similar fiasco for MySQL users in 2025 * changelog added and instructions on how to build custom image with MySQL client * one of our test suites is converted to use "current" image not latest released image (that was bug in our CI). * test was added in `canary` and `release` builds in CI to also test build of custom image with MySQL client
2023-12-15 19:13:00 +01:00
: "${INSTALL_MYSQL_CLIENT_TYPE:-mariadb}"
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
if [[ "${INSTALL_MYSQL_CLIENT}" != "true" && "${INSTALL_MYSQL_CLIENT}" != "false" ]]; then
echo
echo "${COLOR_RED}INSTALL_MYSQL_CLIENT must be either true or false${COLOR_RESET}"
echo
exit 1
fi
if [[ "${INSTALL_MYSQL_CLIENT_TYPE}" != "mysql" && "${INSTALL_MYSQL_CLIENT_TYPE}" != "mariadb" ]]; then
echo
echo "${COLOR_RED}INSTALL_MYSQL_CLIENT_TYPE must be either mysql or mariadb${COLOR_RESET}"
echo
exit 1
fi
if [[ "${INSTALL_MYSQL_CLIENT_TYPE}" == "mysql" ]]; then
echo
echo "${COLOR_RED}The 'mysql' client type is not supported any more. Use 'mariadb' instead.${COLOR_RESET}"
echo
echo "The MySQL drivers are wrongly packaged and released by Oracle with an expiration date on their GPG keys,"
echo "which causes builds to fail after the expiration date. MariaDB client is protocol-compatible with MySQL client."
echo ""
echo "Every two years the MySQL packages fail and Oracle team is always surprised and struggling"
echo "with fixes and re-signing the packages which lasts few days"
echo "See https://bugs.mysql.com/bug.php?id=113432 for more details."
echo "As a community we are not able to support this broken packaging practice from Oracle"
echo "Feel free however to install MySQL drivers on your own as extension of the image."
echo
exit 1
fi
retry() {
local retries=3
local count=0
# adding delay of 10 seconds
local delay=10
until "$@"; do
exit_code=$?
count=$((count + 1))
if [[ $count -lt $retries ]]; then
echo "Command failed. Attempt $count/$retries. Retrying in ${delay}s..."
sleep $delay
else
echo "Command failed after $retries attempts."
return $exit_code
fi
done
}
install_mariadb_client() {
# List of compatible package Oracle MySQL -> MariaDB:
# `mysql-client` -> `mariadb-client` or `mariadb-client-compat` (11+)
# `libmysqlclientXX` (where XX is a number) -> `libmariadb3-compat`
# `libmysqlclient-dev` -> `libmariadb-dev-compat`
#
# Different naming against Debian repo which we used before
# that some of packages might contains `-compat` suffix, Debian repo -> MariaDB repo:
# `libmariadb-dev` -> `libmariadb-dev-compat`
# `mariadb-client-core` -> `mariadb-client` or `mariadb-client-compat` (11+)
if [[ "${1}" == "dev" ]]; then
packages=("libmariadb-dev-compat" "mariadb-client")
elif [[ "${1}" == "prod" ]]; then
packages=("libmariadb3-compat" "mariadb-client")
else
echo
echo "${COLOR_RED}Specify either prod or dev${COLOR_RESET}"
echo
exit 1
fi
common::import_trusted_gpg "0xF1656F24C74CD1D8" "mariadb"
echo
echo "${COLOR_BLUE}Installing MariaDB client version ${MARIADB_LTS_VERSION}: ${1}${COLOR_RESET}"
echo "${COLOR_YELLOW}MariaDB client protocol-compatible with MySQL client.${COLOR_RESET}"
echo
echo "deb [arch=amd64,arm64] https://archive.mariadb.org/mariadb-${MARIADB_LTS_VERSION}/repo/debian/ $(lsb_release -cs) main" > \
/etc/apt/sources.list.d/mariadb.list
# Make sure that dependencies from MariaDB repo are preferred over Debian dependencies
printf "Package: *\nPin: release o=MariaDB\nPin-Priority: 999\n" > /etc/apt/preferences.d/mariadb
retry apt-get update
retry apt-get install --no-install-recommends -y "${packages[@]}"
apt-get autoremove -yqq --purge
apt-get clean && rm -rf /var/lib/apt/lists/*
}
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
if [[ ${INSTALL_MYSQL_CLIENT:="true"} == "true" ]]; then
install_mariadb_client "${@}"
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
fi
EOF
# The content below is automatically copied from scripts/docker/install_mssql.sh
COPY <<"EOF" /install_mssql.sh
#!/usr/bin/env bash
. "$( dirname "${BASH_SOURCE[0]}" )/common.sh"
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
set -euo pipefail
common::get_colors
declare -a packages
Fix ARM image building after Cython 3.0.0 release (#32748) Workaround an issue with installing pymssql on ARM architecture triggered by Cython 3.0.0 release as of 18 July 2023. The problem is that pip uses latest Cython to compile pymssql and since we are using setuptools, there is no easy way to fix version of Cython used to compile packages. This triggers a problem with newer `pip` versions that have build isolation enabled by default because There is no (easy) way to pin build dependencies for dependent packages. If a package does not have limit on build dependencies, it will use the latest version of them to build that particular package. The workaround to the problem suggest in the last thread by Pradyun Gedam - pip maintainer - is to use PIP_CONSTRAINT environment variable and constraint the version of Cython used while installing the package. Which is precisely what we are doing here. Note that it does not work if we pass ``--constraint`` option to pip because it will not be passed to the package being build in isolation. The fact that the PIP_CONSTRAINT env variable works in the isolation is a bit of side-effect on how env variables work and that they are passed to subprocesses as pip launches a subprocess `pip` to build the package. This is a temporary solution until the issue is resolved in pymssql or Cython. Issues/discussions that track it: * https://github.com/cython/cython/issues/5541 * https://github.com/pymssql/pymssql/pull/827 * https://discuss.python.org/t/no-way-to-pin-build-dependencies/29833 Since we have to change Dockerfile around installing `pip`, also version of `pip` has been upgraded to latest - 23.2
2023-07-21 19:27:51 +02:00
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
: "${INSTALL_MSSQL_CLIENT:?Should be true or false}"
function install_mssql_client() {
# Install MsSQL client from Microsoft repositories
if [[ ${INSTALL_MSSQL_CLIENT:="true"} != "true" ]]; then
echo
echo "${COLOR_BLUE}Skip installing mssql client${COLOR_RESET}"
echo
return
fi
packages=("msodbcsql18")
common::import_trusted_gpg "EB3E94ADBE1229CF" "microsoft"
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
echo
echo "${COLOR_BLUE}Installing mssql client${COLOR_RESET}"
echo
echo "deb [arch=amd64,arm64] https://packages.microsoft.com/debian/$(lsb_release -rs)/prod $(lsb_release -cs) main" > \
/etc/apt/sources.list.d/mssql-release.list &&
mkdir -p /opt/microsoft/msodbcsql18 &&
touch /opt/microsoft/msodbcsql18/ACCEPT_EULA &&
apt-get update -yqq &&
apt-get upgrade -yqq &&
apt-get -yqq install --no-install-recommends "${packages[@]}" &&
apt-get autoremove -yqq --purge &&
apt-get clean &&
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
rm -rf /var/lib/apt/lists/*
}
install_mssql_client "${@}"
EOF
# The content below is automatically copied from scripts/docker/install_postgres.sh
COPY <<"EOF" /install_postgres.sh
#!/usr/bin/env bash
. "$( dirname "${BASH_SOURCE[0]}" )/common.sh"
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
set -euo pipefail
common::get_colors
declare -a packages
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
: "${INSTALL_POSTGRES_CLIENT:?Should be true or false}"
install_postgres_client() {
echo
echo "${COLOR_BLUE}Installing postgres client${COLOR_RESET}"
echo
if [[ "${1}" == "dev" ]]; then
packages=("libpq-dev" "postgresql-client")
elif [[ "${1}" == "prod" ]]; then
packages=("postgresql-client")
else
echo
echo "Specify either prod or dev"
echo
exit 1
fi
common::import_trusted_gpg "7FCC7D46ACCC4CF8" "postgres"
echo "deb [arch=amd64,arm64] https://apt.postgresql.org/pub/repos/apt/ $(lsb_release -cs)-pgdg main" > \
/etc/apt/sources.list.d/pgdg.list
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
apt-get update
apt-get install --no-install-recommends -y "${packages[@]}"
apt-get autoremove -yqq --purge
apt-get clean && rm -rf /var/lib/apt/lists/*
}
if [[ ${INSTALL_POSTGRES_CLIENT:="true"} == "true" ]]; then
install_postgres_client "${@}"
fi
EOF
# The content below is automatically copied from scripts/docker/install_packaging_tools.sh
COPY <<"EOF" /install_packaging_tools.sh
#!/usr/bin/env bash
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
. "$( dirname "${BASH_SOURCE[0]}" )/common.sh"
common::get_colors
common::get_packaging_tool
common::show_packaging_tool_version_and_location
common::install_packaging_tools
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
EOF
# The content below is automatically copied from scripts/docker/common.sh
COPY <<"EOF" /common.sh
#!/usr/bin/env bash
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
set -euo pipefail
function common::get_colors() {
COLOR_BLUE=$'\e[34m'
COLOR_GREEN=$'\e[32m'
COLOR_RED=$'\e[31m'
COLOR_RESET=$'\e[0m'
COLOR_YELLOW=$'\e[33m'
export COLOR_BLUE
export COLOR_GREEN
export COLOR_RED
export COLOR_RESET
export COLOR_YELLOW
}
function common::get_packaging_tool() {
: "${AIRFLOW_USE_UV:?Should be set}"
## IMPORTANT: IF YOU MODIFY THIS FUNCTION YOU SHOULD ALSO MODIFY CORRESPONDING FUNCTION IN
## `scripts/in_container/_in_container_utils.sh`
if [[ ${AIRFLOW_USE_UV} == "true" ]]; then
echo
echo "${COLOR_BLUE}Using 'uv' to install Airflow${COLOR_RESET}"
echo
export PACKAGING_TOOL="uv"
export PACKAGING_TOOL_CMD="uv pip"
# --no-binary is needed in order to avoid libxml and xmlsec using different version of libxml2
# (binary lxml embeds its own libxml2, while xmlsec uses system one).
# See https://bugs.launchpad.net/lxml/+bug/2110068
if [[ ${AIRFLOW_INSTALLATION_METHOD=} == "." && -f "./pyproject.toml" ]]; then
# for uv only install dev group when we install from sources
export EXTRA_INSTALL_FLAGS="--group=dev --no-binary lxml --no-binary xmlsec"
else
export EXTRA_INSTALL_FLAGS="--no-binary lxml --no-binary xmlsec"
fi
Move tests_common package to devel-common project (#47281) This PR moves all commo tests to a "devel-common" sub-project in Airlfow repo. This project is now part of the airflow's workspace, which means that it is installed by default when `uv sync` is run. The following changes have been implemented: * removed pytest options from providers and moved to common pyproject.toml * devel-common now keeps all the test dependencies that are used by other packages - they are automatically installed when `uv sync` is run. We do not need devel-tests extra any more * providers_src_folder fixture was effectively not used and replaced by __file__ deriving the source of where the appropriate package is imported from. * pytest init options are removed from provider's pyproject.toml because they overwrote the options defined in the main pyproject.toml. Instead --ignore-glob command line is passed to addopts * devel dependencies from task_sdk are removed and devel-common package is added as required dependency as well as standard and common.sql package - both needed to run task_sdk tests. This allows to run tests in task_sdk tests from withing task_sdk package. * all dev dependency group settings for all providers contain the dependent sources thaat allow to treat each provider separately. * all devel and bundle dependencies and deprecated dependencies have been removed complely - the devel dependencies have been incorporated into "devel-common" distribution - so you can install all development dependencies (except docs) by installing `devel-common`. Doc deps will be later extracted to a separate distriubution.
2025-03-05 23:04:00 +01:00
export EXTRA_UNINSTALL_FLAGS=""
Simplify tooling by switching completely to uv (#48223) The lazy consensus decision has been made at the devlist to switch entirely to `uv` as development tool: link: https://lists.apache.org/thread/6xxdon9lmjx3xh8zw09xc5k9jxb2n256 This PR implements that decision and removes a lot of baggage connected to using `pip` additionally to uv to install and sync the environment. It also introduces more consistency in the way how distribution packages are used in airflow sources - basicaly switching all internal distributions to use `pyproject.toml` approach and linking them all together via `uv`'s workspace feature. This enables much more streamlined development workflows, where any part of airflow development is manageable using `uv sync` in the right distribution - opening the way to moving more of the "sub-worfklows" from the CI image to local virtualenv environment. Unfortunately, such change cannot be done incrementally, really, because any change in the project layout drags with itself a lot of changes in the test/CI/management scripts, so we have to implement one big PR covering the move. This PR is "safe" in terms of the airflow and provider's code - it does not **really** (except occasional imports and type hint changes resulting from better isolation of packages) change Airflow code nor it should not affect any airflow or provider code, because it does not move any of the folder where airflow or provider's code is modified. It does move the test code - in a number of "auxiliary" distributions we have. It also moves the `docs` generation code to `devel-common` and introduces separate conf.py files for every doc package. What is still NOT done after that move and will be covered in the follow-up changes: * isolating docs-building to have separate configuraiton for docs building per distribution - allowing to run doc build locally with it's own conf.py file * moving some of the tests and checks out from breeze container image up to the local environment (for example mypy checks) and likely isolating them per-provider * Constraints are still generated using `pip freeze` and automatically managed by our custom scripts in `canary` builds - this will be replaced later by switching to `uv.lock` mechanism. * potentially, we could merge `devel-common` and `dev` - to be considered as a follow-up. * PROD image is stil build with `pip` by default when using `PyPI` or distribution packages - but we do not support building the source image with `pip` - when building from sources, uv is forced internally to install packages. Currently we have no plans to change default PROD building to use `uv`. This is the detailed list of changes implemented in this PR: * uv is now mandatory to install as pre-requisite in order to develop airflow. We do not support installing airflow for development with `pip` - there will be a lot of cases where it will not work for development - including development dependencies and installing several distributions together. * removed meta-package `hatch_build.py' and replacing it with pre-commit automatically modifying declarative pyproject.toml * stripped down `hatch_build_airflow_core.py` to only cover custom git and asset build hooks (and renaming the file to `hatch_build.py` and moving all airflow dependencies to `pyproject.toml` * converted "loose" packages in airflow repo into distributions: * docker-tests * kubernetes-tests * helm-tests * dev (here we do not have `src` subfolder - sources are directly in the distribution, which is for-now inconsistent with other distributions). The names of the `_tests` distribution folders have been renamed to the `-tests` convention to make sure the imports are always referring to base of each distribution and are not used from the content root. * Each eof the distributions (on top of already existing airflow-core, task-sdk, devel-common and 90+providers has it's own set of dependencies, and the top-level meta-package workspace root brings those distributions together allowing to install them all tegether with a simple `uv sync --all-packages` command and come up with consistent set of dependencies that are good for all those packages (yay!). This is used to build CI image with single common environment to run the tests (with some quirks due to constraints use where we have to manually list all distributions until we switch to `uv.lock` mechanism) * `doc` code is moved to `devel-common` distribution. The `doc` folder only keeps README informing where the other doc code is, the spelling_wordlist.txt and start_docs_server.sh. The documentation is generated in `generated/generated-docs/` folder which is entirely .gitignored. * the documentation is now fully moved to: * `airflow-core/docs` - documentation for Airflow Core * `providers/**/docs` - documentation for Providers * `chart/docs` - documentation for Helm Chart * `task-sdk/docs` - documentation for Task SDK (new format not yet published) * `docker-stack-docs` - documentation for Docker Stack' * `providers-summary-docs` - documentation for provider summary page * `versions` are not dynamically retrieved from `__init__.py` all of them are synchronized directly to pyproject.toml files - this way - except the custom build hook - we have no dynamic components in our `pyproject.toml` properties. * references to extras were removed from INSTALL and other places, the only references to extras remains in the user documentation - we stop using extras for local development, we switch to using dependency groups. * backtracking command was removed from breeze - we did not need it since we started using `uv` * internal commands (except constraint generation) have been moved to `uv` from `pip` * breeze requires `uv` to be installed and expects to be installed by `uv tool install -e ./dev/breeze` * pyproject.tomls are dynamically modified when we add a version suffix dynamically (`--version-suffix-for-pypi`) - only for the time of building the versions with updated suffix * `mypy` checks are now consistently used across all the different distributions and for consistency (and to fix some of the issues with namespace packages) rather than using "folder" approach when running mypy checks, even if we run mypy for whole distribution, we run check on individual files rather than on a folder. That adds consistency in execution of mypy heursistics. Rather than using in-container mypy script all the logic of selection and parameters passed to mypy are in pre-commit code. For now we are still using CI image to run mypy because mypy is very sensitive to version of dependencies installed, we should be able to switch to running mypy locally once we have the `uv.lock` mechanism incorporated in our workflows. * lower bounds for dependencies have been set consistently across all the distributions. With `uv sync` and dependabot, those should be generally kept consistently for the future * the `devel-common` dependencies have been groupped together in `devel-common` extras - including `basic`, `doc`, `doc-gen`, and `all` which will make it easier to install them for some OS-es (basic is used as default set of dependencies to cover most common set of development dependencies to be used for development) * generated/provider_dependencies.json are not committed to the repository any longer. They are .gitignored and geberated on-the-flight as needed (breeze will generate them automatically when empty and pre-commit will always regenerate them to be consistent with provider's pyproject.toml files. * `chart-utils` have been noved to `helm-tests` from `devel-common` as they were only used there. * for k8s tests we are using the `uv` main `.venv` environment rather than creating our own `.build` environment and we use `uv sync` to keep it in sync * Updated `uv` version to 0.6.10 * We are using `uv sync` to perform "upgrade to newer depencies" in `canary` builds and locally * leveldb has been turned into "dependency group" and removed from apache-airflow and apache-airflow-core extras, it is now only available by google provider's leveldb optional extra to install with `pip`
2025-04-02 13:11:13 +02:00
export UPGRADE_TO_HIGHEST_RESOLUTION="--upgrade --resolution highest"
export UPGRADE_IF_NEEDED="--upgrade"
UV_CONCURRENT_DOWNLOADS=$(nproc --all)
export UV_CONCURRENT_DOWNLOADS
if [[ ${INCLUDE_PRE_RELEASE=} == "true" ]]; then
Replace chicken-egg providers with automated use of unreleased packages (#49799) * Replace chicken-egg providers with automated use of unreleased packages When we got rid of the .dev0 suffix, it is now possible to entirely rely on building the packages locally using exsting mechanisms, that check if packages have been already released - for CI builds, and can rely on the fact that we need at least pre-release version of packages if we are building pre-release version of airflow. It works as follows: * for CI builds (generate constraints and PROD image builds) - we are alwasys attempt to build ALL provider packages, but without --skip-tag-check - which means that if provider has been already released and it's version did not change in main, we are not going to build it locally and we will use it from PyPI. However if provider version is updated and the provider has not yet been released (checked by tag) - it will be build locally from sources and it will be used for constraint generation. * for release PROD images build, on the other hand we NEVER build packages locally - we always rely on PyPI released packages, however if we are building pre-release version of airflow, we automatically add --pre flag that looks for pre-release packages in PyPI - this way pre-release version of airflow can be built with pre-release version of providers. We are still attempting to use constraints for that, however first - so unless there are no limits in apache airflow that prevent it from using released versions of providers, the constraint versions will be used - only if it fails, PROD images will fall back to non-constraint installation that will allow to use freely pre-release versions of packages from PyPI. This means for example that if we cherry-pick a change from main that increases minimum version of provider for apache-airflow to one that does not even have a pre-release version, building of rc version image for airflow will fail (which is a good thing). Lack of --pre flag for "release" version of Airlfow also means that if airlfow has a min version of provider that has no "released" version yet (only rc) - it will also fail (which is also a good thing) * Update scripts/in_container/run_generate_constraints.py
2025-04-26 15:23:21 +02:00
EXTRA_INSTALL_FLAGS="${EXTRA_INSTALL_FLAGS} --prerelease if-necessary"
fi
else
echo
echo "${COLOR_BLUE}Using 'pip' to install Airflow${COLOR_RESET}"
echo
export PACKAGING_TOOL="pip"
export PACKAGING_TOOL_CMD="pip"
# --no-binary is needed in order to avoid libxml and xmlsec using different version of libxml2
# (binary lxml embeds its own libxml2, while xmlsec uses system one).
# See https://bugs.launchpad.net/lxml/+bug/2110068
export EXTRA_INSTALL_FLAGS="--root-user-action ignore --no-binary lxml,xmlsec"
export EXTRA_UNINSTALL_FLAGS="--yes"
Simplify tooling by switching completely to uv (#48223) The lazy consensus decision has been made at the devlist to switch entirely to `uv` as development tool: link: https://lists.apache.org/thread/6xxdon9lmjx3xh8zw09xc5k9jxb2n256 This PR implements that decision and removes a lot of baggage connected to using `pip` additionally to uv to install and sync the environment. It also introduces more consistency in the way how distribution packages are used in airflow sources - basicaly switching all internal distributions to use `pyproject.toml` approach and linking them all together via `uv`'s workspace feature. This enables much more streamlined development workflows, where any part of airflow development is manageable using `uv sync` in the right distribution - opening the way to moving more of the "sub-worfklows" from the CI image to local virtualenv environment. Unfortunately, such change cannot be done incrementally, really, because any change in the project layout drags with itself a lot of changes in the test/CI/management scripts, so we have to implement one big PR covering the move. This PR is "safe" in terms of the airflow and provider's code - it does not **really** (except occasional imports and type hint changes resulting from better isolation of packages) change Airflow code nor it should not affect any airflow or provider code, because it does not move any of the folder where airflow or provider's code is modified. It does move the test code - in a number of "auxiliary" distributions we have. It also moves the `docs` generation code to `devel-common` and introduces separate conf.py files for every doc package. What is still NOT done after that move and will be covered in the follow-up changes: * isolating docs-building to have separate configuraiton for docs building per distribution - allowing to run doc build locally with it's own conf.py file * moving some of the tests and checks out from breeze container image up to the local environment (for example mypy checks) and likely isolating them per-provider * Constraints are still generated using `pip freeze` and automatically managed by our custom scripts in `canary` builds - this will be replaced later by switching to `uv.lock` mechanism. * potentially, we could merge `devel-common` and `dev` - to be considered as a follow-up. * PROD image is stil build with `pip` by default when using `PyPI` or distribution packages - but we do not support building the source image with `pip` - when building from sources, uv is forced internally to install packages. Currently we have no plans to change default PROD building to use `uv`. This is the detailed list of changes implemented in this PR: * uv is now mandatory to install as pre-requisite in order to develop airflow. We do not support installing airflow for development with `pip` - there will be a lot of cases where it will not work for development - including development dependencies and installing several distributions together. * removed meta-package `hatch_build.py' and replacing it with pre-commit automatically modifying declarative pyproject.toml * stripped down `hatch_build_airflow_core.py` to only cover custom git and asset build hooks (and renaming the file to `hatch_build.py` and moving all airflow dependencies to `pyproject.toml` * converted "loose" packages in airflow repo into distributions: * docker-tests * kubernetes-tests * helm-tests * dev (here we do not have `src` subfolder - sources are directly in the distribution, which is for-now inconsistent with other distributions). The names of the `_tests` distribution folders have been renamed to the `-tests` convention to make sure the imports are always referring to base of each distribution and are not used from the content root. * Each eof the distributions (on top of already existing airflow-core, task-sdk, devel-common and 90+providers has it's own set of dependencies, and the top-level meta-package workspace root brings those distributions together allowing to install them all tegether with a simple `uv sync --all-packages` command and come up with consistent set of dependencies that are good for all those packages (yay!). This is used to build CI image with single common environment to run the tests (with some quirks due to constraints use where we have to manually list all distributions until we switch to `uv.lock` mechanism) * `doc` code is moved to `devel-common` distribution. The `doc` folder only keeps README informing where the other doc code is, the spelling_wordlist.txt and start_docs_server.sh. The documentation is generated in `generated/generated-docs/` folder which is entirely .gitignored. * the documentation is now fully moved to: * `airflow-core/docs` - documentation for Airflow Core * `providers/**/docs` - documentation for Providers * `chart/docs` - documentation for Helm Chart * `task-sdk/docs` - documentation for Task SDK (new format not yet published) * `docker-stack-docs` - documentation for Docker Stack' * `providers-summary-docs` - documentation for provider summary page * `versions` are not dynamically retrieved from `__init__.py` all of them are synchronized directly to pyproject.toml files - this way - except the custom build hook - we have no dynamic components in our `pyproject.toml` properties. * references to extras were removed from INSTALL and other places, the only references to extras remains in the user documentation - we stop using extras for local development, we switch to using dependency groups. * backtracking command was removed from breeze - we did not need it since we started using `uv` * internal commands (except constraint generation) have been moved to `uv` from `pip` * breeze requires `uv` to be installed and expects to be installed by `uv tool install -e ./dev/breeze` * pyproject.tomls are dynamically modified when we add a version suffix dynamically (`--version-suffix-for-pypi`) - only for the time of building the versions with updated suffix * `mypy` checks are now consistently used across all the different distributions and for consistency (and to fix some of the issues with namespace packages) rather than using "folder" approach when running mypy checks, even if we run mypy for whole distribution, we run check on individual files rather than on a folder. That adds consistency in execution of mypy heursistics. Rather than using in-container mypy script all the logic of selection and parameters passed to mypy are in pre-commit code. For now we are still using CI image to run mypy because mypy is very sensitive to version of dependencies installed, we should be able to switch to running mypy locally once we have the `uv.lock` mechanism incorporated in our workflows. * lower bounds for dependencies have been set consistently across all the distributions. With `uv sync` and dependabot, those should be generally kept consistently for the future * the `devel-common` dependencies have been groupped together in `devel-common` extras - including `basic`, `doc`, `doc-gen`, and `all` which will make it easier to install them for some OS-es (basic is used as default set of dependencies to cover most common set of development dependencies to be used for development) * generated/provider_dependencies.json are not committed to the repository any longer. They are .gitignored and geberated on-the-flight as needed (breeze will generate them automatically when empty and pre-commit will always regenerate them to be consistent with provider's pyproject.toml files. * `chart-utils` have been noved to `helm-tests` from `devel-common` as they were only used there. * for k8s tests we are using the `uv` main `.venv` environment rather than creating our own `.build` environment and we use `uv sync` to keep it in sync * Updated `uv` version to 0.6.10 * We are using `uv sync` to perform "upgrade to newer depencies" in `canary` builds and locally * leveldb has been turned into "dependency group" and removed from apache-airflow and apache-airflow-core extras, it is now only available by google provider's leveldb optional extra to install with `pip`
2025-04-02 13:11:13 +02:00
export UPGRADE_TO_HIGHEST_RESOLUTION="--upgrade --upgrade-strategy eager"
Switch from --user to venv for PROD image and enable uv (#37796) This PR introduces a joint way to treat the .local (--user) folder as both - venv and `--user` package installation. It fixes a number of problems the `--user` installation created us in the past and does it in fully backwards compatible way. This improves both "production" use for end user as well as local iteration on the PROD image during tests - but also for CI. Improvements for "end user": * user does not have to use `pip install --user` to install new packages any more and it is not enabled by default with PIP_USER flag. * users can use uv to install packages when they extend the image (but it's not obligatory - pip continues working as it did) * users can use `uv` to build custom production image, which gives 40%-50% saving for image build time compring to `pip`. * python -m venv --system-site-packages continues to use the .local packages from the .local installation (and not uses them if --system-site-packages is not used) - so we have full compatibility with previous images. Improvements for development: * when image is built from sources (no --use-docker-context-files are specified), airflow is installed in --editable mode, which means that airflow + all providers are installed locally from airflow sources, not from packages - which means that both airflow and providers have the latest version inside the prod image. * when local sources changes and you want to run k8s tests locally, it is now WAY faster (several minutes) to iterate with your changes because you do not have to rebuild the base image - the only thing needed is to copy sources to the PROD image to "/opt/airflow" which is where editable installlation is done from. You only need to rebuild the image if dependencies change. * By default `uv` is used for local source build for k8s tests so even if you have to rebuild it, it is way faster (60%-80%) during iterating with the image. CI/DEV tooling improvements: * this PR switches to use `uv` by default for most prod images we build in CI, but it adds a check if the image still builds with `pip`. * we also switch to more PEP standard way of installing packages from local filesystem (package-name @ file:///FILE) Fixes: #37785 Fixes: #37815 Update contributing-docs/testing/k8s_tests.rst Update contributing-docs/testing/k8s_tests.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update scripts/docker/install_airflow.sh Update docs/docker-stack/changelog.rst Update docs/docker-stack/build.rst Co-authored-by: Niko Oliveira <onikolas@amazon.com>
2024-03-06 01:27:15 +01:00
export UPGRADE_IF_NEEDED="--upgrade --upgrade-strategy only-if-needed"
if [[ ${INCLUDE_PRE_RELEASE=} == "true" ]]; then
EXTRA_INSTALL_FLAGS="${EXTRA_INSTALL_FLAGS} --pre"
fi
fi
}
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
function common::get_airflow_version_specification() {
if [[ -z ${AIRFLOW_VERSION_SPECIFICATION=}
&& -n ${AIRFLOW_VERSION}
&& ${AIRFLOW_INSTALLATION_METHOD} != "." ]]; then
AIRFLOW_VERSION_SPECIFICATION="==${AIRFLOW_VERSION}"
fi
}
function common::get_constraints_location() {
# When installing from sources without upgrade, generate constraints from uv.lock
Switch CI dependency management from constraints to uv.lock (#63609) * Switch CI dependency management from constraints to uv.lock closes: #54609 * Fix selective_checks tests for push events without upgrade Push events no longer trigger upgrade-to-newer-dependencies unless uv.lock or pyproject.toml files changed. Updated test expectations. * Fix remaining selective_checks tests for push events Update two more test cases that expected upgrade-to-newer-dependencies to be true for PUSH events. * Fix CI failures: include uv.lock in Docker context and handle missing constraints - Add uv.lock to .dockerignore allowlist so uv sync --frozen works in Docker builds - Make packaging install in install_from_docker_context_files.sh conditional on constraints.txt existing, since the uv.lock path skips constraints download * Fix static checks: update uv.lock and breeze docs after rebase * Use install script with uv.lock constraints for dev dependencies in CI Revert the entrypoint_ci.sh change from `uv sync --all-packages` back to using the install_development_dependencies.py script. The uv sync approach fails when provider source directories are not fully available in the container (e.g. with selected mounts). Instead, generate constraints from uv.lock via `uv export` and pass them to the existing script, which installs only the needed development dependencies via `uv pip install`. Also add uv.lock to VOLUMES_FOR_SELECTED_MOUNTS so it is available inside containers using the "tests and providers" mount mode.
2026-03-15 15:20:53 +01:00
if [[ ${AIRFLOW_INSTALLATION_METHOD=} == "." && -z "${UPGRADE_RANDOM_INDICATOR_STRING=}" ]]; then
echo
echo "${COLOR_BLUE}Installing from sources with uv.lock - generating constraints from uv.lock${COLOR_RESET}"
Switch CI dependency management from constraints to uv.lock (#63609) * Switch CI dependency management from constraints to uv.lock closes: #54609 * Fix selective_checks tests for push events without upgrade Push events no longer trigger upgrade-to-newer-dependencies unless uv.lock or pyproject.toml files changed. Updated test expectations. * Fix remaining selective_checks tests for push events Update two more test cases that expected upgrade-to-newer-dependencies to be true for PUSH events. * Fix CI failures: include uv.lock in Docker context and handle missing constraints - Add uv.lock to .dockerignore allowlist so uv sync --frozen works in Docker builds - Make packaging install in install_from_docker_context_files.sh conditional on constraints.txt existing, since the uv.lock path skips constraints download * Fix static checks: update uv.lock and breeze docs after rebase * Use install script with uv.lock constraints for dev dependencies in CI Revert the entrypoint_ci.sh change from `uv sync --all-packages` back to using the install_development_dependencies.py script. The uv sync approach fails when provider source directories are not fully available in the container (e.g. with selected mounts). Instead, generate constraints from uv.lock via `uv export` and pass them to the existing script, which installs only the needed development dependencies via `uv pip install`. Also add uv.lock to VOLUMES_FOR_SELECTED_MOUNTS so it is available inside containers using the "tests and providers" mount mode.
2026-03-15 15:20:53 +01:00
echo
uv export --frozen --no-hashes --no-emit-project --no-editable --no-header \
--no-annotate > "${HOME}/constraints.txt" 2>/dev/null || true
Switch CI dependency management from constraints to uv.lock (#63609) * Switch CI dependency management from constraints to uv.lock closes: #54609 * Fix selective_checks tests for push events without upgrade Push events no longer trigger upgrade-to-newer-dependencies unless uv.lock or pyproject.toml files changed. Updated test expectations. * Fix remaining selective_checks tests for push events Update two more test cases that expected upgrade-to-newer-dependencies to be true for PUSH events. * Fix CI failures: include uv.lock in Docker context and handle missing constraints - Add uv.lock to .dockerignore allowlist so uv sync --frozen works in Docker builds - Make packaging install in install_from_docker_context_files.sh conditional on constraints.txt existing, since the uv.lock path skips constraints download * Fix static checks: update uv.lock and breeze docs after rebase * Use install script with uv.lock constraints for dev dependencies in CI Revert the entrypoint_ci.sh change from `uv sync --all-packages` back to using the install_development_dependencies.py script. The uv sync approach fails when provider source directories are not fully available in the container (e.g. with selected mounts). Instead, generate constraints from uv.lock via `uv export` and pass them to the existing script, which installs only the needed development dependencies via `uv pip install`. Also add uv.lock to VOLUMES_FOR_SELECTED_MOUNTS so it is available inside containers using the "tests and providers" mount mode.
2026-03-15 15:20:53 +01:00
return
fi
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
# auto-detect Airflow-constraint reference and location
if [[ -z "${AIRFLOW_CONSTRAINTS_REFERENCE=}" ]]; then
if [[ ${AIRFLOW_VERSION} =~ v?2.* || ${AIRFLOW_VERSION} =~ v?3.* ]]; then
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
AIRFLOW_CONSTRAINTS_REFERENCE=constraints-${AIRFLOW_VERSION}
else
AIRFLOW_CONSTRAINTS_REFERENCE=${DEFAULT_CONSTRAINTS_BRANCH}
fi
fi
if [[ -z ${AIRFLOW_CONSTRAINTS_LOCATION=} ]]; then
local constraints_base="https://raw.githubusercontent.com/${CONSTRAINTS_GITHUB_REPOSITORY}/${AIRFLOW_CONSTRAINTS_REFERENCE}"
local python_version
python_version=$(python -c 'import sys; print(f"{sys.version_info.major}.{sys.version_info.minor}")')
AIRFLOW_CONSTRAINTS_LOCATION="${constraints_base}/${AIRFLOW_CONSTRAINTS_MODE}-${python_version}.txt"
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
fi
fIx constraints use in CI image after uv change (#37845) With the change to switch to uv, we skipped constraints being used in CI image - in effect all PR were not using constraints, but they were using not constraint dependencues but lowest-direct mode of installation so direct dependencies would not be upgraded in such case, only the transitive ones, so the risk of failure was anyhow small even if someone released a new, breakong dependency. The reason is that `uv` currently does not support installing constraints from URL. We had been silently failing back to the "no-constraints" way in such case (this is default mode if for any reason constraint build fail in such case. It introduced the risk that in case 3rd-party breaking dependency was released it would also start breaking regular PRs, not only the "canary" build. We fix it by downloading constraints locally when they are remote and using them from there. While this is being worked on in https://github.com/astral-sh/uv/pull/2081 and likely to land in uv 0.1.14, it's also a good idea to actually download the constraints and keep them around - this might be handy if you want to later use constraints to install "golden" set of dependencies wihtout necessity to build the right URL - you can always use `${HOME}/constraints.txt`. This PR fixes it and also changes the fallback mechanism to perform the lowest-direct upgrade only in case the constraint build fails, rather than always run the lowest-dirct upgrade even if constraints install works fine - this will make sure that most PRs are using exactly the constraint version of the dependencies (at least the version of constraints that were generated last time when pyproject.toml changed).
2024-03-02 11:23:58 +01:00
if [[ ${AIRFLOW_CONSTRAINTS_LOCATION} =~ http.* ]]; then
echo
echo "${COLOR_BLUE}Downloading constraints from ${AIRFLOW_CONSTRAINTS_LOCATION} to ${HOME}/constraints.txt ${COLOR_RESET}"
echo
Add Python 3.14 Support (#63520) * Update python version exclusion to 3.15 * Add 3.14 metadata version classifiers and related constants * Regenerate Breeze command help screenshots * Assorted workarounds to fix breeze image building - constraints are skipped entirely - greenlet pin updated * Exclude cassandra * Exclude amazon * Exclude google * CI: Only add pydantic extra to Airflow 2 migration tests Before this fix there were two separate issues in the migration-test setup for Python 3.14: 1. The migration workflow always passes --airflow-extras pydantic. 2. For Python 3.14, the minimum Airflow version is resolved to 3.2.0 by get_min_airflow_version_for_python.py, and apache-airflow[pydantic]==3.2.0 is not a valid thing to install. So when constraints installation fails, the fallback path tries to install an invalid spec. * Disable DB migration tests for python 3.14 * Enforce werkzeug 3.x for python 3.14 * Increase K8s executor test timeout for Python 3.14 Python 3.14 changed the default multiprocessing start method from 'fork' to 'forkserver' on Linux. The forkserver start method is slower because each new process must import modules from scratch rather than copying the parent's address space. This makes `multiprocessing.Manager()` initialization take longer, causing the test to exceed its 10s timeout. * Adapt LocalExecutor tests for Python 3.14 forkserver default Python 3.14 changed the default multiprocessing start method from 'fork' to 'forkserver' on Linux. Like 'spawn', 'forkserver' doesn't share the parent's address space, so mock patches applied in the test process are invisible to worker subprocesses. - Skip tests that mock across process boundaries on non-fork methods - Add test_executor_lazy_worker_spawning to verify that non-fork start methods defer worker creation and skip gc.freeze - Make test_multiple_team_executors_isolation and test_global_executor_without_team_name assert the correct worker count for each start method instead of assuming pre-spawning - Remove skip from test_clean_stop_on_signal (works on all methods) and increase timeout from 5s to 30s for forkserver overhead Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Bump dependencies to versions supporting 3.14 * Fix PROD image build failing on Python 3.14 due to excluded providers The PROD image build installed all provider wheels regardless of Python version compatibility. Providers like google and amazon that exclude Python 3.14 were still passed to pip, causing resolution failures (e.g. ray has no cp314 wheel on PyPI). Two fixes: - get_distribution_specs.py now reads each wheel's Requires-Python metadata and skips incompatible wheels instead of passing them to pip. - The requires-python specifier generation used !=3.14 which per PEP 440 only excludes 3.14.0, not 3.14.3. Changed to !=3.14.* wildcard. * Split core test types into 2 matrix groups to avoid OOM on Python 3.14 Non-DB core tests use xdist which runs all test types in a single pytest process. With 2059 items across 4 workers, memory accumulates until the OOM killer strikes at ~86% completion (exit code 137). Split core test types into 2 groups (API/Always/CLI and Core/Other/Serialization), similar to how provider tests already use _split_list with NUMBER_OF_LOW_DEP_SLICES. Each group gets ~1000 items, well under the ~1770 threshold where OOM occurs. Update selective_checks test expectations to reflect the 2-group split. * Gracefully handle an already removed password file in fixture The old code had a check-then-act race (if `os.path.exists` → `os.remove`), which fails when the file doesn't exist at removal time. `contextlib.suppress(FileNotFoundError)` handles this atomically — if the file is missing (never created in this xdist worker, or removed between check and delete), it's silently ignored. * Fix OOM and flaky tests in test_process_utils Replace multiprocessing.Process with subprocess.Popen running minimal inline scripts. multiprocessing.Process uses fork(), which duplicates the entire xdist worker memory. At 95% test completion the worker has accumulated hundreds of MBs; forking it triggers the OOM killer (exit code 137) on Python 3.14. subprocess.Popen starts a fresh lightweight process (~10MB) without copying the parent's memory, avoiding the OOM entirely. Also replace the racy ps -ax process counting in TestKillChildProcessesByPids with psutil.pid_exists() checks on the specific PID — the old approach was non-deterministic because unrelated processes could start/stop between measurements. * Add prek hook to validate python_version markers for excluded providers When a provider declares excluded-python-versions in provider.yaml, every dependency string referencing that provider in pyproject.toml must carry a matching python_version marker. Missing markers cause excluded providers to be silently installed as transitive dependencies (e.g. aiobotocore pulling in amazon on Python 3.14). The new check-excluded-provider-markers hook reads exclusions from provider.yaml and validates all dependency strings in pyproject.toml at commit time, preventing regressions like the one fixed in the previous commit. * Update `uv.lock` --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 23:03:46 +02:00
if ! curl -sSf -o "${HOME}/constraints.txt" "${AIRFLOW_CONSTRAINTS_LOCATION}"; then
echo
echo "${COLOR_YELLOW}Constraints file not found at ${AIRFLOW_CONSTRAINTS_LOCATION} (new Python version being bootstrapped?).${COLOR_RESET}"
echo "${COLOR_YELLOW}Falling back to no-constraints installation.${COLOR_RESET}"
echo
AIRFLOW_CONSTRAINTS_LOCATION=""
# Create an empty constraints file so --constraint flag still works
touch "${HOME}/constraints.txt"
fi
fIx constraints use in CI image after uv change (#37845) With the change to switch to uv, we skipped constraints being used in CI image - in effect all PR were not using constraints, but they were using not constraint dependencues but lowest-direct mode of installation so direct dependencies would not be upgraded in such case, only the transitive ones, so the risk of failure was anyhow small even if someone released a new, breakong dependency. The reason is that `uv` currently does not support installing constraints from URL. We had been silently failing back to the "no-constraints" way in such case (this is default mode if for any reason constraint build fail in such case. It introduced the risk that in case 3rd-party breaking dependency was released it would also start breaking regular PRs, not only the "canary" build. We fix it by downloading constraints locally when they are remote and using them from there. While this is being worked on in https://github.com/astral-sh/uv/pull/2081 and likely to land in uv 0.1.14, it's also a good idea to actually download the constraints and keep them around - this might be handy if you want to later use constraints to install "golden" set of dependencies wihtout necessity to build the right URL - you can always use `${HOME}/constraints.txt`. This PR fixes it and also changes the fallback mechanism to perform the lowest-direct upgrade only in case the constraint build fails, rather than always run the lowest-dirct upgrade even if constraints install works fine - this will make sure that most PRs are using exactly the constraint version of the dependencies (at least the version of constraints that were generated last time when pyproject.toml changed).
2024-03-02 11:23:58 +01:00
else
echo
echo "${COLOR_BLUE}Copying constraints from ${AIRFLOW_CONSTRAINTS_LOCATION} to ${HOME}/constraints.txt ${COLOR_RESET}"
echo
cp "${AIRFLOW_CONSTRAINTS_LOCATION}" "${HOME}/constraints.txt"
fi
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
}
function common::show_packaging_tool_version_and_location() {
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
echo "PATH=${PATH}"
echo "Installed pip: $(pip --version): $(which pip)"
if [[ ${PACKAGING_TOOL} == "pip" ]]; then
echo "${COLOR_BLUE}Using 'pip' to install Airflow${COLOR_RESET}"
else
echo "${COLOR_BLUE}Using 'uv' to install Airflow${COLOR_RESET}"
echo "Installed uv: $(uv --version 2>/dev/null || echo "Not installed yet"): $(which uv 2>/dev/null)"
fi
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
}
function common::install_packaging_tools() {
: "${AIRFLOW_USE_UV:?Should be set}"
Switch from --user to venv for PROD image and enable uv (#37796) This PR introduces a joint way to treat the .local (--user) folder as both - venv and `--user` package installation. It fixes a number of problems the `--user` installation created us in the past and does it in fully backwards compatible way. This improves both "production" use for end user as well as local iteration on the PROD image during tests - but also for CI. Improvements for "end user": * user does not have to use `pip install --user` to install new packages any more and it is not enabled by default with PIP_USER flag. * users can use uv to install packages when they extend the image (but it's not obligatory - pip continues working as it did) * users can use `uv` to build custom production image, which gives 40%-50% saving for image build time compring to `pip`. * python -m venv --system-site-packages continues to use the .local packages from the .local installation (and not uses them if --system-site-packages is not used) - so we have full compatibility with previous images. Improvements for development: * when image is built from sources (no --use-docker-context-files are specified), airflow is installed in --editable mode, which means that airflow + all providers are installed locally from airflow sources, not from packages - which means that both airflow and providers have the latest version inside the prod image. * when local sources changes and you want to run k8s tests locally, it is now WAY faster (several minutes) to iterate with your changes because you do not have to rebuild the base image - the only thing needed is to copy sources to the PROD image to "/opt/airflow" which is where editable installlation is done from. You only need to rebuild the image if dependencies change. * By default `uv` is used for local source build for k8s tests so even if you have to rebuild it, it is way faster (60%-80%) during iterating with the image. CI/DEV tooling improvements: * this PR switches to use `uv` by default for most prod images we build in CI, but it adds a check if the image still builds with `pip`. * we also switch to more PEP standard way of installing packages from local filesystem (package-name @ file:///FILE) Fixes: #37785 Fixes: #37815 Update contributing-docs/testing/k8s_tests.rst Update contributing-docs/testing/k8s_tests.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update scripts/docker/install_airflow.sh Update docs/docker-stack/changelog.rst Update docs/docker-stack/build.rst Co-authored-by: Niko Oliveira <onikolas@amazon.com>
2024-03-06 01:27:15 +01:00
if [[ "${VIRTUAL_ENV=}" != "" ]]; then
echo
echo "${COLOR_BLUE}Checking packaging tools in venv: ${VIRTUAL_ENV}${COLOR_RESET}"
echo
else
echo
echo "${COLOR_BLUE}Checking packaging tools for system Python installation: $(which python)${COLOR_RESET}"
echo
fi
if [[ ${AIRFLOW_PIP_VERSION=} == "" ]]; then
echo
echo "${COLOR_BLUE}Installing latest pip version${COLOR_RESET}"
echo
pip install --root-user-action ignore --disable-pip-version-check --upgrade pip
elif [[ ! ${AIRFLOW_PIP_VERSION} =~ ^[0-9].* ]]; then
echo
echo "${COLOR_BLUE}Installing pip version from spec ${AIRFLOW_PIP_VERSION}${COLOR_RESET}"
echo
# shellcheck disable=SC2086
pip install --root-user-action ignore --disable-pip-version-check "pip @ ${AIRFLOW_PIP_VERSION}"
else
local installed_pip_version
installed_pip_version=$(python -c 'from importlib.metadata import version; print(version("pip"))')
if [[ ${installed_pip_version} != "${AIRFLOW_PIP_VERSION}" ]]; then
echo
echo "${COLOR_BLUE}(Re)Installing pip version: ${AIRFLOW_PIP_VERSION}${COLOR_RESET}"
echo
pip install --root-user-action ignore --disable-pip-version-check "pip==${AIRFLOW_PIP_VERSION}"
fi
fi
if [[ ${AIRFLOW_UV_VERSION=} == "" ]]; then
echo
echo "${COLOR_BLUE}Installing latest uv version${COLOR_RESET}"
echo
pip install --root-user-action ignore --disable-pip-version-check --upgrade uv
elif [[ ! ${AIRFLOW_UV_VERSION} =~ ^[0-9].* ]]; then
echo
echo "${COLOR_BLUE}Installing uv version from spec ${AIRFLOW_UV_VERSION}${COLOR_RESET}"
echo
# shellcheck disable=SC2086
pip install --root-user-action ignore --disable-pip-version-check "uv @ ${AIRFLOW_UV_VERSION}"
else
local installed_uv_version
installed_uv_version=$(python -c 'from importlib.metadata import version; print(version("uv"))' 2>/dev/null || echo "Not installed yet")
if [[ ${installed_uv_version} != "${AIRFLOW_UV_VERSION}" ]]; then
echo
echo "${COLOR_BLUE}(Re)Installing uv version: ${AIRFLOW_UV_VERSION}${COLOR_RESET}"
echo
# shellcheck disable=SC2086
pip install --root-user-action ignore --disable-pip-version-check "uv==${AIRFLOW_UV_VERSION}"
fi
fi
Switch pre-commit to prek (#54258) The pre-commit is a fantastic tool, and we heavily used it for years, but generally the tool stagnated and is not showing a sign of adapting to our needs. For years we tried to convince pre-commit maintainers that things like autocomplete are necessary - but it met with pretty much resistance (if not hostility) from the maintainer. Also there was no chance for them to accept expectations of bigger projects like ours, where we have a huge monorepo and not only multiple needs but also different parts of the repo needing different language support (golang, typescript soon) - and apparenty the maintainer of pre-commit does not think monorepo is a good thing at all. Similarly - they did not recognize the raise of `uv` and the only way to use `uv` with pre-commit is to patch it by installing `pre-comit-uv` that essentialy patches pre-commit with uv support. This is not really sustainable and the tool lags behind many of our needs. Luckily - we have new project in town - prek - which rewrites pre-commit that is 100% compatible (now), 10x faster (because rust), uses `uv` natively, supports auto-complete already and they have very friendly maintainer who is not only supporting us but also very happily works on improving `prek` to close all the gaps, and plans to implement (with our support of course and cooperation) monorepo support - that will allow us to modularise our pre-commits. This PR switches our pre-commit support to use prek exclusively: * breeze static checks command is completely removed * custom auto-complete code in breeze as well * instructions are updated to setup prek instead of precommit * CI is updated to run prek instead of pre-commmit * documentation for static checks is reviewed and new features that prek enables are added
2025-08-17 09:00:14 +02:00
if [[ ${AIRFLOW_PREK_VERSION=} == "" ]]; then
echo
echo "${COLOR_BLUE}Installing latest prek, uv${COLOR_RESET}"
echo
Switch pre-commit to prek (#54258) The pre-commit is a fantastic tool, and we heavily used it for years, but generally the tool stagnated and is not showing a sign of adapting to our needs. For years we tried to convince pre-commit maintainers that things like autocomplete are necessary - but it met with pretty much resistance (if not hostility) from the maintainer. Also there was no chance for them to accept expectations of bigger projects like ours, where we have a huge monorepo and not only multiple needs but also different parts of the repo needing different language support (golang, typescript soon) - and apparenty the maintainer of pre-commit does not think monorepo is a good thing at all. Similarly - they did not recognize the raise of `uv` and the only way to use `uv` with pre-commit is to patch it by installing `pre-comit-uv` that essentialy patches pre-commit with uv support. This is not really sustainable and the tool lags behind many of our needs. Luckily - we have new project in town - prek - which rewrites pre-commit that is 100% compatible (now), 10x faster (because rust), uses `uv` natively, supports auto-complete already and they have very friendly maintainer who is not only supporting us but also very happily works on improving `prek` to close all the gaps, and plans to implement (with our support of course and cooperation) monorepo support - that will allow us to modularise our pre-commits. This PR switches our pre-commit support to use prek exclusively: * breeze static checks command is completely removed * custom auto-complete code in breeze as well * instructions are updated to setup prek instead of precommit * CI is updated to run prek instead of pre-commmit * documentation for static checks is reviewed and new features that prek enables are added
2025-08-17 09:00:14 +02:00
uv tool install prek --with uv
# make sure that the venv/user in .local exists
mkdir -p "${HOME}/.local/bin"
else
echo
Switch pre-commit to prek (#54258) The pre-commit is a fantastic tool, and we heavily used it for years, but generally the tool stagnated and is not showing a sign of adapting to our needs. For years we tried to convince pre-commit maintainers that things like autocomplete are necessary - but it met with pretty much resistance (if not hostility) from the maintainer. Also there was no chance for them to accept expectations of bigger projects like ours, where we have a huge monorepo and not only multiple needs but also different parts of the repo needing different language support (golang, typescript soon) - and apparenty the maintainer of pre-commit does not think monorepo is a good thing at all. Similarly - they did not recognize the raise of `uv` and the only way to use `uv` with pre-commit is to patch it by installing `pre-comit-uv` that essentialy patches pre-commit with uv support. This is not really sustainable and the tool lags behind many of our needs. Luckily - we have new project in town - prek - which rewrites pre-commit that is 100% compatible (now), 10x faster (because rust), uses `uv` natively, supports auto-complete already and they have very friendly maintainer who is not only supporting us but also very happily works on improving `prek` to close all the gaps, and plans to implement (with our support of course and cooperation) monorepo support - that will allow us to modularise our pre-commits. This PR switches our pre-commit support to use prek exclusively: * breeze static checks command is completely removed * custom auto-complete code in breeze as well * instructions are updated to setup prek instead of precommit * CI is updated to run prek instead of pre-commmit * documentation for static checks is reviewed and new features that prek enables are added
2025-08-17 09:00:14 +02:00
echo "${COLOR_BLUE}Installing predefined versions of prek, uv:${COLOR_RESET}"
echo "${COLOR_BLUE}prek(${AIRFLOW_PREK_VERSION}) uv(${AIRFLOW_UV_VERSION})${COLOR_RESET}"
echo
Switch pre-commit to prek (#54258) The pre-commit is a fantastic tool, and we heavily used it for years, but generally the tool stagnated and is not showing a sign of adapting to our needs. For years we tried to convince pre-commit maintainers that things like autocomplete are necessary - but it met with pretty much resistance (if not hostility) from the maintainer. Also there was no chance for them to accept expectations of bigger projects like ours, where we have a huge monorepo and not only multiple needs but also different parts of the repo needing different language support (golang, typescript soon) - and apparenty the maintainer of pre-commit does not think monorepo is a good thing at all. Similarly - they did not recognize the raise of `uv` and the only way to use `uv` with pre-commit is to patch it by installing `pre-comit-uv` that essentialy patches pre-commit with uv support. This is not really sustainable and the tool lags behind many of our needs. Luckily - we have new project in town - prek - which rewrites pre-commit that is 100% compatible (now), 10x faster (because rust), uses `uv` natively, supports auto-complete already and they have very friendly maintainer who is not only supporting us but also very happily works on improving `prek` to close all the gaps, and plans to implement (with our support of course and cooperation) monorepo support - that will allow us to modularise our pre-commits. This PR switches our pre-commit support to use prek exclusively: * breeze static checks command is completely removed * custom auto-complete code in breeze as well * instructions are updated to setup prek instead of precommit * CI is updated to run prek instead of pre-commmit * documentation for static checks is reviewed and new features that prek enables are added
2025-08-17 09:00:14 +02:00
uv tool install "prek==${AIRFLOW_PREK_VERSION}" --with "uv==${AIRFLOW_UV_VERSION}"
# make sure that the venv/user in .local exists
mkdir -p "${HOME}/.local/bin"
fi
}
function common::import_trusted_gpg() {
common::get_colors
local key=${1:?${COLOR_RED}First argument expects OpenPGP Key ID${COLOR_RESET}}
local name=${2:?${COLOR_RED}Second argument expected trust storage name${COLOR_RESET}}
# Please note that not all servers could be used for retrieve keys
# sks-keyservers.net: Unmaintained and DNS taken down due to GDPR requests.
# keys.openpgp.org: User ID Mandatory, not suitable for APT repositories
# keyring.debian.org: Only accept keys in Debian keyring.
# pgp.mit.edu: High response time.
local keyservers=(
"hkps://keyserver.ubuntu.com"
"hkps://pgp.surf.nl"
)
GNUPGHOME="$(mktemp -d)"
export GNUPGHOME
set +e
for keyserver in $(shuf -e "${keyservers[@]}"); do
echo "${COLOR_BLUE}Try to receive GPG public key ${key} from ${keyserver}${COLOR_RESET}"
gpg --keyserver "${keyserver}" --recv-keys "${key}" 2>&1 && break
echo "${COLOR_YELLOW}Unable to receive GPG public key ${key} from ${keyserver}${COLOR_RESET}"
done
set -e
gpg --export "${key}" > "/etc/apt/trusted.gpg.d/${name}.gpg"
gpgconf --kill all
rm -rf "${GNUPGHOME}"
unset GNUPGHOME
}
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
EOF
# The content below is automatically copied from scripts/docker/pip
COPY <<"EOF" /pip
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
#!/usr/bin/env bash
COLOR_RED=$'\e[31m'
COLOR_RESET=$'\e[0m'
COLOR_YELLOW=$'\e[33m'
if [[ $(id -u) == "0" ]]; then
echo
echo "${COLOR_RED}You are running pip as root. Please use 'airflow' user to run pip!${COLOR_RESET}"
echo
2025-06-22 11:57:12 +02:00
echo "${COLOR_YELLOW}See: https://airflow.apache.org/docs/docker-stack/build.html#adding-new-pypi-packages-individually${COLOR_RESET}"
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
echo
exit 1
fi
exec "${HOME}"/.local/bin/pip "${@}"
EOF
# The content below is automatically copied from scripts/docker/install_from_docker_context_files.sh
COPY <<"EOF" /install_from_docker_context_files.sh
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
. "$( dirname "${BASH_SOURCE[0]}" )/common.sh"
Switch from --user to venv for PROD image and enable uv (#37796) This PR introduces a joint way to treat the .local (--user) folder as both - venv and `--user` package installation. It fixes a number of problems the `--user` installation created us in the past and does it in fully backwards compatible way. This improves both "production" use for end user as well as local iteration on the PROD image during tests - but also for CI. Improvements for "end user": * user does not have to use `pip install --user` to install new packages any more and it is not enabled by default with PIP_USER flag. * users can use uv to install packages when they extend the image (but it's not obligatory - pip continues working as it did) * users can use `uv` to build custom production image, which gives 40%-50% saving for image build time compring to `pip`. * python -m venv --system-site-packages continues to use the .local packages from the .local installation (and not uses them if --system-site-packages is not used) - so we have full compatibility with previous images. Improvements for development: * when image is built from sources (no --use-docker-context-files are specified), airflow is installed in --editable mode, which means that airflow + all providers are installed locally from airflow sources, not from packages - which means that both airflow and providers have the latest version inside the prod image. * when local sources changes and you want to run k8s tests locally, it is now WAY faster (several minutes) to iterate with your changes because you do not have to rebuild the base image - the only thing needed is to copy sources to the PROD image to "/opt/airflow" which is where editable installlation is done from. You only need to rebuild the image if dependencies change. * By default `uv` is used for local source build for k8s tests so even if you have to rebuild it, it is way faster (60%-80%) during iterating with the image. CI/DEV tooling improvements: * this PR switches to use `uv` by default for most prod images we build in CI, but it adds a check if the image still builds with `pip`. * we also switch to more PEP standard way of installing packages from local filesystem (package-name @ file:///FILE) Fixes: #37785 Fixes: #37815 Update contributing-docs/testing/k8s_tests.rst Update contributing-docs/testing/k8s_tests.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update scripts/docker/install_airflow.sh Update docs/docker-stack/changelog.rst Update docs/docker-stack/build.rst Co-authored-by: Niko Oliveira <onikolas@amazon.com>
2024-03-06 01:27:15 +01:00
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
function install_airflow_and_providers_from_docker_context_files(){
local flags=()
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
if [[ ${INSTALL_MYSQL_CLIENT} != "true" ]]; then
AIRFLOW_EXTRAS=${AIRFLOW_EXTRAS/mysql,}
fi
if [[ ${INSTALL_POSTGRES_CLIENT} != "true" ]]; then
AIRFLOW_EXTRAS=${AIRFLOW_EXTRAS/postgres,}
fi
if [[ ! -d /docker-context-files ]]; then
echo
echo "${COLOR_RED}You must provide a folder via --build-arg DOCKER_CONTEXT_FILES=<FOLDER> and you missed it!${COLOR_RESET}"
echo
exit 1
fi
Move airflow sources to airflow-core package (#47798) This is continuation of the separation of the Airflow codebase into separate distributions. This one splits airflow into two of them: * apache-airflow - becomes an empty, meta no-code distribution that only has dependencies to apache-airflow-core and task-sdk distributions and it has preinstalled provider distributions added in standard "wheel" distribution. All "extras" lead either to "apache-airflow-core" extras or to providers - the dependencies and optional dependencies are calculated differently depending on "editable" or "standard" mode - in editable mode, just provider dependencies are installed for preinstalled providers in standard mode - those preinstalled providers are dependencies. * the apache-airflow-core distribution contains all airflow core sources (previously in apache-airflow) and it has no provider extras. Thanks to that apache-airflow distribution does not have any dynamically calculated dependencies. * the apache-airflow-core distribution hs "hatch_build_airflow_core.py" build hooks that add custom build target and implement custom cleanup in order to implement compiling assets as part of the build. * During the move, the following changes were applied for consistency: * packages when used in context of distribution packages have been renamed to "distributions" - including all documentations and commands in breeze to void confusion with import packages (see https://packaging.python.org/en/latest/discussions/distribution-package-vs-import-package/) * all tests in `airflow-core` follow now the same convention where tests are in `unit`, `system` and `integration` package. no extra package has been as second level, because all the provider tests have "<PROVIDER>" there, so we just have to avoid naming airflow unit."<PROVIDER>" with the same name as provider. * all tooling in CI/DEV have been updated to follow the new structure. We should always build to packages now when we are building them using `breeze`.
2025-03-21 14:25:26 +01:00
# This is needed to get distribution names for local context distributions
Switch CI dependency management from constraints to uv.lock (#63609) * Switch CI dependency management from constraints to uv.lock closes: #54609 * Fix selective_checks tests for push events without upgrade Push events no longer trigger upgrade-to-newer-dependencies unless uv.lock or pyproject.toml files changed. Updated test expectations. * Fix remaining selective_checks tests for push events Update two more test cases that expected upgrade-to-newer-dependencies to be true for PUSH events. * Fix CI failures: include uv.lock in Docker context and handle missing constraints - Add uv.lock to .dockerignore allowlist so uv sync --frozen works in Docker builds - Make packaging install in install_from_docker_context_files.sh conditional on constraints.txt existing, since the uv.lock path skips constraints download * Fix static checks: update uv.lock and breeze docs after rebase * Use install script with uv.lock constraints for dev dependencies in CI Revert the entrypoint_ci.sh change from `uv sync --all-packages` back to using the install_development_dependencies.py script. The uv sync approach fails when provider source directories are not fully available in the container (e.g. with selected mounts). Instead, generate constraints from uv.lock via `uv export` and pass them to the existing script, which installs only the needed development dependencies via `uv pip install`. Also add uv.lock to VOLUMES_FOR_SELECTED_MOUNTS so it is available inside containers using the "tests and providers" mount mode.
2026-03-15 15:20:53 +01:00
if [[ -f "${HOME}/constraints.txt" ]]; then
${PACKAGING_TOOL_CMD} install ${EXTRA_INSTALL_FLAGS} ${ADDITIONAL_PIP_INSTALL_FLAGS} --constraint ${HOME}/constraints.txt packaging
else
${PACKAGING_TOOL_CMD} install ${EXTRA_INSTALL_FLAGS} ${ADDITIONAL_PIP_INSTALL_FLAGS} packaging
fi
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
Switch from --user to venv for PROD image and enable uv (#37796) This PR introduces a joint way to treat the .local (--user) folder as both - venv and `--user` package installation. It fixes a number of problems the `--user` installation created us in the past and does it in fully backwards compatible way. This improves both "production" use for end user as well as local iteration on the PROD image during tests - but also for CI. Improvements for "end user": * user does not have to use `pip install --user` to install new packages any more and it is not enabled by default with PIP_USER flag. * users can use uv to install packages when they extend the image (but it's not obligatory - pip continues working as it did) * users can use `uv` to build custom production image, which gives 40%-50% saving for image build time compring to `pip`. * python -m venv --system-site-packages continues to use the .local packages from the .local installation (and not uses them if --system-site-packages is not used) - so we have full compatibility with previous images. Improvements for development: * when image is built from sources (no --use-docker-context-files are specified), airflow is installed in --editable mode, which means that airflow + all providers are installed locally from airflow sources, not from packages - which means that both airflow and providers have the latest version inside the prod image. * when local sources changes and you want to run k8s tests locally, it is now WAY faster (several minutes) to iterate with your changes because you do not have to rebuild the base image - the only thing needed is to copy sources to the PROD image to "/opt/airflow" which is where editable installlation is done from. You only need to rebuild the image if dependencies change. * By default `uv` is used for local source build for k8s tests so even if you have to rebuild it, it is way faster (60%-80%) during iterating with the image. CI/DEV tooling improvements: * this PR switches to use `uv` by default for most prod images we build in CI, but it adds a check if the image still builds with `pip`. * we also switch to more PEP standard way of installing packages from local filesystem (package-name @ file:///FILE) Fixes: #37785 Fixes: #37815 Update contributing-docs/testing/k8s_tests.rst Update contributing-docs/testing/k8s_tests.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update scripts/docker/install_airflow.sh Update docs/docker-stack/changelog.rst Update docs/docker-stack/build.rst Co-authored-by: Niko Oliveira <onikolas@amazon.com>
2024-03-06 01:27:15 +01:00
if [[ -n ${AIRFLOW_EXTRAS=} ]]; then
AIRFLOW_EXTRAS_TO_INSTALL="[${AIRFLOW_EXTRAS}]"
else
AIRFLOW_EXTRAS_TO_INSTALL=""
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
fi
Move airflow sources to airflow-core package (#47798) This is continuation of the separation of the Airflow codebase into separate distributions. This one splits airflow into two of them: * apache-airflow - becomes an empty, meta no-code distribution that only has dependencies to apache-airflow-core and task-sdk distributions and it has preinstalled provider distributions added in standard "wheel" distribution. All "extras" lead either to "apache-airflow-core" extras or to providers - the dependencies and optional dependencies are calculated differently depending on "editable" or "standard" mode - in editable mode, just provider dependencies are installed for preinstalled providers in standard mode - those preinstalled providers are dependencies. * the apache-airflow-core distribution contains all airflow core sources (previously in apache-airflow) and it has no provider extras. Thanks to that apache-airflow distribution does not have any dynamically calculated dependencies. * the apache-airflow-core distribution hs "hatch_build_airflow_core.py" build hooks that add custom build target and implement custom cleanup in order to implement compiling assets as part of the build. * During the move, the following changes were applied for consistency: * packages when used in context of distribution packages have been renamed to "distributions" - including all documentations and commands in breeze to void confusion with import packages (see https://packaging.python.org/en/latest/discussions/distribution-package-vs-import-package/) * all tests in `airflow-core` follow now the same convention where tests are in `unit`, `system` and `integration` package. no extra package has been as second level, because all the provider tests have "<PROVIDER>" there, so we just have to avoid naming airflow unit."<PROVIDER>" with the same name as provider. * all tooling in CI/DEV have been updated to follow the new structure. We should always build to packages now when we are building them using `breeze`.
2025-03-21 14:25:26 +01:00
# Find apache-airflow distribution in docker-context files
readarray -t install_airflow_distribution < <(EXTRAS="${AIRFLOW_EXTRAS_TO_INSTALL}" \
python /scripts/docker/get_distribution_specs.py /docker-context-files/apache?airflow?[0-9]*.{whl,tar.gz} 2>/dev/null || true)
Switch from --user to venv for PROD image and enable uv (#37796) This PR introduces a joint way to treat the .local (--user) folder as both - venv and `--user` package installation. It fixes a number of problems the `--user` installation created us in the past and does it in fully backwards compatible way. This improves both "production" use for end user as well as local iteration on the PROD image during tests - but also for CI. Improvements for "end user": * user does not have to use `pip install --user` to install new packages any more and it is not enabled by default with PIP_USER flag. * users can use uv to install packages when they extend the image (but it's not obligatory - pip continues working as it did) * users can use `uv` to build custom production image, which gives 40%-50% saving for image build time compring to `pip`. * python -m venv --system-site-packages continues to use the .local packages from the .local installation (and not uses them if --system-site-packages is not used) - so we have full compatibility with previous images. Improvements for development: * when image is built from sources (no --use-docker-context-files are specified), airflow is installed in --editable mode, which means that airflow + all providers are installed locally from airflow sources, not from packages - which means that both airflow and providers have the latest version inside the prod image. * when local sources changes and you want to run k8s tests locally, it is now WAY faster (several minutes) to iterate with your changes because you do not have to rebuild the base image - the only thing needed is to copy sources to the PROD image to "/opt/airflow" which is where editable installlation is done from. You only need to rebuild the image if dependencies change. * By default `uv` is used for local source build for k8s tests so even if you have to rebuild it, it is way faster (60%-80%) during iterating with the image. CI/DEV tooling improvements: * this PR switches to use `uv` by default for most prod images we build in CI, but it adds a check if the image still builds with `pip`. * we also switch to more PEP standard way of installing packages from local filesystem (package-name @ file:///FILE) Fixes: #37785 Fixes: #37815 Update contributing-docs/testing/k8s_tests.rst Update contributing-docs/testing/k8s_tests.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update scripts/docker/install_airflow.sh Update docs/docker-stack/changelog.rst Update docs/docker-stack/build.rst Co-authored-by: Niko Oliveira <onikolas@amazon.com>
2024-03-06 01:27:15 +01:00
echo
Move airflow sources to airflow-core package (#47798) This is continuation of the separation of the Airflow codebase into separate distributions. This one splits airflow into two of them: * apache-airflow - becomes an empty, meta no-code distribution that only has dependencies to apache-airflow-core and task-sdk distributions and it has preinstalled provider distributions added in standard "wheel" distribution. All "extras" lead either to "apache-airflow-core" extras or to providers - the dependencies and optional dependencies are calculated differently depending on "editable" or "standard" mode - in editable mode, just provider dependencies are installed for preinstalled providers in standard mode - those preinstalled providers are dependencies. * the apache-airflow-core distribution contains all airflow core sources (previously in apache-airflow) and it has no provider extras. Thanks to that apache-airflow distribution does not have any dynamically calculated dependencies. * the apache-airflow-core distribution hs "hatch_build_airflow_core.py" build hooks that add custom build target and implement custom cleanup in order to implement compiling assets as part of the build. * During the move, the following changes were applied for consistency: * packages when used in context of distribution packages have been renamed to "distributions" - including all documentations and commands in breeze to void confusion with import packages (see https://packaging.python.org/en/latest/discussions/distribution-package-vs-import-package/) * all tests in `airflow-core` follow now the same convention where tests are in `unit`, `system` and `integration` package. no extra package has been as second level, because all the provider tests have "<PROVIDER>" there, so we just have to avoid naming airflow unit."<PROVIDER>" with the same name as provider. * all tooling in CI/DEV have been updated to follow the new structure. We should always build to packages now when we are building them using `breeze`.
2025-03-21 14:25:26 +01:00
echo "${COLOR_BLUE}Found apache-airflow distributions in docker-context-files folder: ${install_airflow_distribution[*]}${COLOR_RESET}"
Switch from --user to venv for PROD image and enable uv (#37796) This PR introduces a joint way to treat the .local (--user) folder as both - venv and `--user` package installation. It fixes a number of problems the `--user` installation created us in the past and does it in fully backwards compatible way. This improves both "production" use for end user as well as local iteration on the PROD image during tests - but also for CI. Improvements for "end user": * user does not have to use `pip install --user` to install new packages any more and it is not enabled by default with PIP_USER flag. * users can use uv to install packages when they extend the image (but it's not obligatory - pip continues working as it did) * users can use `uv` to build custom production image, which gives 40%-50% saving for image build time compring to `pip`. * python -m venv --system-site-packages continues to use the .local packages from the .local installation (and not uses them if --system-site-packages is not used) - so we have full compatibility with previous images. Improvements for development: * when image is built from sources (no --use-docker-context-files are specified), airflow is installed in --editable mode, which means that airflow + all providers are installed locally from airflow sources, not from packages - which means that both airflow and providers have the latest version inside the prod image. * when local sources changes and you want to run k8s tests locally, it is now WAY faster (several minutes) to iterate with your changes because you do not have to rebuild the base image - the only thing needed is to copy sources to the PROD image to "/opt/airflow" which is where editable installlation is done from. You only need to rebuild the image if dependencies change. * By default `uv` is used for local source build for k8s tests so even if you have to rebuild it, it is way faster (60%-80%) during iterating with the image. CI/DEV tooling improvements: * this PR switches to use `uv` by default for most prod images we build in CI, but it adds a check if the image still builds with `pip`. * we also switch to more PEP standard way of installing packages from local filesystem (package-name @ file:///FILE) Fixes: #37785 Fixes: #37815 Update contributing-docs/testing/k8s_tests.rst Update contributing-docs/testing/k8s_tests.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update scripts/docker/install_airflow.sh Update docs/docker-stack/changelog.rst Update docs/docker-stack/build.rst Co-authored-by: Niko Oliveira <onikolas@amazon.com>
2024-03-06 01:27:15 +01:00
echo
Move airflow sources to airflow-core package (#47798) This is continuation of the separation of the Airflow codebase into separate distributions. This one splits airflow into two of them: * apache-airflow - becomes an empty, meta no-code distribution that only has dependencies to apache-airflow-core and task-sdk distributions and it has preinstalled provider distributions added in standard "wheel" distribution. All "extras" lead either to "apache-airflow-core" extras or to providers - the dependencies and optional dependencies are calculated differently depending on "editable" or "standard" mode - in editable mode, just provider dependencies are installed for preinstalled providers in standard mode - those preinstalled providers are dependencies. * the apache-airflow-core distribution contains all airflow core sources (previously in apache-airflow) and it has no provider extras. Thanks to that apache-airflow distribution does not have any dynamically calculated dependencies. * the apache-airflow-core distribution hs "hatch_build_airflow_core.py" build hooks that add custom build target and implement custom cleanup in order to implement compiling assets as part of the build. * During the move, the following changes were applied for consistency: * packages when used in context of distribution packages have been renamed to "distributions" - including all documentations and commands in breeze to void confusion with import packages (see https://packaging.python.org/en/latest/discussions/distribution-package-vs-import-package/) * all tests in `airflow-core` follow now the same convention where tests are in `unit`, `system` and `integration` package. no extra package has been as second level, because all the provider tests have "<PROVIDER>" there, so we just have to avoid naming airflow unit."<PROVIDER>" with the same name as provider. * all tooling in CI/DEV have been updated to follow the new structure. We should always build to packages now when we are building them using `breeze`.
2025-03-21 14:25:26 +01:00
if [[ -z "${install_airflow_distribution[*]}" && ${AIRFLOW_VERSION=} != "" ]]; then
# When we install only provider distributions from docker-context files, we need to still
# install airflow from PyPI when AIRFLOW_VERSION is set. This handles the case where
# pre-release dockerhub image of airflow is built, but we want to install some providers from
# docker-context files
Move airflow sources to airflow-core package (#47798) This is continuation of the separation of the Airflow codebase into separate distributions. This one splits airflow into two of them: * apache-airflow - becomes an empty, meta no-code distribution that only has dependencies to apache-airflow-core and task-sdk distributions and it has preinstalled provider distributions added in standard "wheel" distribution. All "extras" lead either to "apache-airflow-core" extras or to providers - the dependencies and optional dependencies are calculated differently depending on "editable" or "standard" mode - in editable mode, just provider dependencies are installed for preinstalled providers in standard mode - those preinstalled providers are dependencies. * the apache-airflow-core distribution contains all airflow core sources (previously in apache-airflow) and it has no provider extras. Thanks to that apache-airflow distribution does not have any dynamically calculated dependencies. * the apache-airflow-core distribution hs "hatch_build_airflow_core.py" build hooks that add custom build target and implement custom cleanup in order to implement compiling assets as part of the build. * During the move, the following changes were applied for consistency: * packages when used in context of distribution packages have been renamed to "distributions" - including all documentations and commands in breeze to void confusion with import packages (see https://packaging.python.org/en/latest/discussions/distribution-package-vs-import-package/) * all tests in `airflow-core` follow now the same convention where tests are in `unit`, `system` and `integration` package. no extra package has been as second level, because all the provider tests have "<PROVIDER>" there, so we just have to avoid naming airflow unit."<PROVIDER>" with the same name as provider. * all tooling in CI/DEV have been updated to follow the new structure. We should always build to packages now when we are building them using `breeze`.
2025-03-21 14:25:26 +01:00
install_airflow_distribution=("apache-airflow[${AIRFLOW_EXTRAS}]==${AIRFLOW_VERSION}")
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
fi
Move airflow sources to airflow-core package (#47798) This is continuation of the separation of the Airflow codebase into separate distributions. This one splits airflow into two of them: * apache-airflow - becomes an empty, meta no-code distribution that only has dependencies to apache-airflow-core and task-sdk distributions and it has preinstalled provider distributions added in standard "wheel" distribution. All "extras" lead either to "apache-airflow-core" extras or to providers - the dependencies and optional dependencies are calculated differently depending on "editable" or "standard" mode - in editable mode, just provider dependencies are installed for preinstalled providers in standard mode - those preinstalled providers are dependencies. * the apache-airflow-core distribution contains all airflow core sources (previously in apache-airflow) and it has no provider extras. Thanks to that apache-airflow distribution does not have any dynamically calculated dependencies. * the apache-airflow-core distribution hs "hatch_build_airflow_core.py" build hooks that add custom build target and implement custom cleanup in order to implement compiling assets as part of the build. * During the move, the following changes were applied for consistency: * packages when used in context of distribution packages have been renamed to "distributions" - including all documentations and commands in breeze to void confusion with import packages (see https://packaging.python.org/en/latest/discussions/distribution-package-vs-import-package/) * all tests in `airflow-core` follow now the same convention where tests are in `unit`, `system` and `integration` package. no extra package has been as second level, because all the provider tests have "<PROVIDER>" there, so we just have to avoid naming airflow unit."<PROVIDER>" with the same name as provider. * all tooling in CI/DEV have been updated to follow the new structure. We should always build to packages now when we are building them using `breeze`.
2025-03-21 14:25:26 +01:00
# Find apache-airflow-core distribution in docker-context files
readarray -t install_airflow_core_distribution < <(EXTRAS="" \
python /scripts/docker/get_distribution_specs.py /docker-context-files/apache?airflow?core?[0-9]*.{whl,tar.gz} 2>/dev/null || true)
Switch from --user to venv for PROD image and enable uv (#37796) This PR introduces a joint way to treat the .local (--user) folder as both - venv and `--user` package installation. It fixes a number of problems the `--user` installation created us in the past and does it in fully backwards compatible way. This improves both "production" use for end user as well as local iteration on the PROD image during tests - but also for CI. Improvements for "end user": * user does not have to use `pip install --user` to install new packages any more and it is not enabled by default with PIP_USER flag. * users can use uv to install packages when they extend the image (but it's not obligatory - pip continues working as it did) * users can use `uv` to build custom production image, which gives 40%-50% saving for image build time compring to `pip`. * python -m venv --system-site-packages continues to use the .local packages from the .local installation (and not uses them if --system-site-packages is not used) - so we have full compatibility with previous images. Improvements for development: * when image is built from sources (no --use-docker-context-files are specified), airflow is installed in --editable mode, which means that airflow + all providers are installed locally from airflow sources, not from packages - which means that both airflow and providers have the latest version inside the prod image. * when local sources changes and you want to run k8s tests locally, it is now WAY faster (several minutes) to iterate with your changes because you do not have to rebuild the base image - the only thing needed is to copy sources to the PROD image to "/opt/airflow" which is where editable installlation is done from. You only need to rebuild the image if dependencies change. * By default `uv` is used for local source build for k8s tests so even if you have to rebuild it, it is way faster (60%-80%) during iterating with the image. CI/DEV tooling improvements: * this PR switches to use `uv` by default for most prod images we build in CI, but it adds a check if the image still builds with `pip`. * we also switch to more PEP standard way of installing packages from local filesystem (package-name @ file:///FILE) Fixes: #37785 Fixes: #37815 Update contributing-docs/testing/k8s_tests.rst Update contributing-docs/testing/k8s_tests.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update scripts/docker/install_airflow.sh Update docs/docker-stack/changelog.rst Update docs/docker-stack/build.rst Co-authored-by: Niko Oliveira <onikolas@amazon.com>
2024-03-06 01:27:15 +01:00
echo
Move airflow sources to airflow-core package (#47798) This is continuation of the separation of the Airflow codebase into separate distributions. This one splits airflow into two of them: * apache-airflow - becomes an empty, meta no-code distribution that only has dependencies to apache-airflow-core and task-sdk distributions and it has preinstalled provider distributions added in standard "wheel" distribution. All "extras" lead either to "apache-airflow-core" extras or to providers - the dependencies and optional dependencies are calculated differently depending on "editable" or "standard" mode - in editable mode, just provider dependencies are installed for preinstalled providers in standard mode - those preinstalled providers are dependencies. * the apache-airflow-core distribution contains all airflow core sources (previously in apache-airflow) and it has no provider extras. Thanks to that apache-airflow distribution does not have any dynamically calculated dependencies. * the apache-airflow-core distribution hs "hatch_build_airflow_core.py" build hooks that add custom build target and implement custom cleanup in order to implement compiling assets as part of the build. * During the move, the following changes were applied for consistency: * packages when used in context of distribution packages have been renamed to "distributions" - including all documentations and commands in breeze to void confusion with import packages (see https://packaging.python.org/en/latest/discussions/distribution-package-vs-import-package/) * all tests in `airflow-core` follow now the same convention where tests are in `unit`, `system` and `integration` package. no extra package has been as second level, because all the provider tests have "<PROVIDER>" there, so we just have to avoid naming airflow unit."<PROVIDER>" with the same name as provider. * all tooling in CI/DEV have been updated to follow the new structure. We should always build to packages now when we are building them using `breeze`.
2025-03-21 14:25:26 +01:00
echo "${COLOR_BLUE}Found apache-airflow-core distributions in docker-context-files folder: ${install_airflow_core_distribution[*]}${COLOR_RESET}"
Switch from --user to venv for PROD image and enable uv (#37796) This PR introduces a joint way to treat the .local (--user) folder as both - venv and `--user` package installation. It fixes a number of problems the `--user` installation created us in the past and does it in fully backwards compatible way. This improves both "production" use for end user as well as local iteration on the PROD image during tests - but also for CI. Improvements for "end user": * user does not have to use `pip install --user` to install new packages any more and it is not enabled by default with PIP_USER flag. * users can use uv to install packages when they extend the image (but it's not obligatory - pip continues working as it did) * users can use `uv` to build custom production image, which gives 40%-50% saving for image build time compring to `pip`. * python -m venv --system-site-packages continues to use the .local packages from the .local installation (and not uses them if --system-site-packages is not used) - so we have full compatibility with previous images. Improvements for development: * when image is built from sources (no --use-docker-context-files are specified), airflow is installed in --editable mode, which means that airflow + all providers are installed locally from airflow sources, not from packages - which means that both airflow and providers have the latest version inside the prod image. * when local sources changes and you want to run k8s tests locally, it is now WAY faster (several minutes) to iterate with your changes because you do not have to rebuild the base image - the only thing needed is to copy sources to the PROD image to "/opt/airflow" which is where editable installlation is done from. You only need to rebuild the image if dependencies change. * By default `uv` is used for local source build for k8s tests so even if you have to rebuild it, it is way faster (60%-80%) during iterating with the image. CI/DEV tooling improvements: * this PR switches to use `uv` by default for most prod images we build in CI, but it adds a check if the image still builds with `pip`. * we also switch to more PEP standard way of installing packages from local filesystem (package-name @ file:///FILE) Fixes: #37785 Fixes: #37815 Update contributing-docs/testing/k8s_tests.rst Update contributing-docs/testing/k8s_tests.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update scripts/docker/install_airflow.sh Update docs/docker-stack/changelog.rst Update docs/docker-stack/build.rst Co-authored-by: Niko Oliveira <onikolas@amazon.com>
2024-03-06 01:27:15 +01:00
echo
Move airflow sources to airflow-core package (#47798) This is continuation of the separation of the Airflow codebase into separate distributions. This one splits airflow into two of them: * apache-airflow - becomes an empty, meta no-code distribution that only has dependencies to apache-airflow-core and task-sdk distributions and it has preinstalled provider distributions added in standard "wheel" distribution. All "extras" lead either to "apache-airflow-core" extras or to providers - the dependencies and optional dependencies are calculated differently depending on "editable" or "standard" mode - in editable mode, just provider dependencies are installed for preinstalled providers in standard mode - those preinstalled providers are dependencies. * the apache-airflow-core distribution contains all airflow core sources (previously in apache-airflow) and it has no provider extras. Thanks to that apache-airflow distribution does not have any dynamically calculated dependencies. * the apache-airflow-core distribution hs "hatch_build_airflow_core.py" build hooks that add custom build target and implement custom cleanup in order to implement compiling assets as part of the build. * During the move, the following changes were applied for consistency: * packages when used in context of distribution packages have been renamed to "distributions" - including all documentations and commands in breeze to void confusion with import packages (see https://packaging.python.org/en/latest/discussions/distribution-package-vs-import-package/) * all tests in `airflow-core` follow now the same convention where tests are in `unit`, `system` and `integration` package. no extra package has been as second level, because all the provider tests have "<PROVIDER>" there, so we just have to avoid naming airflow unit."<PROVIDER>" with the same name as provider. * all tooling in CI/DEV have been updated to follow the new structure. We should always build to packages now when we are building them using `breeze`.
2025-03-21 14:25:26 +01:00
if [[ -z "${install_airflow_core_distribution[*]}" && ${AIRFLOW_VERSION=} != "" ]]; then
# When we install only provider distributions from docker-context files, we need to still
# install airflow from PyPI when AIRFLOW_VERSION is set. This handles the case where
# pre-release dockerhub image of airflow is built, but we want to install some providers from
# docker-context files
install_airflow_core_distribution=("apache-airflow-core==${AIRFLOW_VERSION}")
fi
AIP-81 airflowctl Include CI/breeze unit-testing and distribution commands (#48099) * Include unit-testing into CI and breeze, add distribution pieces * Merge task-sdk and airflow-ctl test workflow and can be extended for each non-core distro, update release management doc packages to distributions * Remove no needed comment * Remove duplicate ISSUE_MATCH_IN_BODY definition, unify non-core release logic and include airflowctl release method in release_management_commands.py, create DistributionPackageBuildType for identifying dist name * Update dev/breeze/doc/05_test_commands.rst Co-authored-by: LIU ZHE YOU <68415893+jason810496@users.noreply.github.com> * Fix dash problem * Remove not used vars from ci.yml * Update breeze selective check tests * Update breeze selective check tests, fix typo in release_management_commands.py, fix pre-commit naming in mypy, fix dist naming * Fix pre-commit hook, fix dist path for release_management_commands.py, fix breeze test * add airflowctl to mypy_folder.py, include __init__.py to airflowctl, include into missing scripts for installation and release, pre-commit adjustment, files are moved to src/airflow/ctl structure to fit into generic structure, include airflow-ctl into .dockerignore, * Remove uv workspaces for now which preventing ci image to be built * Fix airflow-ctl workspace and include devel-common again along with pytest_plugins to make breeze testing work * Revert provider yaml workspace changes * Remove bespoke handle of provider.toml and remove airflow-ctl from provider.toml template * Move back distribution name to airflowctl, update CI logic to more dynamic via inputs for non-core distributions * Fix path in mypy, remove not needed __init__.py and duplicate conftest in tests * Remove airflow-ctl from providers test --------- Co-authored-by: LIU ZHE YOU <68415893+jason810496@users.noreply.github.com>
2025-03-31 10:54:10 +02:00
# Find Provider/TaskSDK/CTL distributions in docker-context files
readarray -t airflow_distributions< <(python /scripts/docker/get_distribution_specs.py /docker-context-files/apache?airflow?{providers,task?sdk,airflowctl}*.{whl,tar.gz} 2>/dev/null || true)
Move airflow sources to airflow-core package (#47798) This is continuation of the separation of the Airflow codebase into separate distributions. This one splits airflow into two of them: * apache-airflow - becomes an empty, meta no-code distribution that only has dependencies to apache-airflow-core and task-sdk distributions and it has preinstalled provider distributions added in standard "wheel" distribution. All "extras" lead either to "apache-airflow-core" extras or to providers - the dependencies and optional dependencies are calculated differently depending on "editable" or "standard" mode - in editable mode, just provider dependencies are installed for preinstalled providers in standard mode - those preinstalled providers are dependencies. * the apache-airflow-core distribution contains all airflow core sources (previously in apache-airflow) and it has no provider extras. Thanks to that apache-airflow distribution does not have any dynamically calculated dependencies. * the apache-airflow-core distribution hs "hatch_build_airflow_core.py" build hooks that add custom build target and implement custom cleanup in order to implement compiling assets as part of the build. * During the move, the following changes were applied for consistency: * packages when used in context of distribution packages have been renamed to "distributions" - including all documentations and commands in breeze to void confusion with import packages (see https://packaging.python.org/en/latest/discussions/distribution-package-vs-import-package/) * all tests in `airflow-core` follow now the same convention where tests are in `unit`, `system` and `integration` package. no extra package has been as second level, because all the provider tests have "<PROVIDER>" there, so we just have to avoid naming airflow unit."<PROVIDER>" with the same name as provider. * all tooling in CI/DEV have been updated to follow the new structure. We should always build to packages now when we are building them using `breeze`.
2025-03-21 14:25:26 +01:00
echo
echo "${COLOR_BLUE}Found provider distributions in docker-context-files folder: ${airflow_distributions[*]}${COLOR_RESET}"
echo
if [[ ${USE_CONSTRAINTS_FOR_CONTEXT_DISTRIBUTIONS=} == "true" ]]; then
local python_version
python_version=$(python -c 'import sys; print(f"{sys.version_info.major}.{sys.version_info.minor}")')
local local_constraints_file=/docker-context-files/constraints-"${python_version}"/${AIRFLOW_CONSTRAINTS_MODE}-"${python_version}".txt
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
if [[ -f "${local_constraints_file}" ]]; then
echo
Move airflow sources to airflow-core package (#47798) This is continuation of the separation of the Airflow codebase into separate distributions. This one splits airflow into two of them: * apache-airflow - becomes an empty, meta no-code distribution that only has dependencies to apache-airflow-core and task-sdk distributions and it has preinstalled provider distributions added in standard "wheel" distribution. All "extras" lead either to "apache-airflow-core" extras or to providers - the dependencies and optional dependencies are calculated differently depending on "editable" or "standard" mode - in editable mode, just provider dependencies are installed for preinstalled providers in standard mode - those preinstalled providers are dependencies. * the apache-airflow-core distribution contains all airflow core sources (previously in apache-airflow) and it has no provider extras. Thanks to that apache-airflow distribution does not have any dynamically calculated dependencies. * the apache-airflow-core distribution hs "hatch_build_airflow_core.py" build hooks that add custom build target and implement custom cleanup in order to implement compiling assets as part of the build. * During the move, the following changes were applied for consistency: * packages when used in context of distribution packages have been renamed to "distributions" - including all documentations and commands in breeze to void confusion with import packages (see https://packaging.python.org/en/latest/discussions/distribution-package-vs-import-package/) * all tests in `airflow-core` follow now the same convention where tests are in `unit`, `system` and `integration` package. no extra package has been as second level, because all the provider tests have "<PROVIDER>" there, so we just have to avoid naming airflow unit."<PROVIDER>" with the same name as provider. * all tooling in CI/DEV have been updated to follow the new structure. We should always build to packages now when we are building them using `breeze`.
2025-03-21 14:25:26 +01:00
echo "${COLOR_BLUE}Installing docker-context-files distributions with constraints found in ${local_constraints_file}${COLOR_RESET}"
echo
Move airflow sources to airflow-core package (#47798) This is continuation of the separation of the Airflow codebase into separate distributions. This one splits airflow into two of them: * apache-airflow - becomes an empty, meta no-code distribution that only has dependencies to apache-airflow-core and task-sdk distributions and it has preinstalled provider distributions added in standard "wheel" distribution. All "extras" lead either to "apache-airflow-core" extras or to providers - the dependencies and optional dependencies are calculated differently depending on "editable" or "standard" mode - in editable mode, just provider dependencies are installed for preinstalled providers in standard mode - those preinstalled providers are dependencies. * the apache-airflow-core distribution contains all airflow core sources (previously in apache-airflow) and it has no provider extras. Thanks to that apache-airflow distribution does not have any dynamically calculated dependencies. * the apache-airflow-core distribution hs "hatch_build_airflow_core.py" build hooks that add custom build target and implement custom cleanup in order to implement compiling assets as part of the build. * During the move, the following changes were applied for consistency: * packages when used in context of distribution packages have been renamed to "distributions" - including all documentations and commands in breeze to void confusion with import packages (see https://packaging.python.org/en/latest/discussions/distribution-package-vs-import-package/) * all tests in `airflow-core` follow now the same convention where tests are in `unit`, `system` and `integration` package. no extra package has been as second level, because all the provider tests have "<PROVIDER>" there, so we just have to avoid naming airflow unit."<PROVIDER>" with the same name as provider. * all tooling in CI/DEV have been updated to follow the new structure. We should always build to packages now when we are building them using `breeze`.
2025-03-21 14:25:26 +01:00
# force reinstall all airflow + provider distributions with constraints found in
flags=(--upgrade --constraint "${local_constraints_file}")
echo
echo "${COLOR_BLUE}Copying ${local_constraints_file} to ${HOME}/constraints.txt${COLOR_RESET}"
echo
cp "${local_constraints_file}" "${HOME}/constraints.txt"
else
echo
Move airflow sources to airflow-core package (#47798) This is continuation of the separation of the Airflow codebase into separate distributions. This one splits airflow into two of them: * apache-airflow - becomes an empty, meta no-code distribution that only has dependencies to apache-airflow-core and task-sdk distributions and it has preinstalled provider distributions added in standard "wheel" distribution. All "extras" lead either to "apache-airflow-core" extras or to providers - the dependencies and optional dependencies are calculated differently depending on "editable" or "standard" mode - in editable mode, just provider dependencies are installed for preinstalled providers in standard mode - those preinstalled providers are dependencies. * the apache-airflow-core distribution contains all airflow core sources (previously in apache-airflow) and it has no provider extras. Thanks to that apache-airflow distribution does not have any dynamically calculated dependencies. * the apache-airflow-core distribution hs "hatch_build_airflow_core.py" build hooks that add custom build target and implement custom cleanup in order to implement compiling assets as part of the build. * During the move, the following changes were applied for consistency: * packages when used in context of distribution packages have been renamed to "distributions" - including all documentations and commands in breeze to void confusion with import packages (see https://packaging.python.org/en/latest/discussions/distribution-package-vs-import-package/) * all tests in `airflow-core` follow now the same convention where tests are in `unit`, `system` and `integration` package. no extra package has been as second level, because all the provider tests have "<PROVIDER>" there, so we just have to avoid naming airflow unit."<PROVIDER>" with the same name as provider. * all tooling in CI/DEV have been updated to follow the new structure. We should always build to packages now when we are building them using `breeze`.
2025-03-21 14:25:26 +01:00
echo "${COLOR_BLUE}Installing docker-context-files distributions with constraints from GitHub${COLOR_RESET}"
echo
flags=(--constraint "${HOME}/constraints.txt")
fi
else
echo
Move airflow sources to airflow-core package (#47798) This is continuation of the separation of the Airflow codebase into separate distributions. This one splits airflow into two of them: * apache-airflow - becomes an empty, meta no-code distribution that only has dependencies to apache-airflow-core and task-sdk distributions and it has preinstalled provider distributions added in standard "wheel" distribution. All "extras" lead either to "apache-airflow-core" extras or to providers - the dependencies and optional dependencies are calculated differently depending on "editable" or "standard" mode - in editable mode, just provider dependencies are installed for preinstalled providers in standard mode - those preinstalled providers are dependencies. * the apache-airflow-core distribution contains all airflow core sources (previously in apache-airflow) and it has no provider extras. Thanks to that apache-airflow distribution does not have any dynamically calculated dependencies. * the apache-airflow-core distribution hs "hatch_build_airflow_core.py" build hooks that add custom build target and implement custom cleanup in order to implement compiling assets as part of the build. * During the move, the following changes were applied for consistency: * packages when used in context of distribution packages have been renamed to "distributions" - including all documentations and commands in breeze to void confusion with import packages (see https://packaging.python.org/en/latest/discussions/distribution-package-vs-import-package/) * all tests in `airflow-core` follow now the same convention where tests are in `unit`, `system` and `integration` package. no extra package has been as second level, because all the provider tests have "<PROVIDER>" there, so we just have to avoid naming airflow unit."<PROVIDER>" with the same name as provider. * all tooling in CI/DEV have been updated to follow the new structure. We should always build to packages now when we are building them using `breeze`.
2025-03-21 14:25:26 +01:00
echo "${COLOR_BLUE}Installing docker-context-files distributions without constraints${COLOR_RESET}"
echo
flags=()
fi
set -x
if ! ${PACKAGING_TOOL_CMD} install ${EXTRA_INSTALL_FLAGS} \
${ADDITIONAL_PIP_INSTALL_FLAGS} \
"${flags[@]}" \
"${install_airflow_distribution[@]}" "${install_airflow_core_distribution[@]}" "${airflow_distributions[@]}"; then
set +x
if [[ ${AIRFLOW_FALLBACK_NO_CONSTRAINTS_INSTALLATION} != "true" ]]; then
echo
echo "${COLOR_RED}Failing because constraints installation failed and fallback is disabled.${COLOR_RESET}"
echo
exit 1
fi
echo
echo "${COLOR_YELLOW}Likely there are new dependencies conflicting with constraints.${COLOR_RESET}"
echo
echo "${COLOR_BLUE}Falling back to no-constraints installation.${COLOR_RESET}"
echo
set -x
${PACKAGING_TOOL_CMD} install ${EXTRA_INSTALL_FLAGS} \
${ADDITIONAL_PIP_INSTALL_FLAGS} \
"${install_airflow_distribution[@]}" "${install_airflow_core_distribution[@]}" \
"${airflow_distributions[@]}"
fi
set +x
common::install_packaging_tools
# We use pip check here to make sure that whatever `uv` installs, is also "correct" according to `pip`
pip check
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
}
Move airflow sources to airflow-core package (#47798) This is continuation of the separation of the Airflow codebase into separate distributions. This one splits airflow into two of them: * apache-airflow - becomes an empty, meta no-code distribution that only has dependencies to apache-airflow-core and task-sdk distributions and it has preinstalled provider distributions added in standard "wheel" distribution. All "extras" lead either to "apache-airflow-core" extras or to providers - the dependencies and optional dependencies are calculated differently depending on "editable" or "standard" mode - in editable mode, just provider dependencies are installed for preinstalled providers in standard mode - those preinstalled providers are dependencies. * the apache-airflow-core distribution contains all airflow core sources (previously in apache-airflow) and it has no provider extras. Thanks to that apache-airflow distribution does not have any dynamically calculated dependencies. * the apache-airflow-core distribution hs "hatch_build_airflow_core.py" build hooks that add custom build target and implement custom cleanup in order to implement compiling assets as part of the build. * During the move, the following changes were applied for consistency: * packages when used in context of distribution packages have been renamed to "distributions" - including all documentations and commands in breeze to void confusion with import packages (see https://packaging.python.org/en/latest/discussions/distribution-package-vs-import-package/) * all tests in `airflow-core` follow now the same convention where tests are in `unit`, `system` and `integration` package. no extra package has been as second level, because all the provider tests have "<PROVIDER>" there, so we just have to avoid naming airflow unit."<PROVIDER>" with the same name as provider. * all tooling in CI/DEV have been updated to follow the new structure. We should always build to packages now when we are building them using `breeze`.
2025-03-21 14:25:26 +01:00
function install_all_other_distributions_from_docker_context_files() {
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
echo
Move airflow sources to airflow-core package (#47798) This is continuation of the separation of the Airflow codebase into separate distributions. This one splits airflow into two of them: * apache-airflow - becomes an empty, meta no-code distribution that only has dependencies to apache-airflow-core and task-sdk distributions and it has preinstalled provider distributions added in standard "wheel" distribution. All "extras" lead either to "apache-airflow-core" extras or to providers - the dependencies and optional dependencies are calculated differently depending on "editable" or "standard" mode - in editable mode, just provider dependencies are installed for preinstalled providers in standard mode - those preinstalled providers are dependencies. * the apache-airflow-core distribution contains all airflow core sources (previously in apache-airflow) and it has no provider extras. Thanks to that apache-airflow distribution does not have any dynamically calculated dependencies. * the apache-airflow-core distribution hs "hatch_build_airflow_core.py" build hooks that add custom build target and implement custom cleanup in order to implement compiling assets as part of the build. * During the move, the following changes were applied for consistency: * packages when used in context of distribution packages have been renamed to "distributions" - including all documentations and commands in breeze to void confusion with import packages (see https://packaging.python.org/en/latest/discussions/distribution-package-vs-import-package/) * all tests in `airflow-core` follow now the same convention where tests are in `unit`, `system` and `integration` package. no extra package has been as second level, because all the provider tests have "<PROVIDER>" there, so we just have to avoid naming airflow unit."<PROVIDER>" with the same name as provider. * all tooling in CI/DEV have been updated to follow the new structure. We should always build to packages now when we are building them using `breeze`.
2025-03-21 14:25:26 +01:00
echo "${COLOR_BLUE}Force re-installing all other distributions from local files without dependencies${COLOR_RESET}"
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
echo
Move airflow sources to airflow-core package (#47798) This is continuation of the separation of the Airflow codebase into separate distributions. This one splits airflow into two of them: * apache-airflow - becomes an empty, meta no-code distribution that only has dependencies to apache-airflow-core and task-sdk distributions and it has preinstalled provider distributions added in standard "wheel" distribution. All "extras" lead either to "apache-airflow-core" extras or to providers - the dependencies and optional dependencies are calculated differently depending on "editable" or "standard" mode - in editable mode, just provider dependencies are installed for preinstalled providers in standard mode - those preinstalled providers are dependencies. * the apache-airflow-core distribution contains all airflow core sources (previously in apache-airflow) and it has no provider extras. Thanks to that apache-airflow distribution does not have any dynamically calculated dependencies. * the apache-airflow-core distribution hs "hatch_build_airflow_core.py" build hooks that add custom build target and implement custom cleanup in order to implement compiling assets as part of the build. * During the move, the following changes were applied for consistency: * packages when used in context of distribution packages have been renamed to "distributions" - including all documentations and commands in breeze to void confusion with import packages (see https://packaging.python.org/en/latest/discussions/distribution-package-vs-import-package/) * all tests in `airflow-core` follow now the same convention where tests are in `unit`, `system` and `integration` package. no extra package has been as second level, because all the provider tests have "<PROVIDER>" there, so we just have to avoid naming airflow unit."<PROVIDER>" with the same name as provider. * all tooling in CI/DEV have been updated to follow the new structure. We should always build to packages now when we are building them using `breeze`.
2025-03-21 14:25:26 +01:00
local reinstalling_other_distributions
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
# shellcheck disable=SC2010
Move airflow sources to airflow-core package (#47798) This is continuation of the separation of the Airflow codebase into separate distributions. This one splits airflow into two of them: * apache-airflow - becomes an empty, meta no-code distribution that only has dependencies to apache-airflow-core and task-sdk distributions and it has preinstalled provider distributions added in standard "wheel" distribution. All "extras" lead either to "apache-airflow-core" extras or to providers - the dependencies and optional dependencies are calculated differently depending on "editable" or "standard" mode - in editable mode, just provider dependencies are installed for preinstalled providers in standard mode - those preinstalled providers are dependencies. * the apache-airflow-core distribution contains all airflow core sources (previously in apache-airflow) and it has no provider extras. Thanks to that apache-airflow distribution does not have any dynamically calculated dependencies. * the apache-airflow-core distribution hs "hatch_build_airflow_core.py" build hooks that add custom build target and implement custom cleanup in order to implement compiling assets as part of the build. * During the move, the following changes were applied for consistency: * packages when used in context of distribution packages have been renamed to "distributions" - including all documentations and commands in breeze to void confusion with import packages (see https://packaging.python.org/en/latest/discussions/distribution-package-vs-import-package/) * all tests in `airflow-core` follow now the same convention where tests are in `unit`, `system` and `integration` package. no extra package has been as second level, because all the provider tests have "<PROVIDER>" there, so we just have to avoid naming airflow unit."<PROVIDER>" with the same name as provider. * all tooling in CI/DEV have been updated to follow the new structure. We should always build to packages now when we are building them using `breeze`.
2025-03-21 14:25:26 +01:00
reinstalling_other_distributions=$(ls /docker-context-files/*.{whl,tar.gz} 2>/dev/null | \
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
grep -v apache_airflow | grep -v apache-airflow || true)
Move airflow sources to airflow-core package (#47798) This is continuation of the separation of the Airflow codebase into separate distributions. This one splits airflow into two of them: * apache-airflow - becomes an empty, meta no-code distribution that only has dependencies to apache-airflow-core and task-sdk distributions and it has preinstalled provider distributions added in standard "wheel" distribution. All "extras" lead either to "apache-airflow-core" extras or to providers - the dependencies and optional dependencies are calculated differently depending on "editable" or "standard" mode - in editable mode, just provider dependencies are installed for preinstalled providers in standard mode - those preinstalled providers are dependencies. * the apache-airflow-core distribution contains all airflow core sources (previously in apache-airflow) and it has no provider extras. Thanks to that apache-airflow distribution does not have any dynamically calculated dependencies. * the apache-airflow-core distribution hs "hatch_build_airflow_core.py" build hooks that add custom build target and implement custom cleanup in order to implement compiling assets as part of the build. * During the move, the following changes were applied for consistency: * packages when used in context of distribution packages have been renamed to "distributions" - including all documentations and commands in breeze to void confusion with import packages (see https://packaging.python.org/en/latest/discussions/distribution-package-vs-import-package/) * all tests in `airflow-core` follow now the same convention where tests are in `unit`, `system` and `integration` package. no extra package has been as second level, because all the provider tests have "<PROVIDER>" there, so we just have to avoid naming airflow unit."<PROVIDER>" with the same name as provider. * all tooling in CI/DEV have been updated to follow the new structure. We should always build to packages now when we are building them using `breeze`.
2025-03-21 14:25:26 +01:00
if [[ -n "${reinstalling_other_distributions}" ]]; then
set -x
${PACKAGING_TOOL_CMD} install ${EXTRA_INSTALL_FLAGS} ${ADDITIONAL_PIP_INSTALL_FLAGS} \
Move airflow sources to airflow-core package (#47798) This is continuation of the separation of the Airflow codebase into separate distributions. This one splits airflow into two of them: * apache-airflow - becomes an empty, meta no-code distribution that only has dependencies to apache-airflow-core and task-sdk distributions and it has preinstalled provider distributions added in standard "wheel" distribution. All "extras" lead either to "apache-airflow-core" extras or to providers - the dependencies and optional dependencies are calculated differently depending on "editable" or "standard" mode - in editable mode, just provider dependencies are installed for preinstalled providers in standard mode - those preinstalled providers are dependencies. * the apache-airflow-core distribution contains all airflow core sources (previously in apache-airflow) and it has no provider extras. Thanks to that apache-airflow distribution does not have any dynamically calculated dependencies. * the apache-airflow-core distribution hs "hatch_build_airflow_core.py" build hooks that add custom build target and implement custom cleanup in order to implement compiling assets as part of the build. * During the move, the following changes were applied for consistency: * packages when used in context of distribution packages have been renamed to "distributions" - including all documentations and commands in breeze to void confusion with import packages (see https://packaging.python.org/en/latest/discussions/distribution-package-vs-import-package/) * all tests in `airflow-core` follow now the same convention where tests are in `unit`, `system` and `integration` package. no extra package has been as second level, because all the provider tests have "<PROVIDER>" there, so we just have to avoid naming airflow unit."<PROVIDER>" with the same name as provider. * all tooling in CI/DEV have been updated to follow the new structure. We should always build to packages now when we are building them using `breeze`.
2025-03-21 14:25:26 +01:00
--force-reinstall --no-deps --no-index ${reinstalling_other_distributions}
common::install_packaging_tools
set +x
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
fi
}
common::get_colors
common::get_packaging_tool
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
common::get_airflow_version_specification
common::get_constraints_location
common::show_packaging_tool_version_and_location
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
install_airflow_and_providers_from_docker_context_files
Move airflow sources to airflow-core package (#47798) This is continuation of the separation of the Airflow codebase into separate distributions. This one splits airflow into two of them: * apache-airflow - becomes an empty, meta no-code distribution that only has dependencies to apache-airflow-core and task-sdk distributions and it has preinstalled provider distributions added in standard "wheel" distribution. All "extras" lead either to "apache-airflow-core" extras or to providers - the dependencies and optional dependencies are calculated differently depending on "editable" or "standard" mode - in editable mode, just provider dependencies are installed for preinstalled providers in standard mode - those preinstalled providers are dependencies. * the apache-airflow-core distribution contains all airflow core sources (previously in apache-airflow) and it has no provider extras. Thanks to that apache-airflow distribution does not have any dynamically calculated dependencies. * the apache-airflow-core distribution hs "hatch_build_airflow_core.py" build hooks that add custom build target and implement custom cleanup in order to implement compiling assets as part of the build. * During the move, the following changes were applied for consistency: * packages when used in context of distribution packages have been renamed to "distributions" - including all documentations and commands in breeze to void confusion with import packages (see https://packaging.python.org/en/latest/discussions/distribution-package-vs-import-package/) * all tests in `airflow-core` follow now the same convention where tests are in `unit`, `system` and `integration` package. no extra package has been as second level, because all the provider tests have "<PROVIDER>" there, so we just have to avoid naming airflow unit."<PROVIDER>" with the same name as provider. * all tooling in CI/DEV have been updated to follow the new structure. We should always build to packages now when we are building them using `breeze`.
2025-03-21 14:25:26 +01:00
install_all_other_distributions_from_docker_context_files
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
EOF
Move airflow sources to airflow-core package (#47798) This is continuation of the separation of the Airflow codebase into separate distributions. This one splits airflow into two of them: * apache-airflow - becomes an empty, meta no-code distribution that only has dependencies to apache-airflow-core and task-sdk distributions and it has preinstalled provider distributions added in standard "wheel" distribution. All "extras" lead either to "apache-airflow-core" extras or to providers - the dependencies and optional dependencies are calculated differently depending on "editable" or "standard" mode - in editable mode, just provider dependencies are installed for preinstalled providers in standard mode - those preinstalled providers are dependencies. * the apache-airflow-core distribution contains all airflow core sources (previously in apache-airflow) and it has no provider extras. Thanks to that apache-airflow distribution does not have any dynamically calculated dependencies. * the apache-airflow-core distribution hs "hatch_build_airflow_core.py" build hooks that add custom build target and implement custom cleanup in order to implement compiling assets as part of the build. * During the move, the following changes were applied for consistency: * packages when used in context of distribution packages have been renamed to "distributions" - including all documentations and commands in breeze to void confusion with import packages (see https://packaging.python.org/en/latest/discussions/distribution-package-vs-import-package/) * all tests in `airflow-core` follow now the same convention where tests are in `unit`, `system` and `integration` package. no extra package has been as second level, because all the provider tests have "<PROVIDER>" there, so we just have to avoid naming airflow unit."<PROVIDER>" with the same name as provider. * all tooling in CI/DEV have been updated to follow the new structure. We should always build to packages now when we are building them using `breeze`.
2025-03-21 14:25:26 +01:00
# The content below is automatically copied from scripts/docker/get_distribution_specs.py
COPY <<"EOF" /get_distribution_specs.py
Switch from --user to venv for PROD image and enable uv (#37796) This PR introduces a joint way to treat the .local (--user) folder as both - venv and `--user` package installation. It fixes a number of problems the `--user` installation created us in the past and does it in fully backwards compatible way. This improves both "production" use for end user as well as local iteration on the PROD image during tests - but also for CI. Improvements for "end user": * user does not have to use `pip install --user` to install new packages any more and it is not enabled by default with PIP_USER flag. * users can use uv to install packages when they extend the image (but it's not obligatory - pip continues working as it did) * users can use `uv` to build custom production image, which gives 40%-50% saving for image build time compring to `pip`. * python -m venv --system-site-packages continues to use the .local packages from the .local installation (and not uses them if --system-site-packages is not used) - so we have full compatibility with previous images. Improvements for development: * when image is built from sources (no --use-docker-context-files are specified), airflow is installed in --editable mode, which means that airflow + all providers are installed locally from airflow sources, not from packages - which means that both airflow and providers have the latest version inside the prod image. * when local sources changes and you want to run k8s tests locally, it is now WAY faster (several minutes) to iterate with your changes because you do not have to rebuild the base image - the only thing needed is to copy sources to the PROD image to "/opt/airflow" which is where editable installlation is done from. You only need to rebuild the image if dependencies change. * By default `uv` is used for local source build for k8s tests so even if you have to rebuild it, it is way faster (60%-80%) during iterating with the image. CI/DEV tooling improvements: * this PR switches to use `uv` by default for most prod images we build in CI, but it adds a check if the image still builds with `pip`. * we also switch to more PEP standard way of installing packages from local filesystem (package-name @ file:///FILE) Fixes: #37785 Fixes: #37815 Update contributing-docs/testing/k8s_tests.rst Update contributing-docs/testing/k8s_tests.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update scripts/docker/install_airflow.sh Update docs/docker-stack/changelog.rst Update docs/docker-stack/build.rst Co-authored-by: Niko Oliveira <onikolas@amazon.com>
2024-03-06 01:27:15 +01:00
#!/usr/bin/env python
from __future__ import annotations
import os
import sys
Fix test infrastructure for Python-version-excluded providers (#63793) * Skip provider tests when all test directories are excluded When running Providers[google] or Providers[amazon] on Python 3.14, generate_args_for_pytest removes the test folders for excluded providers, but the skip check in _run_test only triggered when the --ignore filter itself removed something. Since the folders were already removed upstream, the guard condition was never met, leaving pytest with only flags and no test directories — causing it to crash on unrecognized custom arguments. Remove the overly strict guard so the skip fires whenever no test directories remain in the args. * Fix PROD image docker tests for Python-version-excluded providers The docker tests expected all providers from prod_image_installed_providers.txt to be present, but providers like google and amazon declare excluded-python-versions in their provider.yaml. On Python 3.14, these providers are correctly excluded from the PROD image at build time, but the tests didn't account for this. Read provider.yaml exclusions and filter expected providers/imports based on the Docker image's Python version. * Skip Python-incompatible provider wheels during PROD image build get_distribution_specs.py now reads Requires-Python metadata from each wheel and skips wheels that are incompatible with the running interpreter. This prevents excluded providers (e.g. amazon on 3.14) from being passed to pip/uv and installed despite their exclusion. Also fix the requires-python specifier generation in packages.py: !=3.14 per PEP 440 only excludes 3.14.0, not 3.14.2. Use !=3.14.* wildcard to exclude the entire minor version.
2026-03-17 17:31:30 +02:00
import zipfile
from email.parser import HeaderParser
Switch from --user to venv for PROD image and enable uv (#37796) This PR introduces a joint way to treat the .local (--user) folder as both - venv and `--user` package installation. It fixes a number of problems the `--user` installation created us in the past and does it in fully backwards compatible way. This improves both "production" use for end user as well as local iteration on the PROD image during tests - but also for CI. Improvements for "end user": * user does not have to use `pip install --user` to install new packages any more and it is not enabled by default with PIP_USER flag. * users can use uv to install packages when they extend the image (but it's not obligatory - pip continues working as it did) * users can use `uv` to build custom production image, which gives 40%-50% saving for image build time compring to `pip`. * python -m venv --system-site-packages continues to use the .local packages from the .local installation (and not uses them if --system-site-packages is not used) - so we have full compatibility with previous images. Improvements for development: * when image is built from sources (no --use-docker-context-files are specified), airflow is installed in --editable mode, which means that airflow + all providers are installed locally from airflow sources, not from packages - which means that both airflow and providers have the latest version inside the prod image. * when local sources changes and you want to run k8s tests locally, it is now WAY faster (several minutes) to iterate with your changes because you do not have to rebuild the base image - the only thing needed is to copy sources to the PROD image to "/opt/airflow" which is where editable installlation is done from. You only need to rebuild the image if dependencies change. * By default `uv` is used for local source build for k8s tests so even if you have to rebuild it, it is way faster (60%-80%) during iterating with the image. CI/DEV tooling improvements: * this PR switches to use `uv` by default for most prod images we build in CI, but it adds a check if the image still builds with `pip`. * we also switch to more PEP standard way of installing packages from local filesystem (package-name @ file:///FILE) Fixes: #37785 Fixes: #37815 Update contributing-docs/testing/k8s_tests.rst Update contributing-docs/testing/k8s_tests.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update scripts/docker/install_airflow.sh Update docs/docker-stack/changelog.rst Update docs/docker-stack/build.rst Co-authored-by: Niko Oliveira <onikolas@amazon.com>
2024-03-06 01:27:15 +01:00
from pathlib import Path
Fix test infrastructure for Python-version-excluded providers (#63793) * Skip provider tests when all test directories are excluded When running Providers[google] or Providers[amazon] on Python 3.14, generate_args_for_pytest removes the test folders for excluded providers, but the skip check in _run_test only triggered when the --ignore filter itself removed something. Since the folders were already removed upstream, the guard condition was never met, leaving pytest with only flags and no test directories — causing it to crash on unrecognized custom arguments. Remove the overly strict guard so the skip fires whenever no test directories remain in the args. * Fix PROD image docker tests for Python-version-excluded providers The docker tests expected all providers from prod_image_installed_providers.txt to be present, but providers like google and amazon declare excluded-python-versions in their provider.yaml. On Python 3.14, these providers are correctly excluded from the PROD image at build time, but the tests didn't account for this. Read provider.yaml exclusions and filter expected providers/imports based on the Docker image's Python version. * Skip Python-incompatible provider wheels during PROD image build get_distribution_specs.py now reads Requires-Python metadata from each wheel and skips wheels that are incompatible with the running interpreter. This prevents excluded providers (e.g. amazon on 3.14) from being passed to pip/uv and installed despite their exclusion. Also fix the requires-python specifier generation in packages.py: !=3.14 per PEP 440 only excludes 3.14.0, not 3.14.2. Use !=3.14.* wildcard to exclude the entire minor version.
2026-03-17 17:31:30 +02:00
from packaging.specifiers import InvalidSpecifier, SpecifierSet
Switch from --user to venv for PROD image and enable uv (#37796) This PR introduces a joint way to treat the .local (--user) folder as both - venv and `--user` package installation. It fixes a number of problems the `--user` installation created us in the past and does it in fully backwards compatible way. This improves both "production" use for end user as well as local iteration on the PROD image during tests - but also for CI. Improvements for "end user": * user does not have to use `pip install --user` to install new packages any more and it is not enabled by default with PIP_USER flag. * users can use uv to install packages when they extend the image (but it's not obligatory - pip continues working as it did) * users can use `uv` to build custom production image, which gives 40%-50% saving for image build time compring to `pip`. * python -m venv --system-site-packages continues to use the .local packages from the .local installation (and not uses them if --system-site-packages is not used) - so we have full compatibility with previous images. Improvements for development: * when image is built from sources (no --use-docker-context-files are specified), airflow is installed in --editable mode, which means that airflow + all providers are installed locally from airflow sources, not from packages - which means that both airflow and providers have the latest version inside the prod image. * when local sources changes and you want to run k8s tests locally, it is now WAY faster (several minutes) to iterate with your changes because you do not have to rebuild the base image - the only thing needed is to copy sources to the PROD image to "/opt/airflow" which is where editable installlation is done from. You only need to rebuild the image if dependencies change. * By default `uv` is used for local source build for k8s tests so even if you have to rebuild it, it is way faster (60%-80%) during iterating with the image. CI/DEV tooling improvements: * this PR switches to use `uv` by default for most prod images we build in CI, but it adds a check if the image still builds with `pip`. * we also switch to more PEP standard way of installing packages from local filesystem (package-name @ file:///FILE) Fixes: #37785 Fixes: #37815 Update contributing-docs/testing/k8s_tests.rst Update contributing-docs/testing/k8s_tests.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update scripts/docker/install_airflow.sh Update docs/docker-stack/changelog.rst Update docs/docker-stack/build.rst Co-authored-by: Niko Oliveira <onikolas@amazon.com>
2024-03-06 01:27:15 +01:00
from packaging.utils import (
InvalidSdistFilename,
InvalidWheelFilename,
parse_sdist_filename,
parse_wheel_filename,
)
Fix test infrastructure for Python-version-excluded providers (#63793) * Skip provider tests when all test directories are excluded When running Providers[google] or Providers[amazon] on Python 3.14, generate_args_for_pytest removes the test folders for excluded providers, but the skip check in _run_test only triggered when the --ignore filter itself removed something. Since the folders were already removed upstream, the guard condition was never met, leaving pytest with only flags and no test directories — causing it to crash on unrecognized custom arguments. Remove the overly strict guard so the skip fires whenever no test directories remain in the args. * Fix PROD image docker tests for Python-version-excluded providers The docker tests expected all providers from prod_image_installed_providers.txt to be present, but providers like google and amazon declare excluded-python-versions in their provider.yaml. On Python 3.14, these providers are correctly excluded from the PROD image at build time, but the tests didn't account for this. Read provider.yaml exclusions and filter expected providers/imports based on the Docker image's Python version. * Skip Python-incompatible provider wheels during PROD image build get_distribution_specs.py now reads Requires-Python metadata from each wheel and skips wheels that are incompatible with the running interpreter. This prevents excluded providers (e.g. amazon on 3.14) from being passed to pip/uv and installed despite their exclusion. Also fix the requires-python specifier generation in packages.py: !=3.14 per PEP 440 only excludes 3.14.0, not 3.14.2. Use !=3.14.* wildcard to exclude the entire minor version.
2026-03-17 17:31:30 +02:00
_CURRENT_PYTHON = f"{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}"
def _compatible_with_current_python(wheel_path: str) -> bool:
"""Return False if the wheel's Requires-Python excludes the running interpreter."""
try:
with zipfile.ZipFile(wheel_path) as zf:
for name in zf.namelist():
if name.endswith(".dist-info/METADATA"):
requires = HeaderParser().parsestr(zf.read(name).decode("utf-8")).get("Requires-Python")
if requires:
return _CURRENT_PYTHON in SpecifierSet(requires)
return True
except (zipfile.BadZipFile, InvalidSpecifier, KeyError) as exc:
print(f"Warning: could not check Requires-Python for {wheel_path}: {exc}", file=sys.stderr)
return True
Switch from --user to venv for PROD image and enable uv (#37796) This PR introduces a joint way to treat the .local (--user) folder as both - venv and `--user` package installation. It fixes a number of problems the `--user` installation created us in the past and does it in fully backwards compatible way. This improves both "production" use for end user as well as local iteration on the PROD image during tests - but also for CI. Improvements for "end user": * user does not have to use `pip install --user` to install new packages any more and it is not enabled by default with PIP_USER flag. * users can use uv to install packages when they extend the image (but it's not obligatory - pip continues working as it did) * users can use `uv` to build custom production image, which gives 40%-50% saving for image build time compring to `pip`. * python -m venv --system-site-packages continues to use the .local packages from the .local installation (and not uses them if --system-site-packages is not used) - so we have full compatibility with previous images. Improvements for development: * when image is built from sources (no --use-docker-context-files are specified), airflow is installed in --editable mode, which means that airflow + all providers are installed locally from airflow sources, not from packages - which means that both airflow and providers have the latest version inside the prod image. * when local sources changes and you want to run k8s tests locally, it is now WAY faster (several minutes) to iterate with your changes because you do not have to rebuild the base image - the only thing needed is to copy sources to the PROD image to "/opt/airflow" which is where editable installlation is done from. You only need to rebuild the image if dependencies change. * By default `uv` is used for local source build for k8s tests so even if you have to rebuild it, it is way faster (60%-80%) during iterating with the image. CI/DEV tooling improvements: * this PR switches to use `uv` by default for most prod images we build in CI, but it adds a check if the image still builds with `pip`. * we also switch to more PEP standard way of installing packages from local filesystem (package-name @ file:///FILE) Fixes: #37785 Fixes: #37815 Update contributing-docs/testing/k8s_tests.rst Update contributing-docs/testing/k8s_tests.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update scripts/docker/install_airflow.sh Update docs/docker-stack/changelog.rst Update docs/docker-stack/build.rst Co-authored-by: Niko Oliveira <onikolas@amazon.com>
2024-03-06 01:27:15 +01:00
def print_package_specs(extras: str = "") -> None:
for package_path in sys.argv[1:]:
try:
package, _, _, _ = parse_wheel_filename(Path(package_path).name)
except InvalidWheelFilename:
try:
package, _ = parse_sdist_filename(Path(package_path).name)
except InvalidSdistFilename:
print(f"Could not parse package name from {package_path}", file=sys.stderr)
continue
Fix test infrastructure for Python-version-excluded providers (#63793) * Skip provider tests when all test directories are excluded When running Providers[google] or Providers[amazon] on Python 3.14, generate_args_for_pytest removes the test folders for excluded providers, but the skip check in _run_test only triggered when the --ignore filter itself removed something. Since the folders were already removed upstream, the guard condition was never met, leaving pytest with only flags and no test directories — causing it to crash on unrecognized custom arguments. Remove the overly strict guard so the skip fires whenever no test directories remain in the args. * Fix PROD image docker tests for Python-version-excluded providers The docker tests expected all providers from prod_image_installed_providers.txt to be present, but providers like google and amazon declare excluded-python-versions in their provider.yaml. On Python 3.14, these providers are correctly excluded from the PROD image at build time, but the tests didn't account for this. Read provider.yaml exclusions and filter expected providers/imports based on the Docker image's Python version. * Skip Python-incompatible provider wheels during PROD image build get_distribution_specs.py now reads Requires-Python metadata from each wheel and skips wheels that are incompatible with the running interpreter. This prevents excluded providers (e.g. amazon on 3.14) from being passed to pip/uv and installed despite their exclusion. Also fix the requires-python specifier generation in packages.py: !=3.14 per PEP 440 only excludes 3.14.0, not 3.14.2. Use !=3.14.* wildcard to exclude the entire minor version.
2026-03-17 17:31:30 +02:00
if package_path.endswith(".whl") and not _compatible_with_current_python(package_path):
print(
f"Skipping {package} (Requires-Python not satisfied by {_CURRENT_PYTHON})",
file=sys.stderr,
)
continue
Switch from --user to venv for PROD image and enable uv (#37796) This PR introduces a joint way to treat the .local (--user) folder as both - venv and `--user` package installation. It fixes a number of problems the `--user` installation created us in the past and does it in fully backwards compatible way. This improves both "production" use for end user as well as local iteration on the PROD image during tests - but also for CI. Improvements for "end user": * user does not have to use `pip install --user` to install new packages any more and it is not enabled by default with PIP_USER flag. * users can use uv to install packages when they extend the image (but it's not obligatory - pip continues working as it did) * users can use `uv` to build custom production image, which gives 40%-50% saving for image build time compring to `pip`. * python -m venv --system-site-packages continues to use the .local packages from the .local installation (and not uses them if --system-site-packages is not used) - so we have full compatibility with previous images. Improvements for development: * when image is built from sources (no --use-docker-context-files are specified), airflow is installed in --editable mode, which means that airflow + all providers are installed locally from airflow sources, not from packages - which means that both airflow and providers have the latest version inside the prod image. * when local sources changes and you want to run k8s tests locally, it is now WAY faster (several minutes) to iterate with your changes because you do not have to rebuild the base image - the only thing needed is to copy sources to the PROD image to "/opt/airflow" which is where editable installlation is done from. You only need to rebuild the image if dependencies change. * By default `uv` is used for local source build for k8s tests so even if you have to rebuild it, it is way faster (60%-80%) during iterating with the image. CI/DEV tooling improvements: * this PR switches to use `uv` by default for most prod images we build in CI, but it adds a check if the image still builds with `pip`. * we also switch to more PEP standard way of installing packages from local filesystem (package-name @ file:///FILE) Fixes: #37785 Fixes: #37815 Update contributing-docs/testing/k8s_tests.rst Update contributing-docs/testing/k8s_tests.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update scripts/docker/install_airflow.sh Update docs/docker-stack/changelog.rst Update docs/docker-stack/build.rst Co-authored-by: Niko Oliveira <onikolas@amazon.com>
2024-03-06 01:27:15 +01:00
print(f"{package}{extras} @ file://{package_path}")
if __name__ == "__main__":
print_package_specs(extras=os.environ.get("EXTRAS", ""))
EOF
Simplify tooling by switching completely to uv (#48223) The lazy consensus decision has been made at the devlist to switch entirely to `uv` as development tool: link: https://lists.apache.org/thread/6xxdon9lmjx3xh8zw09xc5k9jxb2n256 This PR implements that decision and removes a lot of baggage connected to using `pip` additionally to uv to install and sync the environment. It also introduces more consistency in the way how distribution packages are used in airflow sources - basicaly switching all internal distributions to use `pyproject.toml` approach and linking them all together via `uv`'s workspace feature. This enables much more streamlined development workflows, where any part of airflow development is manageable using `uv sync` in the right distribution - opening the way to moving more of the "sub-worfklows" from the CI image to local virtualenv environment. Unfortunately, such change cannot be done incrementally, really, because any change in the project layout drags with itself a lot of changes in the test/CI/management scripts, so we have to implement one big PR covering the move. This PR is "safe" in terms of the airflow and provider's code - it does not **really** (except occasional imports and type hint changes resulting from better isolation of packages) change Airflow code nor it should not affect any airflow or provider code, because it does not move any of the folder where airflow or provider's code is modified. It does move the test code - in a number of "auxiliary" distributions we have. It also moves the `docs` generation code to `devel-common` and introduces separate conf.py files for every doc package. What is still NOT done after that move and will be covered in the follow-up changes: * isolating docs-building to have separate configuraiton for docs building per distribution - allowing to run doc build locally with it's own conf.py file * moving some of the tests and checks out from breeze container image up to the local environment (for example mypy checks) and likely isolating them per-provider * Constraints are still generated using `pip freeze` and automatically managed by our custom scripts in `canary` builds - this will be replaced later by switching to `uv.lock` mechanism. * potentially, we could merge `devel-common` and `dev` - to be considered as a follow-up. * PROD image is stil build with `pip` by default when using `PyPI` or distribution packages - but we do not support building the source image with `pip` - when building from sources, uv is forced internally to install packages. Currently we have no plans to change default PROD building to use `uv`. This is the detailed list of changes implemented in this PR: * uv is now mandatory to install as pre-requisite in order to develop airflow. We do not support installing airflow for development with `pip` - there will be a lot of cases where it will not work for development - including development dependencies and installing several distributions together. * removed meta-package `hatch_build.py' and replacing it with pre-commit automatically modifying declarative pyproject.toml * stripped down `hatch_build_airflow_core.py` to only cover custom git and asset build hooks (and renaming the file to `hatch_build.py` and moving all airflow dependencies to `pyproject.toml` * converted "loose" packages in airflow repo into distributions: * docker-tests * kubernetes-tests * helm-tests * dev (here we do not have `src` subfolder - sources are directly in the distribution, which is for-now inconsistent with other distributions). The names of the `_tests` distribution folders have been renamed to the `-tests` convention to make sure the imports are always referring to base of each distribution and are not used from the content root. * Each eof the distributions (on top of already existing airflow-core, task-sdk, devel-common and 90+providers has it's own set of dependencies, and the top-level meta-package workspace root brings those distributions together allowing to install them all tegether with a simple `uv sync --all-packages` command and come up with consistent set of dependencies that are good for all those packages (yay!). This is used to build CI image with single common environment to run the tests (with some quirks due to constraints use where we have to manually list all distributions until we switch to `uv.lock` mechanism) * `doc` code is moved to `devel-common` distribution. The `doc` folder only keeps README informing where the other doc code is, the spelling_wordlist.txt and start_docs_server.sh. The documentation is generated in `generated/generated-docs/` folder which is entirely .gitignored. * the documentation is now fully moved to: * `airflow-core/docs` - documentation for Airflow Core * `providers/**/docs` - documentation for Providers * `chart/docs` - documentation for Helm Chart * `task-sdk/docs` - documentation for Task SDK (new format not yet published) * `docker-stack-docs` - documentation for Docker Stack' * `providers-summary-docs` - documentation for provider summary page * `versions` are not dynamically retrieved from `__init__.py` all of them are synchronized directly to pyproject.toml files - this way - except the custom build hook - we have no dynamic components in our `pyproject.toml` properties. * references to extras were removed from INSTALL and other places, the only references to extras remains in the user documentation - we stop using extras for local development, we switch to using dependency groups. * backtracking command was removed from breeze - we did not need it since we started using `uv` * internal commands (except constraint generation) have been moved to `uv` from `pip` * breeze requires `uv` to be installed and expects to be installed by `uv tool install -e ./dev/breeze` * pyproject.tomls are dynamically modified when we add a version suffix dynamically (`--version-suffix-for-pypi`) - only for the time of building the versions with updated suffix * `mypy` checks are now consistently used across all the different distributions and for consistency (and to fix some of the issues with namespace packages) rather than using "folder" approach when running mypy checks, even if we run mypy for whole distribution, we run check on individual files rather than on a folder. That adds consistency in execution of mypy heursistics. Rather than using in-container mypy script all the logic of selection and parameters passed to mypy are in pre-commit code. For now we are still using CI image to run mypy because mypy is very sensitive to version of dependencies installed, we should be able to switch to running mypy locally once we have the `uv.lock` mechanism incorporated in our workflows. * lower bounds for dependencies have been set consistently across all the distributions. With `uv sync` and dependabot, those should be generally kept consistently for the future * the `devel-common` dependencies have been groupped together in `devel-common` extras - including `basic`, `doc`, `doc-gen`, and `all` which will make it easier to install them for some OS-es (basic is used as default set of dependencies to cover most common set of development dependencies to be used for development) * generated/provider_dependencies.json are not committed to the repository any longer. They are .gitignored and geberated on-the-flight as needed (breeze will generate them automatically when empty and pre-commit will always regenerate them to be consistent with provider's pyproject.toml files. * `chart-utils` have been noved to `helm-tests` from `devel-common` as they were only used there. * for k8s tests we are using the `uv` main `.venv` environment rather than creating our own `.build` environment and we use `uv sync` to keep it in sync * Updated `uv` version to 0.6.10 * We are using `uv sync` to perform "upgrade to newer depencies" in `canary` builds and locally * leveldb has been turned into "dependency group" and removed from apache-airflow and apache-airflow-core extras, it is now only available by google provider's leveldb optional extra to install with `pip`
2025-04-02 13:11:13 +02:00
# The content below is automatically copied from scripts/docker/install_airflow_when_building_images.sh
COPY <<"EOF" /install_airflow_when_building_images.sh
#!/usr/bin/env bash
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
. "$( dirname "${BASH_SOURCE[0]}" )/common.sh"
Simplify tooling by switching completely to uv (#48223) The lazy consensus decision has been made at the devlist to switch entirely to `uv` as development tool: link: https://lists.apache.org/thread/6xxdon9lmjx3xh8zw09xc5k9jxb2n256 This PR implements that decision and removes a lot of baggage connected to using `pip` additionally to uv to install and sync the environment. It also introduces more consistency in the way how distribution packages are used in airflow sources - basicaly switching all internal distributions to use `pyproject.toml` approach and linking them all together via `uv`'s workspace feature. This enables much more streamlined development workflows, where any part of airflow development is manageable using `uv sync` in the right distribution - opening the way to moving more of the "sub-worfklows" from the CI image to local virtualenv environment. Unfortunately, such change cannot be done incrementally, really, because any change in the project layout drags with itself a lot of changes in the test/CI/management scripts, so we have to implement one big PR covering the move. This PR is "safe" in terms of the airflow and provider's code - it does not **really** (except occasional imports and type hint changes resulting from better isolation of packages) change Airflow code nor it should not affect any airflow or provider code, because it does not move any of the folder where airflow or provider's code is modified. It does move the test code - in a number of "auxiliary" distributions we have. It also moves the `docs` generation code to `devel-common` and introduces separate conf.py files for every doc package. What is still NOT done after that move and will be covered in the follow-up changes: * isolating docs-building to have separate configuraiton for docs building per distribution - allowing to run doc build locally with it's own conf.py file * moving some of the tests and checks out from breeze container image up to the local environment (for example mypy checks) and likely isolating them per-provider * Constraints are still generated using `pip freeze` and automatically managed by our custom scripts in `canary` builds - this will be replaced later by switching to `uv.lock` mechanism. * potentially, we could merge `devel-common` and `dev` - to be considered as a follow-up. * PROD image is stil build with `pip` by default when using `PyPI` or distribution packages - but we do not support building the source image with `pip` - when building from sources, uv is forced internally to install packages. Currently we have no plans to change default PROD building to use `uv`. This is the detailed list of changes implemented in this PR: * uv is now mandatory to install as pre-requisite in order to develop airflow. We do not support installing airflow for development with `pip` - there will be a lot of cases where it will not work for development - including development dependencies and installing several distributions together. * removed meta-package `hatch_build.py' and replacing it with pre-commit automatically modifying declarative pyproject.toml * stripped down `hatch_build_airflow_core.py` to only cover custom git and asset build hooks (and renaming the file to `hatch_build.py` and moving all airflow dependencies to `pyproject.toml` * converted "loose" packages in airflow repo into distributions: * docker-tests * kubernetes-tests * helm-tests * dev (here we do not have `src` subfolder - sources are directly in the distribution, which is for-now inconsistent with other distributions). The names of the `_tests` distribution folders have been renamed to the `-tests` convention to make sure the imports are always referring to base of each distribution and are not used from the content root. * Each eof the distributions (on top of already existing airflow-core, task-sdk, devel-common and 90+providers has it's own set of dependencies, and the top-level meta-package workspace root brings those distributions together allowing to install them all tegether with a simple `uv sync --all-packages` command and come up with consistent set of dependencies that are good for all those packages (yay!). This is used to build CI image with single common environment to run the tests (with some quirks due to constraints use where we have to manually list all distributions until we switch to `uv.lock` mechanism) * `doc` code is moved to `devel-common` distribution. The `doc` folder only keeps README informing where the other doc code is, the spelling_wordlist.txt and start_docs_server.sh. The documentation is generated in `generated/generated-docs/` folder which is entirely .gitignored. * the documentation is now fully moved to: * `airflow-core/docs` - documentation for Airflow Core * `providers/**/docs` - documentation for Providers * `chart/docs` - documentation for Helm Chart * `task-sdk/docs` - documentation for Task SDK (new format not yet published) * `docker-stack-docs` - documentation for Docker Stack' * `providers-summary-docs` - documentation for provider summary page * `versions` are not dynamically retrieved from `__init__.py` all of them are synchronized directly to pyproject.toml files - this way - except the custom build hook - we have no dynamic components in our `pyproject.toml` properties. * references to extras were removed from INSTALL and other places, the only references to extras remains in the user documentation - we stop using extras for local development, we switch to using dependency groups. * backtracking command was removed from breeze - we did not need it since we started using `uv` * internal commands (except constraint generation) have been moved to `uv` from `pip` * breeze requires `uv` to be installed and expects to be installed by `uv tool install -e ./dev/breeze` * pyproject.tomls are dynamically modified when we add a version suffix dynamically (`--version-suffix-for-pypi`) - only for the time of building the versions with updated suffix * `mypy` checks are now consistently used across all the different distributions and for consistency (and to fix some of the issues with namespace packages) rather than using "folder" approach when running mypy checks, even if we run mypy for whole distribution, we run check on individual files rather than on a folder. That adds consistency in execution of mypy heursistics. Rather than using in-container mypy script all the logic of selection and parameters passed to mypy are in pre-commit code. For now we are still using CI image to run mypy because mypy is very sensitive to version of dependencies installed, we should be able to switch to running mypy locally once we have the `uv.lock` mechanism incorporated in our workflows. * lower bounds for dependencies have been set consistently across all the distributions. With `uv sync` and dependabot, those should be generally kept consistently for the future * the `devel-common` dependencies have been groupped together in `devel-common` extras - including `basic`, `doc`, `doc-gen`, and `all` which will make it easier to install them for some OS-es (basic is used as default set of dependencies to cover most common set of development dependencies to be used for development) * generated/provider_dependencies.json are not committed to the repository any longer. They are .gitignored and geberated on-the-flight as needed (breeze will generate them automatically when empty and pre-commit will always regenerate them to be consistent with provider's pyproject.toml files. * `chart-utils` have been noved to `helm-tests` from `devel-common` as they were only used there. * for k8s tests we are using the `uv` main `.venv` environment rather than creating our own `.build` environment and we use `uv sync` to keep it in sync * Updated `uv` version to 0.6.10 * We are using `uv sync` to perform "upgrade to newer depencies" in `canary` builds and locally * leveldb has been turned into "dependency group" and removed from apache-airflow and apache-airflow-core extras, it is now only available by google provider's leveldb optional extra to install with `pip`
2025-04-02 13:11:13 +02:00
function install_from_sources() {
local extra_sync_flags
extra_sync_flags=""
if [[ ${VIRTUAL_ENV=} != "" ]]; then
extra_sync_flags="--active"
fi
Simplify tooling by switching completely to uv (#48223) The lazy consensus decision has been made at the devlist to switch entirely to `uv` as development tool: link: https://lists.apache.org/thread/6xxdon9lmjx3xh8zw09xc5k9jxb2n256 This PR implements that decision and removes a lot of baggage connected to using `pip` additionally to uv to install and sync the environment. It also introduces more consistency in the way how distribution packages are used in airflow sources - basicaly switching all internal distributions to use `pyproject.toml` approach and linking them all together via `uv`'s workspace feature. This enables much more streamlined development workflows, where any part of airflow development is manageable using `uv sync` in the right distribution - opening the way to moving more of the "sub-worfklows" from the CI image to local virtualenv environment. Unfortunately, such change cannot be done incrementally, really, because any change in the project layout drags with itself a lot of changes in the test/CI/management scripts, so we have to implement one big PR covering the move. This PR is "safe" in terms of the airflow and provider's code - it does not **really** (except occasional imports and type hint changes resulting from better isolation of packages) change Airflow code nor it should not affect any airflow or provider code, because it does not move any of the folder where airflow or provider's code is modified. It does move the test code - in a number of "auxiliary" distributions we have. It also moves the `docs` generation code to `devel-common` and introduces separate conf.py files for every doc package. What is still NOT done after that move and will be covered in the follow-up changes: * isolating docs-building to have separate configuraiton for docs building per distribution - allowing to run doc build locally with it's own conf.py file * moving some of the tests and checks out from breeze container image up to the local environment (for example mypy checks) and likely isolating them per-provider * Constraints are still generated using `pip freeze` and automatically managed by our custom scripts in `canary` builds - this will be replaced later by switching to `uv.lock` mechanism. * potentially, we could merge `devel-common` and `dev` - to be considered as a follow-up. * PROD image is stil build with `pip` by default when using `PyPI` or distribution packages - but we do not support building the source image with `pip` - when building from sources, uv is forced internally to install packages. Currently we have no plans to change default PROD building to use `uv`. This is the detailed list of changes implemented in this PR: * uv is now mandatory to install as pre-requisite in order to develop airflow. We do not support installing airflow for development with `pip` - there will be a lot of cases where it will not work for development - including development dependencies and installing several distributions together. * removed meta-package `hatch_build.py' and replacing it with pre-commit automatically modifying declarative pyproject.toml * stripped down `hatch_build_airflow_core.py` to only cover custom git and asset build hooks (and renaming the file to `hatch_build.py` and moving all airflow dependencies to `pyproject.toml` * converted "loose" packages in airflow repo into distributions: * docker-tests * kubernetes-tests * helm-tests * dev (here we do not have `src` subfolder - sources are directly in the distribution, which is for-now inconsistent with other distributions). The names of the `_tests` distribution folders have been renamed to the `-tests` convention to make sure the imports are always referring to base of each distribution and are not used from the content root. * Each eof the distributions (on top of already existing airflow-core, task-sdk, devel-common and 90+providers has it's own set of dependencies, and the top-level meta-package workspace root brings those distributions together allowing to install them all tegether with a simple `uv sync --all-packages` command and come up with consistent set of dependencies that are good for all those packages (yay!). This is used to build CI image with single common environment to run the tests (with some quirks due to constraints use where we have to manually list all distributions until we switch to `uv.lock` mechanism) * `doc` code is moved to `devel-common` distribution. The `doc` folder only keeps README informing where the other doc code is, the spelling_wordlist.txt and start_docs_server.sh. The documentation is generated in `generated/generated-docs/` folder which is entirely .gitignored. * the documentation is now fully moved to: * `airflow-core/docs` - documentation for Airflow Core * `providers/**/docs` - documentation for Providers * `chart/docs` - documentation for Helm Chart * `task-sdk/docs` - documentation for Task SDK (new format not yet published) * `docker-stack-docs` - documentation for Docker Stack' * `providers-summary-docs` - documentation for provider summary page * `versions` are not dynamically retrieved from `__init__.py` all of them are synchronized directly to pyproject.toml files - this way - except the custom build hook - we have no dynamic components in our `pyproject.toml` properties. * references to extras were removed from INSTALL and other places, the only references to extras remains in the user documentation - we stop using extras for local development, we switch to using dependency groups. * backtracking command was removed from breeze - we did not need it since we started using `uv` * internal commands (except constraint generation) have been moved to `uv` from `pip` * breeze requires `uv` to be installed and expects to be installed by `uv tool install -e ./dev/breeze` * pyproject.tomls are dynamically modified when we add a version suffix dynamically (`--version-suffix-for-pypi`) - only for the time of building the versions with updated suffix * `mypy` checks are now consistently used across all the different distributions and for consistency (and to fix some of the issues with namespace packages) rather than using "folder" approach when running mypy checks, even if we run mypy for whole distribution, we run check on individual files rather than on a folder. That adds consistency in execution of mypy heursistics. Rather than using in-container mypy script all the logic of selection and parameters passed to mypy are in pre-commit code. For now we are still using CI image to run mypy because mypy is very sensitive to version of dependencies installed, we should be able to switch to running mypy locally once we have the `uv.lock` mechanism incorporated in our workflows. * lower bounds for dependencies have been set consistently across all the distributions. With `uv sync` and dependabot, those should be generally kept consistently for the future * the `devel-common` dependencies have been groupped together in `devel-common` extras - including `basic`, `doc`, `doc-gen`, and `all` which will make it easier to install them for some OS-es (basic is used as default set of dependencies to cover most common set of development dependencies to be used for development) * generated/provider_dependencies.json are not committed to the repository any longer. They are .gitignored and geberated on-the-flight as needed (breeze will generate them automatically when empty and pre-commit will always regenerate them to be consistent with provider's pyproject.toml files. * `chart-utils` have been noved to `helm-tests` from `devel-common` as they were only used there. * for k8s tests we are using the `uv` main `.venv` environment rather than creating our own `.build` environment and we use `uv sync` to keep it in sync * Updated `uv` version to 0.6.10 * We are using `uv sync` to perform "upgrade to newer depencies" in `canary` builds and locally * leveldb has been turned into "dependency group" and removed from apache-airflow and apache-airflow-core extras, it is now only available by google provider's leveldb optional extra to install with `pip`
2025-04-02 13:11:13 +02:00
if [[ "${UPGRADE_RANDOM_INDICATOR_STRING=}" != "" ]]; then
if [[ ${PACKAGING_TOOL_CMD} == "pip" ]]; then
set +x
echo
echo "${COLOR_RED}We only support uv not pip installation for upgrading dependencies!.${COLOR_RESET}"
echo
exit 1
fi
set +x
echo
echo "${COLOR_BLUE}Attempting to upgrade all packages to highest versions.${COLOR_RESET}"
echo
# --no-binary is needed in order to avoid libxml and xmlsec using different version of libxml2
# (binary lxml embeds its own libxml2, while xmlsec uses system one).
# See https://bugs.launchpad.net/lxml/+bug/2110068
Simplify tooling by switching completely to uv (#48223) The lazy consensus decision has been made at the devlist to switch entirely to `uv` as development tool: link: https://lists.apache.org/thread/6xxdon9lmjx3xh8zw09xc5k9jxb2n256 This PR implements that decision and removes a lot of baggage connected to using `pip` additionally to uv to install and sync the environment. It also introduces more consistency in the way how distribution packages are used in airflow sources - basicaly switching all internal distributions to use `pyproject.toml` approach and linking them all together via `uv`'s workspace feature. This enables much more streamlined development workflows, where any part of airflow development is manageable using `uv sync` in the right distribution - opening the way to moving more of the "sub-worfklows" from the CI image to local virtualenv environment. Unfortunately, such change cannot be done incrementally, really, because any change in the project layout drags with itself a lot of changes in the test/CI/management scripts, so we have to implement one big PR covering the move. This PR is "safe" in terms of the airflow and provider's code - it does not **really** (except occasional imports and type hint changes resulting from better isolation of packages) change Airflow code nor it should not affect any airflow or provider code, because it does not move any of the folder where airflow or provider's code is modified. It does move the test code - in a number of "auxiliary" distributions we have. It also moves the `docs` generation code to `devel-common` and introduces separate conf.py files for every doc package. What is still NOT done after that move and will be covered in the follow-up changes: * isolating docs-building to have separate configuraiton for docs building per distribution - allowing to run doc build locally with it's own conf.py file * moving some of the tests and checks out from breeze container image up to the local environment (for example mypy checks) and likely isolating them per-provider * Constraints are still generated using `pip freeze` and automatically managed by our custom scripts in `canary` builds - this will be replaced later by switching to `uv.lock` mechanism. * potentially, we could merge `devel-common` and `dev` - to be considered as a follow-up. * PROD image is stil build with `pip` by default when using `PyPI` or distribution packages - but we do not support building the source image with `pip` - when building from sources, uv is forced internally to install packages. Currently we have no plans to change default PROD building to use `uv`. This is the detailed list of changes implemented in this PR: * uv is now mandatory to install as pre-requisite in order to develop airflow. We do not support installing airflow for development with `pip` - there will be a lot of cases where it will not work for development - including development dependencies and installing several distributions together. * removed meta-package `hatch_build.py' and replacing it with pre-commit automatically modifying declarative pyproject.toml * stripped down `hatch_build_airflow_core.py` to only cover custom git and asset build hooks (and renaming the file to `hatch_build.py` and moving all airflow dependencies to `pyproject.toml` * converted "loose" packages in airflow repo into distributions: * docker-tests * kubernetes-tests * helm-tests * dev (here we do not have `src` subfolder - sources are directly in the distribution, which is for-now inconsistent with other distributions). The names of the `_tests` distribution folders have been renamed to the `-tests` convention to make sure the imports are always referring to base of each distribution and are not used from the content root. * Each eof the distributions (on top of already existing airflow-core, task-sdk, devel-common and 90+providers has it's own set of dependencies, and the top-level meta-package workspace root brings those distributions together allowing to install them all tegether with a simple `uv sync --all-packages` command and come up with consistent set of dependencies that are good for all those packages (yay!). This is used to build CI image with single common environment to run the tests (with some quirks due to constraints use where we have to manually list all distributions until we switch to `uv.lock` mechanism) * `doc` code is moved to `devel-common` distribution. The `doc` folder only keeps README informing where the other doc code is, the spelling_wordlist.txt and start_docs_server.sh. The documentation is generated in `generated/generated-docs/` folder which is entirely .gitignored. * the documentation is now fully moved to: * `airflow-core/docs` - documentation for Airflow Core * `providers/**/docs` - documentation for Providers * `chart/docs` - documentation for Helm Chart * `task-sdk/docs` - documentation for Task SDK (new format not yet published) * `docker-stack-docs` - documentation for Docker Stack' * `providers-summary-docs` - documentation for provider summary page * `versions` are not dynamically retrieved from `__init__.py` all of them are synchronized directly to pyproject.toml files - this way - except the custom build hook - we have no dynamic components in our `pyproject.toml` properties. * references to extras were removed from INSTALL and other places, the only references to extras remains in the user documentation - we stop using extras for local development, we switch to using dependency groups. * backtracking command was removed from breeze - we did not need it since we started using `uv` * internal commands (except constraint generation) have been moved to `uv` from `pip` * breeze requires `uv` to be installed and expects to be installed by `uv tool install -e ./dev/breeze` * pyproject.tomls are dynamically modified when we add a version suffix dynamically (`--version-suffix-for-pypi`) - only for the time of building the versions with updated suffix * `mypy` checks are now consistently used across all the different distributions and for consistency (and to fix some of the issues with namespace packages) rather than using "folder" approach when running mypy checks, even if we run mypy for whole distribution, we run check on individual files rather than on a folder. That adds consistency in execution of mypy heursistics. Rather than using in-container mypy script all the logic of selection and parameters passed to mypy are in pre-commit code. For now we are still using CI image to run mypy because mypy is very sensitive to version of dependencies installed, we should be able to switch to running mypy locally once we have the `uv.lock` mechanism incorporated in our workflows. * lower bounds for dependencies have been set consistently across all the distributions. With `uv sync` and dependabot, those should be generally kept consistently for the future * the `devel-common` dependencies have been groupped together in `devel-common` extras - including `basic`, `doc`, `doc-gen`, and `all` which will make it easier to install them for some OS-es (basic is used as default set of dependencies to cover most common set of development dependencies to be used for development) * generated/provider_dependencies.json are not committed to the repository any longer. They are .gitignored and geberated on-the-flight as needed (breeze will generate them automatically when empty and pre-commit will always regenerate them to be consistent with provider's pyproject.toml files. * `chart-utils` have been noved to `helm-tests` from `devel-common` as they were only used there. * for k8s tests we are using the `uv` main `.venv` environment rather than creating our own `.build` environment and we use `uv sync` to keep it in sync * Updated `uv` version to 0.6.10 * We are using `uv sync` to perform "upgrade to newer depencies" in `canary` builds and locally * leveldb has been turned into "dependency group" and removed from apache-airflow and apache-airflow-core extras, it is now only available by google provider's leveldb optional extra to install with `pip`
2025-04-02 13:11:13 +02:00
set -x
uv sync --all-packages --resolution highest --group dev --group docs --group docs-gen \
--group leveldb ${extra_sync_flags} --no-binary-package lxml --no-binary-package xmlsec \
--no-python-downloads --no-managed-python
Simplify tooling by switching completely to uv (#48223) The lazy consensus decision has been made at the devlist to switch entirely to `uv` as development tool: link: https://lists.apache.org/thread/6xxdon9lmjx3xh8zw09xc5k9jxb2n256 This PR implements that decision and removes a lot of baggage connected to using `pip` additionally to uv to install and sync the environment. It also introduces more consistency in the way how distribution packages are used in airflow sources - basicaly switching all internal distributions to use `pyproject.toml` approach and linking them all together via `uv`'s workspace feature. This enables much more streamlined development workflows, where any part of airflow development is manageable using `uv sync` in the right distribution - opening the way to moving more of the "sub-worfklows" from the CI image to local virtualenv environment. Unfortunately, such change cannot be done incrementally, really, because any change in the project layout drags with itself a lot of changes in the test/CI/management scripts, so we have to implement one big PR covering the move. This PR is "safe" in terms of the airflow and provider's code - it does not **really** (except occasional imports and type hint changes resulting from better isolation of packages) change Airflow code nor it should not affect any airflow or provider code, because it does not move any of the folder where airflow or provider's code is modified. It does move the test code - in a number of "auxiliary" distributions we have. It also moves the `docs` generation code to `devel-common` and introduces separate conf.py files for every doc package. What is still NOT done after that move and will be covered in the follow-up changes: * isolating docs-building to have separate configuraiton for docs building per distribution - allowing to run doc build locally with it's own conf.py file * moving some of the tests and checks out from breeze container image up to the local environment (for example mypy checks) and likely isolating them per-provider * Constraints are still generated using `pip freeze` and automatically managed by our custom scripts in `canary` builds - this will be replaced later by switching to `uv.lock` mechanism. * potentially, we could merge `devel-common` and `dev` - to be considered as a follow-up. * PROD image is stil build with `pip` by default when using `PyPI` or distribution packages - but we do not support building the source image with `pip` - when building from sources, uv is forced internally to install packages. Currently we have no plans to change default PROD building to use `uv`. This is the detailed list of changes implemented in this PR: * uv is now mandatory to install as pre-requisite in order to develop airflow. We do not support installing airflow for development with `pip` - there will be a lot of cases where it will not work for development - including development dependencies and installing several distributions together. * removed meta-package `hatch_build.py' and replacing it with pre-commit automatically modifying declarative pyproject.toml * stripped down `hatch_build_airflow_core.py` to only cover custom git and asset build hooks (and renaming the file to `hatch_build.py` and moving all airflow dependencies to `pyproject.toml` * converted "loose" packages in airflow repo into distributions: * docker-tests * kubernetes-tests * helm-tests * dev (here we do not have `src` subfolder - sources are directly in the distribution, which is for-now inconsistent with other distributions). The names of the `_tests` distribution folders have been renamed to the `-tests` convention to make sure the imports are always referring to base of each distribution and are not used from the content root. * Each eof the distributions (on top of already existing airflow-core, task-sdk, devel-common and 90+providers has it's own set of dependencies, and the top-level meta-package workspace root brings those distributions together allowing to install them all tegether with a simple `uv sync --all-packages` command and come up with consistent set of dependencies that are good for all those packages (yay!). This is used to build CI image with single common environment to run the tests (with some quirks due to constraints use where we have to manually list all distributions until we switch to `uv.lock` mechanism) * `doc` code is moved to `devel-common` distribution. The `doc` folder only keeps README informing where the other doc code is, the spelling_wordlist.txt and start_docs_server.sh. The documentation is generated in `generated/generated-docs/` folder which is entirely .gitignored. * the documentation is now fully moved to: * `airflow-core/docs` - documentation for Airflow Core * `providers/**/docs` - documentation for Providers * `chart/docs` - documentation for Helm Chart * `task-sdk/docs` - documentation for Task SDK (new format not yet published) * `docker-stack-docs` - documentation for Docker Stack' * `providers-summary-docs` - documentation for provider summary page * `versions` are not dynamically retrieved from `__init__.py` all of them are synchronized directly to pyproject.toml files - this way - except the custom build hook - we have no dynamic components in our `pyproject.toml` properties. * references to extras were removed from INSTALL and other places, the only references to extras remains in the user documentation - we stop using extras for local development, we switch to using dependency groups. * backtracking command was removed from breeze - we did not need it since we started using `uv` * internal commands (except constraint generation) have been moved to `uv` from `pip` * breeze requires `uv` to be installed and expects to be installed by `uv tool install -e ./dev/breeze` * pyproject.tomls are dynamically modified when we add a version suffix dynamically (`--version-suffix-for-pypi`) - only for the time of building the versions with updated suffix * `mypy` checks are now consistently used across all the different distributions and for consistency (and to fix some of the issues with namespace packages) rather than using "folder" approach when running mypy checks, even if we run mypy for whole distribution, we run check on individual files rather than on a folder. That adds consistency in execution of mypy heursistics. Rather than using in-container mypy script all the logic of selection and parameters passed to mypy are in pre-commit code. For now we are still using CI image to run mypy because mypy is very sensitive to version of dependencies installed, we should be able to switch to running mypy locally once we have the `uv.lock` mechanism incorporated in our workflows. * lower bounds for dependencies have been set consistently across all the distributions. With `uv sync` and dependabot, those should be generally kept consistently for the future * the `devel-common` dependencies have been groupped together in `devel-common` extras - including `basic`, `doc`, `doc-gen`, and `all` which will make it easier to install them for some OS-es (basic is used as default set of dependencies to cover most common set of development dependencies to be used for development) * generated/provider_dependencies.json are not committed to the repository any longer. They are .gitignored and geberated on-the-flight as needed (breeze will generate them automatically when empty and pre-commit will always regenerate them to be consistent with provider's pyproject.toml files. * `chart-utils` have been noved to `helm-tests` from `devel-common` as they were only used there. * for k8s tests we are using the `uv` main `.venv` environment rather than creating our own `.build` environment and we use `uv sync` to keep it in sync * Updated `uv` version to 0.6.10 * We are using `uv sync` to perform "upgrade to newer depencies" in `canary` builds and locally * leveldb has been turned into "dependency group" and removed from apache-airflow and apache-airflow-core extras, it is now only available by google provider's leveldb optional extra to install with `pip`
2025-04-02 13:11:13 +02:00
else
set +x
echo
Switch CI dependency management from constraints to uv.lock (#63609) * Switch CI dependency management from constraints to uv.lock closes: #54609 * Fix selective_checks tests for push events without upgrade Push events no longer trigger upgrade-to-newer-dependencies unless uv.lock or pyproject.toml files changed. Updated test expectations. * Fix remaining selective_checks tests for push events Update two more test cases that expected upgrade-to-newer-dependencies to be true for PUSH events. * Fix CI failures: include uv.lock in Docker context and handle missing constraints - Add uv.lock to .dockerignore allowlist so uv sync --frozen works in Docker builds - Make packaging install in install_from_docker_context_files.sh conditional on constraints.txt existing, since the uv.lock path skips constraints download * Fix static checks: update uv.lock and breeze docs after rebase * Use install script with uv.lock constraints for dev dependencies in CI Revert the entrypoint_ci.sh change from `uv sync --all-packages` back to using the install_development_dependencies.py script. The uv sync approach fails when provider source directories are not fully available in the container (e.g. with selected mounts). Instead, generate constraints from uv.lock via `uv export` and pass them to the existing script, which installs only the needed development dependencies via `uv pip install`. Also add uv.lock to VOLUMES_FOR_SELECTED_MOUNTS so it is available inside containers using the "tests and providers" mount mode.
2026-03-15 15:20:53 +01:00
echo "${COLOR_BLUE}Installing all packages from uv.lock (frozen).${COLOR_RESET}"
Simplify tooling by switching completely to uv (#48223) The lazy consensus decision has been made at the devlist to switch entirely to `uv` as development tool: link: https://lists.apache.org/thread/6xxdon9lmjx3xh8zw09xc5k9jxb2n256 This PR implements that decision and removes a lot of baggage connected to using `pip` additionally to uv to install and sync the environment. It also introduces more consistency in the way how distribution packages are used in airflow sources - basicaly switching all internal distributions to use `pyproject.toml` approach and linking them all together via `uv`'s workspace feature. This enables much more streamlined development workflows, where any part of airflow development is manageable using `uv sync` in the right distribution - opening the way to moving more of the "sub-worfklows" from the CI image to local virtualenv environment. Unfortunately, such change cannot be done incrementally, really, because any change in the project layout drags with itself a lot of changes in the test/CI/management scripts, so we have to implement one big PR covering the move. This PR is "safe" in terms of the airflow and provider's code - it does not **really** (except occasional imports and type hint changes resulting from better isolation of packages) change Airflow code nor it should not affect any airflow or provider code, because it does not move any of the folder where airflow or provider's code is modified. It does move the test code - in a number of "auxiliary" distributions we have. It also moves the `docs` generation code to `devel-common` and introduces separate conf.py files for every doc package. What is still NOT done after that move and will be covered in the follow-up changes: * isolating docs-building to have separate configuraiton for docs building per distribution - allowing to run doc build locally with it's own conf.py file * moving some of the tests and checks out from breeze container image up to the local environment (for example mypy checks) and likely isolating them per-provider * Constraints are still generated using `pip freeze` and automatically managed by our custom scripts in `canary` builds - this will be replaced later by switching to `uv.lock` mechanism. * potentially, we could merge `devel-common` and `dev` - to be considered as a follow-up. * PROD image is stil build with `pip` by default when using `PyPI` or distribution packages - but we do not support building the source image with `pip` - when building from sources, uv is forced internally to install packages. Currently we have no plans to change default PROD building to use `uv`. This is the detailed list of changes implemented in this PR: * uv is now mandatory to install as pre-requisite in order to develop airflow. We do not support installing airflow for development with `pip` - there will be a lot of cases where it will not work for development - including development dependencies and installing several distributions together. * removed meta-package `hatch_build.py' and replacing it with pre-commit automatically modifying declarative pyproject.toml * stripped down `hatch_build_airflow_core.py` to only cover custom git and asset build hooks (and renaming the file to `hatch_build.py` and moving all airflow dependencies to `pyproject.toml` * converted "loose" packages in airflow repo into distributions: * docker-tests * kubernetes-tests * helm-tests * dev (here we do not have `src` subfolder - sources are directly in the distribution, which is for-now inconsistent with other distributions). The names of the `_tests` distribution folders have been renamed to the `-tests` convention to make sure the imports are always referring to base of each distribution and are not used from the content root. * Each eof the distributions (on top of already existing airflow-core, task-sdk, devel-common and 90+providers has it's own set of dependencies, and the top-level meta-package workspace root brings those distributions together allowing to install them all tegether with a simple `uv sync --all-packages` command and come up with consistent set of dependencies that are good for all those packages (yay!). This is used to build CI image with single common environment to run the tests (with some quirks due to constraints use where we have to manually list all distributions until we switch to `uv.lock` mechanism) * `doc` code is moved to `devel-common` distribution. The `doc` folder only keeps README informing where the other doc code is, the spelling_wordlist.txt and start_docs_server.sh. The documentation is generated in `generated/generated-docs/` folder which is entirely .gitignored. * the documentation is now fully moved to: * `airflow-core/docs` - documentation for Airflow Core * `providers/**/docs` - documentation for Providers * `chart/docs` - documentation for Helm Chart * `task-sdk/docs` - documentation for Task SDK (new format not yet published) * `docker-stack-docs` - documentation for Docker Stack' * `providers-summary-docs` - documentation for provider summary page * `versions` are not dynamically retrieved from `__init__.py` all of them are synchronized directly to pyproject.toml files - this way - except the custom build hook - we have no dynamic components in our `pyproject.toml` properties. * references to extras were removed from INSTALL and other places, the only references to extras remains in the user documentation - we stop using extras for local development, we switch to using dependency groups. * backtracking command was removed from breeze - we did not need it since we started using `uv` * internal commands (except constraint generation) have been moved to `uv` from `pip` * breeze requires `uv` to be installed and expects to be installed by `uv tool install -e ./dev/breeze` * pyproject.tomls are dynamically modified when we add a version suffix dynamically (`--version-suffix-for-pypi`) - only for the time of building the versions with updated suffix * `mypy` checks are now consistently used across all the different distributions and for consistency (and to fix some of the issues with namespace packages) rather than using "folder" approach when running mypy checks, even if we run mypy for whole distribution, we run check on individual files rather than on a folder. That adds consistency in execution of mypy heursistics. Rather than using in-container mypy script all the logic of selection and parameters passed to mypy are in pre-commit code. For now we are still using CI image to run mypy because mypy is very sensitive to version of dependencies installed, we should be able to switch to running mypy locally once we have the `uv.lock` mechanism incorporated in our workflows. * lower bounds for dependencies have been set consistently across all the distributions. With `uv sync` and dependabot, those should be generally kept consistently for the future * the `devel-common` dependencies have been groupped together in `devel-common` extras - including `basic`, `doc`, `doc-gen`, and `all` which will make it easier to install them for some OS-es (basic is used as default set of dependencies to cover most common set of development dependencies to be used for development) * generated/provider_dependencies.json are not committed to the repository any longer. They are .gitignored and geberated on-the-flight as needed (breeze will generate them automatically when empty and pre-commit will always regenerate them to be consistent with provider's pyproject.toml files. * `chart-utils` have been noved to `helm-tests` from `devel-common` as they were only used there. * for k8s tests we are using the `uv` main `.venv` environment rather than creating our own `.build` environment and we use `uv sync` to keep it in sync * Updated `uv` version to 0.6.10 * We are using `uv sync` to perform "upgrade to newer depencies" in `canary` builds and locally * leveldb has been turned into "dependency group" and removed from apache-airflow and apache-airflow-core extras, it is now only available by google provider's leveldb optional extra to install with `pip`
2025-04-02 13:11:13 +02:00
echo
Switch CI dependency management from constraints to uv.lock (#63609) * Switch CI dependency management from constraints to uv.lock closes: #54609 * Fix selective_checks tests for push events without upgrade Push events no longer trigger upgrade-to-newer-dependencies unless uv.lock or pyproject.toml files changed. Updated test expectations. * Fix remaining selective_checks tests for push events Update two more test cases that expected upgrade-to-newer-dependencies to be true for PUSH events. * Fix CI failures: include uv.lock in Docker context and handle missing constraints - Add uv.lock to .dockerignore allowlist so uv sync --frozen works in Docker builds - Make packaging install in install_from_docker_context_files.sh conditional on constraints.txt existing, since the uv.lock path skips constraints download * Fix static checks: update uv.lock and breeze docs after rebase * Use install script with uv.lock constraints for dev dependencies in CI Revert the entrypoint_ci.sh change from `uv sync --all-packages` back to using the install_development_dependencies.py script. The uv sync approach fails when provider source directories are not fully available in the container (e.g. with selected mounts). Instead, generate constraints from uv.lock via `uv export` and pass them to the existing script, which installs only the needed development dependencies via `uv pip install`. Also add uv.lock to VOLUMES_FOR_SELECTED_MOUNTS so it is available inside containers using the "tests and providers" mount mode.
2026-03-15 15:20:53 +01:00
# Use uv sync --frozen to install exactly what is pinned in uv.lock without re-resolving.
# --no-binary-package is needed in order to avoid libxml and xmlsec using different version of
# libxml2 (binary lxml embeds its own libxml2, while xmlsec uses system one).
# See https://bugs.launchpad.net/lxml/+bug/2110068
Simplify tooling by switching completely to uv (#48223) The lazy consensus decision has been made at the devlist to switch entirely to `uv` as development tool: link: https://lists.apache.org/thread/6xxdon9lmjx3xh8zw09xc5k9jxb2n256 This PR implements that decision and removes a lot of baggage connected to using `pip` additionally to uv to install and sync the environment. It also introduces more consistency in the way how distribution packages are used in airflow sources - basicaly switching all internal distributions to use `pyproject.toml` approach and linking them all together via `uv`'s workspace feature. This enables much more streamlined development workflows, where any part of airflow development is manageable using `uv sync` in the right distribution - opening the way to moving more of the "sub-worfklows" from the CI image to local virtualenv environment. Unfortunately, such change cannot be done incrementally, really, because any change in the project layout drags with itself a lot of changes in the test/CI/management scripts, so we have to implement one big PR covering the move. This PR is "safe" in terms of the airflow and provider's code - it does not **really** (except occasional imports and type hint changes resulting from better isolation of packages) change Airflow code nor it should not affect any airflow or provider code, because it does not move any of the folder where airflow or provider's code is modified. It does move the test code - in a number of "auxiliary" distributions we have. It also moves the `docs` generation code to `devel-common` and introduces separate conf.py files for every doc package. What is still NOT done after that move and will be covered in the follow-up changes: * isolating docs-building to have separate configuraiton for docs building per distribution - allowing to run doc build locally with it's own conf.py file * moving some of the tests and checks out from breeze container image up to the local environment (for example mypy checks) and likely isolating them per-provider * Constraints are still generated using `pip freeze` and automatically managed by our custom scripts in `canary` builds - this will be replaced later by switching to `uv.lock` mechanism. * potentially, we could merge `devel-common` and `dev` - to be considered as a follow-up. * PROD image is stil build with `pip` by default when using `PyPI` or distribution packages - but we do not support building the source image with `pip` - when building from sources, uv is forced internally to install packages. Currently we have no plans to change default PROD building to use `uv`. This is the detailed list of changes implemented in this PR: * uv is now mandatory to install as pre-requisite in order to develop airflow. We do not support installing airflow for development with `pip` - there will be a lot of cases where it will not work for development - including development dependencies and installing several distributions together. * removed meta-package `hatch_build.py' and replacing it with pre-commit automatically modifying declarative pyproject.toml * stripped down `hatch_build_airflow_core.py` to only cover custom git and asset build hooks (and renaming the file to `hatch_build.py` and moving all airflow dependencies to `pyproject.toml` * converted "loose" packages in airflow repo into distributions: * docker-tests * kubernetes-tests * helm-tests * dev (here we do not have `src` subfolder - sources are directly in the distribution, which is for-now inconsistent with other distributions). The names of the `_tests` distribution folders have been renamed to the `-tests` convention to make sure the imports are always referring to base of each distribution and are not used from the content root. * Each eof the distributions (on top of already existing airflow-core, task-sdk, devel-common and 90+providers has it's own set of dependencies, and the top-level meta-package workspace root brings those distributions together allowing to install them all tegether with a simple `uv sync --all-packages` command and come up with consistent set of dependencies that are good for all those packages (yay!). This is used to build CI image with single common environment to run the tests (with some quirks due to constraints use where we have to manually list all distributions until we switch to `uv.lock` mechanism) * `doc` code is moved to `devel-common` distribution. The `doc` folder only keeps README informing where the other doc code is, the spelling_wordlist.txt and start_docs_server.sh. The documentation is generated in `generated/generated-docs/` folder which is entirely .gitignored. * the documentation is now fully moved to: * `airflow-core/docs` - documentation for Airflow Core * `providers/**/docs` - documentation for Providers * `chart/docs` - documentation for Helm Chart * `task-sdk/docs` - documentation for Task SDK (new format not yet published) * `docker-stack-docs` - documentation for Docker Stack' * `providers-summary-docs` - documentation for provider summary page * `versions` are not dynamically retrieved from `__init__.py` all of them are synchronized directly to pyproject.toml files - this way - except the custom build hook - we have no dynamic components in our `pyproject.toml` properties. * references to extras were removed from INSTALL and other places, the only references to extras remains in the user documentation - we stop using extras for local development, we switch to using dependency groups. * backtracking command was removed from breeze - we did not need it since we started using `uv` * internal commands (except constraint generation) have been moved to `uv` from `pip` * breeze requires `uv` to be installed and expects to be installed by `uv tool install -e ./dev/breeze` * pyproject.tomls are dynamically modified when we add a version suffix dynamically (`--version-suffix-for-pypi`) - only for the time of building the versions with updated suffix * `mypy` checks are now consistently used across all the different distributions and for consistency (and to fix some of the issues with namespace packages) rather than using "folder" approach when running mypy checks, even if we run mypy for whole distribution, we run check on individual files rather than on a folder. That adds consistency in execution of mypy heursistics. Rather than using in-container mypy script all the logic of selection and parameters passed to mypy are in pre-commit code. For now we are still using CI image to run mypy because mypy is very sensitive to version of dependencies installed, we should be able to switch to running mypy locally once we have the `uv.lock` mechanism incorporated in our workflows. * lower bounds for dependencies have been set consistently across all the distributions. With `uv sync` and dependabot, those should be generally kept consistently for the future * the `devel-common` dependencies have been groupped together in `devel-common` extras - including `basic`, `doc`, `doc-gen`, and `all` which will make it easier to install them for some OS-es (basic is used as default set of dependencies to cover most common set of development dependencies to be used for development) * generated/provider_dependencies.json are not committed to the repository any longer. They are .gitignored and geberated on-the-flight as needed (breeze will generate them automatically when empty and pre-commit will always regenerate them to be consistent with provider's pyproject.toml files. * `chart-utils` have been noved to `helm-tests` from `devel-common` as they were only used there. * for k8s tests we are using the `uv` main `.venv` environment rather than creating our own `.build` environment and we use `uv sync` to keep it in sync * Updated `uv` version to 0.6.10 * We are using `uv sync` to perform "upgrade to newer depencies" in `canary` builds and locally * leveldb has been turned into "dependency group" and removed from apache-airflow and apache-airflow-core extras, it is now only available by google provider's leveldb optional extra to install with `pip`
2025-04-02 13:11:13 +02:00
set -x
Switch CI dependency management from constraints to uv.lock (#63609) * Switch CI dependency management from constraints to uv.lock closes: #54609 * Fix selective_checks tests for push events without upgrade Push events no longer trigger upgrade-to-newer-dependencies unless uv.lock or pyproject.toml files changed. Updated test expectations. * Fix remaining selective_checks tests for push events Update two more test cases that expected upgrade-to-newer-dependencies to be true for PUSH events. * Fix CI failures: include uv.lock in Docker context and handle missing constraints - Add uv.lock to .dockerignore allowlist so uv sync --frozen works in Docker builds - Make packaging install in install_from_docker_context_files.sh conditional on constraints.txt existing, since the uv.lock path skips constraints download * Fix static checks: update uv.lock and breeze docs after rebase * Use install script with uv.lock constraints for dev dependencies in CI Revert the entrypoint_ci.sh change from `uv sync --all-packages` back to using the install_development_dependencies.py script. The uv sync approach fails when provider source directories are not fully available in the container (e.g. with selected mounts). Instead, generate constraints from uv.lock via `uv export` and pass them to the existing script, which installs only the needed development dependencies via `uv pip install`. Also add uv.lock to VOLUMES_FOR_SELECTED_MOUNTS so it is available inside containers using the "tests and providers" mount mode.
2026-03-15 15:20:53 +01:00
if ! uv sync --all-packages --frozen --group dev --group docs --group docs-gen \
--group leveldb ${extra_sync_flags} --no-binary-package lxml --no-binary-package xmlsec \
--no-python-downloads --no-managed-python; then
set +x
if [[ ${AIRFLOW_FALLBACK_NO_CONSTRAINTS_INSTALLATION} != "true" ]]; then
echo
Switch CI dependency management from constraints to uv.lock (#63609) * Switch CI dependency management from constraints to uv.lock closes: #54609 * Fix selective_checks tests for push events without upgrade Push events no longer trigger upgrade-to-newer-dependencies unless uv.lock or pyproject.toml files changed. Updated test expectations. * Fix remaining selective_checks tests for push events Update two more test cases that expected upgrade-to-newer-dependencies to be true for PUSH events. * Fix CI failures: include uv.lock in Docker context and handle missing constraints - Add uv.lock to .dockerignore allowlist so uv sync --frozen works in Docker builds - Make packaging install in install_from_docker_context_files.sh conditional on constraints.txt existing, since the uv.lock path skips constraints download * Fix static checks: update uv.lock and breeze docs after rebase * Use install script with uv.lock constraints for dev dependencies in CI Revert the entrypoint_ci.sh change from `uv sync --all-packages` back to using the install_development_dependencies.py script. The uv sync approach fails when provider source directories are not fully available in the container (e.g. with selected mounts). Instead, generate constraints from uv.lock via `uv export` and pass them to the existing script, which installs only the needed development dependencies via `uv pip install`. Also add uv.lock to VOLUMES_FOR_SELECTED_MOUNTS so it is available inside containers using the "tests and providers" mount mode.
2026-03-15 15:20:53 +01:00
echo "${COLOR_RED}Failing because frozen uv.lock installation failed and fallback is disabled.${COLOR_RESET}"
echo
exit 1
fi
Simplify tooling by switching completely to uv (#48223) The lazy consensus decision has been made at the devlist to switch entirely to `uv` as development tool: link: https://lists.apache.org/thread/6xxdon9lmjx3xh8zw09xc5k9jxb2n256 This PR implements that decision and removes a lot of baggage connected to using `pip` additionally to uv to install and sync the environment. It also introduces more consistency in the way how distribution packages are used in airflow sources - basicaly switching all internal distributions to use `pyproject.toml` approach and linking them all together via `uv`'s workspace feature. This enables much more streamlined development workflows, where any part of airflow development is manageable using `uv sync` in the right distribution - opening the way to moving more of the "sub-worfklows" from the CI image to local virtualenv environment. Unfortunately, such change cannot be done incrementally, really, because any change in the project layout drags with itself a lot of changes in the test/CI/management scripts, so we have to implement one big PR covering the move. This PR is "safe" in terms of the airflow and provider's code - it does not **really** (except occasional imports and type hint changes resulting from better isolation of packages) change Airflow code nor it should not affect any airflow or provider code, because it does not move any of the folder where airflow or provider's code is modified. It does move the test code - in a number of "auxiliary" distributions we have. It also moves the `docs` generation code to `devel-common` and introduces separate conf.py files for every doc package. What is still NOT done after that move and will be covered in the follow-up changes: * isolating docs-building to have separate configuraiton for docs building per distribution - allowing to run doc build locally with it's own conf.py file * moving some of the tests and checks out from breeze container image up to the local environment (for example mypy checks) and likely isolating them per-provider * Constraints are still generated using `pip freeze` and automatically managed by our custom scripts in `canary` builds - this will be replaced later by switching to `uv.lock` mechanism. * potentially, we could merge `devel-common` and `dev` - to be considered as a follow-up. * PROD image is stil build with `pip` by default when using `PyPI` or distribution packages - but we do not support building the source image with `pip` - when building from sources, uv is forced internally to install packages. Currently we have no plans to change default PROD building to use `uv`. This is the detailed list of changes implemented in this PR: * uv is now mandatory to install as pre-requisite in order to develop airflow. We do not support installing airflow for development with `pip` - there will be a lot of cases where it will not work for development - including development dependencies and installing several distributions together. * removed meta-package `hatch_build.py' and replacing it with pre-commit automatically modifying declarative pyproject.toml * stripped down `hatch_build_airflow_core.py` to only cover custom git and asset build hooks (and renaming the file to `hatch_build.py` and moving all airflow dependencies to `pyproject.toml` * converted "loose" packages in airflow repo into distributions: * docker-tests * kubernetes-tests * helm-tests * dev (here we do not have `src` subfolder - sources are directly in the distribution, which is for-now inconsistent with other distributions). The names of the `_tests` distribution folders have been renamed to the `-tests` convention to make sure the imports are always referring to base of each distribution and are not used from the content root. * Each eof the distributions (on top of already existing airflow-core, task-sdk, devel-common and 90+providers has it's own set of dependencies, and the top-level meta-package workspace root brings those distributions together allowing to install them all tegether with a simple `uv sync --all-packages` command and come up with consistent set of dependencies that are good for all those packages (yay!). This is used to build CI image with single common environment to run the tests (with some quirks due to constraints use where we have to manually list all distributions until we switch to `uv.lock` mechanism) * `doc` code is moved to `devel-common` distribution. The `doc` folder only keeps README informing where the other doc code is, the spelling_wordlist.txt and start_docs_server.sh. The documentation is generated in `generated/generated-docs/` folder which is entirely .gitignored. * the documentation is now fully moved to: * `airflow-core/docs` - documentation for Airflow Core * `providers/**/docs` - documentation for Providers * `chart/docs` - documentation for Helm Chart * `task-sdk/docs` - documentation for Task SDK (new format not yet published) * `docker-stack-docs` - documentation for Docker Stack' * `providers-summary-docs` - documentation for provider summary page * `versions` are not dynamically retrieved from `__init__.py` all of them are synchronized directly to pyproject.toml files - this way - except the custom build hook - we have no dynamic components in our `pyproject.toml` properties. * references to extras were removed from INSTALL and other places, the only references to extras remains in the user documentation - we stop using extras for local development, we switch to using dependency groups. * backtracking command was removed from breeze - we did not need it since we started using `uv` * internal commands (except constraint generation) have been moved to `uv` from `pip` * breeze requires `uv` to be installed and expects to be installed by `uv tool install -e ./dev/breeze` * pyproject.tomls are dynamically modified when we add a version suffix dynamically (`--version-suffix-for-pypi`) - only for the time of building the versions with updated suffix * `mypy` checks are now consistently used across all the different distributions and for consistency (and to fix some of the issues with namespace packages) rather than using "folder" approach when running mypy checks, even if we run mypy for whole distribution, we run check on individual files rather than on a folder. That adds consistency in execution of mypy heursistics. Rather than using in-container mypy script all the logic of selection and parameters passed to mypy are in pre-commit code. For now we are still using CI image to run mypy because mypy is very sensitive to version of dependencies installed, we should be able to switch to running mypy locally once we have the `uv.lock` mechanism incorporated in our workflows. * lower bounds for dependencies have been set consistently across all the distributions. With `uv sync` and dependabot, those should be generally kept consistently for the future * the `devel-common` dependencies have been groupped together in `devel-common` extras - including `basic`, `doc`, `doc-gen`, and `all` which will make it easier to install them for some OS-es (basic is used as default set of dependencies to cover most common set of development dependencies to be used for development) * generated/provider_dependencies.json are not committed to the repository any longer. They are .gitignored and geberated on-the-flight as needed (breeze will generate them automatically when empty and pre-commit will always regenerate them to be consistent with provider's pyproject.toml files. * `chart-utils` have been noved to `helm-tests` from `devel-common` as they were only used there. * for k8s tests we are using the `uv` main `.venv` environment rather than creating our own `.build` environment and we use `uv sync` to keep it in sync * Updated `uv` version to 0.6.10 * We are using `uv sync` to perform "upgrade to newer depencies" in `canary` builds and locally * leveldb has been turned into "dependency group" and removed from apache-airflow and apache-airflow-core extras, it is now only available by google provider's leveldb optional extra to install with `pip`
2025-04-02 13:11:13 +02:00
echo
Switch CI dependency management from constraints to uv.lock (#63609) * Switch CI dependency management from constraints to uv.lock closes: #54609 * Fix selective_checks tests for push events without upgrade Push events no longer trigger upgrade-to-newer-dependencies unless uv.lock or pyproject.toml files changed. Updated test expectations. * Fix remaining selective_checks tests for push events Update two more test cases that expected upgrade-to-newer-dependencies to be true for PUSH events. * Fix CI failures: include uv.lock in Docker context and handle missing constraints - Add uv.lock to .dockerignore allowlist so uv sync --frozen works in Docker builds - Make packaging install in install_from_docker_context_files.sh conditional on constraints.txt existing, since the uv.lock path skips constraints download * Fix static checks: update uv.lock and breeze docs after rebase * Use install script with uv.lock constraints for dev dependencies in CI Revert the entrypoint_ci.sh change from `uv sync --all-packages` back to using the install_development_dependencies.py script. The uv sync approach fails when provider source directories are not fully available in the container (e.g. with selected mounts). Instead, generate constraints from uv.lock via `uv export` and pass them to the existing script, which installs only the needed development dependencies via `uv pip install`. Also add uv.lock to VOLUMES_FOR_SELECTED_MOUNTS so it is available inside containers using the "tests and providers" mount mode.
2026-03-15 15:20:53 +01:00
echo "${COLOR_YELLOW}Likely pyproject.toml has new dependencies not reflected in uv.lock.${COLOR_RESET}"
Simplify tooling by switching completely to uv (#48223) The lazy consensus decision has been made at the devlist to switch entirely to `uv` as development tool: link: https://lists.apache.org/thread/6xxdon9lmjx3xh8zw09xc5k9jxb2n256 This PR implements that decision and removes a lot of baggage connected to using `pip` additionally to uv to install and sync the environment. It also introduces more consistency in the way how distribution packages are used in airflow sources - basicaly switching all internal distributions to use `pyproject.toml` approach and linking them all together via `uv`'s workspace feature. This enables much more streamlined development workflows, where any part of airflow development is manageable using `uv sync` in the right distribution - opening the way to moving more of the "sub-worfklows" from the CI image to local virtualenv environment. Unfortunately, such change cannot be done incrementally, really, because any change in the project layout drags with itself a lot of changes in the test/CI/management scripts, so we have to implement one big PR covering the move. This PR is "safe" in terms of the airflow and provider's code - it does not **really** (except occasional imports and type hint changes resulting from better isolation of packages) change Airflow code nor it should not affect any airflow or provider code, because it does not move any of the folder where airflow or provider's code is modified. It does move the test code - in a number of "auxiliary" distributions we have. It also moves the `docs` generation code to `devel-common` and introduces separate conf.py files for every doc package. What is still NOT done after that move and will be covered in the follow-up changes: * isolating docs-building to have separate configuraiton for docs building per distribution - allowing to run doc build locally with it's own conf.py file * moving some of the tests and checks out from breeze container image up to the local environment (for example mypy checks) and likely isolating them per-provider * Constraints are still generated using `pip freeze` and automatically managed by our custom scripts in `canary` builds - this will be replaced later by switching to `uv.lock` mechanism. * potentially, we could merge `devel-common` and `dev` - to be considered as a follow-up. * PROD image is stil build with `pip` by default when using `PyPI` or distribution packages - but we do not support building the source image with `pip` - when building from sources, uv is forced internally to install packages. Currently we have no plans to change default PROD building to use `uv`. This is the detailed list of changes implemented in this PR: * uv is now mandatory to install as pre-requisite in order to develop airflow. We do not support installing airflow for development with `pip` - there will be a lot of cases where it will not work for development - including development dependencies and installing several distributions together. * removed meta-package `hatch_build.py' and replacing it with pre-commit automatically modifying declarative pyproject.toml * stripped down `hatch_build_airflow_core.py` to only cover custom git and asset build hooks (and renaming the file to `hatch_build.py` and moving all airflow dependencies to `pyproject.toml` * converted "loose" packages in airflow repo into distributions: * docker-tests * kubernetes-tests * helm-tests * dev (here we do not have `src` subfolder - sources are directly in the distribution, which is for-now inconsistent with other distributions). The names of the `_tests` distribution folders have been renamed to the `-tests` convention to make sure the imports are always referring to base of each distribution and are not used from the content root. * Each eof the distributions (on top of already existing airflow-core, task-sdk, devel-common and 90+providers has it's own set of dependencies, and the top-level meta-package workspace root brings those distributions together allowing to install them all tegether with a simple `uv sync --all-packages` command and come up with consistent set of dependencies that are good for all those packages (yay!). This is used to build CI image with single common environment to run the tests (with some quirks due to constraints use where we have to manually list all distributions until we switch to `uv.lock` mechanism) * `doc` code is moved to `devel-common` distribution. The `doc` folder only keeps README informing where the other doc code is, the spelling_wordlist.txt and start_docs_server.sh. The documentation is generated in `generated/generated-docs/` folder which is entirely .gitignored. * the documentation is now fully moved to: * `airflow-core/docs` - documentation for Airflow Core * `providers/**/docs` - documentation for Providers * `chart/docs` - documentation for Helm Chart * `task-sdk/docs` - documentation for Task SDK (new format not yet published) * `docker-stack-docs` - documentation for Docker Stack' * `providers-summary-docs` - documentation for provider summary page * `versions` are not dynamically retrieved from `__init__.py` all of them are synchronized directly to pyproject.toml files - this way - except the custom build hook - we have no dynamic components in our `pyproject.toml` properties. * references to extras were removed from INSTALL and other places, the only references to extras remains in the user documentation - we stop using extras for local development, we switch to using dependency groups. * backtracking command was removed from breeze - we did not need it since we started using `uv` * internal commands (except constraint generation) have been moved to `uv` from `pip` * breeze requires `uv` to be installed and expects to be installed by `uv tool install -e ./dev/breeze` * pyproject.tomls are dynamically modified when we add a version suffix dynamically (`--version-suffix-for-pypi`) - only for the time of building the versions with updated suffix * `mypy` checks are now consistently used across all the different distributions and for consistency (and to fix some of the issues with namespace packages) rather than using "folder" approach when running mypy checks, even if we run mypy for whole distribution, we run check on individual files rather than on a folder. That adds consistency in execution of mypy heursistics. Rather than using in-container mypy script all the logic of selection and parameters passed to mypy are in pre-commit code. For now we are still using CI image to run mypy because mypy is very sensitive to version of dependencies installed, we should be able to switch to running mypy locally once we have the `uv.lock` mechanism incorporated in our workflows. * lower bounds for dependencies have been set consistently across all the distributions. With `uv sync` and dependabot, those should be generally kept consistently for the future * the `devel-common` dependencies have been groupped together in `devel-common` extras - including `basic`, `doc`, `doc-gen`, and `all` which will make it easier to install them for some OS-es (basic is used as default set of dependencies to cover most common set of development dependencies to be used for development) * generated/provider_dependencies.json are not committed to the repository any longer. They are .gitignored and geberated on-the-flight as needed (breeze will generate them automatically when empty and pre-commit will always regenerate them to be consistent with provider's pyproject.toml files. * `chart-utils` have been noved to `helm-tests` from `devel-common` as they were only used there. * for k8s tests we are using the `uv` main `.venv` environment rather than creating our own `.build` environment and we use `uv sync` to keep it in sync * Updated `uv` version to 0.6.10 * We are using `uv sync` to perform "upgrade to newer depencies" in `canary` builds and locally * leveldb has been turned into "dependency group" and removed from apache-airflow and apache-airflow-core extras, it is now only available by google provider's leveldb optional extra to install with `pip`
2025-04-02 13:11:13 +02:00
echo
Switch CI dependency management from constraints to uv.lock (#63609) * Switch CI dependency management from constraints to uv.lock closes: #54609 * Fix selective_checks tests for push events without upgrade Push events no longer trigger upgrade-to-newer-dependencies unless uv.lock or pyproject.toml files changed. Updated test expectations. * Fix remaining selective_checks tests for push events Update two more test cases that expected upgrade-to-newer-dependencies to be true for PUSH events. * Fix CI failures: include uv.lock in Docker context and handle missing constraints - Add uv.lock to .dockerignore allowlist so uv sync --frozen works in Docker builds - Make packaging install in install_from_docker_context_files.sh conditional on constraints.txt existing, since the uv.lock path skips constraints download * Fix static checks: update uv.lock and breeze docs after rebase * Use install script with uv.lock constraints for dev dependencies in CI Revert the entrypoint_ci.sh change from `uv sync --all-packages` back to using the install_development_dependencies.py script. The uv sync approach fails when provider source directories are not fully available in the container (e.g. with selected mounts). Instead, generate constraints from uv.lock via `uv export` and pass them to the existing script, which installs only the needed development dependencies via `uv pip install`. Also add uv.lock to VOLUMES_FOR_SELECTED_MOUNTS so it is available inside containers using the "tests and providers" mount mode.
2026-03-15 15:20:53 +01:00
echo "${COLOR_BLUE}Falling back to re-resolving dependencies (uv sync without --frozen).${COLOR_RESET}"
Simplify tooling by switching completely to uv (#48223) The lazy consensus decision has been made at the devlist to switch entirely to `uv` as development tool: link: https://lists.apache.org/thread/6xxdon9lmjx3xh8zw09xc5k9jxb2n256 This PR implements that decision and removes a lot of baggage connected to using `pip` additionally to uv to install and sync the environment. It also introduces more consistency in the way how distribution packages are used in airflow sources - basicaly switching all internal distributions to use `pyproject.toml` approach and linking them all together via `uv`'s workspace feature. This enables much more streamlined development workflows, where any part of airflow development is manageable using `uv sync` in the right distribution - opening the way to moving more of the "sub-worfklows" from the CI image to local virtualenv environment. Unfortunately, such change cannot be done incrementally, really, because any change in the project layout drags with itself a lot of changes in the test/CI/management scripts, so we have to implement one big PR covering the move. This PR is "safe" in terms of the airflow and provider's code - it does not **really** (except occasional imports and type hint changes resulting from better isolation of packages) change Airflow code nor it should not affect any airflow or provider code, because it does not move any of the folder where airflow or provider's code is modified. It does move the test code - in a number of "auxiliary" distributions we have. It also moves the `docs` generation code to `devel-common` and introduces separate conf.py files for every doc package. What is still NOT done after that move and will be covered in the follow-up changes: * isolating docs-building to have separate configuraiton for docs building per distribution - allowing to run doc build locally with it's own conf.py file * moving some of the tests and checks out from breeze container image up to the local environment (for example mypy checks) and likely isolating them per-provider * Constraints are still generated using `pip freeze` and automatically managed by our custom scripts in `canary` builds - this will be replaced later by switching to `uv.lock` mechanism. * potentially, we could merge `devel-common` and `dev` - to be considered as a follow-up. * PROD image is stil build with `pip` by default when using `PyPI` or distribution packages - but we do not support building the source image with `pip` - when building from sources, uv is forced internally to install packages. Currently we have no plans to change default PROD building to use `uv`. This is the detailed list of changes implemented in this PR: * uv is now mandatory to install as pre-requisite in order to develop airflow. We do not support installing airflow for development with `pip` - there will be a lot of cases where it will not work for development - including development dependencies and installing several distributions together. * removed meta-package `hatch_build.py' and replacing it with pre-commit automatically modifying declarative pyproject.toml * stripped down `hatch_build_airflow_core.py` to only cover custom git and asset build hooks (and renaming the file to `hatch_build.py` and moving all airflow dependencies to `pyproject.toml` * converted "loose" packages in airflow repo into distributions: * docker-tests * kubernetes-tests * helm-tests * dev (here we do not have `src` subfolder - sources are directly in the distribution, which is for-now inconsistent with other distributions). The names of the `_tests` distribution folders have been renamed to the `-tests` convention to make sure the imports are always referring to base of each distribution and are not used from the content root. * Each eof the distributions (on top of already existing airflow-core, task-sdk, devel-common and 90+providers has it's own set of dependencies, and the top-level meta-package workspace root brings those distributions together allowing to install them all tegether with a simple `uv sync --all-packages` command and come up with consistent set of dependencies that are good for all those packages (yay!). This is used to build CI image with single common environment to run the tests (with some quirks due to constraints use where we have to manually list all distributions until we switch to `uv.lock` mechanism) * `doc` code is moved to `devel-common` distribution. The `doc` folder only keeps README informing where the other doc code is, the spelling_wordlist.txt and start_docs_server.sh. The documentation is generated in `generated/generated-docs/` folder which is entirely .gitignored. * the documentation is now fully moved to: * `airflow-core/docs` - documentation for Airflow Core * `providers/**/docs` - documentation for Providers * `chart/docs` - documentation for Helm Chart * `task-sdk/docs` - documentation for Task SDK (new format not yet published) * `docker-stack-docs` - documentation for Docker Stack' * `providers-summary-docs` - documentation for provider summary page * `versions` are not dynamically retrieved from `__init__.py` all of them are synchronized directly to pyproject.toml files - this way - except the custom build hook - we have no dynamic components in our `pyproject.toml` properties. * references to extras were removed from INSTALL and other places, the only references to extras remains in the user documentation - we stop using extras for local development, we switch to using dependency groups. * backtracking command was removed from breeze - we did not need it since we started using `uv` * internal commands (except constraint generation) have been moved to `uv` from `pip` * breeze requires `uv` to be installed and expects to be installed by `uv tool install -e ./dev/breeze` * pyproject.tomls are dynamically modified when we add a version suffix dynamically (`--version-suffix-for-pypi`) - only for the time of building the versions with updated suffix * `mypy` checks are now consistently used across all the different distributions and for consistency (and to fix some of the issues with namespace packages) rather than using "folder" approach when running mypy checks, even if we run mypy for whole distribution, we run check on individual files rather than on a folder. That adds consistency in execution of mypy heursistics. Rather than using in-container mypy script all the logic of selection and parameters passed to mypy are in pre-commit code. For now we are still using CI image to run mypy because mypy is very sensitive to version of dependencies installed, we should be able to switch to running mypy locally once we have the `uv.lock` mechanism incorporated in our workflows. * lower bounds for dependencies have been set consistently across all the distributions. With `uv sync` and dependabot, those should be generally kept consistently for the future * the `devel-common` dependencies have been groupped together in `devel-common` extras - including `basic`, `doc`, `doc-gen`, and `all` which will make it easier to install them for some OS-es (basic is used as default set of dependencies to cover most common set of development dependencies to be used for development) * generated/provider_dependencies.json are not committed to the repository any longer. They are .gitignored and geberated on-the-flight as needed (breeze will generate them automatically when empty and pre-commit will always regenerate them to be consistent with provider's pyproject.toml files. * `chart-utils` have been noved to `helm-tests` from `devel-common` as they were only used there. * for k8s tests we are using the `uv` main `.venv` environment rather than creating our own `.build` environment and we use `uv sync` to keep it in sync * Updated `uv` version to 0.6.10 * We are using `uv sync` to perform "upgrade to newer depencies" in `canary` builds and locally * leveldb has been turned into "dependency group" and removed from apache-airflow and apache-airflow-core extras, it is now only available by google provider's leveldb optional extra to install with `pip`
2025-04-02 13:11:13 +02:00
echo
set -x
uv sync --all-packages --group dev --group docs --group docs-gen \
--group leveldb ${extra_sync_flags} --no-binary-package lxml --no-binary-package xmlsec \
--no-python-downloads --no-managed-python
Simplify tooling by switching completely to uv (#48223) The lazy consensus decision has been made at the devlist to switch entirely to `uv` as development tool: link: https://lists.apache.org/thread/6xxdon9lmjx3xh8zw09xc5k9jxb2n256 This PR implements that decision and removes a lot of baggage connected to using `pip` additionally to uv to install and sync the environment. It also introduces more consistency in the way how distribution packages are used in airflow sources - basicaly switching all internal distributions to use `pyproject.toml` approach and linking them all together via `uv`'s workspace feature. This enables much more streamlined development workflows, where any part of airflow development is manageable using `uv sync` in the right distribution - opening the way to moving more of the "sub-worfklows" from the CI image to local virtualenv environment. Unfortunately, such change cannot be done incrementally, really, because any change in the project layout drags with itself a lot of changes in the test/CI/management scripts, so we have to implement one big PR covering the move. This PR is "safe" in terms of the airflow and provider's code - it does not **really** (except occasional imports and type hint changes resulting from better isolation of packages) change Airflow code nor it should not affect any airflow or provider code, because it does not move any of the folder where airflow or provider's code is modified. It does move the test code - in a number of "auxiliary" distributions we have. It also moves the `docs` generation code to `devel-common` and introduces separate conf.py files for every doc package. What is still NOT done after that move and will be covered in the follow-up changes: * isolating docs-building to have separate configuraiton for docs building per distribution - allowing to run doc build locally with it's own conf.py file * moving some of the tests and checks out from breeze container image up to the local environment (for example mypy checks) and likely isolating them per-provider * Constraints are still generated using `pip freeze` and automatically managed by our custom scripts in `canary` builds - this will be replaced later by switching to `uv.lock` mechanism. * potentially, we could merge `devel-common` and `dev` - to be considered as a follow-up. * PROD image is stil build with `pip` by default when using `PyPI` or distribution packages - but we do not support building the source image with `pip` - when building from sources, uv is forced internally to install packages. Currently we have no plans to change default PROD building to use `uv`. This is the detailed list of changes implemented in this PR: * uv is now mandatory to install as pre-requisite in order to develop airflow. We do not support installing airflow for development with `pip` - there will be a lot of cases where it will not work for development - including development dependencies and installing several distributions together. * removed meta-package `hatch_build.py' and replacing it with pre-commit automatically modifying declarative pyproject.toml * stripped down `hatch_build_airflow_core.py` to only cover custom git and asset build hooks (and renaming the file to `hatch_build.py` and moving all airflow dependencies to `pyproject.toml` * converted "loose" packages in airflow repo into distributions: * docker-tests * kubernetes-tests * helm-tests * dev (here we do not have `src` subfolder - sources are directly in the distribution, which is for-now inconsistent with other distributions). The names of the `_tests` distribution folders have been renamed to the `-tests` convention to make sure the imports are always referring to base of each distribution and are not used from the content root. * Each eof the distributions (on top of already existing airflow-core, task-sdk, devel-common and 90+providers has it's own set of dependencies, and the top-level meta-package workspace root brings those distributions together allowing to install them all tegether with a simple `uv sync --all-packages` command and come up with consistent set of dependencies that are good for all those packages (yay!). This is used to build CI image with single common environment to run the tests (with some quirks due to constraints use where we have to manually list all distributions until we switch to `uv.lock` mechanism) * `doc` code is moved to `devel-common` distribution. The `doc` folder only keeps README informing where the other doc code is, the spelling_wordlist.txt and start_docs_server.sh. The documentation is generated in `generated/generated-docs/` folder which is entirely .gitignored. * the documentation is now fully moved to: * `airflow-core/docs` - documentation for Airflow Core * `providers/**/docs` - documentation for Providers * `chart/docs` - documentation for Helm Chart * `task-sdk/docs` - documentation for Task SDK (new format not yet published) * `docker-stack-docs` - documentation for Docker Stack' * `providers-summary-docs` - documentation for provider summary page * `versions` are not dynamically retrieved from `__init__.py` all of them are synchronized directly to pyproject.toml files - this way - except the custom build hook - we have no dynamic components in our `pyproject.toml` properties. * references to extras were removed from INSTALL and other places, the only references to extras remains in the user documentation - we stop using extras for local development, we switch to using dependency groups. * backtracking command was removed from breeze - we did not need it since we started using `uv` * internal commands (except constraint generation) have been moved to `uv` from `pip` * breeze requires `uv` to be installed and expects to be installed by `uv tool install -e ./dev/breeze` * pyproject.tomls are dynamically modified when we add a version suffix dynamically (`--version-suffix-for-pypi`) - only for the time of building the versions with updated suffix * `mypy` checks are now consistently used across all the different distributions and for consistency (and to fix some of the issues with namespace packages) rather than using "folder" approach when running mypy checks, even if we run mypy for whole distribution, we run check on individual files rather than on a folder. That adds consistency in execution of mypy heursistics. Rather than using in-container mypy script all the logic of selection and parameters passed to mypy are in pre-commit code. For now we are still using CI image to run mypy because mypy is very sensitive to version of dependencies installed, we should be able to switch to running mypy locally once we have the `uv.lock` mechanism incorporated in our workflows. * lower bounds for dependencies have been set consistently across all the distributions. With `uv sync` and dependabot, those should be generally kept consistently for the future * the `devel-common` dependencies have been groupped together in `devel-common` extras - including `basic`, `doc`, `doc-gen`, and `all` which will make it easier to install them for some OS-es (basic is used as default set of dependencies to cover most common set of development dependencies to be used for development) * generated/provider_dependencies.json are not committed to the repository any longer. They are .gitignored and geberated on-the-flight as needed (breeze will generate them automatically when empty and pre-commit will always regenerate them to be consistent with provider's pyproject.toml files. * `chart-utils` have been noved to `helm-tests` from `devel-common` as they were only used there. * for k8s tests we are using the `uv` main `.venv` environment rather than creating our own `.build` environment and we use `uv sync` to keep it in sync * Updated `uv` version to 0.6.10 * We are using `uv sync` to perform "upgrade to newer depencies" in `canary` builds and locally * leveldb has been turned into "dependency group" and removed from apache-airflow and apache-airflow-core extras, it is now only available by google provider's leveldb optional extra to install with `pip`
2025-04-02 13:11:13 +02:00
set +x
fi
fi
}
function install_from_external_spec() {
local installation_command_flags
if [[ ${AIRFLOW_INSTALLATION_METHOD} == "apache-airflow" ]]; then
Switch from --user to venv for PROD image and enable uv (#37796) This PR introduces a joint way to treat the .local (--user) folder as both - venv and `--user` package installation. It fixes a number of problems the `--user` installation created us in the past and does it in fully backwards compatible way. This improves both "production" use for end user as well as local iteration on the PROD image during tests - but also for CI. Improvements for "end user": * user does not have to use `pip install --user` to install new packages any more and it is not enabled by default with PIP_USER flag. * users can use uv to install packages when they extend the image (but it's not obligatory - pip continues working as it did) * users can use `uv` to build custom production image, which gives 40%-50% saving for image build time compring to `pip`. * python -m venv --system-site-packages continues to use the .local packages from the .local installation (and not uses them if --system-site-packages is not used) - so we have full compatibility with previous images. Improvements for development: * when image is built from sources (no --use-docker-context-files are specified), airflow is installed in --editable mode, which means that airflow + all providers are installed locally from airflow sources, not from packages - which means that both airflow and providers have the latest version inside the prod image. * when local sources changes and you want to run k8s tests locally, it is now WAY faster (several minutes) to iterate with your changes because you do not have to rebuild the base image - the only thing needed is to copy sources to the PROD image to "/opt/airflow" which is where editable installlation is done from. You only need to rebuild the image if dependencies change. * By default `uv` is used for local source build for k8s tests so even if you have to rebuild it, it is way faster (60%-80%) during iterating with the image. CI/DEV tooling improvements: * this PR switches to use `uv` by default for most prod images we build in CI, but it adds a check if the image still builds with `pip`. * we also switch to more PEP standard way of installing packages from local filesystem (package-name @ file:///FILE) Fixes: #37785 Fixes: #37815 Update contributing-docs/testing/k8s_tests.rst Update contributing-docs/testing/k8s_tests.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update scripts/docker/install_airflow.sh Update docs/docker-stack/changelog.rst Update docs/docker-stack/build.rst Co-authored-by: Niko Oliveira <onikolas@amazon.com>
2024-03-06 01:27:15 +01:00
installation_command_flags="apache-airflow[${AIRFLOW_EXTRAS}]${AIRFLOW_VERSION_SPECIFICATION}"
else
echo
echo "${COLOR_RED}The '${AIRFLOW_INSTALLATION_METHOD}' installation method is not supported${COLOR_RESET}"
Switch from --user to venv for PROD image and enable uv (#37796) This PR introduces a joint way to treat the .local (--user) folder as both - venv and `--user` package installation. It fixes a number of problems the `--user` installation created us in the past and does it in fully backwards compatible way. This improves both "production" use for end user as well as local iteration on the PROD image during tests - but also for CI. Improvements for "end user": * user does not have to use `pip install --user` to install new packages any more and it is not enabled by default with PIP_USER flag. * users can use uv to install packages when they extend the image (but it's not obligatory - pip continues working as it did) * users can use `uv` to build custom production image, which gives 40%-50% saving for image build time compring to `pip`. * python -m venv --system-site-packages continues to use the .local packages from the .local installation (and not uses them if --system-site-packages is not used) - so we have full compatibility with previous images. Improvements for development: * when image is built from sources (no --use-docker-context-files are specified), airflow is installed in --editable mode, which means that airflow + all providers are installed locally from airflow sources, not from packages - which means that both airflow and providers have the latest version inside the prod image. * when local sources changes and you want to run k8s tests locally, it is now WAY faster (several minutes) to iterate with your changes because you do not have to rebuild the base image - the only thing needed is to copy sources to the PROD image to "/opt/airflow" which is where editable installlation is done from. You only need to rebuild the image if dependencies change. * By default `uv` is used for local source build for k8s tests so even if you have to rebuild it, it is way faster (60%-80%) during iterating with the image. CI/DEV tooling improvements: * this PR switches to use `uv` by default for most prod images we build in CI, but it adds a check if the image still builds with `pip`. * we also switch to more PEP standard way of installing packages from local filesystem (package-name @ file:///FILE) Fixes: #37785 Fixes: #37815 Update contributing-docs/testing/k8s_tests.rst Update contributing-docs/testing/k8s_tests.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update scripts/docker/install_airflow.sh Update docs/docker-stack/changelog.rst Update docs/docker-stack/build.rst Co-authored-by: Niko Oliveira <onikolas@amazon.com>
2024-03-06 01:27:15 +01:00
echo
Migrated prod builds to use python built from source (#53770) Moved the prod build to also use the src built python as done by CI. This let's us iterate faster on our python versions without needing to wait for the community image of python. This involves a number of changes: * 3.0.5 -> 3.0.6 airflow version change * use released official packages rather than git repo to install Python * added wget (it was added in the original image to pull python packages so added for compatibility) added those flags to python build (same as in the * original build) * --with-ensurepip --build="$gnuArch" * --enable-loadable-sqlite-extension * --enable-option-checking=fatal * --enable-shared * --with-lto * added cleanup of apt after installing packages * added removal of .pyc/.test etc. files (saves 350 MB) * added relinking of symbolic links from /usr/python/bin to /usr/local/bin in the "main" image as well as in the build image. * we do not need AIRFLOW_SETUPTOOLS_VERSION any more - this was only added to upgrade setuptools, because the Python official image had a very old setuptools version. * checked all "customize" scripts and make them "work" * Updated the 'version upgrade" script to upgrade AIRFLOW_PYTHON_VERSION everywhere * Updated Changelog and documentation to update new ways of building images * removed installation with GitHub URL (it won't work easily after splitting to multiple packages - not easily at least) and it's not needed * all python, pip and similar links are created in /usr/python/bin * /usr/python/bin is always first in the PATH - before /usr/local/bin * added changelog entry explaining that Python's installation home has been moved to /usr/python/ from /usr/local * removal of installed editable distributions in breeze happens now first and THEN we install when --use-airflow-version is used. * LD_LIBRARY_PATH was not set so the shared python libraries could not be loaded when venv was created Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
2025-08-30 20:18:30 +05:30
echo "${COLOR_YELLOW}Supported methods are ('.', 'apache-airflow')${COLOR_RESET}"
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
echo
exit 1
fi
Simplify tooling by switching completely to uv (#48223) The lazy consensus decision has been made at the devlist to switch entirely to `uv` as development tool: link: https://lists.apache.org/thread/6xxdon9lmjx3xh8zw09xc5k9jxb2n256 This PR implements that decision and removes a lot of baggage connected to using `pip` additionally to uv to install and sync the environment. It also introduces more consistency in the way how distribution packages are used in airflow sources - basicaly switching all internal distributions to use `pyproject.toml` approach and linking them all together via `uv`'s workspace feature. This enables much more streamlined development workflows, where any part of airflow development is manageable using `uv sync` in the right distribution - opening the way to moving more of the "sub-worfklows" from the CI image to local virtualenv environment. Unfortunately, such change cannot be done incrementally, really, because any change in the project layout drags with itself a lot of changes in the test/CI/management scripts, so we have to implement one big PR covering the move. This PR is "safe" in terms of the airflow and provider's code - it does not **really** (except occasional imports and type hint changes resulting from better isolation of packages) change Airflow code nor it should not affect any airflow or provider code, because it does not move any of the folder where airflow or provider's code is modified. It does move the test code - in a number of "auxiliary" distributions we have. It also moves the `docs` generation code to `devel-common` and introduces separate conf.py files for every doc package. What is still NOT done after that move and will be covered in the follow-up changes: * isolating docs-building to have separate configuraiton for docs building per distribution - allowing to run doc build locally with it's own conf.py file * moving some of the tests and checks out from breeze container image up to the local environment (for example mypy checks) and likely isolating them per-provider * Constraints are still generated using `pip freeze` and automatically managed by our custom scripts in `canary` builds - this will be replaced later by switching to `uv.lock` mechanism. * potentially, we could merge `devel-common` and `dev` - to be considered as a follow-up. * PROD image is stil build with `pip` by default when using `PyPI` or distribution packages - but we do not support building the source image with `pip` - when building from sources, uv is forced internally to install packages. Currently we have no plans to change default PROD building to use `uv`. This is the detailed list of changes implemented in this PR: * uv is now mandatory to install as pre-requisite in order to develop airflow. We do not support installing airflow for development with `pip` - there will be a lot of cases where it will not work for development - including development dependencies and installing several distributions together. * removed meta-package `hatch_build.py' and replacing it with pre-commit automatically modifying declarative pyproject.toml * stripped down `hatch_build_airflow_core.py` to only cover custom git and asset build hooks (and renaming the file to `hatch_build.py` and moving all airflow dependencies to `pyproject.toml` * converted "loose" packages in airflow repo into distributions: * docker-tests * kubernetes-tests * helm-tests * dev (here we do not have `src` subfolder - sources are directly in the distribution, which is for-now inconsistent with other distributions). The names of the `_tests` distribution folders have been renamed to the `-tests` convention to make sure the imports are always referring to base of each distribution and are not used from the content root. * Each eof the distributions (on top of already existing airflow-core, task-sdk, devel-common and 90+providers has it's own set of dependencies, and the top-level meta-package workspace root brings those distributions together allowing to install them all tegether with a simple `uv sync --all-packages` command and come up with consistent set of dependencies that are good for all those packages (yay!). This is used to build CI image with single common environment to run the tests (with some quirks due to constraints use where we have to manually list all distributions until we switch to `uv.lock` mechanism) * `doc` code is moved to `devel-common` distribution. The `doc` folder only keeps README informing where the other doc code is, the spelling_wordlist.txt and start_docs_server.sh. The documentation is generated in `generated/generated-docs/` folder which is entirely .gitignored. * the documentation is now fully moved to: * `airflow-core/docs` - documentation for Airflow Core * `providers/**/docs` - documentation for Providers * `chart/docs` - documentation for Helm Chart * `task-sdk/docs` - documentation for Task SDK (new format not yet published) * `docker-stack-docs` - documentation for Docker Stack' * `providers-summary-docs` - documentation for provider summary page * `versions` are not dynamically retrieved from `__init__.py` all of them are synchronized directly to pyproject.toml files - this way - except the custom build hook - we have no dynamic components in our `pyproject.toml` properties. * references to extras were removed from INSTALL and other places, the only references to extras remains in the user documentation - we stop using extras for local development, we switch to using dependency groups. * backtracking command was removed from breeze - we did not need it since we started using `uv` * internal commands (except constraint generation) have been moved to `uv` from `pip` * breeze requires `uv` to be installed and expects to be installed by `uv tool install -e ./dev/breeze` * pyproject.tomls are dynamically modified when we add a version suffix dynamically (`--version-suffix-for-pypi`) - only for the time of building the versions with updated suffix * `mypy` checks are now consistently used across all the different distributions and for consistency (and to fix some of the issues with namespace packages) rather than using "folder" approach when running mypy checks, even if we run mypy for whole distribution, we run check on individual files rather than on a folder. That adds consistency in execution of mypy heursistics. Rather than using in-container mypy script all the logic of selection and parameters passed to mypy are in pre-commit code. For now we are still using CI image to run mypy because mypy is very sensitive to version of dependencies installed, we should be able to switch to running mypy locally once we have the `uv.lock` mechanism incorporated in our workflows. * lower bounds for dependencies have been set consistently across all the distributions. With `uv sync` and dependabot, those should be generally kept consistently for the future * the `devel-common` dependencies have been groupped together in `devel-common` extras - including `basic`, `doc`, `doc-gen`, and `all` which will make it easier to install them for some OS-es (basic is used as default set of dependencies to cover most common set of development dependencies to be used for development) * generated/provider_dependencies.json are not committed to the repository any longer. They are .gitignored and geberated on-the-flight as needed (breeze will generate them automatically when empty and pre-commit will always regenerate them to be consistent with provider's pyproject.toml files. * `chart-utils` have been noved to `helm-tests` from `devel-common` as they were only used there. * for k8s tests we are using the `uv` main `.venv` environment rather than creating our own `.build` environment and we use `uv sync` to keep it in sync * Updated `uv` version to 0.6.10 * We are using `uv sync` to perform "upgrade to newer depencies" in `canary` builds and locally * leveldb has been turned into "dependency group" and removed from apache-airflow and apache-airflow-core extras, it is now only available by google provider's leveldb optional extra to install with `pip`
2025-04-02 13:11:13 +02:00
if [[ "${UPGRADE_RANDOM_INDICATOR_STRING=}" != "" ]]; then
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
echo
Move airflow sources to airflow-core package (#47798) This is continuation of the separation of the Airflow codebase into separate distributions. This one splits airflow into two of them: * apache-airflow - becomes an empty, meta no-code distribution that only has dependencies to apache-airflow-core and task-sdk distributions and it has preinstalled provider distributions added in standard "wheel" distribution. All "extras" lead either to "apache-airflow-core" extras or to providers - the dependencies and optional dependencies are calculated differently depending on "editable" or "standard" mode - in editable mode, just provider dependencies are installed for preinstalled providers in standard mode - those preinstalled providers are dependencies. * the apache-airflow-core distribution contains all airflow core sources (previously in apache-airflow) and it has no provider extras. Thanks to that apache-airflow distribution does not have any dynamically calculated dependencies. * the apache-airflow-core distribution hs "hatch_build_airflow_core.py" build hooks that add custom build target and implement custom cleanup in order to implement compiling assets as part of the build. * During the move, the following changes were applied for consistency: * packages when used in context of distribution packages have been renamed to "distributions" - including all documentations and commands in breeze to void confusion with import packages (see https://packaging.python.org/en/latest/discussions/distribution-package-vs-import-package/) * all tests in `airflow-core` follow now the same convention where tests are in `unit`, `system` and `integration` package. no extra package has been as second level, because all the provider tests have "<PROVIDER>" there, so we just have to avoid naming airflow unit."<PROVIDER>" with the same name as provider. * all tooling in CI/DEV have been updated to follow the new structure. We should always build to packages now when we are building them using `breeze`.
2025-03-21 14:25:26 +01:00
echo "${COLOR_BLUE}Remove airflow and all provider distributions installed before potentially${COLOR_RESET}"
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
echo
set -x
${PACKAGING_TOOL_CMD} freeze | grep apache-airflow | xargs ${PACKAGING_TOOL_CMD} uninstall ${EXTRA_UNINSTALL_FLAGS} 2>/dev/null || true
set +x
echo
Simplify tooling by switching completely to uv (#48223) The lazy consensus decision has been made at the devlist to switch entirely to `uv` as development tool: link: https://lists.apache.org/thread/6xxdon9lmjx3xh8zw09xc5k9jxb2n256 This PR implements that decision and removes a lot of baggage connected to using `pip` additionally to uv to install and sync the environment. It also introduces more consistency in the way how distribution packages are used in airflow sources - basicaly switching all internal distributions to use `pyproject.toml` approach and linking them all together via `uv`'s workspace feature. This enables much more streamlined development workflows, where any part of airflow development is manageable using `uv sync` in the right distribution - opening the way to moving more of the "sub-worfklows" from the CI image to local virtualenv environment. Unfortunately, such change cannot be done incrementally, really, because any change in the project layout drags with itself a lot of changes in the test/CI/management scripts, so we have to implement one big PR covering the move. This PR is "safe" in terms of the airflow and provider's code - it does not **really** (except occasional imports and type hint changes resulting from better isolation of packages) change Airflow code nor it should not affect any airflow or provider code, because it does not move any of the folder where airflow or provider's code is modified. It does move the test code - in a number of "auxiliary" distributions we have. It also moves the `docs` generation code to `devel-common` and introduces separate conf.py files for every doc package. What is still NOT done after that move and will be covered in the follow-up changes: * isolating docs-building to have separate configuraiton for docs building per distribution - allowing to run doc build locally with it's own conf.py file * moving some of the tests and checks out from breeze container image up to the local environment (for example mypy checks) and likely isolating them per-provider * Constraints are still generated using `pip freeze` and automatically managed by our custom scripts in `canary` builds - this will be replaced later by switching to `uv.lock` mechanism. * potentially, we could merge `devel-common` and `dev` - to be considered as a follow-up. * PROD image is stil build with `pip` by default when using `PyPI` or distribution packages - but we do not support building the source image with `pip` - when building from sources, uv is forced internally to install packages. Currently we have no plans to change default PROD building to use `uv`. This is the detailed list of changes implemented in this PR: * uv is now mandatory to install as pre-requisite in order to develop airflow. We do not support installing airflow for development with `pip` - there will be a lot of cases where it will not work for development - including development dependencies and installing several distributions together. * removed meta-package `hatch_build.py' and replacing it with pre-commit automatically modifying declarative pyproject.toml * stripped down `hatch_build_airflow_core.py` to only cover custom git and asset build hooks (and renaming the file to `hatch_build.py` and moving all airflow dependencies to `pyproject.toml` * converted "loose" packages in airflow repo into distributions: * docker-tests * kubernetes-tests * helm-tests * dev (here we do not have `src` subfolder - sources are directly in the distribution, which is for-now inconsistent with other distributions). The names of the `_tests` distribution folders have been renamed to the `-tests` convention to make sure the imports are always referring to base of each distribution and are not used from the content root. * Each eof the distributions (on top of already existing airflow-core, task-sdk, devel-common and 90+providers has it's own set of dependencies, and the top-level meta-package workspace root brings those distributions together allowing to install them all tegether with a simple `uv sync --all-packages` command and come up with consistent set of dependencies that are good for all those packages (yay!). This is used to build CI image with single common environment to run the tests (with some quirks due to constraints use where we have to manually list all distributions until we switch to `uv.lock` mechanism) * `doc` code is moved to `devel-common` distribution. The `doc` folder only keeps README informing where the other doc code is, the spelling_wordlist.txt and start_docs_server.sh. The documentation is generated in `generated/generated-docs/` folder which is entirely .gitignored. * the documentation is now fully moved to: * `airflow-core/docs` - documentation for Airflow Core * `providers/**/docs` - documentation for Providers * `chart/docs` - documentation for Helm Chart * `task-sdk/docs` - documentation for Task SDK (new format not yet published) * `docker-stack-docs` - documentation for Docker Stack' * `providers-summary-docs` - documentation for provider summary page * `versions` are not dynamically retrieved from `__init__.py` all of them are synchronized directly to pyproject.toml files - this way - except the custom build hook - we have no dynamic components in our `pyproject.toml` properties. * references to extras were removed from INSTALL and other places, the only references to extras remains in the user documentation - we stop using extras for local development, we switch to using dependency groups. * backtracking command was removed from breeze - we did not need it since we started using `uv` * internal commands (except constraint generation) have been moved to `uv` from `pip` * breeze requires `uv` to be installed and expects to be installed by `uv tool install -e ./dev/breeze` * pyproject.tomls are dynamically modified when we add a version suffix dynamically (`--version-suffix-for-pypi`) - only for the time of building the versions with updated suffix * `mypy` checks are now consistently used across all the different distributions and for consistency (and to fix some of the issues with namespace packages) rather than using "folder" approach when running mypy checks, even if we run mypy for whole distribution, we run check on individual files rather than on a folder. That adds consistency in execution of mypy heursistics. Rather than using in-container mypy script all the logic of selection and parameters passed to mypy are in pre-commit code. For now we are still using CI image to run mypy because mypy is very sensitive to version of dependencies installed, we should be able to switch to running mypy locally once we have the `uv.lock` mechanism incorporated in our workflows. * lower bounds for dependencies have been set consistently across all the distributions. With `uv sync` and dependabot, those should be generally kept consistently for the future * the `devel-common` dependencies have been groupped together in `devel-common` extras - including `basic`, `doc`, `doc-gen`, and `all` which will make it easier to install them for some OS-es (basic is used as default set of dependencies to cover most common set of development dependencies to be used for development) * generated/provider_dependencies.json are not committed to the repository any longer. They are .gitignored and geberated on-the-flight as needed (breeze will generate them automatically when empty and pre-commit will always regenerate them to be consistent with provider's pyproject.toml files. * `chart-utils` have been noved to `helm-tests` from `devel-common` as they were only used there. * for k8s tests we are using the `uv` main `.venv` environment rather than creating our own `.build` environment and we use `uv sync` to keep it in sync * Updated `uv` version to 0.6.10 * We are using `uv sync` to perform "upgrade to newer depencies" in `canary` builds and locally * leveldb has been turned into "dependency group" and removed from apache-airflow and apache-airflow-core extras, it is now only available by google provider's leveldb optional extra to install with `pip`
2025-04-02 13:11:13 +02:00
echo "${COLOR_BLUE}Installing all packages with highest resolutions. Installation method: ${AIRFLOW_INSTALLATION_METHOD}${COLOR_RESET}"
echo
set -x
Simplify tooling by switching completely to uv (#48223) The lazy consensus decision has been made at the devlist to switch entirely to `uv` as development tool: link: https://lists.apache.org/thread/6xxdon9lmjx3xh8zw09xc5k9jxb2n256 This PR implements that decision and removes a lot of baggage connected to using `pip` additionally to uv to install and sync the environment. It also introduces more consistency in the way how distribution packages are used in airflow sources - basicaly switching all internal distributions to use `pyproject.toml` approach and linking them all together via `uv`'s workspace feature. This enables much more streamlined development workflows, where any part of airflow development is manageable using `uv sync` in the right distribution - opening the way to moving more of the "sub-worfklows" from the CI image to local virtualenv environment. Unfortunately, such change cannot be done incrementally, really, because any change in the project layout drags with itself a lot of changes in the test/CI/management scripts, so we have to implement one big PR covering the move. This PR is "safe" in terms of the airflow and provider's code - it does not **really** (except occasional imports and type hint changes resulting from better isolation of packages) change Airflow code nor it should not affect any airflow or provider code, because it does not move any of the folder where airflow or provider's code is modified. It does move the test code - in a number of "auxiliary" distributions we have. It also moves the `docs` generation code to `devel-common` and introduces separate conf.py files for every doc package. What is still NOT done after that move and will be covered in the follow-up changes: * isolating docs-building to have separate configuraiton for docs building per distribution - allowing to run doc build locally with it's own conf.py file * moving some of the tests and checks out from breeze container image up to the local environment (for example mypy checks) and likely isolating them per-provider * Constraints are still generated using `pip freeze` and automatically managed by our custom scripts in `canary` builds - this will be replaced later by switching to `uv.lock` mechanism. * potentially, we could merge `devel-common` and `dev` - to be considered as a follow-up. * PROD image is stil build with `pip` by default when using `PyPI` or distribution packages - but we do not support building the source image with `pip` - when building from sources, uv is forced internally to install packages. Currently we have no plans to change default PROD building to use `uv`. This is the detailed list of changes implemented in this PR: * uv is now mandatory to install as pre-requisite in order to develop airflow. We do not support installing airflow for development with `pip` - there will be a lot of cases where it will not work for development - including development dependencies and installing several distributions together. * removed meta-package `hatch_build.py' and replacing it with pre-commit automatically modifying declarative pyproject.toml * stripped down `hatch_build_airflow_core.py` to only cover custom git and asset build hooks (and renaming the file to `hatch_build.py` and moving all airflow dependencies to `pyproject.toml` * converted "loose" packages in airflow repo into distributions: * docker-tests * kubernetes-tests * helm-tests * dev (here we do not have `src` subfolder - sources are directly in the distribution, which is for-now inconsistent with other distributions). The names of the `_tests` distribution folders have been renamed to the `-tests` convention to make sure the imports are always referring to base of each distribution and are not used from the content root. * Each eof the distributions (on top of already existing airflow-core, task-sdk, devel-common and 90+providers has it's own set of dependencies, and the top-level meta-package workspace root brings those distributions together allowing to install them all tegether with a simple `uv sync --all-packages` command and come up with consistent set of dependencies that are good for all those packages (yay!). This is used to build CI image with single common environment to run the tests (with some quirks due to constraints use where we have to manually list all distributions until we switch to `uv.lock` mechanism) * `doc` code is moved to `devel-common` distribution. The `doc` folder only keeps README informing where the other doc code is, the spelling_wordlist.txt and start_docs_server.sh. The documentation is generated in `generated/generated-docs/` folder which is entirely .gitignored. * the documentation is now fully moved to: * `airflow-core/docs` - documentation for Airflow Core * `providers/**/docs` - documentation for Providers * `chart/docs` - documentation for Helm Chart * `task-sdk/docs` - documentation for Task SDK (new format not yet published) * `docker-stack-docs` - documentation for Docker Stack' * `providers-summary-docs` - documentation for provider summary page * `versions` are not dynamically retrieved from `__init__.py` all of them are synchronized directly to pyproject.toml files - this way - except the custom build hook - we have no dynamic components in our `pyproject.toml` properties. * references to extras were removed from INSTALL and other places, the only references to extras remains in the user documentation - we stop using extras for local development, we switch to using dependency groups. * backtracking command was removed from breeze - we did not need it since we started using `uv` * internal commands (except constraint generation) have been moved to `uv` from `pip` * breeze requires `uv` to be installed and expects to be installed by `uv tool install -e ./dev/breeze` * pyproject.tomls are dynamically modified when we add a version suffix dynamically (`--version-suffix-for-pypi`) - only for the time of building the versions with updated suffix * `mypy` checks are now consistently used across all the different distributions and for consistency (and to fix some of the issues with namespace packages) rather than using "folder" approach when running mypy checks, even if we run mypy for whole distribution, we run check on individual files rather than on a folder. That adds consistency in execution of mypy heursistics. Rather than using in-container mypy script all the logic of selection and parameters passed to mypy are in pre-commit code. For now we are still using CI image to run mypy because mypy is very sensitive to version of dependencies installed, we should be able to switch to running mypy locally once we have the `uv.lock` mechanism incorporated in our workflows. * lower bounds for dependencies have been set consistently across all the distributions. With `uv sync` and dependabot, those should be generally kept consistently for the future * the `devel-common` dependencies have been groupped together in `devel-common` extras - including `basic`, `doc`, `doc-gen`, and `all` which will make it easier to install them for some OS-es (basic is used as default set of dependencies to cover most common set of development dependencies to be used for development) * generated/provider_dependencies.json are not committed to the repository any longer. They are .gitignored and geberated on-the-flight as needed (breeze will generate them automatically when empty and pre-commit will always regenerate them to be consistent with provider's pyproject.toml files. * `chart-utils` have been noved to `helm-tests` from `devel-common` as they were only used there. * for k8s tests we are using the `uv` main `.venv` environment rather than creating our own `.build` environment and we use `uv sync` to keep it in sync * Updated `uv` version to 0.6.10 * We are using `uv sync` to perform "upgrade to newer depencies" in `canary` builds and locally * leveldb has been turned into "dependency group" and removed from apache-airflow and apache-airflow-core extras, it is now only available by google provider's leveldb optional extra to install with `pip`
2025-04-02 13:11:13 +02:00
${PACKAGING_TOOL_CMD} install ${EXTRA_INSTALL_FLAGS} ${UPGRADE_TO_HIGHEST_RESOLUTION} ${ADDITIONAL_PIP_INSTALL_FLAGS} ${installation_command_flags}
set +x
else
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
echo
Switch from --user to venv for PROD image and enable uv (#37796) This PR introduces a joint way to treat the .local (--user) folder as both - venv and `--user` package installation. It fixes a number of problems the `--user` installation created us in the past and does it in fully backwards compatible way. This improves both "production" use for end user as well as local iteration on the PROD image during tests - but also for CI. Improvements for "end user": * user does not have to use `pip install --user` to install new packages any more and it is not enabled by default with PIP_USER flag. * users can use uv to install packages when they extend the image (but it's not obligatory - pip continues working as it did) * users can use `uv` to build custom production image, which gives 40%-50% saving for image build time compring to `pip`. * python -m venv --system-site-packages continues to use the .local packages from the .local installation (and not uses them if --system-site-packages is not used) - so we have full compatibility with previous images. Improvements for development: * when image is built from sources (no --use-docker-context-files are specified), airflow is installed in --editable mode, which means that airflow + all providers are installed locally from airflow sources, not from packages - which means that both airflow and providers have the latest version inside the prod image. * when local sources changes and you want to run k8s tests locally, it is now WAY faster (several minutes) to iterate with your changes because you do not have to rebuild the base image - the only thing needed is to copy sources to the PROD image to "/opt/airflow" which is where editable installlation is done from. You only need to rebuild the image if dependencies change. * By default `uv` is used for local source build for k8s tests so even if you have to rebuild it, it is way faster (60%-80%) during iterating with the image. CI/DEV tooling improvements: * this PR switches to use `uv` by default for most prod images we build in CI, but it adds a check if the image still builds with `pip`. * we also switch to more PEP standard way of installing packages from local filesystem (package-name @ file:///FILE) Fixes: #37785 Fixes: #37815 Update contributing-docs/testing/k8s_tests.rst Update contributing-docs/testing/k8s_tests.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update scripts/docker/install_airflow.sh Update docs/docker-stack/changelog.rst Update docs/docker-stack/build.rst Co-authored-by: Niko Oliveira <onikolas@amazon.com>
2024-03-06 01:27:15 +01:00
echo "${COLOR_BLUE}Installing all packages with constraints. Installation method: ${AIRFLOW_INSTALLATION_METHOD}${COLOR_RESET}"
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
echo
set -x
Switch from --user to venv for PROD image and enable uv (#37796) This PR introduces a joint way to treat the .local (--user) folder as both - venv and `--user` package installation. It fixes a number of problems the `--user` installation created us in the past and does it in fully backwards compatible way. This improves both "production" use for end user as well as local iteration on the PROD image during tests - but also for CI. Improvements for "end user": * user does not have to use `pip install --user` to install new packages any more and it is not enabled by default with PIP_USER flag. * users can use uv to install packages when they extend the image (but it's not obligatory - pip continues working as it did) * users can use `uv` to build custom production image, which gives 40%-50% saving for image build time compring to `pip`. * python -m venv --system-site-packages continues to use the .local packages from the .local installation (and not uses them if --system-site-packages is not used) - so we have full compatibility with previous images. Improvements for development: * when image is built from sources (no --use-docker-context-files are specified), airflow is installed in --editable mode, which means that airflow + all providers are installed locally from airflow sources, not from packages - which means that both airflow and providers have the latest version inside the prod image. * when local sources changes and you want to run k8s tests locally, it is now WAY faster (several minutes) to iterate with your changes because you do not have to rebuild the base image - the only thing needed is to copy sources to the PROD image to "/opt/airflow" which is where editable installlation is done from. You only need to rebuild the image if dependencies change. * By default `uv` is used for local source build for k8s tests so even if you have to rebuild it, it is way faster (60%-80%) during iterating with the image. CI/DEV tooling improvements: * this PR switches to use `uv` by default for most prod images we build in CI, but it adds a check if the image still builds with `pip`. * we also switch to more PEP standard way of installing packages from local filesystem (package-name @ file:///FILE) Fixes: #37785 Fixes: #37815 Update contributing-docs/testing/k8s_tests.rst Update contributing-docs/testing/k8s_tests.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update scripts/docker/install_airflow.sh Update docs/docker-stack/changelog.rst Update docs/docker-stack/build.rst Co-authored-by: Niko Oliveira <onikolas@amazon.com>
2024-03-06 01:27:15 +01:00
if ! ${PACKAGING_TOOL_CMD} install ${EXTRA_INSTALL_FLAGS} ${ADDITIONAL_PIP_INSTALL_FLAGS} ${installation_command_flags} --constraint "${HOME}/constraints.txt"; then
set +x
if [[ ${AIRFLOW_FALLBACK_NO_CONSTRAINTS_INSTALLATION} != "true" ]]; then
echo
echo "${COLOR_RED}Failing because constraints installation failed and fallback is disabled.${COLOR_RESET}"
echo
exit 1
fi
fIx constraints use in CI image after uv change (#37845) With the change to switch to uv, we skipped constraints being used in CI image - in effect all PR were not using constraints, but they were using not constraint dependencues but lowest-direct mode of installation so direct dependencies would not be upgraded in such case, only the transitive ones, so the risk of failure was anyhow small even if someone released a new, breakong dependency. The reason is that `uv` currently does not support installing constraints from URL. We had been silently failing back to the "no-constraints" way in such case (this is default mode if for any reason constraint build fail in such case. It introduced the risk that in case 3rd-party breaking dependency was released it would also start breaking regular PRs, not only the "canary" build. We fix it by downloading constraints locally when they are remote and using them from there. While this is being worked on in https://github.com/astral-sh/uv/pull/2081 and likely to land in uv 0.1.14, it's also a good idea to actually download the constraints and keep them around - this might be handy if you want to later use constraints to install "golden" set of dependencies wihtout necessity to build the right URL - you can always use `${HOME}/constraints.txt`. This PR fixes it and also changes the fallback mechanism to perform the lowest-direct upgrade only in case the constraint build fails, rather than always run the lowest-dirct upgrade even if constraints install works fine - this will make sure that most PRs are using exactly the constraint version of the dependencies (at least the version of constraints that were generated last time when pyproject.toml changed).
2024-03-02 11:23:58 +01:00
echo
echo "${COLOR_YELLOW}Likely pyproject.toml has new dependencies conflicting with constraints.${COLOR_RESET}"
echo
echo "${COLOR_BLUE}Falling back to no-constraints installation.${COLOR_RESET}"
fIx constraints use in CI image after uv change (#37845) With the change to switch to uv, we skipped constraints being used in CI image - in effect all PR were not using constraints, but they were using not constraint dependencues but lowest-direct mode of installation so direct dependencies would not be upgraded in such case, only the transitive ones, so the risk of failure was anyhow small even if someone released a new, breakong dependency. The reason is that `uv` currently does not support installing constraints from URL. We had been silently failing back to the "no-constraints" way in such case (this is default mode if for any reason constraint build fail in such case. It introduced the risk that in case 3rd-party breaking dependency was released it would also start breaking regular PRs, not only the "canary" build. We fix it by downloading constraints locally when they are remote and using them from there. While this is being worked on in https://github.com/astral-sh/uv/pull/2081 and likely to land in uv 0.1.14, it's also a good idea to actually download the constraints and keep them around - this might be handy if you want to later use constraints to install "golden" set of dependencies wihtout necessity to build the right URL - you can always use `${HOME}/constraints.txt`. This PR fixes it and also changes the fallback mechanism to perform the lowest-direct upgrade only in case the constraint build fails, rather than always run the lowest-dirct upgrade even if constraints install works fine - this will make sure that most PRs are using exactly the constraint version of the dependencies (at least the version of constraints that were generated last time when pyproject.toml changed).
2024-03-02 11:23:58 +01:00
echo
set -x
Switch from --user to venv for PROD image and enable uv (#37796) This PR introduces a joint way to treat the .local (--user) folder as both - venv and `--user` package installation. It fixes a number of problems the `--user` installation created us in the past and does it in fully backwards compatible way. This improves both "production" use for end user as well as local iteration on the PROD image during tests - but also for CI. Improvements for "end user": * user does not have to use `pip install --user` to install new packages any more and it is not enabled by default with PIP_USER flag. * users can use uv to install packages when they extend the image (but it's not obligatory - pip continues working as it did) * users can use `uv` to build custom production image, which gives 40%-50% saving for image build time compring to `pip`. * python -m venv --system-site-packages continues to use the .local packages from the .local installation (and not uses them if --system-site-packages is not used) - so we have full compatibility with previous images. Improvements for development: * when image is built from sources (no --use-docker-context-files are specified), airflow is installed in --editable mode, which means that airflow + all providers are installed locally from airflow sources, not from packages - which means that both airflow and providers have the latest version inside the prod image. * when local sources changes and you want to run k8s tests locally, it is now WAY faster (several minutes) to iterate with your changes because you do not have to rebuild the base image - the only thing needed is to copy sources to the PROD image to "/opt/airflow" which is where editable installlation is done from. You only need to rebuild the image if dependencies change. * By default `uv` is used for local source build for k8s tests so even if you have to rebuild it, it is way faster (60%-80%) during iterating with the image. CI/DEV tooling improvements: * this PR switches to use `uv` by default for most prod images we build in CI, but it adds a check if the image still builds with `pip`. * we also switch to more PEP standard way of installing packages from local filesystem (package-name @ file:///FILE) Fixes: #37785 Fixes: #37815 Update contributing-docs/testing/k8s_tests.rst Update contributing-docs/testing/k8s_tests.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update scripts/docker/install_airflow.sh Update docs/docker-stack/changelog.rst Update docs/docker-stack/build.rst Co-authored-by: Niko Oliveira <onikolas@amazon.com>
2024-03-06 01:27:15 +01:00
${PACKAGING_TOOL_CMD} install ${EXTRA_INSTALL_FLAGS} ${UPGRADE_IF_NEEDED} ${ADDITIONAL_PIP_INSTALL_FLAGS} ${installation_command_flags}
Simplify tooling by switching completely to uv (#48223) The lazy consensus decision has been made at the devlist to switch entirely to `uv` as development tool: link: https://lists.apache.org/thread/6xxdon9lmjx3xh8zw09xc5k9jxb2n256 This PR implements that decision and removes a lot of baggage connected to using `pip` additionally to uv to install and sync the environment. It also introduces more consistency in the way how distribution packages are used in airflow sources - basicaly switching all internal distributions to use `pyproject.toml` approach and linking them all together via `uv`'s workspace feature. This enables much more streamlined development workflows, where any part of airflow development is manageable using `uv sync` in the right distribution - opening the way to moving more of the "sub-worfklows" from the CI image to local virtualenv environment. Unfortunately, such change cannot be done incrementally, really, because any change in the project layout drags with itself a lot of changes in the test/CI/management scripts, so we have to implement one big PR covering the move. This PR is "safe" in terms of the airflow and provider's code - it does not **really** (except occasional imports and type hint changes resulting from better isolation of packages) change Airflow code nor it should not affect any airflow or provider code, because it does not move any of the folder where airflow or provider's code is modified. It does move the test code - in a number of "auxiliary" distributions we have. It also moves the `docs` generation code to `devel-common` and introduces separate conf.py files for every doc package. What is still NOT done after that move and will be covered in the follow-up changes: * isolating docs-building to have separate configuraiton for docs building per distribution - allowing to run doc build locally with it's own conf.py file * moving some of the tests and checks out from breeze container image up to the local environment (for example mypy checks) and likely isolating them per-provider * Constraints are still generated using `pip freeze` and automatically managed by our custom scripts in `canary` builds - this will be replaced later by switching to `uv.lock` mechanism. * potentially, we could merge `devel-common` and `dev` - to be considered as a follow-up. * PROD image is stil build with `pip` by default when using `PyPI` or distribution packages - but we do not support building the source image with `pip` - when building from sources, uv is forced internally to install packages. Currently we have no plans to change default PROD building to use `uv`. This is the detailed list of changes implemented in this PR: * uv is now mandatory to install as pre-requisite in order to develop airflow. We do not support installing airflow for development with `pip` - there will be a lot of cases where it will not work for development - including development dependencies and installing several distributions together. * removed meta-package `hatch_build.py' and replacing it with pre-commit automatically modifying declarative pyproject.toml * stripped down `hatch_build_airflow_core.py` to only cover custom git and asset build hooks (and renaming the file to `hatch_build.py` and moving all airflow dependencies to `pyproject.toml` * converted "loose" packages in airflow repo into distributions: * docker-tests * kubernetes-tests * helm-tests * dev (here we do not have `src` subfolder - sources are directly in the distribution, which is for-now inconsistent with other distributions). The names of the `_tests` distribution folders have been renamed to the `-tests` convention to make sure the imports are always referring to base of each distribution and are not used from the content root. * Each eof the distributions (on top of already existing airflow-core, task-sdk, devel-common and 90+providers has it's own set of dependencies, and the top-level meta-package workspace root brings those distributions together allowing to install them all tegether with a simple `uv sync --all-packages` command and come up with consistent set of dependencies that are good for all those packages (yay!). This is used to build CI image with single common environment to run the tests (with some quirks due to constraints use where we have to manually list all distributions until we switch to `uv.lock` mechanism) * `doc` code is moved to `devel-common` distribution. The `doc` folder only keeps README informing where the other doc code is, the spelling_wordlist.txt and start_docs_server.sh. The documentation is generated in `generated/generated-docs/` folder which is entirely .gitignored. * the documentation is now fully moved to: * `airflow-core/docs` - documentation for Airflow Core * `providers/**/docs` - documentation for Providers * `chart/docs` - documentation for Helm Chart * `task-sdk/docs` - documentation for Task SDK (new format not yet published) * `docker-stack-docs` - documentation for Docker Stack' * `providers-summary-docs` - documentation for provider summary page * `versions` are not dynamically retrieved from `__init__.py` all of them are synchronized directly to pyproject.toml files - this way - except the custom build hook - we have no dynamic components in our `pyproject.toml` properties. * references to extras were removed from INSTALL and other places, the only references to extras remains in the user documentation - we stop using extras for local development, we switch to using dependency groups. * backtracking command was removed from breeze - we did not need it since we started using `uv` * internal commands (except constraint generation) have been moved to `uv` from `pip` * breeze requires `uv` to be installed and expects to be installed by `uv tool install -e ./dev/breeze` * pyproject.tomls are dynamically modified when we add a version suffix dynamically (`--version-suffix-for-pypi`) - only for the time of building the versions with updated suffix * `mypy` checks are now consistently used across all the different distributions and for consistency (and to fix some of the issues with namespace packages) rather than using "folder" approach when running mypy checks, even if we run mypy for whole distribution, we run check on individual files rather than on a folder. That adds consistency in execution of mypy heursistics. Rather than using in-container mypy script all the logic of selection and parameters passed to mypy are in pre-commit code. For now we are still using CI image to run mypy because mypy is very sensitive to version of dependencies installed, we should be able to switch to running mypy locally once we have the `uv.lock` mechanism incorporated in our workflows. * lower bounds for dependencies have been set consistently across all the distributions. With `uv sync` and dependabot, those should be generally kept consistently for the future * the `devel-common` dependencies have been groupped together in `devel-common` extras - including `basic`, `doc`, `doc-gen`, and `all` which will make it easier to install them for some OS-es (basic is used as default set of dependencies to cover most common set of development dependencies to be used for development) * generated/provider_dependencies.json are not committed to the repository any longer. They are .gitignored and geberated on-the-flight as needed (breeze will generate them automatically when empty and pre-commit will always regenerate them to be consistent with provider's pyproject.toml files. * `chart-utils` have been noved to `helm-tests` from `devel-common` as they were only used there. * for k8s tests we are using the `uv` main `.venv` environment rather than creating our own `.build` environment and we use `uv sync` to keep it in sync * Updated `uv` version to 0.6.10 * We are using `uv sync` to perform "upgrade to newer depencies" in `canary` builds and locally * leveldb has been turned into "dependency group" and removed from apache-airflow and apache-airflow-core extras, it is now only available by google provider's leveldb optional extra to install with `pip`
2025-04-02 13:11:13 +02:00
set +x
fIx constraints use in CI image after uv change (#37845) With the change to switch to uv, we skipped constraints being used in CI image - in effect all PR were not using constraints, but they were using not constraint dependencues but lowest-direct mode of installation so direct dependencies would not be upgraded in such case, only the transitive ones, so the risk of failure was anyhow small even if someone released a new, breakong dependency. The reason is that `uv` currently does not support installing constraints from URL. We had been silently failing back to the "no-constraints" way in such case (this is default mode if for any reason constraint build fail in such case. It introduced the risk that in case 3rd-party breaking dependency was released it would also start breaking regular PRs, not only the "canary" build. We fix it by downloading constraints locally when they are remote and using them from there. While this is being worked on in https://github.com/astral-sh/uv/pull/2081 and likely to land in uv 0.1.14, it's also a good idea to actually download the constraints and keep them around - this might be handy if you want to later use constraints to install "golden" set of dependencies wihtout necessity to build the right URL - you can always use `${HOME}/constraints.txt`. This PR fixes it and also changes the fallback mechanism to perform the lowest-direct upgrade only in case the constraint build fails, rather than always run the lowest-dirct upgrade even if constraints install works fine - this will make sure that most PRs are using exactly the constraint version of the dependencies (at least the version of constraints that were generated last time when pyproject.toml changed).
2024-03-02 11:23:58 +01:00
fi
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
fi
Simplify tooling by switching completely to uv (#48223) The lazy consensus decision has been made at the devlist to switch entirely to `uv` as development tool: link: https://lists.apache.org/thread/6xxdon9lmjx3xh8zw09xc5k9jxb2n256 This PR implements that decision and removes a lot of baggage connected to using `pip` additionally to uv to install and sync the environment. It also introduces more consistency in the way how distribution packages are used in airflow sources - basicaly switching all internal distributions to use `pyproject.toml` approach and linking them all together via `uv`'s workspace feature. This enables much more streamlined development workflows, where any part of airflow development is manageable using `uv sync` in the right distribution - opening the way to moving more of the "sub-worfklows" from the CI image to local virtualenv environment. Unfortunately, such change cannot be done incrementally, really, because any change in the project layout drags with itself a lot of changes in the test/CI/management scripts, so we have to implement one big PR covering the move. This PR is "safe" in terms of the airflow and provider's code - it does not **really** (except occasional imports and type hint changes resulting from better isolation of packages) change Airflow code nor it should not affect any airflow or provider code, because it does not move any of the folder where airflow or provider's code is modified. It does move the test code - in a number of "auxiliary" distributions we have. It also moves the `docs` generation code to `devel-common` and introduces separate conf.py files for every doc package. What is still NOT done after that move and will be covered in the follow-up changes: * isolating docs-building to have separate configuraiton for docs building per distribution - allowing to run doc build locally with it's own conf.py file * moving some of the tests and checks out from breeze container image up to the local environment (for example mypy checks) and likely isolating them per-provider * Constraints are still generated using `pip freeze` and automatically managed by our custom scripts in `canary` builds - this will be replaced later by switching to `uv.lock` mechanism. * potentially, we could merge `devel-common` and `dev` - to be considered as a follow-up. * PROD image is stil build with `pip` by default when using `PyPI` or distribution packages - but we do not support building the source image with `pip` - when building from sources, uv is forced internally to install packages. Currently we have no plans to change default PROD building to use `uv`. This is the detailed list of changes implemented in this PR: * uv is now mandatory to install as pre-requisite in order to develop airflow. We do not support installing airflow for development with `pip` - there will be a lot of cases where it will not work for development - including development dependencies and installing several distributions together. * removed meta-package `hatch_build.py' and replacing it with pre-commit automatically modifying declarative pyproject.toml * stripped down `hatch_build_airflow_core.py` to only cover custom git and asset build hooks (and renaming the file to `hatch_build.py` and moving all airflow dependencies to `pyproject.toml` * converted "loose" packages in airflow repo into distributions: * docker-tests * kubernetes-tests * helm-tests * dev (here we do not have `src` subfolder - sources are directly in the distribution, which is for-now inconsistent with other distributions). The names of the `_tests` distribution folders have been renamed to the `-tests` convention to make sure the imports are always referring to base of each distribution and are not used from the content root. * Each eof the distributions (on top of already existing airflow-core, task-sdk, devel-common and 90+providers has it's own set of dependencies, and the top-level meta-package workspace root brings those distributions together allowing to install them all tegether with a simple `uv sync --all-packages` command and come up with consistent set of dependencies that are good for all those packages (yay!). This is used to build CI image with single common environment to run the tests (with some quirks due to constraints use where we have to manually list all distributions until we switch to `uv.lock` mechanism) * `doc` code is moved to `devel-common` distribution. The `doc` folder only keeps README informing where the other doc code is, the spelling_wordlist.txt and start_docs_server.sh. The documentation is generated in `generated/generated-docs/` folder which is entirely .gitignored. * the documentation is now fully moved to: * `airflow-core/docs` - documentation for Airflow Core * `providers/**/docs` - documentation for Providers * `chart/docs` - documentation for Helm Chart * `task-sdk/docs` - documentation for Task SDK (new format not yet published) * `docker-stack-docs` - documentation for Docker Stack' * `providers-summary-docs` - documentation for provider summary page * `versions` are not dynamically retrieved from `__init__.py` all of them are synchronized directly to pyproject.toml files - this way - except the custom build hook - we have no dynamic components in our `pyproject.toml` properties. * references to extras were removed from INSTALL and other places, the only references to extras remains in the user documentation - we stop using extras for local development, we switch to using dependency groups. * backtracking command was removed from breeze - we did not need it since we started using `uv` * internal commands (except constraint generation) have been moved to `uv` from `pip` * breeze requires `uv` to be installed and expects to be installed by `uv tool install -e ./dev/breeze` * pyproject.tomls are dynamically modified when we add a version suffix dynamically (`--version-suffix-for-pypi`) - only for the time of building the versions with updated suffix * `mypy` checks are now consistently used across all the different distributions and for consistency (and to fix some of the issues with namespace packages) rather than using "folder" approach when running mypy checks, even if we run mypy for whole distribution, we run check on individual files rather than on a folder. That adds consistency in execution of mypy heursistics. Rather than using in-container mypy script all the logic of selection and parameters passed to mypy are in pre-commit code. For now we are still using CI image to run mypy because mypy is very sensitive to version of dependencies installed, we should be able to switch to running mypy locally once we have the `uv.lock` mechanism incorporated in our workflows. * lower bounds for dependencies have been set consistently across all the distributions. With `uv sync` and dependabot, those should be generally kept consistently for the future * the `devel-common` dependencies have been groupped together in `devel-common` extras - including `basic`, `doc`, `doc-gen`, and `all` which will make it easier to install them for some OS-es (basic is used as default set of dependencies to cover most common set of development dependencies to be used for development) * generated/provider_dependencies.json are not committed to the repository any longer. They are .gitignored and geberated on-the-flight as needed (breeze will generate them automatically when empty and pre-commit will always regenerate them to be consistent with provider's pyproject.toml files. * `chart-utils` have been noved to `helm-tests` from `devel-common` as they were only used there. * for k8s tests we are using the `uv` main `.venv` environment rather than creating our own `.build` environment and we use `uv sync` to keep it in sync * Updated `uv` version to 0.6.10 * We are using `uv sync` to perform "upgrade to newer depencies" in `canary` builds and locally * leveldb has been turned into "dependency group" and removed from apache-airflow and apache-airflow-core extras, it is now only available by google provider's leveldb optional extra to install with `pip`
2025-04-02 13:11:13 +02:00
}
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
Simplify tooling by switching completely to uv (#48223) The lazy consensus decision has been made at the devlist to switch entirely to `uv` as development tool: link: https://lists.apache.org/thread/6xxdon9lmjx3xh8zw09xc5k9jxb2n256 This PR implements that decision and removes a lot of baggage connected to using `pip` additionally to uv to install and sync the environment. It also introduces more consistency in the way how distribution packages are used in airflow sources - basicaly switching all internal distributions to use `pyproject.toml` approach and linking them all together via `uv`'s workspace feature. This enables much more streamlined development workflows, where any part of airflow development is manageable using `uv sync` in the right distribution - opening the way to moving more of the "sub-worfklows" from the CI image to local virtualenv environment. Unfortunately, such change cannot be done incrementally, really, because any change in the project layout drags with itself a lot of changes in the test/CI/management scripts, so we have to implement one big PR covering the move. This PR is "safe" in terms of the airflow and provider's code - it does not **really** (except occasional imports and type hint changes resulting from better isolation of packages) change Airflow code nor it should not affect any airflow or provider code, because it does not move any of the folder where airflow or provider's code is modified. It does move the test code - in a number of "auxiliary" distributions we have. It also moves the `docs` generation code to `devel-common` and introduces separate conf.py files for every doc package. What is still NOT done after that move and will be covered in the follow-up changes: * isolating docs-building to have separate configuraiton for docs building per distribution - allowing to run doc build locally with it's own conf.py file * moving some of the tests and checks out from breeze container image up to the local environment (for example mypy checks) and likely isolating them per-provider * Constraints are still generated using `pip freeze` and automatically managed by our custom scripts in `canary` builds - this will be replaced later by switching to `uv.lock` mechanism. * potentially, we could merge `devel-common` and `dev` - to be considered as a follow-up. * PROD image is stil build with `pip` by default when using `PyPI` or distribution packages - but we do not support building the source image with `pip` - when building from sources, uv is forced internally to install packages. Currently we have no plans to change default PROD building to use `uv`. This is the detailed list of changes implemented in this PR: * uv is now mandatory to install as pre-requisite in order to develop airflow. We do not support installing airflow for development with `pip` - there will be a lot of cases where it will not work for development - including development dependencies and installing several distributions together. * removed meta-package `hatch_build.py' and replacing it with pre-commit automatically modifying declarative pyproject.toml * stripped down `hatch_build_airflow_core.py` to only cover custom git and asset build hooks (and renaming the file to `hatch_build.py` and moving all airflow dependencies to `pyproject.toml` * converted "loose" packages in airflow repo into distributions: * docker-tests * kubernetes-tests * helm-tests * dev (here we do not have `src` subfolder - sources are directly in the distribution, which is for-now inconsistent with other distributions). The names of the `_tests` distribution folders have been renamed to the `-tests` convention to make sure the imports are always referring to base of each distribution and are not used from the content root. * Each eof the distributions (on top of already existing airflow-core, task-sdk, devel-common and 90+providers has it's own set of dependencies, and the top-level meta-package workspace root brings those distributions together allowing to install them all tegether with a simple `uv sync --all-packages` command and come up with consistent set of dependencies that are good for all those packages (yay!). This is used to build CI image with single common environment to run the tests (with some quirks due to constraints use where we have to manually list all distributions until we switch to `uv.lock` mechanism) * `doc` code is moved to `devel-common` distribution. The `doc` folder only keeps README informing where the other doc code is, the spelling_wordlist.txt and start_docs_server.sh. The documentation is generated in `generated/generated-docs/` folder which is entirely .gitignored. * the documentation is now fully moved to: * `airflow-core/docs` - documentation for Airflow Core * `providers/**/docs` - documentation for Providers * `chart/docs` - documentation for Helm Chart * `task-sdk/docs` - documentation for Task SDK (new format not yet published) * `docker-stack-docs` - documentation for Docker Stack' * `providers-summary-docs` - documentation for provider summary page * `versions` are not dynamically retrieved from `__init__.py` all of them are synchronized directly to pyproject.toml files - this way - except the custom build hook - we have no dynamic components in our `pyproject.toml` properties. * references to extras were removed from INSTALL and other places, the only references to extras remains in the user documentation - we stop using extras for local development, we switch to using dependency groups. * backtracking command was removed from breeze - we did not need it since we started using `uv` * internal commands (except constraint generation) have been moved to `uv` from `pip` * breeze requires `uv` to be installed and expects to be installed by `uv tool install -e ./dev/breeze` * pyproject.tomls are dynamically modified when we add a version suffix dynamically (`--version-suffix-for-pypi`) - only for the time of building the versions with updated suffix * `mypy` checks are now consistently used across all the different distributions and for consistency (and to fix some of the issues with namespace packages) rather than using "folder" approach when running mypy checks, even if we run mypy for whole distribution, we run check on individual files rather than on a folder. That adds consistency in execution of mypy heursistics. Rather than using in-container mypy script all the logic of selection and parameters passed to mypy are in pre-commit code. For now we are still using CI image to run mypy because mypy is very sensitive to version of dependencies installed, we should be able to switch to running mypy locally once we have the `uv.lock` mechanism incorporated in our workflows. * lower bounds for dependencies have been set consistently across all the distributions. With `uv sync` and dependabot, those should be generally kept consistently for the future * the `devel-common` dependencies have been groupped together in `devel-common` extras - including `basic`, `doc`, `doc-gen`, and `all` which will make it easier to install them for some OS-es (basic is used as default set of dependencies to cover most common set of development dependencies to be used for development) * generated/provider_dependencies.json are not committed to the repository any longer. They are .gitignored and geberated on-the-flight as needed (breeze will generate them automatically when empty and pre-commit will always regenerate them to be consistent with provider's pyproject.toml files. * `chart-utils` have been noved to `helm-tests` from `devel-common` as they were only used there. * for k8s tests we are using the `uv` main `.venv` environment rather than creating our own `.build` environment and we use `uv sync` to keep it in sync * Updated `uv` version to 0.6.10 * We are using `uv sync` to perform "upgrade to newer depencies" in `canary` builds and locally * leveldb has been turned into "dependency group" and removed from apache-airflow and apache-airflow-core extras, it is now only available by google provider's leveldb optional extra to install with `pip`
2025-04-02 13:11:13 +02:00
function install_airflow_when_building_images() {
# Remove mysql from extras if client is not going to be installed
if [[ ${INSTALL_MYSQL_CLIENT} != "true" ]]; then
AIRFLOW_EXTRAS=${AIRFLOW_EXTRAS/mysql,}
echo "${COLOR_YELLOW}MYSQL client installation is disabled. Extra 'mysql' installations were therefore omitted.${COLOR_RESET}"
fi
# Remove postgres from extras if client is not going to be installed
if [[ ${INSTALL_POSTGRES_CLIENT} != "true" ]]; then
AIRFLOW_EXTRAS=${AIRFLOW_EXTRAS/postgres,}
echo "${COLOR_YELLOW}Postgres client installation is disabled. Extra 'postgres' installations were therefore omitted.${COLOR_RESET}"
fi
# Determine the installation_command_flags based on AIRFLOW_INSTALLATION_METHOD method
if [[ ${AIRFLOW_INSTALLATION_METHOD} == "." ]]; then
install_from_sources
else
install_from_external_spec
fi
set +x
common::install_packaging_tools
echo
echo "${COLOR_BLUE}Running 'pip check'${COLOR_RESET}"
Simplify tooling by switching completely to uv (#48223) The lazy consensus decision has been made at the devlist to switch entirely to `uv` as development tool: link: https://lists.apache.org/thread/6xxdon9lmjx3xh8zw09xc5k9jxb2n256 This PR implements that decision and removes a lot of baggage connected to using `pip` additionally to uv to install and sync the environment. It also introduces more consistency in the way how distribution packages are used in airflow sources - basicaly switching all internal distributions to use `pyproject.toml` approach and linking them all together via `uv`'s workspace feature. This enables much more streamlined development workflows, where any part of airflow development is manageable using `uv sync` in the right distribution - opening the way to moving more of the "sub-worfklows" from the CI image to local virtualenv environment. Unfortunately, such change cannot be done incrementally, really, because any change in the project layout drags with itself a lot of changes in the test/CI/management scripts, so we have to implement one big PR covering the move. This PR is "safe" in terms of the airflow and provider's code - it does not **really** (except occasional imports and type hint changes resulting from better isolation of packages) change Airflow code nor it should not affect any airflow or provider code, because it does not move any of the folder where airflow or provider's code is modified. It does move the test code - in a number of "auxiliary" distributions we have. It also moves the `docs` generation code to `devel-common` and introduces separate conf.py files for every doc package. What is still NOT done after that move and will be covered in the follow-up changes: * isolating docs-building to have separate configuraiton for docs building per distribution - allowing to run doc build locally with it's own conf.py file * moving some of the tests and checks out from breeze container image up to the local environment (for example mypy checks) and likely isolating them per-provider * Constraints are still generated using `pip freeze` and automatically managed by our custom scripts in `canary` builds - this will be replaced later by switching to `uv.lock` mechanism. * potentially, we could merge `devel-common` and `dev` - to be considered as a follow-up. * PROD image is stil build with `pip` by default when using `PyPI` or distribution packages - but we do not support building the source image with `pip` - when building from sources, uv is forced internally to install packages. Currently we have no plans to change default PROD building to use `uv`. This is the detailed list of changes implemented in this PR: * uv is now mandatory to install as pre-requisite in order to develop airflow. We do not support installing airflow for development with `pip` - there will be a lot of cases where it will not work for development - including development dependencies and installing several distributions together. * removed meta-package `hatch_build.py' and replacing it with pre-commit automatically modifying declarative pyproject.toml * stripped down `hatch_build_airflow_core.py` to only cover custom git and asset build hooks (and renaming the file to `hatch_build.py` and moving all airflow dependencies to `pyproject.toml` * converted "loose" packages in airflow repo into distributions: * docker-tests * kubernetes-tests * helm-tests * dev (here we do not have `src` subfolder - sources are directly in the distribution, which is for-now inconsistent with other distributions). The names of the `_tests` distribution folders have been renamed to the `-tests` convention to make sure the imports are always referring to base of each distribution and are not used from the content root. * Each eof the distributions (on top of already existing airflow-core, task-sdk, devel-common and 90+providers has it's own set of dependencies, and the top-level meta-package workspace root brings those distributions together allowing to install them all tegether with a simple `uv sync --all-packages` command and come up with consistent set of dependencies that are good for all those packages (yay!). This is used to build CI image with single common environment to run the tests (with some quirks due to constraints use where we have to manually list all distributions until we switch to `uv.lock` mechanism) * `doc` code is moved to `devel-common` distribution. The `doc` folder only keeps README informing where the other doc code is, the spelling_wordlist.txt and start_docs_server.sh. The documentation is generated in `generated/generated-docs/` folder which is entirely .gitignored. * the documentation is now fully moved to: * `airflow-core/docs` - documentation for Airflow Core * `providers/**/docs` - documentation for Providers * `chart/docs` - documentation for Helm Chart * `task-sdk/docs` - documentation for Task SDK (new format not yet published) * `docker-stack-docs` - documentation for Docker Stack' * `providers-summary-docs` - documentation for provider summary page * `versions` are not dynamically retrieved from `__init__.py` all of them are synchronized directly to pyproject.toml files - this way - except the custom build hook - we have no dynamic components in our `pyproject.toml` properties. * references to extras were removed from INSTALL and other places, the only references to extras remains in the user documentation - we stop using extras for local development, we switch to using dependency groups. * backtracking command was removed from breeze - we did not need it since we started using `uv` * internal commands (except constraint generation) have been moved to `uv` from `pip` * breeze requires `uv` to be installed and expects to be installed by `uv tool install -e ./dev/breeze` * pyproject.tomls are dynamically modified when we add a version suffix dynamically (`--version-suffix-for-pypi`) - only for the time of building the versions with updated suffix * `mypy` checks are now consistently used across all the different distributions and for consistency (and to fix some of the issues with namespace packages) rather than using "folder" approach when running mypy checks, even if we run mypy for whole distribution, we run check on individual files rather than on a folder. That adds consistency in execution of mypy heursistics. Rather than using in-container mypy script all the logic of selection and parameters passed to mypy are in pre-commit code. For now we are still using CI image to run mypy because mypy is very sensitive to version of dependencies installed, we should be able to switch to running mypy locally once we have the `uv.lock` mechanism incorporated in our workflows. * lower bounds for dependencies have been set consistently across all the distributions. With `uv sync` and dependabot, those should be generally kept consistently for the future * the `devel-common` dependencies have been groupped together in `devel-common` extras - including `basic`, `doc`, `doc-gen`, and `all` which will make it easier to install them for some OS-es (basic is used as default set of dependencies to cover most common set of development dependencies to be used for development) * generated/provider_dependencies.json are not committed to the repository any longer. They are .gitignored and geberated on-the-flight as needed (breeze will generate them automatically when empty and pre-commit will always regenerate them to be consistent with provider's pyproject.toml files. * `chart-utils` have been noved to `helm-tests` from `devel-common` as they were only used there. * for k8s tests we are using the `uv` main `.venv` environment rather than creating our own `.build` environment and we use `uv sync` to keep it in sync * Updated `uv` version to 0.6.10 * We are using `uv sync` to perform "upgrade to newer depencies" in `canary` builds and locally * leveldb has been turned into "dependency group" and removed from apache-airflow and apache-airflow-core extras, it is now only available by google provider's leveldb optional extra to install with `pip`
2025-04-02 13:11:13 +02:00
echo
# We use pip check here to make sure that whatever `uv` installs, is also "correct" according to `pip`
pip check
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
}
common::get_colors
common::get_packaging_tool
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
common::get_airflow_version_specification
common::get_constraints_location
common::show_packaging_tool_version_and_location
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
Simplify tooling by switching completely to uv (#48223) The lazy consensus decision has been made at the devlist to switch entirely to `uv` as development tool: link: https://lists.apache.org/thread/6xxdon9lmjx3xh8zw09xc5k9jxb2n256 This PR implements that decision and removes a lot of baggage connected to using `pip` additionally to uv to install and sync the environment. It also introduces more consistency in the way how distribution packages are used in airflow sources - basicaly switching all internal distributions to use `pyproject.toml` approach and linking them all together via `uv`'s workspace feature. This enables much more streamlined development workflows, where any part of airflow development is manageable using `uv sync` in the right distribution - opening the way to moving more of the "sub-worfklows" from the CI image to local virtualenv environment. Unfortunately, such change cannot be done incrementally, really, because any change in the project layout drags with itself a lot of changes in the test/CI/management scripts, so we have to implement one big PR covering the move. This PR is "safe" in terms of the airflow and provider's code - it does not **really** (except occasional imports and type hint changes resulting from better isolation of packages) change Airflow code nor it should not affect any airflow or provider code, because it does not move any of the folder where airflow or provider's code is modified. It does move the test code - in a number of "auxiliary" distributions we have. It also moves the `docs` generation code to `devel-common` and introduces separate conf.py files for every doc package. What is still NOT done after that move and will be covered in the follow-up changes: * isolating docs-building to have separate configuraiton for docs building per distribution - allowing to run doc build locally with it's own conf.py file * moving some of the tests and checks out from breeze container image up to the local environment (for example mypy checks) and likely isolating them per-provider * Constraints are still generated using `pip freeze` and automatically managed by our custom scripts in `canary` builds - this will be replaced later by switching to `uv.lock` mechanism. * potentially, we could merge `devel-common` and `dev` - to be considered as a follow-up. * PROD image is stil build with `pip` by default when using `PyPI` or distribution packages - but we do not support building the source image with `pip` - when building from sources, uv is forced internally to install packages. Currently we have no plans to change default PROD building to use `uv`. This is the detailed list of changes implemented in this PR: * uv is now mandatory to install as pre-requisite in order to develop airflow. We do not support installing airflow for development with `pip` - there will be a lot of cases where it will not work for development - including development dependencies and installing several distributions together. * removed meta-package `hatch_build.py' and replacing it with pre-commit automatically modifying declarative pyproject.toml * stripped down `hatch_build_airflow_core.py` to only cover custom git and asset build hooks (and renaming the file to `hatch_build.py` and moving all airflow dependencies to `pyproject.toml` * converted "loose" packages in airflow repo into distributions: * docker-tests * kubernetes-tests * helm-tests * dev (here we do not have `src` subfolder - sources are directly in the distribution, which is for-now inconsistent with other distributions). The names of the `_tests` distribution folders have been renamed to the `-tests` convention to make sure the imports are always referring to base of each distribution and are not used from the content root. * Each eof the distributions (on top of already existing airflow-core, task-sdk, devel-common and 90+providers has it's own set of dependencies, and the top-level meta-package workspace root brings those distributions together allowing to install them all tegether with a simple `uv sync --all-packages` command and come up with consistent set of dependencies that are good for all those packages (yay!). This is used to build CI image with single common environment to run the tests (with some quirks due to constraints use where we have to manually list all distributions until we switch to `uv.lock` mechanism) * `doc` code is moved to `devel-common` distribution. The `doc` folder only keeps README informing where the other doc code is, the spelling_wordlist.txt and start_docs_server.sh. The documentation is generated in `generated/generated-docs/` folder which is entirely .gitignored. * the documentation is now fully moved to: * `airflow-core/docs` - documentation for Airflow Core * `providers/**/docs` - documentation for Providers * `chart/docs` - documentation for Helm Chart * `task-sdk/docs` - documentation for Task SDK (new format not yet published) * `docker-stack-docs` - documentation for Docker Stack' * `providers-summary-docs` - documentation for provider summary page * `versions` are not dynamically retrieved from `__init__.py` all of them are synchronized directly to pyproject.toml files - this way - except the custom build hook - we have no dynamic components in our `pyproject.toml` properties. * references to extras were removed from INSTALL and other places, the only references to extras remains in the user documentation - we stop using extras for local development, we switch to using dependency groups. * backtracking command was removed from breeze - we did not need it since we started using `uv` * internal commands (except constraint generation) have been moved to `uv` from `pip` * breeze requires `uv` to be installed and expects to be installed by `uv tool install -e ./dev/breeze` * pyproject.tomls are dynamically modified when we add a version suffix dynamically (`--version-suffix-for-pypi`) - only for the time of building the versions with updated suffix * `mypy` checks are now consistently used across all the different distributions and for consistency (and to fix some of the issues with namespace packages) rather than using "folder" approach when running mypy checks, even if we run mypy for whole distribution, we run check on individual files rather than on a folder. That adds consistency in execution of mypy heursistics. Rather than using in-container mypy script all the logic of selection and parameters passed to mypy are in pre-commit code. For now we are still using CI image to run mypy because mypy is very sensitive to version of dependencies installed, we should be able to switch to running mypy locally once we have the `uv.lock` mechanism incorporated in our workflows. * lower bounds for dependencies have been set consistently across all the distributions. With `uv sync` and dependabot, those should be generally kept consistently for the future * the `devel-common` dependencies have been groupped together in `devel-common` extras - including `basic`, `doc`, `doc-gen`, and `all` which will make it easier to install them for some OS-es (basic is used as default set of dependencies to cover most common set of development dependencies to be used for development) * generated/provider_dependencies.json are not committed to the repository any longer. They are .gitignored and geberated on-the-flight as needed (breeze will generate them automatically when empty and pre-commit will always regenerate them to be consistent with provider's pyproject.toml files. * `chart-utils` have been noved to `helm-tests` from `devel-common` as they were only used there. * for k8s tests we are using the `uv` main `.venv` environment rather than creating our own `.build` environment and we use `uv sync` to keep it in sync * Updated `uv` version to 0.6.10 * We are using `uv sync` to perform "upgrade to newer depencies" in `canary` builds and locally * leveldb has been turned into "dependency group" and removed from apache-airflow and apache-airflow-core extras, it is now only available by google provider's leveldb optional extra to install with `pip`
2025-04-02 13:11:13 +02:00
install_airflow_when_building_images
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
EOF
# The content below is automatically copied from scripts/docker/install_additional_dependencies.sh
COPY <<"EOF" /install_additional_dependencies.sh
#!/usr/bin/env bash
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
set -euo pipefail
: "${ADDITIONAL_PYTHON_DEPS:?Should be set}"
. "$( dirname "${BASH_SOURCE[0]}" )/common.sh"
function install_additional_dependencies() {
Simplify tooling by switching completely to uv (#48223) The lazy consensus decision has been made at the devlist to switch entirely to `uv` as development tool: link: https://lists.apache.org/thread/6xxdon9lmjx3xh8zw09xc5k9jxb2n256 This PR implements that decision and removes a lot of baggage connected to using `pip` additionally to uv to install and sync the environment. It also introduces more consistency in the way how distribution packages are used in airflow sources - basicaly switching all internal distributions to use `pyproject.toml` approach and linking them all together via `uv`'s workspace feature. This enables much more streamlined development workflows, where any part of airflow development is manageable using `uv sync` in the right distribution - opening the way to moving more of the "sub-worfklows" from the CI image to local virtualenv environment. Unfortunately, such change cannot be done incrementally, really, because any change in the project layout drags with itself a lot of changes in the test/CI/management scripts, so we have to implement one big PR covering the move. This PR is "safe" in terms of the airflow and provider's code - it does not **really** (except occasional imports and type hint changes resulting from better isolation of packages) change Airflow code nor it should not affect any airflow or provider code, because it does not move any of the folder where airflow or provider's code is modified. It does move the test code - in a number of "auxiliary" distributions we have. It also moves the `docs` generation code to `devel-common` and introduces separate conf.py files for every doc package. What is still NOT done after that move and will be covered in the follow-up changes: * isolating docs-building to have separate configuraiton for docs building per distribution - allowing to run doc build locally with it's own conf.py file * moving some of the tests and checks out from breeze container image up to the local environment (for example mypy checks) and likely isolating them per-provider * Constraints are still generated using `pip freeze` and automatically managed by our custom scripts in `canary` builds - this will be replaced later by switching to `uv.lock` mechanism. * potentially, we could merge `devel-common` and `dev` - to be considered as a follow-up. * PROD image is stil build with `pip` by default when using `PyPI` or distribution packages - but we do not support building the source image with `pip` - when building from sources, uv is forced internally to install packages. Currently we have no plans to change default PROD building to use `uv`. This is the detailed list of changes implemented in this PR: * uv is now mandatory to install as pre-requisite in order to develop airflow. We do not support installing airflow for development with `pip` - there will be a lot of cases where it will not work for development - including development dependencies and installing several distributions together. * removed meta-package `hatch_build.py' and replacing it with pre-commit automatically modifying declarative pyproject.toml * stripped down `hatch_build_airflow_core.py` to only cover custom git and asset build hooks (and renaming the file to `hatch_build.py` and moving all airflow dependencies to `pyproject.toml` * converted "loose" packages in airflow repo into distributions: * docker-tests * kubernetes-tests * helm-tests * dev (here we do not have `src` subfolder - sources are directly in the distribution, which is for-now inconsistent with other distributions). The names of the `_tests` distribution folders have been renamed to the `-tests` convention to make sure the imports are always referring to base of each distribution and are not used from the content root. * Each eof the distributions (on top of already existing airflow-core, task-sdk, devel-common and 90+providers has it's own set of dependencies, and the top-level meta-package workspace root brings those distributions together allowing to install them all tegether with a simple `uv sync --all-packages` command and come up with consistent set of dependencies that are good for all those packages (yay!). This is used to build CI image with single common environment to run the tests (with some quirks due to constraints use where we have to manually list all distributions until we switch to `uv.lock` mechanism) * `doc` code is moved to `devel-common` distribution. The `doc` folder only keeps README informing where the other doc code is, the spelling_wordlist.txt and start_docs_server.sh. The documentation is generated in `generated/generated-docs/` folder which is entirely .gitignored. * the documentation is now fully moved to: * `airflow-core/docs` - documentation for Airflow Core * `providers/**/docs` - documentation for Providers * `chart/docs` - documentation for Helm Chart * `task-sdk/docs` - documentation for Task SDK (new format not yet published) * `docker-stack-docs` - documentation for Docker Stack' * `providers-summary-docs` - documentation for provider summary page * `versions` are not dynamically retrieved from `__init__.py` all of them are synchronized directly to pyproject.toml files - this way - except the custom build hook - we have no dynamic components in our `pyproject.toml` properties. * references to extras were removed from INSTALL and other places, the only references to extras remains in the user documentation - we stop using extras for local development, we switch to using dependency groups. * backtracking command was removed from breeze - we did not need it since we started using `uv` * internal commands (except constraint generation) have been moved to `uv` from `pip` * breeze requires `uv` to be installed and expects to be installed by `uv tool install -e ./dev/breeze` * pyproject.tomls are dynamically modified when we add a version suffix dynamically (`--version-suffix-for-pypi`) - only for the time of building the versions with updated suffix * `mypy` checks are now consistently used across all the different distributions and for consistency (and to fix some of the issues with namespace packages) rather than using "folder" approach when running mypy checks, even if we run mypy for whole distribution, we run check on individual files rather than on a folder. That adds consistency in execution of mypy heursistics. Rather than using in-container mypy script all the logic of selection and parameters passed to mypy are in pre-commit code. For now we are still using CI image to run mypy because mypy is very sensitive to version of dependencies installed, we should be able to switch to running mypy locally once we have the `uv.lock` mechanism incorporated in our workflows. * lower bounds for dependencies have been set consistently across all the distributions. With `uv sync` and dependabot, those should be generally kept consistently for the future * the `devel-common` dependencies have been groupped together in `devel-common` extras - including `basic`, `doc`, `doc-gen`, and `all` which will make it easier to install them for some OS-es (basic is used as default set of dependencies to cover most common set of development dependencies to be used for development) * generated/provider_dependencies.json are not committed to the repository any longer. They are .gitignored and geberated on-the-flight as needed (breeze will generate them automatically when empty and pre-commit will always regenerate them to be consistent with provider's pyproject.toml files. * `chart-utils` have been noved to `helm-tests` from `devel-common` as they were only used there. * for k8s tests we are using the `uv` main `.venv` environment rather than creating our own `.build` environment and we use `uv sync` to keep it in sync * Updated `uv` version to 0.6.10 * We are using `uv sync` to perform "upgrade to newer depencies" in `canary` builds and locally * leveldb has been turned into "dependency group" and removed from apache-airflow and apache-airflow-core extras, it is now only available by google provider's leveldb optional extra to install with `pip`
2025-04-02 13:11:13 +02:00
if [[ "${UPGRADE_RANDOM_INDICATOR_STRING=}" != "" ]]; then
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
echo
echo "${COLOR_BLUE}Installing additional dependencies while upgrading to newer dependencies${COLOR_RESET}"
echo
set -x
Simplify tooling by switching completely to uv (#48223) The lazy consensus decision has been made at the devlist to switch entirely to `uv` as development tool: link: https://lists.apache.org/thread/6xxdon9lmjx3xh8zw09xc5k9jxb2n256 This PR implements that decision and removes a lot of baggage connected to using `pip` additionally to uv to install and sync the environment. It also introduces more consistency in the way how distribution packages are used in airflow sources - basicaly switching all internal distributions to use `pyproject.toml` approach and linking them all together via `uv`'s workspace feature. This enables much more streamlined development workflows, where any part of airflow development is manageable using `uv sync` in the right distribution - opening the way to moving more of the "sub-worfklows" from the CI image to local virtualenv environment. Unfortunately, such change cannot be done incrementally, really, because any change in the project layout drags with itself a lot of changes in the test/CI/management scripts, so we have to implement one big PR covering the move. This PR is "safe" in terms of the airflow and provider's code - it does not **really** (except occasional imports and type hint changes resulting from better isolation of packages) change Airflow code nor it should not affect any airflow or provider code, because it does not move any of the folder where airflow or provider's code is modified. It does move the test code - in a number of "auxiliary" distributions we have. It also moves the `docs` generation code to `devel-common` and introduces separate conf.py files for every doc package. What is still NOT done after that move and will be covered in the follow-up changes: * isolating docs-building to have separate configuraiton for docs building per distribution - allowing to run doc build locally with it's own conf.py file * moving some of the tests and checks out from breeze container image up to the local environment (for example mypy checks) and likely isolating them per-provider * Constraints are still generated using `pip freeze` and automatically managed by our custom scripts in `canary` builds - this will be replaced later by switching to `uv.lock` mechanism. * potentially, we could merge `devel-common` and `dev` - to be considered as a follow-up. * PROD image is stil build with `pip` by default when using `PyPI` or distribution packages - but we do not support building the source image with `pip` - when building from sources, uv is forced internally to install packages. Currently we have no plans to change default PROD building to use `uv`. This is the detailed list of changes implemented in this PR: * uv is now mandatory to install as pre-requisite in order to develop airflow. We do not support installing airflow for development with `pip` - there will be a lot of cases where it will not work for development - including development dependencies and installing several distributions together. * removed meta-package `hatch_build.py' and replacing it with pre-commit automatically modifying declarative pyproject.toml * stripped down `hatch_build_airflow_core.py` to only cover custom git and asset build hooks (and renaming the file to `hatch_build.py` and moving all airflow dependencies to `pyproject.toml` * converted "loose" packages in airflow repo into distributions: * docker-tests * kubernetes-tests * helm-tests * dev (here we do not have `src` subfolder - sources are directly in the distribution, which is for-now inconsistent with other distributions). The names of the `_tests` distribution folders have been renamed to the `-tests` convention to make sure the imports are always referring to base of each distribution and are not used from the content root. * Each eof the distributions (on top of already existing airflow-core, task-sdk, devel-common and 90+providers has it's own set of dependencies, and the top-level meta-package workspace root brings those distributions together allowing to install them all tegether with a simple `uv sync --all-packages` command and come up with consistent set of dependencies that are good for all those packages (yay!). This is used to build CI image with single common environment to run the tests (with some quirks due to constraints use where we have to manually list all distributions until we switch to `uv.lock` mechanism) * `doc` code is moved to `devel-common` distribution. The `doc` folder only keeps README informing where the other doc code is, the spelling_wordlist.txt and start_docs_server.sh. The documentation is generated in `generated/generated-docs/` folder which is entirely .gitignored. * the documentation is now fully moved to: * `airflow-core/docs` - documentation for Airflow Core * `providers/**/docs` - documentation for Providers * `chart/docs` - documentation for Helm Chart * `task-sdk/docs` - documentation for Task SDK (new format not yet published) * `docker-stack-docs` - documentation for Docker Stack' * `providers-summary-docs` - documentation for provider summary page * `versions` are not dynamically retrieved from `__init__.py` all of them are synchronized directly to pyproject.toml files - this way - except the custom build hook - we have no dynamic components in our `pyproject.toml` properties. * references to extras were removed from INSTALL and other places, the only references to extras remains in the user documentation - we stop using extras for local development, we switch to using dependency groups. * backtracking command was removed from breeze - we did not need it since we started using `uv` * internal commands (except constraint generation) have been moved to `uv` from `pip` * breeze requires `uv` to be installed and expects to be installed by `uv tool install -e ./dev/breeze` * pyproject.tomls are dynamically modified when we add a version suffix dynamically (`--version-suffix-for-pypi`) - only for the time of building the versions with updated suffix * `mypy` checks are now consistently used across all the different distributions and for consistency (and to fix some of the issues with namespace packages) rather than using "folder" approach when running mypy checks, even if we run mypy for whole distribution, we run check on individual files rather than on a folder. That adds consistency in execution of mypy heursistics. Rather than using in-container mypy script all the logic of selection and parameters passed to mypy are in pre-commit code. For now we are still using CI image to run mypy because mypy is very sensitive to version of dependencies installed, we should be able to switch to running mypy locally once we have the `uv.lock` mechanism incorporated in our workflows. * lower bounds for dependencies have been set consistently across all the distributions. With `uv sync` and dependabot, those should be generally kept consistently for the future * the `devel-common` dependencies have been groupped together in `devel-common` extras - including `basic`, `doc`, `doc-gen`, and `all` which will make it easier to install them for some OS-es (basic is used as default set of dependencies to cover most common set of development dependencies to be used for development) * generated/provider_dependencies.json are not committed to the repository any longer. They are .gitignored and geberated on-the-flight as needed (breeze will generate them automatically when empty and pre-commit will always regenerate them to be consistent with provider's pyproject.toml files. * `chart-utils` have been noved to `helm-tests` from `devel-common` as they were only used there. * for k8s tests we are using the `uv` main `.venv` environment rather than creating our own `.build` environment and we use `uv sync` to keep it in sync * Updated `uv` version to 0.6.10 * We are using `uv sync` to perform "upgrade to newer depencies" in `canary` builds and locally * leveldb has been turned into "dependency group" and removed from apache-airflow and apache-airflow-core extras, it is now only available by google provider's leveldb optional extra to install with `pip`
2025-04-02 13:11:13 +02:00
${PACKAGING_TOOL_CMD} install ${EXTRA_INSTALL_FLAGS} ${UPGRADE_TO_HIGHEST_RESOLUTION} \
${ADDITIONAL_PIP_INSTALL_FLAGS} \
Simplify tooling by switching completely to uv (#48223) The lazy consensus decision has been made at the devlist to switch entirely to `uv` as development tool: link: https://lists.apache.org/thread/6xxdon9lmjx3xh8zw09xc5k9jxb2n256 This PR implements that decision and removes a lot of baggage connected to using `pip` additionally to uv to install and sync the environment. It also introduces more consistency in the way how distribution packages are used in airflow sources - basicaly switching all internal distributions to use `pyproject.toml` approach and linking them all together via `uv`'s workspace feature. This enables much more streamlined development workflows, where any part of airflow development is manageable using `uv sync` in the right distribution - opening the way to moving more of the "sub-worfklows" from the CI image to local virtualenv environment. Unfortunately, such change cannot be done incrementally, really, because any change in the project layout drags with itself a lot of changes in the test/CI/management scripts, so we have to implement one big PR covering the move. This PR is "safe" in terms of the airflow and provider's code - it does not **really** (except occasional imports and type hint changes resulting from better isolation of packages) change Airflow code nor it should not affect any airflow or provider code, because it does not move any of the folder where airflow or provider's code is modified. It does move the test code - in a number of "auxiliary" distributions we have. It also moves the `docs` generation code to `devel-common` and introduces separate conf.py files for every doc package. What is still NOT done after that move and will be covered in the follow-up changes: * isolating docs-building to have separate configuraiton for docs building per distribution - allowing to run doc build locally with it's own conf.py file * moving some of the tests and checks out from breeze container image up to the local environment (for example mypy checks) and likely isolating them per-provider * Constraints are still generated using `pip freeze` and automatically managed by our custom scripts in `canary` builds - this will be replaced later by switching to `uv.lock` mechanism. * potentially, we could merge `devel-common` and `dev` - to be considered as a follow-up. * PROD image is stil build with `pip` by default when using `PyPI` or distribution packages - but we do not support building the source image with `pip` - when building from sources, uv is forced internally to install packages. Currently we have no plans to change default PROD building to use `uv`. This is the detailed list of changes implemented in this PR: * uv is now mandatory to install as pre-requisite in order to develop airflow. We do not support installing airflow for development with `pip` - there will be a lot of cases where it will not work for development - including development dependencies and installing several distributions together. * removed meta-package `hatch_build.py' and replacing it with pre-commit automatically modifying declarative pyproject.toml * stripped down `hatch_build_airflow_core.py` to only cover custom git and asset build hooks (and renaming the file to `hatch_build.py` and moving all airflow dependencies to `pyproject.toml` * converted "loose" packages in airflow repo into distributions: * docker-tests * kubernetes-tests * helm-tests * dev (here we do not have `src` subfolder - sources are directly in the distribution, which is for-now inconsistent with other distributions). The names of the `_tests` distribution folders have been renamed to the `-tests` convention to make sure the imports are always referring to base of each distribution and are not used from the content root. * Each eof the distributions (on top of already existing airflow-core, task-sdk, devel-common and 90+providers has it's own set of dependencies, and the top-level meta-package workspace root brings those distributions together allowing to install them all tegether with a simple `uv sync --all-packages` command and come up with consistent set of dependencies that are good for all those packages (yay!). This is used to build CI image with single common environment to run the tests (with some quirks due to constraints use where we have to manually list all distributions until we switch to `uv.lock` mechanism) * `doc` code is moved to `devel-common` distribution. The `doc` folder only keeps README informing where the other doc code is, the spelling_wordlist.txt and start_docs_server.sh. The documentation is generated in `generated/generated-docs/` folder which is entirely .gitignored. * the documentation is now fully moved to: * `airflow-core/docs` - documentation for Airflow Core * `providers/**/docs` - documentation for Providers * `chart/docs` - documentation for Helm Chart * `task-sdk/docs` - documentation for Task SDK (new format not yet published) * `docker-stack-docs` - documentation for Docker Stack' * `providers-summary-docs` - documentation for provider summary page * `versions` are not dynamically retrieved from `__init__.py` all of them are synchronized directly to pyproject.toml files - this way - except the custom build hook - we have no dynamic components in our `pyproject.toml` properties. * references to extras were removed from INSTALL and other places, the only references to extras remains in the user documentation - we stop using extras for local development, we switch to using dependency groups. * backtracking command was removed from breeze - we did not need it since we started using `uv` * internal commands (except constraint generation) have been moved to `uv` from `pip` * breeze requires `uv` to be installed and expects to be installed by `uv tool install -e ./dev/breeze` * pyproject.tomls are dynamically modified when we add a version suffix dynamically (`--version-suffix-for-pypi`) - only for the time of building the versions with updated suffix * `mypy` checks are now consistently used across all the different distributions and for consistency (and to fix some of the issues with namespace packages) rather than using "folder" approach when running mypy checks, even if we run mypy for whole distribution, we run check on individual files rather than on a folder. That adds consistency in execution of mypy heursistics. Rather than using in-container mypy script all the logic of selection and parameters passed to mypy are in pre-commit code. For now we are still using CI image to run mypy because mypy is very sensitive to version of dependencies installed, we should be able to switch to running mypy locally once we have the `uv.lock` mechanism incorporated in our workflows. * lower bounds for dependencies have been set consistently across all the distributions. With `uv sync` and dependabot, those should be generally kept consistently for the future * the `devel-common` dependencies have been groupped together in `devel-common` extras - including `basic`, `doc`, `doc-gen`, and `all` which will make it easier to install them for some OS-es (basic is used as default set of dependencies to cover most common set of development dependencies to be used for development) * generated/provider_dependencies.json are not committed to the repository any longer. They are .gitignored and geberated on-the-flight as needed (breeze will generate them automatically when empty and pre-commit will always regenerate them to be consistent with provider's pyproject.toml files. * `chart-utils` have been noved to `helm-tests` from `devel-common` as they were only used there. * for k8s tests we are using the `uv` main `.venv` environment rather than creating our own `.build` environment and we use `uv sync` to keep it in sync * Updated `uv` version to 0.6.10 * We are using `uv sync` to perform "upgrade to newer depencies" in `canary` builds and locally * leveldb has been turned into "dependency group" and removed from apache-airflow and apache-airflow-core extras, it is now only available by google provider's leveldb optional extra to install with `pip`
2025-04-02 13:11:13 +02:00
${ADDITIONAL_PYTHON_DEPS}
set +x
common::install_packaging_tools
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
echo
echo "${COLOR_BLUE}Running 'pip check'${COLOR_RESET}"
echo
# We use pip check here to make sure that whatever `uv` installs, is also "correct" according to `pip`
pip check
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
else
echo
echo "${COLOR_BLUE}Installing additional dependencies upgrading only if needed${COLOR_RESET}"
echo
set -x
Switch from --user to venv for PROD image and enable uv (#37796) This PR introduces a joint way to treat the .local (--user) folder as both - venv and `--user` package installation. It fixes a number of problems the `--user` installation created us in the past and does it in fully backwards compatible way. This improves both "production" use for end user as well as local iteration on the PROD image during tests - but also for CI. Improvements for "end user": * user does not have to use `pip install --user` to install new packages any more and it is not enabled by default with PIP_USER flag. * users can use uv to install packages when they extend the image (but it's not obligatory - pip continues working as it did) * users can use `uv` to build custom production image, which gives 40%-50% saving for image build time compring to `pip`. * python -m venv --system-site-packages continues to use the .local packages from the .local installation (and not uses them if --system-site-packages is not used) - so we have full compatibility with previous images. Improvements for development: * when image is built from sources (no --use-docker-context-files are specified), airflow is installed in --editable mode, which means that airflow + all providers are installed locally from airflow sources, not from packages - which means that both airflow and providers have the latest version inside the prod image. * when local sources changes and you want to run k8s tests locally, it is now WAY faster (several minutes) to iterate with your changes because you do not have to rebuild the base image - the only thing needed is to copy sources to the PROD image to "/opt/airflow" which is where editable installlation is done from. You only need to rebuild the image if dependencies change. * By default `uv` is used for local source build for k8s tests so even if you have to rebuild it, it is way faster (60%-80%) during iterating with the image. CI/DEV tooling improvements: * this PR switches to use `uv` by default for most prod images we build in CI, but it adds a check if the image still builds with `pip`. * we also switch to more PEP standard way of installing packages from local filesystem (package-name @ file:///FILE) Fixes: #37785 Fixes: #37815 Update contributing-docs/testing/k8s_tests.rst Update contributing-docs/testing/k8s_tests.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update scripts/docker/install_airflow.sh Update docs/docker-stack/changelog.rst Update docs/docker-stack/build.rst Co-authored-by: Niko Oliveira <onikolas@amazon.com>
2024-03-06 01:27:15 +01:00
${PACKAGING_TOOL_CMD} install ${EXTRA_INSTALL_FLAGS} ${UPGRADE_IF_NEEDED} \
${ADDITIONAL_PIP_INSTALL_FLAGS} \
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
${ADDITIONAL_PYTHON_DEPS}
set +x
common::install_packaging_tools
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
echo
echo "${COLOR_BLUE}Running 'pip check'${COLOR_RESET}"
echo
# We use pip check here to make sure that whatever `uv` installs, is also "correct" according to `pip`
pip check
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
fi
}
common::get_colors
common::get_packaging_tool
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
common::get_airflow_version_specification
common::get_constraints_location
common::show_packaging_tool_version_and_location
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
install_additional_dependencies
EOF
Switch from --user to venv for PROD image and enable uv (#37796) This PR introduces a joint way to treat the .local (--user) folder as both - venv and `--user` package installation. It fixes a number of problems the `--user` installation created us in the past and does it in fully backwards compatible way. This improves both "production" use for end user as well as local iteration on the PROD image during tests - but also for CI. Improvements for "end user": * user does not have to use `pip install --user` to install new packages any more and it is not enabled by default with PIP_USER flag. * users can use uv to install packages when they extend the image (but it's not obligatory - pip continues working as it did) * users can use `uv` to build custom production image, which gives 40%-50% saving for image build time compring to `pip`. * python -m venv --system-site-packages continues to use the .local packages from the .local installation (and not uses them if --system-site-packages is not used) - so we have full compatibility with previous images. Improvements for development: * when image is built from sources (no --use-docker-context-files are specified), airflow is installed in --editable mode, which means that airflow + all providers are installed locally from airflow sources, not from packages - which means that both airflow and providers have the latest version inside the prod image. * when local sources changes and you want to run k8s tests locally, it is now WAY faster (several minutes) to iterate with your changes because you do not have to rebuild the base image - the only thing needed is to copy sources to the PROD image to "/opt/airflow" which is where editable installlation is done from. You only need to rebuild the image if dependencies change. * By default `uv` is used for local source build for k8s tests so even if you have to rebuild it, it is way faster (60%-80%) during iterating with the image. CI/DEV tooling improvements: * this PR switches to use `uv` by default for most prod images we build in CI, but it adds a check if the image still builds with `pip`. * we also switch to more PEP standard way of installing packages from local filesystem (package-name @ file:///FILE) Fixes: #37785 Fixes: #37815 Update contributing-docs/testing/k8s_tests.rst Update contributing-docs/testing/k8s_tests.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update scripts/docker/install_airflow.sh Update docs/docker-stack/changelog.rst Update docs/docker-stack/build.rst Co-authored-by: Niko Oliveira <onikolas@amazon.com>
2024-03-06 01:27:15 +01:00
# The content below is automatically copied from scripts/docker/create_prod_venv.sh
COPY <<"EOF" /create_prod_venv.sh
#!/usr/bin/env bash
. "$( dirname "${BASH_SOURCE[0]}" )/common.sh"
function create_prod_venv() {
echo
echo "${COLOR_BLUE}Removing ${HOME}/.local and re-creating it as virtual environment.${COLOR_RESET}"
rm -rf ~/.local
python -m venv ~/.local
echo "${COLOR_BLUE}The ${HOME}/.local virtualenv created.${COLOR_RESET}"
}
common::get_colors
common::get_packaging_tool
common::show_packaging_tool_version_and_location
create_prod_venv
common::install_packaging_tools
EOF
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
# The content below is automatically copied from scripts/docker/entrypoint_prod.sh
COPY <<"EOF" /entrypoint_prod.sh
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
#!/usr/bin/env bash
AIRFLOW_COMMAND="${1:-}"
set -euo pipefail
LD_PRELOAD="/usr/lib/$(uname -m)-linux-gnu/libstdc++.so.6"
export LD_PRELOAD
function run_check_with_retries {
local cmd
cmd="${1}"
local countdown
countdown="${CONNECTION_CHECK_MAX_COUNT}"
while true
do
set +e
local last_check_result
local res
last_check_result=$(eval "${cmd} 2>&1")
res=$?
set -e
if [[ ${res} == 0 ]]; then
echo
break
else
echo -n "."
countdown=$((countdown-1))
fi
if [[ ${countdown} == 0 ]]; then
echo
echo "ERROR! Maximum number of retries (${CONNECTION_CHECK_MAX_COUNT}) reached."
echo
echo "Last check result:"
echo "$ ${cmd}"
echo "${last_check_result}"
echo
exit 1
else
sleep "${CONNECTION_CHECK_SLEEP_TIME}"
fi
done
}
function run_nc() {
# Checks if it is possible to connect to the host using netcat.
#
# We want to avoid misleading messages and perform only forward lookup of the service IP address.
# Netcat when run without -n performs both forward and reverse lookup and fails if the reverse
# lookup name does not match the original name even if the host is reachable via IP. This happens
# randomly with docker-compose in GitHub Actions.
# Since we are not using reverse lookup elsewhere, we can perform forward lookup in python
# And use the IP in NC and add '-n' switch to disable any DNS use.
# Even if this message might be harmless, it might hide the real reason for the problem
# Which is the long time needed to start some services, seeing this message might be totally misleading
# when you try to analyse the problem, that's why it's best to avoid it,
local host="${1}"
local port="${2}"
local ip
ip=$(python -c "import socket; print(socket.gethostbyname('${host}'))")
nc -zvvn "${ip}" "${port}"
}
function wait_for_connection {
# Waits for Connection to the backend specified via URL passed as first parameter
# Detects backend type depending on the URL schema and assigns
# default port numbers if not specified in the URL.
# Then it loops until connection to the host/port specified can be established
# It tries `CONNECTION_CHECK_MAX_COUNT` times and sleeps `CONNECTION_CHECK_SLEEP_TIME` between checks
local connection_url
connection_url="${1}"
local detected_backend
detected_backend=$(python -c "from urllib.parse import urlsplit; import sys; print(urlsplit(sys.argv[1]).scheme)" "${connection_url}")
local detected_host
detected_host=$(python -c "from urllib.parse import urlsplit; import sys; print(urlsplit(sys.argv[1]).hostname or '')" "${connection_url}")
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
local detected_port
detected_port=$(python -c "from urllib.parse import urlsplit; import sys; print(urlsplit(sys.argv[1]).port or '')" "${connection_url}")
echo BACKEND="${BACKEND:=${detected_backend}}"
readonly BACKEND
if [[ -z "${detected_port=}" ]]; then
if [[ ${BACKEND} == "postgres"* ]]; then
detected_port=5432
elif [[ ${BACKEND} == "mysql"* ]]; then
detected_port=3306
elif [[ ${BACKEND} == "mssql"* ]]; then
detected_port=1433
elif [[ ${BACKEND} == "redis"* ]]; then
detected_port=6379
elif [[ ${BACKEND} == "amqp"* ]]; then
detected_port=5672
fi
fi
detected_host=${detected_host:="localhost"}
# Allow the DB parameters to be overridden by environment variable
echo DB_HOST="${DB_HOST:=${detected_host}}"
readonly DB_HOST
echo DB_PORT="${DB_PORT:=${detected_port}}"
readonly DB_PORT
if [[ -n "${DB_HOST=}" ]] && [[ -n "${DB_PORT=}" ]]; then
run_check_with_retries "run_nc ${DB_HOST@Q} ${DB_PORT@Q}"
else
>&2 echo "The connection details to the broker could not be determined. Connectivity checks were skipped."
fi
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
}
function create_www_user() {
local local_password=""
# Warning: command environment variables (*_CMD) have priority over usual configuration variables
# for configuration parameters that require sensitive information. This is the case for the SQL database
# and the broker backend in this entrypoint script.
if [[ -n "${_AIRFLOW_WWW_USER_PASSWORD_CMD=}" ]]; then
local_password=$(eval "${_AIRFLOW_WWW_USER_PASSWORD_CMD}")
unset _AIRFLOW_WWW_USER_PASSWORD_CMD
elif [[ -n "${_AIRFLOW_WWW_USER_PASSWORD=}" ]]; then
local_password="${_AIRFLOW_WWW_USER_PASSWORD}"
unset _AIRFLOW_WWW_USER_PASSWORD
fi
if [[ -z ${local_password} ]]; then
echo
echo "ERROR! Airflow Admin password not set via _AIRFLOW_WWW_USER_PASSWORD or _AIRFLOW_WWW_USER_PASSWORD_CMD variables!"
echo
exit 1
fi
if airflow config get-value core auth_manager | grep -q "FabAuthManager"; then
airflow users create \
--username "${_AIRFLOW_WWW_USER_USERNAME="admin"}" \
--firstname "${_AIRFLOW_WWW_USER_FIRSTNAME="Airflow"}" \
--lastname "${_AIRFLOW_WWW_USER_LASTNAME="Admin"}" \
--email "${_AIRFLOW_WWW_USER_EMAIL="airflowadmin@example.com"}" \
--role "${_AIRFLOW_WWW_USER_ROLE="Admin"}" \
--password "${local_password}" || true
else
echo "Skipping user creation as auth manager different from Fab is used"
fi
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
}
function create_system_user_if_missing() {
# This is needed in case of OpenShift-compatible container execution. In case of OpenShift random
# User id is used when starting the image, however group 0 is kept as the user group. Our production
# Image is OpenShift compatible, so all permissions on all folders are set so that 0 group can exercise
# the same privileges as the default "airflow" user, this code checks if the user is already
# present in /etc/passwd and will create the system user dynamically, including setting its
# HOME directory to the /home/airflow so that (for example) the ${HOME}/.local folder where airflow is
# Installed can be automatically added to PYTHONPATH
if ! whoami &> /dev/null; then
if [[ -w /etc/passwd ]]; then
echo "${USER_NAME:-default}:x:$(id -u):0:${USER_NAME:-default} user:${AIRFLOW_USER_HOME_DIR}:/sbin/nologin" \
>> /etc/passwd
fi
export HOME="${AIRFLOW_USER_HOME_DIR}"
fi
}
function set_pythonpath_for_root_user() {
# Airflow is installed as a local user application which means that if the container is running as root
# the application is not available. because Python then only load system-wide applications.
# Now also adds applications installed as local user "airflow".
if [[ $UID == "0" ]]; then
local python_major_minor
python_major_minor=$(python -c 'import sys; print(f"{sys.version_info.major}.{sys.version_info.minor}")')
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
export PYTHONPATH="${AIRFLOW_USER_HOME_DIR}/.local/lib/python${python_major_minor}/site-packages:${PYTHONPATH:-}"
>&2 echo "The container is run as root user. For security, consider using a regular user account."
fi
}
function wait_for_airflow_db() {
# Wait for the command to run successfully to validate the database connection.
run_check_with_retries "airflow db check"
}
function migrate_db() {
# Runs airflow db migrate
airflow db migrate || true
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
}
function wait_for_celery_broker() {
# Verifies connection to Celery Broker
local executor
executor="$(airflow config get-value core executor)"
if [[ "${executor}" == "CeleryExecutor" ]]; then
local connection_url
connection_url="$(airflow config get-value celery broker_url)"
wait_for_connection "${connection_url}"
fi
}
function exec_to_bash_or_python_command_if_specified() {
# If one of the commands: 'bash', 'python' is used, either run appropriate
# command with exec
if [[ ${AIRFLOW_COMMAND} == "bash" ]]; then
shift
exec "/bin/bash" "${@}"
elif [[ ${AIRFLOW_COMMAND} == "python" ]]; then
shift
exec "python" "${@}"
fi
}
function check_uid_gid() {
if [[ $(id -g) == "0" ]]; then
return
fi
if [[ $(id -u) == "50000" ]]; then
>&2 echo
>&2 echo "WARNING! You should run the image with GID (Group ID) set to 0"
>&2 echo " even if you use 'airflow' user (UID=50000)"
>&2 echo
>&2 echo " You started the image with UID=$(id -u) and GID=$(id -g)"
>&2 echo
>&2 echo " This is to make sure you can run the image with an arbitrary UID in the future."
>&2 echo
>&2 echo " See more about it in the Airflow's docker image documentation"
2025-06-22 11:57:12 +02:00
>&2 echo " https://airflow.apache.org/docs/docker-stack/entrypoint.html"
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
>&2 echo
# We still allow the image to run with `airflow` user.
return
else
>&2 echo
>&2 echo "ERROR! You should run the image with GID=0"
>&2 echo
>&2 echo " You started the image with UID=$(id -u) and GID=$(id -g)"
>&2 echo
>&2 echo "The image should always be run with GID (Group ID) set to 0 regardless of the UID used."
>&2 echo " This is to make sure you can run the image with an arbitrary UID."
>&2 echo
>&2 echo " See more about it in the Airflow's docker image documentation"
2025-06-22 11:57:12 +02:00
>&2 echo " https://airflow.apache.org/docs/docker-stack/entrypoint.html"
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
# This will not work so we fail hard
exit 1
fi
}
unset PIP_USER
check_uid_gid
umask 0002
CONNECTION_CHECK_MAX_COUNT=${CONNECTION_CHECK_MAX_COUNT:=20}
readonly CONNECTION_CHECK_MAX_COUNT
CONNECTION_CHECK_SLEEP_TIME=${CONNECTION_CHECK_SLEEP_TIME:=3}
readonly CONNECTION_CHECK_SLEEP_TIME
create_system_user_if_missing
set_pythonpath_for_root_user
if [[ "${CONNECTION_CHECK_MAX_COUNT}" -gt "0" ]]; then
wait_for_airflow_db
fi
if [[ -n "${_AIRFLOW_DB_UPGRADE=}" ]] || [[ -n "${_AIRFLOW_DB_MIGRATE=}" ]] ; then
migrate_db
fi
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
if [[ -n "${_AIRFLOW_DB_UPGRADE=}" ]] ; then
>&2 echo "WARNING: Environment variable '_AIRFLOW_DB_UPGRADE' is deprecated please use '_AIRFLOW_DB_MIGRATE' instead"
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
fi
if [[ -n "${_AIRFLOW_WWW_USER_CREATE=}" ]] ; then
create_www_user
fi
if [[ -n "${_PIP_ADDITIONAL_REQUIREMENTS=}" ]] ; then
>&2 echo
>&2 echo "!!!!! Installing additional requirements: '${_PIP_ADDITIONAL_REQUIREMENTS}' !!!!!!!!!!!!"
>&2 echo
>&2 echo "WARNING: This is a development/test feature only. NEVER use it in production!"
>&2 echo " Instead, build a custom image as described in"
>&2 echo
>&2 echo " https://airflow.apache.org/docs/docker-stack/build.html"
>&2 echo
>&2 echo " Adding requirements at container startup is fragile and is done every time"
>&2 echo " the container starts, so it is only useful for testing and trying out"
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
>&2 echo " of adding dependencies."
>&2 echo
pip install --root-user-action ignore ${_PIP_ADDITIONAL_REQUIREMENTS}
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
fi
exec_to_bash_or_python_command_if_specified "${@}"
if [[ ${AIRFLOW_COMMAND} == "airflow" ]]; then
AIRFLOW_COMMAND="${2:-}"
shift
fi
if [[ ${AIRFLOW_COMMAND} =~ ^(scheduler|celery)$ ]] \
&& [[ "${CONNECTION_CHECK_MAX_COUNT}" -gt "0" ]]; then
wait_for_celery_broker
fi
if [[ "$#" -eq 0 && "${_AIRFLOW_DB_MIGRATE}" == "true" ]]; then
echo "[INFO] No commands passed and _AIRFLOW_DB_MIGRATE=true. Exiting script with code 0."
exit 0
fi
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
exec "airflow" "${@}"
EOF
# The content below is automatically copied from scripts/docker/clean-logs.sh
COPY <<"EOF" /clean-logs.sh
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
#!/usr/bin/env bash
set -euo pipefail
readonly DIRECTORY="${AIRFLOW_HOME:-/usr/local/airflow}"
readonly RETENTION_DAYS="${AIRFLOW__LOG_RETENTION_DAYS:-15}"
readonly RETENTION_MINUTES="${AIRFLOW__LOG_RETENTION_MINUTES:-0}"
readonly FREQUENCY="${AIRFLOW__LOG_CLEANUP_FREQUENCY_MINUTES:-15}"
readonly MAX_PERCENT="${AIRFLOW__LOG_MAX_SIZE_PERCENT:-0}"
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
trap "exit" INT TERM
MAX_SIZE_BYTES="${AIRFLOW__LOG_MAX_SIZE_BYTES:-0}"
if [[ "$MAX_SIZE_BYTES" -eq 0 && "$MAX_PERCENT" -gt 0 ]]; then
total_space=$(df -k "${DIRECTORY}"/logs 2>/dev/null | tail -1 | awk '{print $2}' || echo "0")
MAX_SIZE_BYTES=$(( total_space * 1024 * MAX_PERCENT / 100 ))
echo "Computed MAX_SIZE_BYTES from ${MAX_PERCENT}% of disk: ${MAX_SIZE_BYTES} bytes"
fi
readonly MAX_SIZE_BYTES
readonly EVERY=$((FREQUENCY*60))
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
echo "Cleaning logs every $EVERY seconds"
if [[ "$MAX_SIZE_BYTES" -gt 0 ]]; then
echo "Max log size limit: $MAX_SIZE_BYTES bytes"
fi
retention_days="${RETENTION_DAYS}"
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
while true; do
total_retention_minutes=$(( (retention_days * 1440) + RETENTION_MINUTES ))
echo "Trimming airflow logs older than ${total_retention_minutes} minutes."
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
find "${DIRECTORY}"/logs \
-type d -name 'lost+found' -prune -o \
-type f -mmin +"${total_retention_minutes}" -name '*.log' -print0 | \
xargs -0 rm -f || true
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
if [[ "$MAX_SIZE_BYTES" -gt 0 && "$retention_days" -ge 0 ]]; then
current_size=$(df -k "${DIRECTORY}"/logs 2>/dev/null | tail -1 | awk '{print $3}' || echo "0")
current_size=$(( current_size * 1024 ))
if [[ "$current_size" -gt "$MAX_SIZE_BYTES" ]]; then
retention_days=$((retention_days - 1))
echo "Size ($current_size bytes) exceeds limit ($MAX_SIZE_BYTES bytes). Reducing retention to ${retention_days} days."
continue
fi
fi
find "${DIRECTORY}"/logs -type d -empty -delete || true
retention_days="${RETENTION_DAYS}"
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
seconds=$(( $(date -u +%s) % EVERY))
(( seconds < 1 )) || sleep $((EVERY - seconds - 1))
sleep 1
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
done
EOF
# The content below is automatically copied from scripts/docker/airflow-scheduler-autorestart.sh
COPY <<"EOF" /airflow-scheduler-autorestart.sh
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
#!/usr/bin/env bash
while echo "Running"; do
airflow scheduler -n 5
return_code=$?
if (( return_code != 0 )); then
echo "Scheduler crashed with exit code $return_code. Respawning.." >&2
date >> /tmp/airflow_scheduler_errors.txt
fi
sleep 1
done
EOF
##############################################################################################
# This is the build image where we build all dependencies
##############################################################################################
Migrated prod builds to use python built from source (#53770) Moved the prod build to also use the src built python as done by CI. This let's us iterate faster on our python versions without needing to wait for the community image of python. This involves a number of changes: * 3.0.5 -> 3.0.6 airflow version change * use released official packages rather than git repo to install Python * added wget (it was added in the original image to pull python packages so added for compatibility) added those flags to python build (same as in the * original build) * --with-ensurepip --build="$gnuArch" * --enable-loadable-sqlite-extension * --enable-option-checking=fatal * --enable-shared * --with-lto * added cleanup of apt after installing packages * added removal of .pyc/.test etc. files (saves 350 MB) * added relinking of symbolic links from /usr/python/bin to /usr/local/bin in the "main" image as well as in the build image. * we do not need AIRFLOW_SETUPTOOLS_VERSION any more - this was only added to upgrade setuptools, because the Python official image had a very old setuptools version. * checked all "customize" scripts and make them "work" * Updated the 'version upgrade" script to upgrade AIRFLOW_PYTHON_VERSION everywhere * Updated Changelog and documentation to update new ways of building images * removed installation with GitHub URL (it won't work easily after splitting to multiple packages - not easily at least) and it's not needed * all python, pip and similar links are created in /usr/python/bin * /usr/python/bin is always first in the PATH - before /usr/local/bin * added changelog entry explaining that Python's installation home has been moved to /usr/python/ from /usr/local * removal of installed editable distributions in breeze happens now first and THEN we install when --use-airflow-version is used. * LD_LIBRARY_PATH was not set so the shared python libraries could not be loaded when venv was created Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
2025-08-30 20:18:30 +05:30
FROM ${BASE_IMAGE} as airflow-build-image
# Nolog bash flag is currently ignored - but you can replace it with
# xtrace - to show commands executed)
SHELL ["/bin/bash", "-o", "pipefail", "-o", "errexit", "-o", "nounset", "-o", "nolog", "-c"]
Migrated prod builds to use python built from source (#53770) Moved the prod build to also use the src built python as done by CI. This let's us iterate faster on our python versions without needing to wait for the community image of python. This involves a number of changes: * 3.0.5 -> 3.0.6 airflow version change * use released official packages rather than git repo to install Python * added wget (it was added in the original image to pull python packages so added for compatibility) added those flags to python build (same as in the * original build) * --with-ensurepip --build="$gnuArch" * --enable-loadable-sqlite-extension * --enable-option-checking=fatal * --enable-shared * --with-lto * added cleanup of apt after installing packages * added removal of .pyc/.test etc. files (saves 350 MB) * added relinking of symbolic links from /usr/python/bin to /usr/local/bin in the "main" image as well as in the build image. * we do not need AIRFLOW_SETUPTOOLS_VERSION any more - this was only added to upgrade setuptools, because the Python official image had a very old setuptools version. * checked all "customize" scripts and make them "work" * Updated the 'version upgrade" script to upgrade AIRFLOW_PYTHON_VERSION everywhere * Updated Changelog and documentation to update new ways of building images * removed installation with GitHub URL (it won't work easily after splitting to multiple packages - not easily at least) and it's not needed * all python, pip and similar links are created in /usr/python/bin * /usr/python/bin is always first in the PATH - before /usr/local/bin * added changelog entry explaining that Python's installation home has been moved to /usr/python/ from /usr/local * removal of installed editable distributions in breeze happens now first and THEN we install when --use-airflow-version is used. * LD_LIBRARY_PATH was not set so the shared python libraries could not be loaded when venv was created Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
2025-08-30 20:18:30 +05:30
ARG BASE_IMAGE
# Make sure noninteractive debian install is used and language variables set
ENV BASE_IMAGE=${BASE_IMAGE} \
DEBIAN_FRONTEND=noninteractive LANGUAGE=C.UTF-8 LANG=C.UTF-8 LC_ALL=C.UTF-8 \
LC_CTYPE=C.UTF-8 LC_MESSAGES=C.UTF-8 \
Simplify caching mechanisms for CI and PROD images (#45266) For a long time we had used a sophisticated mechanism to speed up our CI jobs by building the images in "pull_request_target" workflow and pushing them to GitHub registry. That however had several drawbacks: * CI image was complex when it comes to layer setup (we had to pre- cache installed dependencies by installing them from branch tip * The pull_request_target is a very dangerous workflow, we had a number of security problems with it (and it's difficult to debug) * Caching of `pip` and `uv` was not used because it increased size of the image significantly This PR significantly improves the caching mechanisms for the images building of several advacements that were not possible before: * The upload-artifacts@v4 action and improved stash action developed by @assignUser and published in "apache/infrastructure-actions" allows us to store all images (8GB per run) in artifacts rather than in registry - so we can do the image build once and share it with all the jobs. * The uv speed is "enough" to allow occasional installation of Airlfow locally. This allows to utilize cache-mount and locally build uv cache, rather than rely on "remote" cache when we are building local images for breeze. The first time you build local breeze image it will take 2-5 more minutes (depending on your network speed, but because we can utilise cache mounts, every subsequent build should be very fast - even if all dependencies change. Using uv also allows to "always" reinstall airflow when you build the image even if single source file changed, because with cache it takes sub-seconds to reinstall airflow and all dependencies. * the cache mounts are not included in the image size, and since we can export and import images in CI in artifacts and we do not need to rebuild them, the images shared as compressed artifacts are relatively small (2GB) - cache of `uv` is around 4GB on top of that so sharing image built in the "build image" job with other jobs in the same workflow is fast. * we are still using registry cache for the "non-python" parts of the image - both CI and breeze image build speed benefit from using the image cache for system dependencies, database clients etc. this helps with faster rebuilds of the images for local development environment * documentation has been updated to reflect the new CI setup. The diagrams showing the workflows of ours are no longer needed as the workflows are quite straightforward when they are looked at. Fixes: #42999 Fixes: #43268
2024-12-29 22:58:27 +01:00
PIP_CACHE_DIR=/tmp/.cache/pip \
UV_CACHE_DIR=/tmp/.cache/uv
ARG DEV_APT_DEPS=""
More customizable build process for Docker images (#11176) * Allows more customizations for image building. This is the third (and not last) part of making the Production image more corporate-environment friendly. It's been prepared for the request of one of the big Airflow user (company) that has rather strict security requirements when it comes to preparing and building images. They are committed to synchronizing with the progress of Apache Airflow 2.0 development and making the image customizable so that they can build it using only sources controlled by them internally was one of the important requirements for them. This change adds the possibilty of customizing various steps in the build process: * adding custom scripts to be run before installation of both build image and runtime image. This allows for example to add installing custom GPG keys, and adding custom sources. * customizing the way NodeJS and Yarn are installed in the build image segment - as they might rely on their own way of installation. * adding extra packages to be installed during both build and dev segment build steps. This is crucial to achieve the same size optimizations as the original image. * defining additional environment variables (for example environment variables that indicate acceptance of the EULAs in case of installing proprietary packages that require EULA acceptance - both in the build image and runtime image (again the goal is to keep the image optimized for size) The image build process remains the same when no customization options are specified, but having those options increases flexibility of the image build process in corporate environments. This is part of #11171. This change also fixes some of the issues opened and raised by other users of the Dockerfile. Fixes: #10730 Fixes: #10555 Fixes: #10856 Input from those issues has been taken into account when this change was designed so that the cases described in those issues could be implemented. Example from one of the issue landed as an example way of building highly customized Airflow Image using those customization options. Depends on #11174 * Update IMAGES.rst Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
2020-09-29 15:30:00 +02:00
ARG ADDITIONAL_DEV_APT_DEPS=""
ARG DEV_APT_COMMAND=""
ARG ADDITIONAL_DEV_APT_COMMAND=""
ARG ADDITIONAL_DEV_APT_ENV=""
ARG AIRFLOW_PYTHON_VERSION
More customizable build process for Docker images (#11176) * Allows more customizations for image building. This is the third (and not last) part of making the Production image more corporate-environment friendly. It's been prepared for the request of one of the big Airflow user (company) that has rather strict security requirements when it comes to preparing and building images. They are committed to synchronizing with the progress of Apache Airflow 2.0 development and making the image customizable so that they can build it using only sources controlled by them internally was one of the important requirements for them. This change adds the possibilty of customizing various steps in the build process: * adding custom scripts to be run before installation of both build image and runtime image. This allows for example to add installing custom GPG keys, and adding custom sources. * customizing the way NodeJS and Yarn are installed in the build image segment - as they might rely on their own way of installation. * adding extra packages to be installed during both build and dev segment build steps. This is crucial to achieve the same size optimizations as the original image. * defining additional environment variables (for example environment variables that indicate acceptance of the EULAs in case of installing proprietary packages that require EULA acceptance - both in the build image and runtime image (again the goal is to keep the image optimized for size) The image build process remains the same when no customization options are specified, but having those options increases flexibility of the image build process in corporate environments. This is part of #11171. This change also fixes some of the issues opened and raised by other users of the Dockerfile. Fixes: #10730 Fixes: #10555 Fixes: #10856 Input from those issues has been taken into account when this change was designed so that the cases described in those issues could be implemented. Example from one of the issue landed as an example way of building highly customized Airflow Image using those customization options. Depends on #11174 * Update IMAGES.rst Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
2020-09-29 15:30:00 +02:00
ENV DEV_APT_DEPS=${DEV_APT_DEPS} \
ADDITIONAL_DEV_APT_DEPS=${ADDITIONAL_DEV_APT_DEPS} \
DEV_APT_COMMAND=${DEV_APT_COMMAND} \
ADDITIONAL_DEV_APT_COMMAND=${ADDITIONAL_DEV_APT_COMMAND} \
ADDITIONAL_DEV_APT_ENV=${ADDITIONAL_DEV_APT_ENV} \
AIRFLOW_PYTHON_VERSION=${AIRFLOW_PYTHON_VERSION}
Migrated prod builds to use python built from source (#53770) Moved the prod build to also use the src built python as done by CI. This let's us iterate faster on our python versions without needing to wait for the community image of python. This involves a number of changes: * 3.0.5 -> 3.0.6 airflow version change * use released official packages rather than git repo to install Python * added wget (it was added in the original image to pull python packages so added for compatibility) added those flags to python build (same as in the * original build) * --with-ensurepip --build="$gnuArch" * --enable-loadable-sqlite-extension * --enable-option-checking=fatal * --enable-shared * --with-lto * added cleanup of apt after installing packages * added removal of .pyc/.test etc. files (saves 350 MB) * added relinking of symbolic links from /usr/python/bin to /usr/local/bin in the "main" image as well as in the build image. * we do not need AIRFLOW_SETUPTOOLS_VERSION any more - this was only added to upgrade setuptools, because the Python official image had a very old setuptools version. * checked all "customize" scripts and make them "work" * Updated the 'version upgrade" script to upgrade AIRFLOW_PYTHON_VERSION everywhere * Updated Changelog and documentation to update new ways of building images * removed installation with GitHub URL (it won't work easily after splitting to multiple packages - not easily at least) and it's not needed * all python, pip and similar links are created in /usr/python/bin * /usr/python/bin is always first in the PATH - before /usr/local/bin * added changelog entry explaining that Python's installation home has been moved to /usr/python/ from /usr/local * removal of installed editable distributions in breeze happens now first and THEN we install when --use-airflow-version is used. * LD_LIBRARY_PATH was not set so the shared python libraries could not be loaded when venv was created Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
2025-08-30 20:18:30 +05:30
ARG PYTHON_LTO
COPY --from=scripts install_os_dependencies.sh /scripts/docker/
RUN PYTHON_LTO=${PYTHON_LTO} bash /scripts/docker/install_os_dependencies.sh dev
Fix, cleanup and refactor adding apt dependencies when building image (#55151) After migrating to python built from sources, it turned out that we had a weird mixture of system and source build python in our CI images images. There were some dependencies installed that installed python3 debian dependencies and they were breaking the way how source build python 3.11 interacted with system one. Namely - debian Python 3.11 had internal ABI incompatibility between 3.11.4 and 3.11.5 that caused the ssl build with latest 3.11.5 fail when 3.11.5 system python has been installed while the new python was being built - because system includes interfered with the build process. The system python was pulled by "software-properties-common" and "gdb" - we do not need to use either of them during the build process, and user in In order to avoid this, we now make sure that we do not install any debian dependencies that pull any of the python3 system packages, before we install Python from sources. We also reviewed and cleaned the way how we configure list of packages installed during image building: * packages are reviewed and unnecessary ones removed * packages that are pulling python3 system packages are moved to additional apt dev dependencies * additional apt dev dependencies are installed AFTER python is installed * the packages listed in scripts are now one-per-line * they are alphabetically sorted * finally LD_LIBRARY_PATH is set to have the "source-installed" python libraries to be always prioritised against the system libraries if system libraries are installed to the main "/usr/lib" directories.
2025-09-01 23:16:20 +02:00
# In case system python is installed, setting LD_LIBRARY_PATH prevents any case the system python
# libraries will be accidentally used before the library installed from sources (which is newer and
# python interpreter might break if accidentally the old system libraries are used.
ENV LD_LIBRARY_PATH="/usr/python/lib"
ARG INSTALL_MYSQL_CLIENT="true"
Change default MySQL client to MariaDB (#36243) This PR is a response to pretty catastrophic issue caused by expiring key on MySQL repository on 14th of December. Oracle does not follow the best practices for signing their packages (while all others do) and their packages and repositories are signed with a key with short expiry date. This basically puts an expiry on their repository, and anyone who releases images following best practices of installation, while keeping the repository after installing mysql libraries has to rebuild their past released images every 2 years or so. This is the last straw for our MySQL client installation problems (we had a few, especially for ARM images) and we decided that we will switch to MariaDB client libraries by default (while allowing our users to build custom images with MySQL libraries). The issue tracked in Airflow repository is: #36231 The issues in Oracle's MySQL repo: * https://bugs.mysql.com/bug.php?id=113427 * https://bugs.mysql.com/bug.php?id=113428 * https://bugs.mysql.com/bug.php?id=113432 This PR implements a number of changes: * MariaDB client is now default client used for both ARM and X86 * both pre-2023 and 2023 keys for MySQL are now added to be trusted when custom image with MySQL client is built * MySQL repository is removed after installing MySQL (to avoid repeating similar fiasco for MySQL users in 2025 * changelog added and instructions on how to build custom image with MySQL client * one of our test suites is converted to use "current" image not latest released image (that was bug in our CI). * test was added in `canary` and `release` builds in CI to also test build of custom image with MySQL client
2023-12-15 19:13:00 +01:00
ARG INSTALL_MYSQL_CLIENT_TYPE="mariadb"
ARG INSTALL_MSSQL_CLIENT="true"
ARG INSTALL_POSTGRES_CLIENT="true"
Optimize PROD image caching in CI (#35438) Turns out that some of the layers in our PROD image got invalidated because AIRFLOW_CONSTRAINTS_MODE used to build the cache for PROD image is "constraints" by default, while building images in "build-images" workflow for regular PRs and canary build uses "constraints-source-providers". The former is fine as default for PROD image (as oppose to CI image we build PROD image from released PyPI packages by default) but the latter is "proper" for the CI cache, because there, the image is built out of local packages prepared from sources. Turns out that the CONSTRAINT_MODE parameter had a profound impact on caching - because it was set before the "install_packages_from_branch_tip" step and - in fact - even before "install database clients" step, which caused our cache to only work for the "base OS dependencies" - installing database clients and installing airflow from branch tip (which works great for CI image) had always been done in PRs because the layers in cache with constraints env invalidated all subsequent layers. This had no big impact before when testing usually took much longer time - but since the testing has been vastly improved in #35160, now PROD image building continues running even after test complete and becomes the next frontier of optimization. This PR optimizes PROD image building in two ways: * caching is prepared with "source_providers" constraint mode, same as regular build * the AIRFLOW_CONSTRAINT_MODE and related arguments are moved after installing database clients, so that this parameter does not impact their caching.
2023-11-04 16:58:55 +01:00
ENV INSTALL_MYSQL_CLIENT=${INSTALL_MYSQL_CLIENT} \
INSTALL_MYSQL_CLIENT_TYPE=${INSTALL_MYSQL_CLIENT_TYPE} \
INSTALL_MSSQL_CLIENT=${INSTALL_MSSQL_CLIENT} \
INSTALL_POSTGRES_CLIENT=${INSTALL_POSTGRES_CLIENT}
COPY --from=scripts common.sh /scripts/docker/
Optimize PROD image caching in CI (#35438) Turns out that some of the layers in our PROD image got invalidated because AIRFLOW_CONSTRAINTS_MODE used to build the cache for PROD image is "constraints" by default, while building images in "build-images" workflow for regular PRs and canary build uses "constraints-source-providers". The former is fine as default for PROD image (as oppose to CI image we build PROD image from released PyPI packages by default) but the latter is "proper" for the CI cache, because there, the image is built out of local packages prepared from sources. Turns out that the CONSTRAINT_MODE parameter had a profound impact on caching - because it was set before the "install_packages_from_branch_tip" step and - in fact - even before "install database clients" step, which caused our cache to only work for the "base OS dependencies" - installing database clients and installing airflow from branch tip (which works great for CI image) had always been done in PRs because the layers in cache with constraints env invalidated all subsequent layers. This had no big impact before when testing usually took much longer time - but since the testing has been vastly improved in #35160, now PROD image building continues running even after test complete and becomes the next frontier of optimization. This PR optimizes PROD image building in two ways: * caching is prepared with "source_providers" constraint mode, same as regular build * the AIRFLOW_CONSTRAINT_MODE and related arguments are moved after installing database clients, so that this parameter does not impact their caching.
2023-11-04 16:58:55 +01:00
# Only copy mysql/mssql installation scripts for now - so that changing the other
# scripts which are needed much later will not invalidate the docker layer here
COPY --from=scripts install_mysql.sh install_mssql.sh install_postgres.sh /scripts/docker/
RUN bash /scripts/docker/install_mysql.sh dev && \
bash /scripts/docker/install_mssql.sh dev && \
bash /scripts/docker/install_postgres.sh dev
ENV PATH=${PATH}:/opt/mssql-tools/bin
# By default we do not install from docker context files but if we decide to install from docker context
# files, we should override those variables to "docker-context-files"
ARG DOCKER_CONTEXT_FILES="Dockerfile"
Simplify tooling by switching completely to uv (#48223) The lazy consensus decision has been made at the devlist to switch entirely to `uv` as development tool: link: https://lists.apache.org/thread/6xxdon9lmjx3xh8zw09xc5k9jxb2n256 This PR implements that decision and removes a lot of baggage connected to using `pip` additionally to uv to install and sync the environment. It also introduces more consistency in the way how distribution packages are used in airflow sources - basicaly switching all internal distributions to use `pyproject.toml` approach and linking them all together via `uv`'s workspace feature. This enables much more streamlined development workflows, where any part of airflow development is manageable using `uv sync` in the right distribution - opening the way to moving more of the "sub-worfklows" from the CI image to local virtualenv environment. Unfortunately, such change cannot be done incrementally, really, because any change in the project layout drags with itself a lot of changes in the test/CI/management scripts, so we have to implement one big PR covering the move. This PR is "safe" in terms of the airflow and provider's code - it does not **really** (except occasional imports and type hint changes resulting from better isolation of packages) change Airflow code nor it should not affect any airflow or provider code, because it does not move any of the folder where airflow or provider's code is modified. It does move the test code - in a number of "auxiliary" distributions we have. It also moves the `docs` generation code to `devel-common` and introduces separate conf.py files for every doc package. What is still NOT done after that move and will be covered in the follow-up changes: * isolating docs-building to have separate configuraiton for docs building per distribution - allowing to run doc build locally with it's own conf.py file * moving some of the tests and checks out from breeze container image up to the local environment (for example mypy checks) and likely isolating them per-provider * Constraints are still generated using `pip freeze` and automatically managed by our custom scripts in `canary` builds - this will be replaced later by switching to `uv.lock` mechanism. * potentially, we could merge `devel-common` and `dev` - to be considered as a follow-up. * PROD image is stil build with `pip` by default when using `PyPI` or distribution packages - but we do not support building the source image with `pip` - when building from sources, uv is forced internally to install packages. Currently we have no plans to change default PROD building to use `uv`. This is the detailed list of changes implemented in this PR: * uv is now mandatory to install as pre-requisite in order to develop airflow. We do not support installing airflow for development with `pip` - there will be a lot of cases where it will not work for development - including development dependencies and installing several distributions together. * removed meta-package `hatch_build.py' and replacing it with pre-commit automatically modifying declarative pyproject.toml * stripped down `hatch_build_airflow_core.py` to only cover custom git and asset build hooks (and renaming the file to `hatch_build.py` and moving all airflow dependencies to `pyproject.toml` * converted "loose" packages in airflow repo into distributions: * docker-tests * kubernetes-tests * helm-tests * dev (here we do not have `src` subfolder - sources are directly in the distribution, which is for-now inconsistent with other distributions). The names of the `_tests` distribution folders have been renamed to the `-tests` convention to make sure the imports are always referring to base of each distribution and are not used from the content root. * Each eof the distributions (on top of already existing airflow-core, task-sdk, devel-common and 90+providers has it's own set of dependencies, and the top-level meta-package workspace root brings those distributions together allowing to install them all tegether with a simple `uv sync --all-packages` command and come up with consistent set of dependencies that are good for all those packages (yay!). This is used to build CI image with single common environment to run the tests (with some quirks due to constraints use where we have to manually list all distributions until we switch to `uv.lock` mechanism) * `doc` code is moved to `devel-common` distribution. The `doc` folder only keeps README informing where the other doc code is, the spelling_wordlist.txt and start_docs_server.sh. The documentation is generated in `generated/generated-docs/` folder which is entirely .gitignored. * the documentation is now fully moved to: * `airflow-core/docs` - documentation for Airflow Core * `providers/**/docs` - documentation for Providers * `chart/docs` - documentation for Helm Chart * `task-sdk/docs` - documentation for Task SDK (new format not yet published) * `docker-stack-docs` - documentation for Docker Stack' * `providers-summary-docs` - documentation for provider summary page * `versions` are not dynamically retrieved from `__init__.py` all of them are synchronized directly to pyproject.toml files - this way - except the custom build hook - we have no dynamic components in our `pyproject.toml` properties. * references to extras were removed from INSTALL and other places, the only references to extras remains in the user documentation - we stop using extras for local development, we switch to using dependency groups. * backtracking command was removed from breeze - we did not need it since we started using `uv` * internal commands (except constraint generation) have been moved to `uv` from `pip` * breeze requires `uv` to be installed and expects to be installed by `uv tool install -e ./dev/breeze` * pyproject.tomls are dynamically modified when we add a version suffix dynamically (`--version-suffix-for-pypi`) - only for the time of building the versions with updated suffix * `mypy` checks are now consistently used across all the different distributions and for consistency (and to fix some of the issues with namespace packages) rather than using "folder" approach when running mypy checks, even if we run mypy for whole distribution, we run check on individual files rather than on a folder. That adds consistency in execution of mypy heursistics. Rather than using in-container mypy script all the logic of selection and parameters passed to mypy are in pre-commit code. For now we are still using CI image to run mypy because mypy is very sensitive to version of dependencies installed, we should be able to switch to running mypy locally once we have the `uv.lock` mechanism incorporated in our workflows. * lower bounds for dependencies have been set consistently across all the distributions. With `uv sync` and dependabot, those should be generally kept consistently for the future * the `devel-common` dependencies have been groupped together in `devel-common` extras - including `basic`, `doc`, `doc-gen`, and `all` which will make it easier to install them for some OS-es (basic is used as default set of dependencies to cover most common set of development dependencies to be used for development) * generated/provider_dependencies.json are not committed to the repository any longer. They are .gitignored and geberated on-the-flight as needed (breeze will generate them automatically when empty and pre-commit will always regenerate them to be consistent with provider's pyproject.toml files. * `chart-utils` have been noved to `helm-tests` from `devel-common` as they were only used there. * for k8s tests we are using the `uv` main `.venv` environment rather than creating our own `.build` environment and we use `uv sync` to keep it in sync * Updated `uv` version to 0.6.10 * We are using `uv sync` to perform "upgrade to newer depencies" in `canary` builds and locally * leveldb has been turned into "dependency group" and removed from apache-airflow and apache-airflow-core extras, it is now only available by google provider's leveldb optional extra to install with `pip`
2025-04-02 13:11:13 +02:00
ARG AIRFLOW_IMAGE_TYPE
Optimize PROD image caching in CI (#35438) Turns out that some of the layers in our PROD image got invalidated because AIRFLOW_CONSTRAINTS_MODE used to build the cache for PROD image is "constraints" by default, while building images in "build-images" workflow for regular PRs and canary build uses "constraints-source-providers". The former is fine as default for PROD image (as oppose to CI image we build PROD image from released PyPI packages by default) but the latter is "proper" for the CI cache, because there, the image is built out of local packages prepared from sources. Turns out that the CONSTRAINT_MODE parameter had a profound impact on caching - because it was set before the "install_packages_from_branch_tip" step and - in fact - even before "install database clients" step, which caused our cache to only work for the "base OS dependencies" - installing database clients and installing airflow from branch tip (which works great for CI image) had always been done in PRs because the layers in cache with constraints env invalidated all subsequent layers. This had no big impact before when testing usually took much longer time - but since the testing has been vastly improved in #35160, now PROD image building continues running even after test complete and becomes the next frontier of optimization. This PR optimizes PROD image building in two ways: * caching is prepared with "source_providers" constraint mode, same as regular build * the AIRFLOW_CONSTRAINT_MODE and related arguments are moved after installing database clients, so that this parameter does not impact their caching.
2023-11-04 16:58:55 +01:00
ARG AIRFLOW_HOME
ARG AIRFLOW_USER_HOME_DIR
ARG AIRFLOW_UID
RUN adduser --gecos "First Last,RoomNumber,WorkPhone,HomePhone" --disabled-password \
--quiet "airflow" --uid "${AIRFLOW_UID}" --gid "0" --home "${AIRFLOW_USER_HOME_DIR}" && \
mkdir -p ${AIRFLOW_HOME} && chown -R "airflow:0" "${AIRFLOW_USER_HOME_DIR}" ${AIRFLOW_HOME}
Switch from --user to venv for PROD image and enable uv (#37796) This PR introduces a joint way to treat the .local (--user) folder as both - venv and `--user` package installation. It fixes a number of problems the `--user` installation created us in the past and does it in fully backwards compatible way. This improves both "production" use for end user as well as local iteration on the PROD image during tests - but also for CI. Improvements for "end user": * user does not have to use `pip install --user` to install new packages any more and it is not enabled by default with PIP_USER flag. * users can use uv to install packages when they extend the image (but it's not obligatory - pip continues working as it did) * users can use `uv` to build custom production image, which gives 40%-50% saving for image build time compring to `pip`. * python -m venv --system-site-packages continues to use the .local packages from the .local installation (and not uses them if --system-site-packages is not used) - so we have full compatibility with previous images. Improvements for development: * when image is built from sources (no --use-docker-context-files are specified), airflow is installed in --editable mode, which means that airflow + all providers are installed locally from airflow sources, not from packages - which means that both airflow and providers have the latest version inside the prod image. * when local sources changes and you want to run k8s tests locally, it is now WAY faster (several minutes) to iterate with your changes because you do not have to rebuild the base image - the only thing needed is to copy sources to the PROD image to "/opt/airflow" which is where editable installlation is done from. You only need to rebuild the image if dependencies change. * By default `uv` is used for local source build for k8s tests so even if you have to rebuild it, it is way faster (60%-80%) during iterating with the image. CI/DEV tooling improvements: * this PR switches to use `uv` by default for most prod images we build in CI, but it adds a check if the image still builds with `pip`. * we also switch to more PEP standard way of installing packages from local filesystem (package-name @ file:///FILE) Fixes: #37785 Fixes: #37815 Update contributing-docs/testing/k8s_tests.rst Update contributing-docs/testing/k8s_tests.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update scripts/docker/install_airflow.sh Update docs/docker-stack/changelog.rst Update docs/docker-stack/build.rst Co-authored-by: Niko Oliveira <onikolas@amazon.com>
2024-03-06 01:27:15 +01:00
COPY --chown=${AIRFLOW_UID}:0 ${DOCKER_CONTEXT_FILES} /docker-context-files
Optimize PROD image caching in CI (#35438) Turns out that some of the layers in our PROD image got invalidated because AIRFLOW_CONSTRAINTS_MODE used to build the cache for PROD image is "constraints" by default, while building images in "build-images" workflow for regular PRs and canary build uses "constraints-source-providers". The former is fine as default for PROD image (as oppose to CI image we build PROD image from released PyPI packages by default) but the latter is "proper" for the CI cache, because there, the image is built out of local packages prepared from sources. Turns out that the CONSTRAINT_MODE parameter had a profound impact on caching - because it was set before the "install_packages_from_branch_tip" step and - in fact - even before "install database clients" step, which caused our cache to only work for the "base OS dependencies" - installing database clients and installing airflow from branch tip (which works great for CI image) had always been done in PRs because the layers in cache with constraints env invalidated all subsequent layers. This had no big impact before when testing usually took much longer time - but since the testing has been vastly improved in #35160, now PROD image building continues running even after test complete and becomes the next frontier of optimization. This PR optimizes PROD image building in two ways: * caching is prepared with "source_providers" constraint mode, same as regular build * the AIRFLOW_CONSTRAINT_MODE and related arguments are moved after installing database clients, so that this parameter does not impact their caching.
2023-11-04 16:58:55 +01:00
USER airflow
ARG AIRFLOW_REPO=apache/airflow
ARG AIRFLOW_BRANCH=main
ARG AIRFLOW_EXTRAS
ARG ADDITIONAL_AIRFLOW_EXTRAS=""
# Allows to override constraints source
ARG CONSTRAINTS_GITHUB_REPOSITORY="apache/airflow"
ARG AIRFLOW_CONSTRAINTS_MODE="constraints"
2021-03-23 04:13:17 +01:00
ARG AIRFLOW_CONSTRAINTS_REFERENCE=""
ARG AIRFLOW_CONSTRAINTS_LOCATION=""
ARG DEFAULT_CONSTRAINTS_BRANCH="constraints-main"
# By default do not fallback to installation without constraints because it can hide problems with constraints
ARG AIRFLOW_FALLBACK_NO_CONSTRAINTS_INSTALLATION="false"
Optimize PROD image caching in CI (#35438) Turns out that some of the layers in our PROD image got invalidated because AIRFLOW_CONSTRAINTS_MODE used to build the cache for PROD image is "constraints" by default, while building images in "build-images" workflow for regular PRs and canary build uses "constraints-source-providers". The former is fine as default for PROD image (as oppose to CI image we build PROD image from released PyPI packages by default) but the latter is "proper" for the CI cache, because there, the image is built out of local packages prepared from sources. Turns out that the CONSTRAINT_MODE parameter had a profound impact on caching - because it was set before the "install_packages_from_branch_tip" step and - in fact - even before "install database clients" step, which caused our cache to only work for the "base OS dependencies" - installing database clients and installing airflow from branch tip (which works great for CI image) had always been done in PRs because the layers in cache with constraints env invalidated all subsequent layers. This had no big impact before when testing usually took much longer time - but since the testing has been vastly improved in #35160, now PROD image building continues running even after test complete and becomes the next frontier of optimization. This PR optimizes PROD image building in two ways: * caching is prepared with "source_providers" constraint mode, same as regular build * the AIRFLOW_CONSTRAINT_MODE and related arguments are moved after installing database clients, so that this parameter does not impact their caching.
2023-11-04 16:58:55 +01:00
# By default PIP has progress bar but you can disable it.
ARG PIP_PROGRESS_BAR
# This is airflow version that is put in the label of the image build
ARG AIRFLOW_VERSION
# By default latest released version of airflow is installed (when empty) but this value can be overridden
# and we can install version according to specification (For example ==2.0.2 or <3.0.0).
ARG AIRFLOW_VERSION_SPECIFICATION
2021-03-23 04:13:17 +01:00
# Determines the way airflow is installed. By default we install airflow from PyPI `apache-airflow` package
# But it also can be `.` from local installation or GitHub URL pointing to specific branch or tag
# Of Airflow. Note That for local source installation you need to have local sources of
# Airflow checked out together with the Dockerfile and AIRFLOW_SOURCES_FROM and AIRFLOW_SOURCES_TO
# set to "." and "/opt/airflow" respectively.
2021-03-23 04:13:17 +01:00
ARG AIRFLOW_INSTALLATION_METHOD="apache-airflow"
# By default we do not upgrade to latest dependencies
Simplify tooling by switching completely to uv (#48223) The lazy consensus decision has been made at the devlist to switch entirely to `uv` as development tool: link: https://lists.apache.org/thread/6xxdon9lmjx3xh8zw09xc5k9jxb2n256 This PR implements that decision and removes a lot of baggage connected to using `pip` additionally to uv to install and sync the environment. It also introduces more consistency in the way how distribution packages are used in airflow sources - basicaly switching all internal distributions to use `pyproject.toml` approach and linking them all together via `uv`'s workspace feature. This enables much more streamlined development workflows, where any part of airflow development is manageable using `uv sync` in the right distribution - opening the way to moving more of the "sub-worfklows" from the CI image to local virtualenv environment. Unfortunately, such change cannot be done incrementally, really, because any change in the project layout drags with itself a lot of changes in the test/CI/management scripts, so we have to implement one big PR covering the move. This PR is "safe" in terms of the airflow and provider's code - it does not **really** (except occasional imports and type hint changes resulting from better isolation of packages) change Airflow code nor it should not affect any airflow or provider code, because it does not move any of the folder where airflow or provider's code is modified. It does move the test code - in a number of "auxiliary" distributions we have. It also moves the `docs` generation code to `devel-common` and introduces separate conf.py files for every doc package. What is still NOT done after that move and will be covered in the follow-up changes: * isolating docs-building to have separate configuraiton for docs building per distribution - allowing to run doc build locally with it's own conf.py file * moving some of the tests and checks out from breeze container image up to the local environment (for example mypy checks) and likely isolating them per-provider * Constraints are still generated using `pip freeze` and automatically managed by our custom scripts in `canary` builds - this will be replaced later by switching to `uv.lock` mechanism. * potentially, we could merge `devel-common` and `dev` - to be considered as a follow-up. * PROD image is stil build with `pip` by default when using `PyPI` or distribution packages - but we do not support building the source image with `pip` - when building from sources, uv is forced internally to install packages. Currently we have no plans to change default PROD building to use `uv`. This is the detailed list of changes implemented in this PR: * uv is now mandatory to install as pre-requisite in order to develop airflow. We do not support installing airflow for development with `pip` - there will be a lot of cases where it will not work for development - including development dependencies and installing several distributions together. * removed meta-package `hatch_build.py' and replacing it with pre-commit automatically modifying declarative pyproject.toml * stripped down `hatch_build_airflow_core.py` to only cover custom git and asset build hooks (and renaming the file to `hatch_build.py` and moving all airflow dependencies to `pyproject.toml` * converted "loose" packages in airflow repo into distributions: * docker-tests * kubernetes-tests * helm-tests * dev (here we do not have `src` subfolder - sources are directly in the distribution, which is for-now inconsistent with other distributions). The names of the `_tests` distribution folders have been renamed to the `-tests` convention to make sure the imports are always referring to base of each distribution and are not used from the content root. * Each eof the distributions (on top of already existing airflow-core, task-sdk, devel-common and 90+providers has it's own set of dependencies, and the top-level meta-package workspace root brings those distributions together allowing to install them all tegether with a simple `uv sync --all-packages` command and come up with consistent set of dependencies that are good for all those packages (yay!). This is used to build CI image with single common environment to run the tests (with some quirks due to constraints use where we have to manually list all distributions until we switch to `uv.lock` mechanism) * `doc` code is moved to `devel-common` distribution. The `doc` folder only keeps README informing where the other doc code is, the spelling_wordlist.txt and start_docs_server.sh. The documentation is generated in `generated/generated-docs/` folder which is entirely .gitignored. * the documentation is now fully moved to: * `airflow-core/docs` - documentation for Airflow Core * `providers/**/docs` - documentation for Providers * `chart/docs` - documentation for Helm Chart * `task-sdk/docs` - documentation for Task SDK (new format not yet published) * `docker-stack-docs` - documentation for Docker Stack' * `providers-summary-docs` - documentation for provider summary page * `versions` are not dynamically retrieved from `__init__.py` all of them are synchronized directly to pyproject.toml files - this way - except the custom build hook - we have no dynamic components in our `pyproject.toml` properties. * references to extras were removed from INSTALL and other places, the only references to extras remains in the user documentation - we stop using extras for local development, we switch to using dependency groups. * backtracking command was removed from breeze - we did not need it since we started using `uv` * internal commands (except constraint generation) have been moved to `uv` from `pip` * breeze requires `uv` to be installed and expects to be installed by `uv tool install -e ./dev/breeze` * pyproject.tomls are dynamically modified when we add a version suffix dynamically (`--version-suffix-for-pypi`) - only for the time of building the versions with updated suffix * `mypy` checks are now consistently used across all the different distributions and for consistency (and to fix some of the issues with namespace packages) rather than using "folder" approach when running mypy checks, even if we run mypy for whole distribution, we run check on individual files rather than on a folder. That adds consistency in execution of mypy heursistics. Rather than using in-container mypy script all the logic of selection and parameters passed to mypy are in pre-commit code. For now we are still using CI image to run mypy because mypy is very sensitive to version of dependencies installed, we should be able to switch to running mypy locally once we have the `uv.lock` mechanism incorporated in our workflows. * lower bounds for dependencies have been set consistently across all the distributions. With `uv sync` and dependabot, those should be generally kept consistently for the future * the `devel-common` dependencies have been groupped together in `devel-common` extras - including `basic`, `doc`, `doc-gen`, and `all` which will make it easier to install them for some OS-es (basic is used as default set of dependencies to cover most common set of development dependencies to be used for development) * generated/provider_dependencies.json are not committed to the repository any longer. They are .gitignored and geberated on-the-flight as needed (breeze will generate them automatically when empty and pre-commit will always regenerate them to be consistent with provider's pyproject.toml files. * `chart-utils` have been noved to `helm-tests` from `devel-common` as they were only used there. * for k8s tests we are using the `uv` main `.venv` environment rather than creating our own `.build` environment and we use `uv sync` to keep it in sync * Updated `uv` version to 0.6.10 * We are using `uv sync` to perform "upgrade to newer depencies" in `canary` builds and locally * leveldb has been turned into "dependency group" and removed from apache-airflow and apache-airflow-core extras, it is now only available by google provider's leveldb optional extra to install with `pip`
2025-04-02 13:11:13 +02:00
ARG UPGRADE_RANDOM_INDICATOR_STRING=""
Switch from --user to venv for PROD image and enable uv (#37796) This PR introduces a joint way to treat the .local (--user) folder as both - venv and `--user` package installation. It fixes a number of problems the `--user` installation created us in the past and does it in fully backwards compatible way. This improves both "production" use for end user as well as local iteration on the PROD image during tests - but also for CI. Improvements for "end user": * user does not have to use `pip install --user` to install new packages any more and it is not enabled by default with PIP_USER flag. * users can use uv to install packages when they extend the image (but it's not obligatory - pip continues working as it did) * users can use `uv` to build custom production image, which gives 40%-50% saving for image build time compring to `pip`. * python -m venv --system-site-packages continues to use the .local packages from the .local installation (and not uses them if --system-site-packages is not used) - so we have full compatibility with previous images. Improvements for development: * when image is built from sources (no --use-docker-context-files are specified), airflow is installed in --editable mode, which means that airflow + all providers are installed locally from airflow sources, not from packages - which means that both airflow and providers have the latest version inside the prod image. * when local sources changes and you want to run k8s tests locally, it is now WAY faster (several minutes) to iterate with your changes because you do not have to rebuild the base image - the only thing needed is to copy sources to the PROD image to "/opt/airflow" which is where editable installlation is done from. You only need to rebuild the image if dependencies change. * By default `uv` is used for local source build for k8s tests so even if you have to rebuild it, it is way faster (60%-80%) during iterating with the image. CI/DEV tooling improvements: * this PR switches to use `uv` by default for most prod images we build in CI, but it adds a check if the image still builds with `pip`. * we also switch to more PEP standard way of installing packages from local filesystem (package-name @ file:///FILE) Fixes: #37785 Fixes: #37815 Update contributing-docs/testing/k8s_tests.rst Update contributing-docs/testing/k8s_tests.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update scripts/docker/install_airflow.sh Update docs/docker-stack/changelog.rst Update docs/docker-stack/build.rst Co-authored-by: Niko Oliveira <onikolas@amazon.com>
2024-03-06 01:27:15 +01:00
ARG AIRFLOW_SOURCES_FROM
ARG AIRFLOW_SOURCES_TO
Converts Dockerfiles to be standalone (#22492) This change is one of the biggest optimizations to the Dockerfiles that from the very beginning was a goal, but it has been enabled by switching to buildkit and recent relase of support for the 1.4 dockerfile syntax. This syntax introduced two features: * heredocs * links for COPY commands Both changes allows to solve multiple problems: * COPY for build scripts suffer from permission problems. Depending on umask setting of the host, the scripts could have different group permissions and invalidate docker cache. Inlining the scripts (automatically by pre-commit) gets rid of the problem completely * COPY --link allows to optimize and parallelize builds for Dockerfile.ci embedded source code. This should speed up not only building the images locally but also it will allow to use more efficiently cache for the CI builds (in case no source code change, the builds will use pre-cached layers from the cache more efficiently (and in parallel) * The PROD Dockerfile is now completely standalone. You do not need to have any folders or files to build Airlfow image. At the same time the versatility and support for multiple ways on how you can build the image (as described in https://airflow.apache.org/docs/docker-stack/build.html is maintained (this was a goal from the very beginning of the PROD Dockerfile but it was not easily achievable - heredocs allow to inline scripts that are used for the build and the pre-commits will make sure that there is one source of truth and nicely editable scripts for both PROD and CI Dockerfile. The last point is really cool, because it allows our users to build custom dockerfiles without checking out the code of Airflow, it is enough to download the latest released Dockerfile and they can easily build the image. Overall - this change will vastly optimize build speed for both PROD and CI images in multiple scenarios.
2022-03-27 19:19:02 +02:00
Simplify tooling by switching completely to uv (#48223) The lazy consensus decision has been made at the devlist to switch entirely to `uv` as development tool: link: https://lists.apache.org/thread/6xxdon9lmjx3xh8zw09xc5k9jxb2n256 This PR implements that decision and removes a lot of baggage connected to using `pip` additionally to uv to install and sync the environment. It also introduces more consistency in the way how distribution packages are used in airflow sources - basicaly switching all internal distributions to use `pyproject.toml` approach and linking them all together via `uv`'s workspace feature. This enables much more streamlined development workflows, where any part of airflow development is manageable using `uv sync` in the right distribution - opening the way to moving more of the "sub-worfklows" from the CI image to local virtualenv environment. Unfortunately, such change cannot be done incrementally, really, because any change in the project layout drags with itself a lot of changes in the test/CI/management scripts, so we have to implement one big PR covering the move. This PR is "safe" in terms of the airflow and provider's code - it does not **really** (except occasional imports and type hint changes resulting from better isolation of packages) change Airflow code nor it should not affect any airflow or provider code, because it does not move any of the folder where airflow or provider's code is modified. It does move the test code - in a number of "auxiliary" distributions we have. It also moves the `docs` generation code to `devel-common` and introduces separate conf.py files for every doc package. What is still NOT done after that move and will be covered in the follow-up changes: * isolating docs-building to have separate configuraiton for docs building per distribution - allowing to run doc build locally with it's own conf.py file * moving some of the tests and checks out from breeze container image up to the local environment (for example mypy checks) and likely isolating them per-provider * Constraints are still generated using `pip freeze` and automatically managed by our custom scripts in `canary` builds - this will be replaced later by switching to `uv.lock` mechanism. * potentially, we could merge `devel-common` and `dev` - to be considered as a follow-up. * PROD image is stil build with `pip` by default when using `PyPI` or distribution packages - but we do not support building the source image with `pip` - when building from sources, uv is forced internally to install packages. Currently we have no plans to change default PROD building to use `uv`. This is the detailed list of changes implemented in this PR: * uv is now mandatory to install as pre-requisite in order to develop airflow. We do not support installing airflow for development with `pip` - there will be a lot of cases where it will not work for development - including development dependencies and installing several distributions together. * removed meta-package `hatch_build.py' and replacing it with pre-commit automatically modifying declarative pyproject.toml * stripped down `hatch_build_airflow_core.py` to only cover custom git and asset build hooks (and renaming the file to `hatch_build.py` and moving all airflow dependencies to `pyproject.toml` * converted "loose" packages in airflow repo into distributions: * docker-tests * kubernetes-tests * helm-tests * dev (here we do not have `src` subfolder - sources are directly in the distribution, which is for-now inconsistent with other distributions). The names of the `_tests` distribution folders have been renamed to the `-tests` convention to make sure the imports are always referring to base of each distribution and are not used from the content root. * Each eof the distributions (on top of already existing airflow-core, task-sdk, devel-common and 90+providers has it's own set of dependencies, and the top-level meta-package workspace root brings those distributions together allowing to install them all tegether with a simple `uv sync --all-packages` command and come up with consistent set of dependencies that are good for all those packages (yay!). This is used to build CI image with single common environment to run the tests (with some quirks due to constraints use where we have to manually list all distributions until we switch to `uv.lock` mechanism) * `doc` code is moved to `devel-common` distribution. The `doc` folder only keeps README informing where the other doc code is, the spelling_wordlist.txt and start_docs_server.sh. The documentation is generated in `generated/generated-docs/` folder which is entirely .gitignored. * the documentation is now fully moved to: * `airflow-core/docs` - documentation for Airflow Core * `providers/**/docs` - documentation for Providers * `chart/docs` - documentation for Helm Chart * `task-sdk/docs` - documentation for Task SDK (new format not yet published) * `docker-stack-docs` - documentation for Docker Stack' * `providers-summary-docs` - documentation for provider summary page * `versions` are not dynamically retrieved from `__init__.py` all of them are synchronized directly to pyproject.toml files - this way - except the custom build hook - we have no dynamic components in our `pyproject.toml` properties. * references to extras were removed from INSTALL and other places, the only references to extras remains in the user documentation - we stop using extras for local development, we switch to using dependency groups. * backtracking command was removed from breeze - we did not need it since we started using `uv` * internal commands (except constraint generation) have been moved to `uv` from `pip` * breeze requires `uv` to be installed and expects to be installed by `uv tool install -e ./dev/breeze` * pyproject.tomls are dynamically modified when we add a version suffix dynamically (`--version-suffix-for-pypi`) - only for the time of building the versions with updated suffix * `mypy` checks are now consistently used across all the different distributions and for consistency (and to fix some of the issues with namespace packages) rather than using "folder" approach when running mypy checks, even if we run mypy for whole distribution, we run check on individual files rather than on a folder. That adds consistency in execution of mypy heursistics. Rather than using in-container mypy script all the logic of selection and parameters passed to mypy are in pre-commit code. For now we are still using CI image to run mypy because mypy is very sensitive to version of dependencies installed, we should be able to switch to running mypy locally once we have the `uv.lock` mechanism incorporated in our workflows. * lower bounds for dependencies have been set consistently across all the distributions. With `uv sync` and dependabot, those should be generally kept consistently for the future * the `devel-common` dependencies have been groupped together in `devel-common` extras - including `basic`, `doc`, `doc-gen`, and `all` which will make it easier to install them for some OS-es (basic is used as default set of dependencies to cover most common set of development dependencies to be used for development) * generated/provider_dependencies.json are not committed to the repository any longer. They are .gitignored and geberated on-the-flight as needed (breeze will generate them automatically when empty and pre-commit will always regenerate them to be consistent with provider's pyproject.toml files. * `chart-utils` have been noved to `helm-tests` from `devel-common` as they were only used there. * for k8s tests we are using the `uv` main `.venv` environment rather than creating our own `.build` environment and we use `uv sync` to keep it in sync * Updated `uv` version to 0.6.10 * We are using `uv sync` to perform "upgrade to newer depencies" in `canary` builds and locally * leveldb has been turned into "dependency group" and removed from apache-airflow and apache-airflow-core extras, it is now only available by google provider's leveldb optional extra to install with `pip`
2025-04-02 13:11:13 +02:00
ENV AIRFLOW_USER_HOME_DIR=${AIRFLOW_USER_HOME_DIR}
RUN if [[ -f /docker-context-files/pip.conf ]]; then \
mkdir -p ${AIRFLOW_USER_HOME_DIR}/.config/pip; \
cp /docker-context-files/pip.conf "${AIRFLOW_USER_HOME_DIR}/.config/pip/pip.conf"; \
fi; \
if [[ -f /docker-context-files/.piprc ]]; then \
cp /docker-context-files/.piprc "${AIRFLOW_USER_HOME_DIR}/.piprc"; \
fi
# Additional PIP flags passed to all pip install commands except reinstalling pip itself
ARG ADDITIONAL_PIP_INSTALL_FLAGS=""
Switch from --user to venv for PROD image and enable uv (#37796) This PR introduces a joint way to treat the .local (--user) folder as both - venv and `--user` package installation. It fixes a number of problems the `--user` installation created us in the past and does it in fully backwards compatible way. This improves both "production" use for end user as well as local iteration on the PROD image during tests - but also for CI. Improvements for "end user": * user does not have to use `pip install --user` to install new packages any more and it is not enabled by default with PIP_USER flag. * users can use uv to install packages when they extend the image (but it's not obligatory - pip continues working as it did) * users can use `uv` to build custom production image, which gives 40%-50% saving for image build time compring to `pip`. * python -m venv --system-site-packages continues to use the .local packages from the .local installation (and not uses them if --system-site-packages is not used) - so we have full compatibility with previous images. Improvements for development: * when image is built from sources (no --use-docker-context-files are specified), airflow is installed in --editable mode, which means that airflow + all providers are installed locally from airflow sources, not from packages - which means that both airflow and providers have the latest version inside the prod image. * when local sources changes and you want to run k8s tests locally, it is now WAY faster (several minutes) to iterate with your changes because you do not have to rebuild the base image - the only thing needed is to copy sources to the PROD image to "/opt/airflow" which is where editable installlation is done from. You only need to rebuild the image if dependencies change. * By default `uv` is used for local source build for k8s tests so even if you have to rebuild it, it is way faster (60%-80%) during iterating with the image. CI/DEV tooling improvements: * this PR switches to use `uv` by default for most prod images we build in CI, but it adds a check if the image still builds with `pip`. * we also switch to more PEP standard way of installing packages from local filesystem (package-name @ file:///FILE) Fixes: #37785 Fixes: #37815 Update contributing-docs/testing/k8s_tests.rst Update contributing-docs/testing/k8s_tests.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update scripts/docker/install_airflow.sh Update docs/docker-stack/changelog.rst Update docs/docker-stack/build.rst Co-authored-by: Niko Oliveira <onikolas@amazon.com>
2024-03-06 01:27:15 +01:00
ARG AIRFLOW_PIP_VERSION
ARG AIRFLOW_UV_VERSION
ARG AIRFLOW_USE_UV
ARG INCLUDE_PRE_RELEASE="false"
Switch from --user to venv for PROD image and enable uv (#37796) This PR introduces a joint way to treat the .local (--user) folder as both - venv and `--user` package installation. It fixes a number of problems the `--user` installation created us in the past and does it in fully backwards compatible way. This improves both "production" use for end user as well as local iteration on the PROD image during tests - but also for CI. Improvements for "end user": * user does not have to use `pip install --user` to install new packages any more and it is not enabled by default with PIP_USER flag. * users can use uv to install packages when they extend the image (but it's not obligatory - pip continues working as it did) * users can use `uv` to build custom production image, which gives 40%-50% saving for image build time compring to `pip`. * python -m venv --system-site-packages continues to use the .local packages from the .local installation (and not uses them if --system-site-packages is not used) - so we have full compatibility with previous images. Improvements for development: * when image is built from sources (no --use-docker-context-files are specified), airflow is installed in --editable mode, which means that airflow + all providers are installed locally from airflow sources, not from packages - which means that both airflow and providers have the latest version inside the prod image. * when local sources changes and you want to run k8s tests locally, it is now WAY faster (several minutes) to iterate with your changes because you do not have to rebuild the base image - the only thing needed is to copy sources to the PROD image to "/opt/airflow" which is where editable installlation is done from. You only need to rebuild the image if dependencies change. * By default `uv` is used for local source build for k8s tests so even if you have to rebuild it, it is way faster (60%-80%) during iterating with the image. CI/DEV tooling improvements: * this PR switches to use `uv` by default for most prod images we build in CI, but it adds a check if the image still builds with `pip`. * we also switch to more PEP standard way of installing packages from local filesystem (package-name @ file:///FILE) Fixes: #37785 Fixes: #37815 Update contributing-docs/testing/k8s_tests.rst Update contributing-docs/testing/k8s_tests.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update scripts/docker/install_airflow.sh Update docs/docker-stack/changelog.rst Update docs/docker-stack/build.rst Co-authored-by: Niko Oliveira <onikolas@amazon.com>
2024-03-06 01:27:15 +01:00
ENV AIRFLOW_PIP_VERSION=${AIRFLOW_PIP_VERSION} \
AIRFLOW_UV_VERSION=${AIRFLOW_UV_VERSION} \
AIRFLOW_USE_UV=${AIRFLOW_USE_UV} \
AIRFLOW_VERSION=${AIRFLOW_VERSION} \
AIRFLOW_INSTALLATION_METHOD=${AIRFLOW_INSTALLATION_METHOD} \
AIRFLOW_VERSION_SPECIFICATION=${AIRFLOW_VERSION_SPECIFICATION} \
AIRFLOW_SOURCES_FROM=${AIRFLOW_SOURCES_FROM} \
AIRFLOW_SOURCES_TO=${AIRFLOW_SOURCES_TO} \
AIRFLOW_REPO=${AIRFLOW_REPO} \
AIRFLOW_BRANCH=${AIRFLOW_BRANCH} \
AIRFLOW_EXTRAS=${AIRFLOW_EXTRAS}${ADDITIONAL_AIRFLOW_EXTRAS:+,}${ADDITIONAL_AIRFLOW_EXTRAS} \
CONSTRAINTS_GITHUB_REPOSITORY=${CONSTRAINTS_GITHUB_REPOSITORY} \
AIRFLOW_CONSTRAINTS_MODE=${AIRFLOW_CONSTRAINTS_MODE} \
AIRFLOW_CONSTRAINTS_REFERENCE=${AIRFLOW_CONSTRAINTS_REFERENCE} \
AIRFLOW_CONSTRAINTS_LOCATION=${AIRFLOW_CONSTRAINTS_LOCATION} \
AIRFLOW_FALLBACK_NO_CONSTRAINTS_INSTALLATION=${AIRFLOW_FALLBACK_NO_CONSTRAINTS_INSTALLATION} \
DEFAULT_CONSTRAINTS_BRANCH=${DEFAULT_CONSTRAINTS_BRANCH} \
Switch from --user to venv for PROD image and enable uv (#37796) This PR introduces a joint way to treat the .local (--user) folder as both - venv and `--user` package installation. It fixes a number of problems the `--user` installation created us in the past and does it in fully backwards compatible way. This improves both "production" use for end user as well as local iteration on the PROD image during tests - but also for CI. Improvements for "end user": * user does not have to use `pip install --user` to install new packages any more and it is not enabled by default with PIP_USER flag. * users can use uv to install packages when they extend the image (but it's not obligatory - pip continues working as it did) * users can use `uv` to build custom production image, which gives 40%-50% saving for image build time compring to `pip`. * python -m venv --system-site-packages continues to use the .local packages from the .local installation (and not uses them if --system-site-packages is not used) - so we have full compatibility with previous images. Improvements for development: * when image is built from sources (no --use-docker-context-files are specified), airflow is installed in --editable mode, which means that airflow + all providers are installed locally from airflow sources, not from packages - which means that both airflow and providers have the latest version inside the prod image. * when local sources changes and you want to run k8s tests locally, it is now WAY faster (several minutes) to iterate with your changes because you do not have to rebuild the base image - the only thing needed is to copy sources to the PROD image to "/opt/airflow" which is where editable installlation is done from. You only need to rebuild the image if dependencies change. * By default `uv` is used for local source build for k8s tests so even if you have to rebuild it, it is way faster (60%-80%) during iterating with the image. CI/DEV tooling improvements: * this PR switches to use `uv` by default for most prod images we build in CI, but it adds a check if the image still builds with `pip`. * we also switch to more PEP standard way of installing packages from local filesystem (package-name @ file:///FILE) Fixes: #37785 Fixes: #37815 Update contributing-docs/testing/k8s_tests.rst Update contributing-docs/testing/k8s_tests.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update scripts/docker/install_airflow.sh Update docs/docker-stack/changelog.rst Update docs/docker-stack/build.rst Co-authored-by: Niko Oliveira <onikolas@amazon.com>
2024-03-06 01:27:15 +01:00
PATH=${AIRFLOW_USER_HOME_DIR}/.local/bin:${PATH} \
PIP_PROGRESS_BAR=${PIP_PROGRESS_BAR} \
ADDITIONAL_PIP_INSTALL_FLAGS=${ADDITIONAL_PIP_INSTALL_FLAGS} \
AIRFLOW_HOME=${AIRFLOW_HOME} \
Simplify tooling by switching completely to uv (#48223) The lazy consensus decision has been made at the devlist to switch entirely to `uv` as development tool: link: https://lists.apache.org/thread/6xxdon9lmjx3xh8zw09xc5k9jxb2n256 This PR implements that decision and removes a lot of baggage connected to using `pip` additionally to uv to install and sync the environment. It also introduces more consistency in the way how distribution packages are used in airflow sources - basicaly switching all internal distributions to use `pyproject.toml` approach and linking them all together via `uv`'s workspace feature. This enables much more streamlined development workflows, where any part of airflow development is manageable using `uv sync` in the right distribution - opening the way to moving more of the "sub-worfklows" from the CI image to local virtualenv environment. Unfortunately, such change cannot be done incrementally, really, because any change in the project layout drags with itself a lot of changes in the test/CI/management scripts, so we have to implement one big PR covering the move. This PR is "safe" in terms of the airflow and provider's code - it does not **really** (except occasional imports and type hint changes resulting from better isolation of packages) change Airflow code nor it should not affect any airflow or provider code, because it does not move any of the folder where airflow or provider's code is modified. It does move the test code - in a number of "auxiliary" distributions we have. It also moves the `docs` generation code to `devel-common` and introduces separate conf.py files for every doc package. What is still NOT done after that move and will be covered in the follow-up changes: * isolating docs-building to have separate configuraiton for docs building per distribution - allowing to run doc build locally with it's own conf.py file * moving some of the tests and checks out from breeze container image up to the local environment (for example mypy checks) and likely isolating them per-provider * Constraints are still generated using `pip freeze` and automatically managed by our custom scripts in `canary` builds - this will be replaced later by switching to `uv.lock` mechanism. * potentially, we could merge `devel-common` and `dev` - to be considered as a follow-up. * PROD image is stil build with `pip` by default when using `PyPI` or distribution packages - but we do not support building the source image with `pip` - when building from sources, uv is forced internally to install packages. Currently we have no plans to change default PROD building to use `uv`. This is the detailed list of changes implemented in this PR: * uv is now mandatory to install as pre-requisite in order to develop airflow. We do not support installing airflow for development with `pip` - there will be a lot of cases where it will not work for development - including development dependencies and installing several distributions together. * removed meta-package `hatch_build.py' and replacing it with pre-commit automatically modifying declarative pyproject.toml * stripped down `hatch_build_airflow_core.py` to only cover custom git and asset build hooks (and renaming the file to `hatch_build.py` and moving all airflow dependencies to `pyproject.toml` * converted "loose" packages in airflow repo into distributions: * docker-tests * kubernetes-tests * helm-tests * dev (here we do not have `src` subfolder - sources are directly in the distribution, which is for-now inconsistent with other distributions). The names of the `_tests` distribution folders have been renamed to the `-tests` convention to make sure the imports are always referring to base of each distribution and are not used from the content root. * Each eof the distributions (on top of already existing airflow-core, task-sdk, devel-common and 90+providers has it's own set of dependencies, and the top-level meta-package workspace root brings those distributions together allowing to install them all tegether with a simple `uv sync --all-packages` command and come up with consistent set of dependencies that are good for all those packages (yay!). This is used to build CI image with single common environment to run the tests (with some quirks due to constraints use where we have to manually list all distributions until we switch to `uv.lock` mechanism) * `doc` code is moved to `devel-common` distribution. The `doc` folder only keeps README informing where the other doc code is, the spelling_wordlist.txt and start_docs_server.sh. The documentation is generated in `generated/generated-docs/` folder which is entirely .gitignored. * the documentation is now fully moved to: * `airflow-core/docs` - documentation for Airflow Core * `providers/**/docs` - documentation for Providers * `chart/docs` - documentation for Helm Chart * `task-sdk/docs` - documentation for Task SDK (new format not yet published) * `docker-stack-docs` - documentation for Docker Stack' * `providers-summary-docs` - documentation for provider summary page * `versions` are not dynamically retrieved from `__init__.py` all of them are synchronized directly to pyproject.toml files - this way - except the custom build hook - we have no dynamic components in our `pyproject.toml` properties. * references to extras were removed from INSTALL and other places, the only references to extras remains in the user documentation - we stop using extras for local development, we switch to using dependency groups. * backtracking command was removed from breeze - we did not need it since we started using `uv` * internal commands (except constraint generation) have been moved to `uv` from `pip` * breeze requires `uv` to be installed and expects to be installed by `uv tool install -e ./dev/breeze` * pyproject.tomls are dynamically modified when we add a version suffix dynamically (`--version-suffix-for-pypi`) - only for the time of building the versions with updated suffix * `mypy` checks are now consistently used across all the different distributions and for consistency (and to fix some of the issues with namespace packages) rather than using "folder" approach when running mypy checks, even if we run mypy for whole distribution, we run check on individual files rather than on a folder. That adds consistency in execution of mypy heursistics. Rather than using in-container mypy script all the logic of selection and parameters passed to mypy are in pre-commit code. For now we are still using CI image to run mypy because mypy is very sensitive to version of dependencies installed, we should be able to switch to running mypy locally once we have the `uv.lock` mechanism incorporated in our workflows. * lower bounds for dependencies have been set consistently across all the distributions. With `uv sync` and dependabot, those should be generally kept consistently for the future * the `devel-common` dependencies have been groupped together in `devel-common` extras - including `basic`, `doc`, `doc-gen`, and `all` which will make it easier to install them for some OS-es (basic is used as default set of dependencies to cover most common set of development dependencies to be used for development) * generated/provider_dependencies.json are not committed to the repository any longer. They are .gitignored and geberated on-the-flight as needed (breeze will generate them automatically when empty and pre-commit will always regenerate them to be consistent with provider's pyproject.toml files. * `chart-utils` have been noved to `helm-tests` from `devel-common` as they were only used there. * for k8s tests we are using the `uv` main `.venv` environment rather than creating our own `.build` environment and we use `uv sync` to keep it in sync * Updated `uv` version to 0.6.10 * We are using `uv sync` to perform "upgrade to newer depencies" in `canary` builds and locally * leveldb has been turned into "dependency group" and removed from apache-airflow and apache-airflow-core extras, it is now only available by google provider's leveldb optional extra to install with `pip`
2025-04-02 13:11:13 +02:00
AIRFLOW_IMAGE_TYPE=${AIRFLOW_IMAGE_TYPE} \
AIRFLOW_UID=${AIRFLOW_UID} \
INCLUDE_PRE_RELEASE=${INCLUDE_PRE_RELEASE} \
Simplify tooling by switching completely to uv (#48223) The lazy consensus decision has been made at the devlist to switch entirely to `uv` as development tool: link: https://lists.apache.org/thread/6xxdon9lmjx3xh8zw09xc5k9jxb2n256 This PR implements that decision and removes a lot of baggage connected to using `pip` additionally to uv to install and sync the environment. It also introduces more consistency in the way how distribution packages are used in airflow sources - basicaly switching all internal distributions to use `pyproject.toml` approach and linking them all together via `uv`'s workspace feature. This enables much more streamlined development workflows, where any part of airflow development is manageable using `uv sync` in the right distribution - opening the way to moving more of the "sub-worfklows" from the CI image to local virtualenv environment. Unfortunately, such change cannot be done incrementally, really, because any change in the project layout drags with itself a lot of changes in the test/CI/management scripts, so we have to implement one big PR covering the move. This PR is "safe" in terms of the airflow and provider's code - it does not **really** (except occasional imports and type hint changes resulting from better isolation of packages) change Airflow code nor it should not affect any airflow or provider code, because it does not move any of the folder where airflow or provider's code is modified. It does move the test code - in a number of "auxiliary" distributions we have. It also moves the `docs` generation code to `devel-common` and introduces separate conf.py files for every doc package. What is still NOT done after that move and will be covered in the follow-up changes: * isolating docs-building to have separate configuraiton for docs building per distribution - allowing to run doc build locally with it's own conf.py file * moving some of the tests and checks out from breeze container image up to the local environment (for example mypy checks) and likely isolating them per-provider * Constraints are still generated using `pip freeze` and automatically managed by our custom scripts in `canary` builds - this will be replaced later by switching to `uv.lock` mechanism. * potentially, we could merge `devel-common` and `dev` - to be considered as a follow-up. * PROD image is stil build with `pip` by default when using `PyPI` or distribution packages - but we do not support building the source image with `pip` - when building from sources, uv is forced internally to install packages. Currently we have no plans to change default PROD building to use `uv`. This is the detailed list of changes implemented in this PR: * uv is now mandatory to install as pre-requisite in order to develop airflow. We do not support installing airflow for development with `pip` - there will be a lot of cases where it will not work for development - including development dependencies and installing several distributions together. * removed meta-package `hatch_build.py' and replacing it with pre-commit automatically modifying declarative pyproject.toml * stripped down `hatch_build_airflow_core.py` to only cover custom git and asset build hooks (and renaming the file to `hatch_build.py` and moving all airflow dependencies to `pyproject.toml` * converted "loose" packages in airflow repo into distributions: * docker-tests * kubernetes-tests * helm-tests * dev (here we do not have `src` subfolder - sources are directly in the distribution, which is for-now inconsistent with other distributions). The names of the `_tests` distribution folders have been renamed to the `-tests` convention to make sure the imports are always referring to base of each distribution and are not used from the content root. * Each eof the distributions (on top of already existing airflow-core, task-sdk, devel-common and 90+providers has it's own set of dependencies, and the top-level meta-package workspace root brings those distributions together allowing to install them all tegether with a simple `uv sync --all-packages` command and come up with consistent set of dependencies that are good for all those packages (yay!). This is used to build CI image with single common environment to run the tests (with some quirks due to constraints use where we have to manually list all distributions until we switch to `uv.lock` mechanism) * `doc` code is moved to `devel-common` distribution. The `doc` folder only keeps README informing where the other doc code is, the spelling_wordlist.txt and start_docs_server.sh. The documentation is generated in `generated/generated-docs/` folder which is entirely .gitignored. * the documentation is now fully moved to: * `airflow-core/docs` - documentation for Airflow Core * `providers/**/docs` - documentation for Providers * `chart/docs` - documentation for Helm Chart * `task-sdk/docs` - documentation for Task SDK (new format not yet published) * `docker-stack-docs` - documentation for Docker Stack' * `providers-summary-docs` - documentation for provider summary page * `versions` are not dynamically retrieved from `__init__.py` all of them are synchronized directly to pyproject.toml files - this way - except the custom build hook - we have no dynamic components in our `pyproject.toml` properties. * references to extras were removed from INSTALL and other places, the only references to extras remains in the user documentation - we stop using extras for local development, we switch to using dependency groups. * backtracking command was removed from breeze - we did not need it since we started using `uv` * internal commands (except constraint generation) have been moved to `uv` from `pip` * breeze requires `uv` to be installed and expects to be installed by `uv tool install -e ./dev/breeze` * pyproject.tomls are dynamically modified when we add a version suffix dynamically (`--version-suffix-for-pypi`) - only for the time of building the versions with updated suffix * `mypy` checks are now consistently used across all the different distributions and for consistency (and to fix some of the issues with namespace packages) rather than using "folder" approach when running mypy checks, even if we run mypy for whole distribution, we run check on individual files rather than on a folder. That adds consistency in execution of mypy heursistics. Rather than using in-container mypy script all the logic of selection and parameters passed to mypy are in pre-commit code. For now we are still using CI image to run mypy because mypy is very sensitive to version of dependencies installed, we should be able to switch to running mypy locally once we have the `uv.lock` mechanism incorporated in our workflows. * lower bounds for dependencies have been set consistently across all the distributions. With `uv sync` and dependabot, those should be generally kept consistently for the future * the `devel-common` dependencies have been groupped together in `devel-common` extras - including `basic`, `doc`, `doc-gen`, and `all` which will make it easier to install them for some OS-es (basic is used as default set of dependencies to cover most common set of development dependencies to be used for development) * generated/provider_dependencies.json are not committed to the repository any longer. They are .gitignored and geberated on-the-flight as needed (breeze will generate them automatically when empty and pre-commit will always regenerate them to be consistent with provider's pyproject.toml files. * `chart-utils` have been noved to `helm-tests` from `devel-common` as they were only used there. * for k8s tests we are using the `uv` main `.venv` environment rather than creating our own `.build` environment and we use `uv sync` to keep it in sync * Updated `uv` version to 0.6.10 * We are using `uv sync` to perform "upgrade to newer depencies" in `canary` builds and locally * leveldb has been turned into "dependency group" and removed from apache-airflow and apache-airflow-core extras, it is now only available by google provider's leveldb optional extra to install with `pip`
2025-04-02 13:11:13 +02:00
UPGRADE_RANDOM_INDICATOR_STRING=${UPGRADE_RANDOM_INDICATOR_STRING}
Switch from --user to venv for PROD image and enable uv (#37796) This PR introduces a joint way to treat the .local (--user) folder as both - venv and `--user` package installation. It fixes a number of problems the `--user` installation created us in the past and does it in fully backwards compatible way. This improves both "production" use for end user as well as local iteration on the PROD image during tests - but also for CI. Improvements for "end user": * user does not have to use `pip install --user` to install new packages any more and it is not enabled by default with PIP_USER flag. * users can use uv to install packages when they extend the image (but it's not obligatory - pip continues working as it did) * users can use `uv` to build custom production image, which gives 40%-50% saving for image build time compring to `pip`. * python -m venv --system-site-packages continues to use the .local packages from the .local installation (and not uses them if --system-site-packages is not used) - so we have full compatibility with previous images. Improvements for development: * when image is built from sources (no --use-docker-context-files are specified), airflow is installed in --editable mode, which means that airflow + all providers are installed locally from airflow sources, not from packages - which means that both airflow and providers have the latest version inside the prod image. * when local sources changes and you want to run k8s tests locally, it is now WAY faster (several minutes) to iterate with your changes because you do not have to rebuild the base image - the only thing needed is to copy sources to the PROD image to "/opt/airflow" which is where editable installlation is done from. You only need to rebuild the image if dependencies change. * By default `uv` is used for local source build for k8s tests so even if you have to rebuild it, it is way faster (60%-80%) during iterating with the image. CI/DEV tooling improvements: * this PR switches to use `uv` by default for most prod images we build in CI, but it adds a check if the image still builds with `pip`. * we also switch to more PEP standard way of installing packages from local filesystem (package-name @ file:///FILE) Fixes: #37785 Fixes: #37815 Update contributing-docs/testing/k8s_tests.rst Update contributing-docs/testing/k8s_tests.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update scripts/docker/install_airflow.sh Update docs/docker-stack/changelog.rst Update docs/docker-stack/build.rst Co-authored-by: Niko Oliveira <onikolas@amazon.com>
2024-03-06 01:27:15 +01:00
# Copy all scripts required for installation - changing any of those should lead to
# rebuilding from here
Simplify caching mechanisms for CI and PROD images (#45266) For a long time we had used a sophisticated mechanism to speed up our CI jobs by building the images in "pull_request_target" workflow and pushing them to GitHub registry. That however had several drawbacks: * CI image was complex when it comes to layer setup (we had to pre- cache installed dependencies by installing them from branch tip * The pull_request_target is a very dangerous workflow, we had a number of security problems with it (and it's difficult to debug) * Caching of `pip` and `uv` was not used because it increased size of the image significantly This PR significantly improves the caching mechanisms for the images building of several advacements that were not possible before: * The upload-artifacts@v4 action and improved stash action developed by @assignUser and published in "apache/infrastructure-actions" allows us to store all images (8GB per run) in artifacts rather than in registry - so we can do the image build once and share it with all the jobs. * The uv speed is "enough" to allow occasional installation of Airlfow locally. This allows to utilize cache-mount and locally build uv cache, rather than rely on "remote" cache when we are building local images for breeze. The first time you build local breeze image it will take 2-5 more minutes (depending on your network speed, but because we can utilise cache mounts, every subsequent build should be very fast - even if all dependencies change. Using uv also allows to "always" reinstall airflow when you build the image even if single source file changed, because with cache it takes sub-seconds to reinstall airflow and all dependencies. * the cache mounts are not included in the image size, and since we can export and import images in CI in artifacts and we do not need to rebuild them, the images shared as compressed artifacts are relatively small (2GB) - cache of `uv` is around 4GB on top of that so sharing image built in the "build image" job with other jobs in the same workflow is fast. * we are still using registry cache for the "non-python" parts of the image - both CI and breeze image build speed benefit from using the image cache for system dependencies, database clients etc. this helps with faster rebuilds of the images for local development environment * documentation has been updated to reflect the new CI setup. The diagrams showing the workflows of ours are no longer needed as the workflows are quite straightforward when they are looked at. Fixes: #42999 Fixes: #43268
2024-12-29 22:58:27 +01:00
COPY --from=scripts common.sh install_packaging_tools.sh create_prod_venv.sh /scripts/docker/
2021-03-23 04:13:17 +01:00
# We can set this value to true in case we want to install .whl/.tar.gz packages placed in the
# docker-context-files folder. This can be done for both additional packages you want to install
Move airflow sources to airflow-core package (#47798) This is continuation of the separation of the Airflow codebase into separate distributions. This one splits airflow into two of them: * apache-airflow - becomes an empty, meta no-code distribution that only has dependencies to apache-airflow-core and task-sdk distributions and it has preinstalled provider distributions added in standard "wheel" distribution. All "extras" lead either to "apache-airflow-core" extras or to providers - the dependencies and optional dependencies are calculated differently depending on "editable" or "standard" mode - in editable mode, just provider dependencies are installed for preinstalled providers in standard mode - those preinstalled providers are dependencies. * the apache-airflow-core distribution contains all airflow core sources (previously in apache-airflow) and it has no provider extras. Thanks to that apache-airflow distribution does not have any dynamically calculated dependencies. * the apache-airflow-core distribution hs "hatch_build_airflow_core.py" build hooks that add custom build target and implement custom cleanup in order to implement compiling assets as part of the build. * During the move, the following changes were applied for consistency: * packages when used in context of distribution packages have been renamed to "distributions" - including all documentations and commands in breeze to void confusion with import packages (see https://packaging.python.org/en/latest/discussions/distribution-package-vs-import-package/) * all tests in `airflow-core` follow now the same convention where tests are in `unit`, `system` and `integration` package. no extra package has been as second level, because all the provider tests have "<PROVIDER>" there, so we just have to avoid naming airflow unit."<PROVIDER>" with the same name as provider. * all tooling in CI/DEV have been updated to follow the new structure. We should always build to packages now when we are building them using `breeze`.
2025-03-21 14:25:26 +01:00
# as well as Airflow and provider distributions (it will be automatically detected if airflow
# is installed from docker-context files rather than from PyPI)
Move airflow sources to airflow-core package (#47798) This is continuation of the separation of the Airflow codebase into separate distributions. This one splits airflow into two of them: * apache-airflow - becomes an empty, meta no-code distribution that only has dependencies to apache-airflow-core and task-sdk distributions and it has preinstalled provider distributions added in standard "wheel" distribution. All "extras" lead either to "apache-airflow-core" extras or to providers - the dependencies and optional dependencies are calculated differently depending on "editable" or "standard" mode - in editable mode, just provider dependencies are installed for preinstalled providers in standard mode - those preinstalled providers are dependencies. * the apache-airflow-core distribution contains all airflow core sources (previously in apache-airflow) and it has no provider extras. Thanks to that apache-airflow distribution does not have any dynamically calculated dependencies. * the apache-airflow-core distribution hs "hatch_build_airflow_core.py" build hooks that add custom build target and implement custom cleanup in order to implement compiling assets as part of the build. * During the move, the following changes were applied for consistency: * packages when used in context of distribution packages have been renamed to "distributions" - including all documentations and commands in breeze to void confusion with import packages (see https://packaging.python.org/en/latest/discussions/distribution-package-vs-import-package/) * all tests in `airflow-core` follow now the same convention where tests are in `unit`, `system` and `integration` package. no extra package has been as second level, because all the provider tests have "<PROVIDER>" there, so we just have to avoid naming airflow unit."<PROVIDER>" with the same name as provider. * all tooling in CI/DEV have been updated to follow the new structure. We should always build to packages now when we are building them using `breeze`.
2025-03-21 14:25:26 +01:00
ARG INSTALL_DISTRIBUTIONS_FROM_CONTEXT="false"
# Normally constraints are not used when context packages are build - because we might have packages
# that are conflicting with Airflow constraints, however there are cases when we want to use constraints
# for example in CI builds when we already have source-package constraints - either from github branch or
# from eager-upgraded constraints by the CI builds
Move airflow sources to airflow-core package (#47798) This is continuation of the separation of the Airflow codebase into separate distributions. This one splits airflow into two of them: * apache-airflow - becomes an empty, meta no-code distribution that only has dependencies to apache-airflow-core and task-sdk distributions and it has preinstalled provider distributions added in standard "wheel" distribution. All "extras" lead either to "apache-airflow-core" extras or to providers - the dependencies and optional dependencies are calculated differently depending on "editable" or "standard" mode - in editable mode, just provider dependencies are installed for preinstalled providers in standard mode - those preinstalled providers are dependencies. * the apache-airflow-core distribution contains all airflow core sources (previously in apache-airflow) and it has no provider extras. Thanks to that apache-airflow distribution does not have any dynamically calculated dependencies. * the apache-airflow-core distribution hs "hatch_build_airflow_core.py" build hooks that add custom build target and implement custom cleanup in order to implement compiling assets as part of the build. * During the move, the following changes were applied for consistency: * packages when used in context of distribution packages have been renamed to "distributions" - including all documentations and commands in breeze to void confusion with import packages (see https://packaging.python.org/en/latest/discussions/distribution-package-vs-import-package/) * all tests in `airflow-core` follow now the same convention where tests are in `unit`, `system` and `integration` package. no extra package has been as second level, because all the provider tests have "<PROVIDER>" there, so we just have to avoid naming airflow unit."<PROVIDER>" with the same name as provider. * all tooling in CI/DEV have been updated to follow the new structure. We should always build to packages now when we are building them using `breeze`.
2025-03-21 14:25:26 +01:00
ARG USE_CONSTRAINTS_FOR_CONTEXT_DISTRIBUTIONS="false"
# In case of Production build image segment we want to pre-install main version of airflow
2020-10-21 22:32:41 +10:00
# dependencies from GitHub so that we do not have to always reinstall it from the scratch.
Standardize airflow build process and switch to Hatchling build backend (#36537) This PR changes Airflow installation and build backend to use new standard Python ways of building Python applications. We've been trying to do it for quite a while. Airflow tranditionally has been using complex and convoluted build process based on setuptools and (extremely) custom setup.py file. It survived migration to Airflow 2.0 and splitting Airlfow monorepo into Airflow and Providers, adding pre-installed providers and switching providers to use flit (and follow build standards). So far tooling in Python ecosystme had not been able to fuflill our needs and we refrained to develop our own tooling, but finally with appearance of Hatch (managed by Python Packaging Authority) and few recent advancements there we are finally able to swtich to Python standard ways of managing project dependnecy configuration and project build setup (with a few customizations). This PR makes airflow build process follow those standard PEPs: * Airflow has all build configuration stored in pyproject.toml following PEP 518 which allows any fronted (`pip`, `poetry`, `hatch`, `flit`, or whatever other frontend is used to install required build dependendencies to install Airflow locally and to build distribution pacakges (sdist/wheel) * Hatchling backend follows PEP 517 for standard source tree and build backend implementation that allows to execute the build in a frontend-independent way * We store all project metadata in pyprooject.toml - following PEP 621 where all necessary project metadata components were defined. * We plug-in into Hatchling "editable build" hooks following PEP 660. Hatchling internally builds editable wheel that is used as ephemeral step and communication between backend and frontend (and this ephemeral wheel is used to make editable installation of the projeect - suitable for fast iteration of code without reinstalling the package. With Airflow having many provider packages in single source tree where we want to be able to install and develop airflow and providers together, this is not a small feat to implement the case wher editable installation has to behave quite a bit differently when it comes to packaging and dependencies for editable install (when you want to edit sources directly) and installable package (where you want to have separate Airflow package and provider packages). Fortunately the standardisation efforts in the Python Packaging community and tooling implementing it had finally made it possible. Some of the important ways bow this has been achieved: * We continue using provider.yaml in providers as the single source of trutgh for per-provider dependencies. We added a possibility to specify "devel-dependencies" in provider.yaml so that all per-provider dependencies in `generated/provider_dependencies.json` and `pyproject.toml` are generated from those dependencies via update-providers-dependencies pre-commit. * Pyproject.toml is generally managed manually, but the part where provider dependencies and bundle dependencies are used is automatically updated by a pre-commit whenever provider dependencies change. Those generated provider dependencies contain just dependencies of providers - not the provider packages, but in the final "standard" wheel file they are replaced with "apache-airflow-providers-PROVIDER" dependencies - so that the wheel package will only install the provider and use the dependencies of that version of provider it installs. * We are utilising custom hatchiling build hooks (PEP 660 standard) that allow to modify 'standard' wheel package on-the-fly when the wheel is being prepared by adding preinstalled package dependencies (which are not needed in editable build) and by removing all devel extras (that are not needed in the PyPI distributed wheel package). This allows to solve the conundrum of having different "editable" and "standard" behaviour while keeping the same project specification in pyproject.toml. * We added description of how `Hatch` can be employed as build frontend in order to manage local virtualenv and install Airflow in editable way easily - while keeping all properties of the installed application (including working airflow cli and package metadata discovery) as well as how to use PEP-standard ways of bulding wheel and sdist packages. * We have a custom step (following PEP-standards) to inject airflow-specific build steps - compiling www assets and generating git commit hash version to display it in the UI * We also show how all this makes it possible to make it easy to manage local virtualenvs and editable installations for Airflow contributors - without vendor lock-in of the build tools as by following standard PEPs Airflow can be locally and editably installed by anyone using any build front-end tools following the standards - whether you use `pip`, `poetry`, `Hatch`, `flit` or any other frontent build tools, Airflow local installation and package building will work the same way for all of them, where both "editable" and "standard" package prepration is managed by `hatchling` backend in the same way. * Previously our extras contained a "." which is not normalized name for extras - `pip` and other tools replaced it automatically with `_'. This change updates the extra names to contain '-' rather than '.' in the name, following PEP-685. This should be fully backwards compatible, users will still be able to use "." but it will be normalized to "-" in Airflow packages. This is also future proof as it is expected that all package managers and tools will eventually use PEP-685 applied to extras, even if currently some of the tools (pip + setuptools) might generate warnings. * Additionally, this change organizes the documentation around the extras and dependencies, explaining the reasoning behind all the different extras we have. * As a bonus (and this is what we used to test it all) we are documenting how to use Hatch frontend to: * manage multiple Python installations * manage multiple Pythob virtualenv environments * build Airflow packages for release management
2024-01-10 21:19:02 +01:00
# The Airflow and providers are uninstalled, only dependencies remain
# the cache is only used when "upgrade to newer dependencies" is not set to automatically
# account for removed dependencies (we do not install them in the first place) and in case
Move airflow sources to airflow-core package (#47798) This is continuation of the separation of the Airflow codebase into separate distributions. This one splits airflow into two of them: * apache-airflow - becomes an empty, meta no-code distribution that only has dependencies to apache-airflow-core and task-sdk distributions and it has preinstalled provider distributions added in standard "wheel" distribution. All "extras" lead either to "apache-airflow-core" extras or to providers - the dependencies and optional dependencies are calculated differently depending on "editable" or "standard" mode - in editable mode, just provider dependencies are installed for preinstalled providers in standard mode - those preinstalled providers are dependencies. * the apache-airflow-core distribution contains all airflow core sources (previously in apache-airflow) and it has no provider extras. Thanks to that apache-airflow distribution does not have any dynamically calculated dependencies. * the apache-airflow-core distribution hs "hatch_build_airflow_core.py" build hooks that add custom build target and implement custom cleanup in order to implement compiling assets as part of the build. * During the move, the following changes were applied for consistency: * packages when used in context of distribution packages have been renamed to "distributions" - including all documentations and commands in breeze to void confusion with import packages (see https://packaging.python.org/en/latest/discussions/distribution-package-vs-import-package/) * all tests in `airflow-core` follow now the same convention where tests are in `unit`, `system` and `integration` package. no extra package has been as second level, because all the provider tests have "<PROVIDER>" there, so we just have to avoid naming airflow unit."<PROVIDER>" with the same name as provider. * all tooling in CI/DEV have been updated to follow the new structure. We should always build to packages now when we are building them using `breeze`.
2025-03-21 14:25:26 +01:00
# INSTALL_DISTRIBUTIONS_FROM_CONTEXT is not set (because then caching it from main makes no sense).
Switch from --user to venv for PROD image and enable uv (#37796) This PR introduces a joint way to treat the .local (--user) folder as both - venv and `--user` package installation. It fixes a number of problems the `--user` installation created us in the past and does it in fully backwards compatible way. This improves both "production" use for end user as well as local iteration on the PROD image during tests - but also for CI. Improvements for "end user": * user does not have to use `pip install --user` to install new packages any more and it is not enabled by default with PIP_USER flag. * users can use uv to install packages when they extend the image (but it's not obligatory - pip continues working as it did) * users can use `uv` to build custom production image, which gives 40%-50% saving for image build time compring to `pip`. * python -m venv --system-site-packages continues to use the .local packages from the .local installation (and not uses them if --system-site-packages is not used) - so we have full compatibility with previous images. Improvements for development: * when image is built from sources (no --use-docker-context-files are specified), airflow is installed in --editable mode, which means that airflow + all providers are installed locally from airflow sources, not from packages - which means that both airflow and providers have the latest version inside the prod image. * when local sources changes and you want to run k8s tests locally, it is now WAY faster (several minutes) to iterate with your changes because you do not have to rebuild the base image - the only thing needed is to copy sources to the PROD image to "/opt/airflow" which is where editable installlation is done from. You only need to rebuild the image if dependencies change. * By default `uv` is used for local source build for k8s tests so even if you have to rebuild it, it is way faster (60%-80%) during iterating with the image. CI/DEV tooling improvements: * this PR switches to use `uv` by default for most prod images we build in CI, but it adds a check if the image still builds with `pip`. * we also switch to more PEP standard way of installing packages from local filesystem (package-name @ file:///FILE) Fixes: #37785 Fixes: #37815 Update contributing-docs/testing/k8s_tests.rst Update contributing-docs/testing/k8s_tests.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update scripts/docker/install_airflow.sh Update docs/docker-stack/changelog.rst Update docs/docker-stack/build.rst Co-authored-by: Niko Oliveira <onikolas@amazon.com>
2024-03-06 01:27:15 +01:00
# By default PIP installs everything to ~/.local and it's also treated as VIRTUALENV
ENV VIRTUAL_ENV="${AIRFLOW_USER_HOME_DIR}/.local"
Migrated prod builds to use python built from source (#53770) Moved the prod build to also use the src built python as done by CI. This let's us iterate faster on our python versions without needing to wait for the community image of python. This involves a number of changes: * 3.0.5 -> 3.0.6 airflow version change * use released official packages rather than git repo to install Python * added wget (it was added in the original image to pull python packages so added for compatibility) added those flags to python build (same as in the * original build) * --with-ensurepip --build="$gnuArch" * --enable-loadable-sqlite-extension * --enable-option-checking=fatal * --enable-shared * --with-lto * added cleanup of apt after installing packages * added removal of .pyc/.test etc. files (saves 350 MB) * added relinking of symbolic links from /usr/python/bin to /usr/local/bin in the "main" image as well as in the build image. * we do not need AIRFLOW_SETUPTOOLS_VERSION any more - this was only added to upgrade setuptools, because the Python official image had a very old setuptools version. * checked all "customize" scripts and make them "work" * Updated the 'version upgrade" script to upgrade AIRFLOW_PYTHON_VERSION everywhere * Updated Changelog and documentation to update new ways of building images * removed installation with GitHub URL (it won't work easily after splitting to multiple packages - not easily at least) and it's not needed * all python, pip and similar links are created in /usr/python/bin * /usr/python/bin is always first in the PATH - before /usr/local/bin * added changelog entry explaining that Python's installation home has been moved to /usr/python/ from /usr/local * removal of installed editable distributions in breeze happens now first and THEN we install when --use-airflow-version is used. * LD_LIBRARY_PATH was not set so the shared python libraries could not be loaded when venv was created Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
2025-08-30 20:18:30 +05:30
ENV PATH="/usr/python/bin:$PATH"
Simplify caching mechanisms for CI and PROD images (#45266) For a long time we had used a sophisticated mechanism to speed up our CI jobs by building the images in "pull_request_target" workflow and pushing them to GitHub registry. That however had several drawbacks: * CI image was complex when it comes to layer setup (we had to pre- cache installed dependencies by installing them from branch tip * The pull_request_target is a very dangerous workflow, we had a number of security problems with it (and it's difficult to debug) * Caching of `pip` and `uv` was not used because it increased size of the image significantly This PR significantly improves the caching mechanisms for the images building of several advacements that were not possible before: * The upload-artifacts@v4 action and improved stash action developed by @assignUser and published in "apache/infrastructure-actions" allows us to store all images (8GB per run) in artifacts rather than in registry - so we can do the image build once and share it with all the jobs. * The uv speed is "enough" to allow occasional installation of Airlfow locally. This allows to utilize cache-mount and locally build uv cache, rather than rely on "remote" cache when we are building local images for breeze. The first time you build local breeze image it will take 2-5 more minutes (depending on your network speed, but because we can utilise cache mounts, every subsequent build should be very fast - even if all dependencies change. Using uv also allows to "always" reinstall airflow when you build the image even if single source file changed, because with cache it takes sub-seconds to reinstall airflow and all dependencies. * the cache mounts are not included in the image size, and since we can export and import images in CI in artifacts and we do not need to rebuild them, the images shared as compressed artifacts are relatively small (2GB) - cache of `uv` is around 4GB on top of that so sharing image built in the "build image" job with other jobs in the same workflow is fast. * we are still using registry cache for the "non-python" parts of the image - both CI and breeze image build speed benefit from using the image cache for system dependencies, database clients etc. this helps with faster rebuilds of the images for local development environment * documentation has been updated to reflect the new CI setup. The diagrams showing the workflows of ours are no longer needed as the workflows are quite straightforward when they are looked at. Fixes: #42999 Fixes: #43268
2024-12-29 22:58:27 +01:00
RUN bash /scripts/docker/install_packaging_tools.sh; bash /scripts/docker/create_prod_venv.sh
COPY --chown=airflow:0 ${AIRFLOW_SOURCES_FROM} ${AIRFLOW_SOURCES_TO}
# Add extra python dependencies
ARG ADDITIONAL_PYTHON_DEPS=""
ARG VERSION_SUFFIX=""
ENV ADDITIONAL_PYTHON_DEPS=${ADDITIONAL_PYTHON_DEPS} \
Move airflow sources to airflow-core package (#47798) This is continuation of the separation of the Airflow codebase into separate distributions. This one splits airflow into two of them: * apache-airflow - becomes an empty, meta no-code distribution that only has dependencies to apache-airflow-core and task-sdk distributions and it has preinstalled provider distributions added in standard "wheel" distribution. All "extras" lead either to "apache-airflow-core" extras or to providers - the dependencies and optional dependencies are calculated differently depending on "editable" or "standard" mode - in editable mode, just provider dependencies are installed for preinstalled providers in standard mode - those preinstalled providers are dependencies. * the apache-airflow-core distribution contains all airflow core sources (previously in apache-airflow) and it has no provider extras. Thanks to that apache-airflow distribution does not have any dynamically calculated dependencies. * the apache-airflow-core distribution hs "hatch_build_airflow_core.py" build hooks that add custom build target and implement custom cleanup in order to implement compiling assets as part of the build. * During the move, the following changes were applied for consistency: * packages when used in context of distribution packages have been renamed to "distributions" - including all documentations and commands in breeze to void confusion with import packages (see https://packaging.python.org/en/latest/discussions/distribution-package-vs-import-package/) * all tests in `airflow-core` follow now the same convention where tests are in `unit`, `system` and `integration` package. no extra package has been as second level, because all the provider tests have "<PROVIDER>" there, so we just have to avoid naming airflow unit."<PROVIDER>" with the same name as provider. * all tooling in CI/DEV have been updated to follow the new structure. We should always build to packages now when we are building them using `breeze`.
2025-03-21 14:25:26 +01:00
INSTALL_DISTRIBUTIONS_FROM_CONTEXT=${INSTALL_DISTRIBUTIONS_FROM_CONTEXT} \
USE_CONSTRAINTS_FOR_CONTEXT_DISTRIBUTIONS=${USE_CONSTRAINTS_FOR_CONTEXT_DISTRIBUTIONS} \
VERSION_SUFFIX=${VERSION_SUFFIX}
2022-06-12 22:59:48 +12:00
WORKDIR ${AIRFLOW_HOME}
Simplify tooling by switching completely to uv (#48223) The lazy consensus decision has been made at the devlist to switch entirely to `uv` as development tool: link: https://lists.apache.org/thread/6xxdon9lmjx3xh8zw09xc5k9jxb2n256 This PR implements that decision and removes a lot of baggage connected to using `pip` additionally to uv to install and sync the environment. It also introduces more consistency in the way how distribution packages are used in airflow sources - basicaly switching all internal distributions to use `pyproject.toml` approach and linking them all together via `uv`'s workspace feature. This enables much more streamlined development workflows, where any part of airflow development is manageable using `uv sync` in the right distribution - opening the way to moving more of the "sub-worfklows" from the CI image to local virtualenv environment. Unfortunately, such change cannot be done incrementally, really, because any change in the project layout drags with itself a lot of changes in the test/CI/management scripts, so we have to implement one big PR covering the move. This PR is "safe" in terms of the airflow and provider's code - it does not **really** (except occasional imports and type hint changes resulting from better isolation of packages) change Airflow code nor it should not affect any airflow or provider code, because it does not move any of the folder where airflow or provider's code is modified. It does move the test code - in a number of "auxiliary" distributions we have. It also moves the `docs` generation code to `devel-common` and introduces separate conf.py files for every doc package. What is still NOT done after that move and will be covered in the follow-up changes: * isolating docs-building to have separate configuraiton for docs building per distribution - allowing to run doc build locally with it's own conf.py file * moving some of the tests and checks out from breeze container image up to the local environment (for example mypy checks) and likely isolating them per-provider * Constraints are still generated using `pip freeze` and automatically managed by our custom scripts in `canary` builds - this will be replaced later by switching to `uv.lock` mechanism. * potentially, we could merge `devel-common` and `dev` - to be considered as a follow-up. * PROD image is stil build with `pip` by default when using `PyPI` or distribution packages - but we do not support building the source image with `pip` - when building from sources, uv is forced internally to install packages. Currently we have no plans to change default PROD building to use `uv`. This is the detailed list of changes implemented in this PR: * uv is now mandatory to install as pre-requisite in order to develop airflow. We do not support installing airflow for development with `pip` - there will be a lot of cases where it will not work for development - including development dependencies and installing several distributions together. * removed meta-package `hatch_build.py' and replacing it with pre-commit automatically modifying declarative pyproject.toml * stripped down `hatch_build_airflow_core.py` to only cover custom git and asset build hooks (and renaming the file to `hatch_build.py` and moving all airflow dependencies to `pyproject.toml` * converted "loose" packages in airflow repo into distributions: * docker-tests * kubernetes-tests * helm-tests * dev (here we do not have `src` subfolder - sources are directly in the distribution, which is for-now inconsistent with other distributions). The names of the `_tests` distribution folders have been renamed to the `-tests` convention to make sure the imports are always referring to base of each distribution and are not used from the content root. * Each eof the distributions (on top of already existing airflow-core, task-sdk, devel-common and 90+providers has it's own set of dependencies, and the top-level meta-package workspace root brings those distributions together allowing to install them all tegether with a simple `uv sync --all-packages` command and come up with consistent set of dependencies that are good for all those packages (yay!). This is used to build CI image with single common environment to run the tests (with some quirks due to constraints use where we have to manually list all distributions until we switch to `uv.lock` mechanism) * `doc` code is moved to `devel-common` distribution. The `doc` folder only keeps README informing where the other doc code is, the spelling_wordlist.txt and start_docs_server.sh. The documentation is generated in `generated/generated-docs/` folder which is entirely .gitignored. * the documentation is now fully moved to: * `airflow-core/docs` - documentation for Airflow Core * `providers/**/docs` - documentation for Providers * `chart/docs` - documentation for Helm Chart * `task-sdk/docs` - documentation for Task SDK (new format not yet published) * `docker-stack-docs` - documentation for Docker Stack' * `providers-summary-docs` - documentation for provider summary page * `versions` are not dynamically retrieved from `__init__.py` all of them are synchronized directly to pyproject.toml files - this way - except the custom build hook - we have no dynamic components in our `pyproject.toml` properties. * references to extras were removed from INSTALL and other places, the only references to extras remains in the user documentation - we stop using extras for local development, we switch to using dependency groups. * backtracking command was removed from breeze - we did not need it since we started using `uv` * internal commands (except constraint generation) have been moved to `uv` from `pip` * breeze requires `uv` to be installed and expects to be installed by `uv tool install -e ./dev/breeze` * pyproject.tomls are dynamically modified when we add a version suffix dynamically (`--version-suffix-for-pypi`) - only for the time of building the versions with updated suffix * `mypy` checks are now consistently used across all the different distributions and for consistency (and to fix some of the issues with namespace packages) rather than using "folder" approach when running mypy checks, even if we run mypy for whole distribution, we run check on individual files rather than on a folder. That adds consistency in execution of mypy heursistics. Rather than using in-container mypy script all the logic of selection and parameters passed to mypy are in pre-commit code. For now we are still using CI image to run mypy because mypy is very sensitive to version of dependencies installed, we should be able to switch to running mypy locally once we have the `uv.lock` mechanism incorporated in our workflows. * lower bounds for dependencies have been set consistently across all the distributions. With `uv sync` and dependabot, those should be generally kept consistently for the future * the `devel-common` dependencies have been groupped together in `devel-common` extras - including `basic`, `doc`, `doc-gen`, and `all` which will make it easier to install them for some OS-es (basic is used as default set of dependencies to cover most common set of development dependencies to be used for development) * generated/provider_dependencies.json are not committed to the repository any longer. They are .gitignored and geberated on-the-flight as needed (breeze will generate them automatically when empty and pre-commit will always regenerate them to be consistent with provider's pyproject.toml files. * `chart-utils` have been noved to `helm-tests` from `devel-common` as they were only used there. * for k8s tests we are using the `uv` main `.venv` environment rather than creating our own `.build` environment and we use `uv sync` to keep it in sync * Updated `uv` version to 0.6.10 * We are using `uv sync` to perform "upgrade to newer depencies" in `canary` builds and locally * leveldb has been turned into "dependency group" and removed from apache-airflow and apache-airflow-core extras, it is now only available by google provider's leveldb optional extra to install with `pip`
2025-04-02 13:11:13 +02:00
COPY --from=scripts install_from_docker_context_files.sh install_airflow_when_building_images.sh \
Move airflow sources to airflow-core package (#47798) This is continuation of the separation of the Airflow codebase into separate distributions. This one splits airflow into two of them: * apache-airflow - becomes an empty, meta no-code distribution that only has dependencies to apache-airflow-core and task-sdk distributions and it has preinstalled provider distributions added in standard "wheel" distribution. All "extras" lead either to "apache-airflow-core" extras or to providers - the dependencies and optional dependencies are calculated differently depending on "editable" or "standard" mode - in editable mode, just provider dependencies are installed for preinstalled providers in standard mode - those preinstalled providers are dependencies. * the apache-airflow-core distribution contains all airflow core sources (previously in apache-airflow) and it has no provider extras. Thanks to that apache-airflow distribution does not have any dynamically calculated dependencies. * the apache-airflow-core distribution hs "hatch_build_airflow_core.py" build hooks that add custom build target and implement custom cleanup in order to implement compiling assets as part of the build. * During the move, the following changes were applied for consistency: * packages when used in context of distribution packages have been renamed to "distributions" - including all documentations and commands in breeze to void confusion with import packages (see https://packaging.python.org/en/latest/discussions/distribution-package-vs-import-package/) * all tests in `airflow-core` follow now the same convention where tests are in `unit`, `system` and `integration` package. no extra package has been as second level, because all the provider tests have "<PROVIDER>" there, so we just have to avoid naming airflow unit."<PROVIDER>" with the same name as provider. * all tooling in CI/DEV have been updated to follow the new structure. We should always build to packages now when we are building them using `breeze`.
2025-03-21 14:25:26 +01:00
install_additional_dependencies.sh create_prod_venv.sh get_distribution_specs.py /scripts/docker/
# Useful for creating a cache id based on the underlying architecture, preventing the use of cached python packages from
# an incorrect architecture.
ARG TARGETARCH
# Value to be able to easily change cache id and therefore use a bare new cache
ARG DEPENDENCY_CACHE_EPOCH="11"
# hadolint ignore=SC2086, SC2010, DL3042
Simplify caching mechanisms for CI and PROD images (#45266) For a long time we had used a sophisticated mechanism to speed up our CI jobs by building the images in "pull_request_target" workflow and pushing them to GitHub registry. That however had several drawbacks: * CI image was complex when it comes to layer setup (we had to pre- cache installed dependencies by installing them from branch tip * The pull_request_target is a very dangerous workflow, we had a number of security problems with it (and it's difficult to debug) * Caching of `pip` and `uv` was not used because it increased size of the image significantly This PR significantly improves the caching mechanisms for the images building of several advacements that were not possible before: * The upload-artifacts@v4 action and improved stash action developed by @assignUser and published in "apache/infrastructure-actions" allows us to store all images (8GB per run) in artifacts rather than in registry - so we can do the image build once and share it with all the jobs. * The uv speed is "enough" to allow occasional installation of Airlfow locally. This allows to utilize cache-mount and locally build uv cache, rather than rely on "remote" cache when we are building local images for breeze. The first time you build local breeze image it will take 2-5 more minutes (depending on your network speed, but because we can utilise cache mounts, every subsequent build should be very fast - even if all dependencies change. Using uv also allows to "always" reinstall airflow when you build the image even if single source file changed, because with cache it takes sub-seconds to reinstall airflow and all dependencies. * the cache mounts are not included in the image size, and since we can export and import images in CI in artifacts and we do not need to rebuild them, the images shared as compressed artifacts are relatively small (2GB) - cache of `uv` is around 4GB on top of that so sharing image built in the "build image" job with other jobs in the same workflow is fast. * we are still using registry cache for the "non-python" parts of the image - both CI and breeze image build speed benefit from using the image cache for system dependencies, database clients etc. this helps with faster rebuilds of the images for local development environment * documentation has been updated to reflect the new CI setup. The diagrams showing the workflows of ours are no longer needed as the workflows are quite straightforward when they are looked at. Fixes: #42999 Fixes: #43268
2024-12-29 22:58:27 +01:00
RUN --mount=type=cache,id=prod-$TARGETARCH-$DEPENDENCY_CACHE_EPOCH,target=/tmp/.cache/,uid=${AIRFLOW_UID} \
Move airflow sources to airflow-core package (#47798) This is continuation of the separation of the Airflow codebase into separate distributions. This one splits airflow into two of them: * apache-airflow - becomes an empty, meta no-code distribution that only has dependencies to apache-airflow-core and task-sdk distributions and it has preinstalled provider distributions added in standard "wheel" distribution. All "extras" lead either to "apache-airflow-core" extras or to providers - the dependencies and optional dependencies are calculated differently depending on "editable" or "standard" mode - in editable mode, just provider dependencies are installed for preinstalled providers in standard mode - those preinstalled providers are dependencies. * the apache-airflow-core distribution contains all airflow core sources (previously in apache-airflow) and it has no provider extras. Thanks to that apache-airflow distribution does not have any dynamically calculated dependencies. * the apache-airflow-core distribution hs "hatch_build_airflow_core.py" build hooks that add custom build target and implement custom cleanup in order to implement compiling assets as part of the build. * During the move, the following changes were applied for consistency: * packages when used in context of distribution packages have been renamed to "distributions" - including all documentations and commands in breeze to void confusion with import packages (see https://packaging.python.org/en/latest/discussions/distribution-package-vs-import-package/) * all tests in `airflow-core` follow now the same convention where tests are in `unit`, `system` and `integration` package. no extra package has been as second level, because all the provider tests have "<PROVIDER>" there, so we just have to avoid naming airflow unit."<PROVIDER>" with the same name as provider. * all tooling in CI/DEV have been updated to follow the new structure. We should always build to packages now when we are building them using `breeze`.
2025-03-21 14:25:26 +01:00
if [[ ${INSTALL_DISTRIBUTIONS_FROM_CONTEXT} == "true" ]]; then \
bash /scripts/docker/install_from_docker_context_files.sh; \
fi; \
if ! airflow version 2>/dev/null >/dev/null; then \
Simplify tooling by switching completely to uv (#48223) The lazy consensus decision has been made at the devlist to switch entirely to `uv` as development tool: link: https://lists.apache.org/thread/6xxdon9lmjx3xh8zw09xc5k9jxb2n256 This PR implements that decision and removes a lot of baggage connected to using `pip` additionally to uv to install and sync the environment. It also introduces more consistency in the way how distribution packages are used in airflow sources - basicaly switching all internal distributions to use `pyproject.toml` approach and linking them all together via `uv`'s workspace feature. This enables much more streamlined development workflows, where any part of airflow development is manageable using `uv sync` in the right distribution - opening the way to moving more of the "sub-worfklows" from the CI image to local virtualenv environment. Unfortunately, such change cannot be done incrementally, really, because any change in the project layout drags with itself a lot of changes in the test/CI/management scripts, so we have to implement one big PR covering the move. This PR is "safe" in terms of the airflow and provider's code - it does not **really** (except occasional imports and type hint changes resulting from better isolation of packages) change Airflow code nor it should not affect any airflow or provider code, because it does not move any of the folder where airflow or provider's code is modified. It does move the test code - in a number of "auxiliary" distributions we have. It also moves the `docs` generation code to `devel-common` and introduces separate conf.py files for every doc package. What is still NOT done after that move and will be covered in the follow-up changes: * isolating docs-building to have separate configuraiton for docs building per distribution - allowing to run doc build locally with it's own conf.py file * moving some of the tests and checks out from breeze container image up to the local environment (for example mypy checks) and likely isolating them per-provider * Constraints are still generated using `pip freeze` and automatically managed by our custom scripts in `canary` builds - this will be replaced later by switching to `uv.lock` mechanism. * potentially, we could merge `devel-common` and `dev` - to be considered as a follow-up. * PROD image is stil build with `pip` by default when using `PyPI` or distribution packages - but we do not support building the source image with `pip` - when building from sources, uv is forced internally to install packages. Currently we have no plans to change default PROD building to use `uv`. This is the detailed list of changes implemented in this PR: * uv is now mandatory to install as pre-requisite in order to develop airflow. We do not support installing airflow for development with `pip` - there will be a lot of cases where it will not work for development - including development dependencies and installing several distributions together. * removed meta-package `hatch_build.py' and replacing it with pre-commit automatically modifying declarative pyproject.toml * stripped down `hatch_build_airflow_core.py` to only cover custom git and asset build hooks (and renaming the file to `hatch_build.py` and moving all airflow dependencies to `pyproject.toml` * converted "loose" packages in airflow repo into distributions: * docker-tests * kubernetes-tests * helm-tests * dev (here we do not have `src` subfolder - sources are directly in the distribution, which is for-now inconsistent with other distributions). The names of the `_tests` distribution folders have been renamed to the `-tests` convention to make sure the imports are always referring to base of each distribution and are not used from the content root. * Each eof the distributions (on top of already existing airflow-core, task-sdk, devel-common and 90+providers has it's own set of dependencies, and the top-level meta-package workspace root brings those distributions together allowing to install them all tegether with a simple `uv sync --all-packages` command and come up with consistent set of dependencies that are good for all those packages (yay!). This is used to build CI image with single common environment to run the tests (with some quirks due to constraints use where we have to manually list all distributions until we switch to `uv.lock` mechanism) * `doc` code is moved to `devel-common` distribution. The `doc` folder only keeps README informing where the other doc code is, the spelling_wordlist.txt and start_docs_server.sh. The documentation is generated in `generated/generated-docs/` folder which is entirely .gitignored. * the documentation is now fully moved to: * `airflow-core/docs` - documentation for Airflow Core * `providers/**/docs` - documentation for Providers * `chart/docs` - documentation for Helm Chart * `task-sdk/docs` - documentation for Task SDK (new format not yet published) * `docker-stack-docs` - documentation for Docker Stack' * `providers-summary-docs` - documentation for provider summary page * `versions` are not dynamically retrieved from `__init__.py` all of them are synchronized directly to pyproject.toml files - this way - except the custom build hook - we have no dynamic components in our `pyproject.toml` properties. * references to extras were removed from INSTALL and other places, the only references to extras remains in the user documentation - we stop using extras for local development, we switch to using dependency groups. * backtracking command was removed from breeze - we did not need it since we started using `uv` * internal commands (except constraint generation) have been moved to `uv` from `pip` * breeze requires `uv` to be installed and expects to be installed by `uv tool install -e ./dev/breeze` * pyproject.tomls are dynamically modified when we add a version suffix dynamically (`--version-suffix-for-pypi`) - only for the time of building the versions with updated suffix * `mypy` checks are now consistently used across all the different distributions and for consistency (and to fix some of the issues with namespace packages) rather than using "folder" approach when running mypy checks, even if we run mypy for whole distribution, we run check on individual files rather than on a folder. That adds consistency in execution of mypy heursistics. Rather than using in-container mypy script all the logic of selection and parameters passed to mypy are in pre-commit code. For now we are still using CI image to run mypy because mypy is very sensitive to version of dependencies installed, we should be able to switch to running mypy locally once we have the `uv.lock` mechanism incorporated in our workflows. * lower bounds for dependencies have been set consistently across all the distributions. With `uv sync` and dependabot, those should be generally kept consistently for the future * the `devel-common` dependencies have been groupped together in `devel-common` extras - including `basic`, `doc`, `doc-gen`, and `all` which will make it easier to install them for some OS-es (basic is used as default set of dependencies to cover most common set of development dependencies to be used for development) * generated/provider_dependencies.json are not committed to the repository any longer. They are .gitignored and geberated on-the-flight as needed (breeze will generate them automatically when empty and pre-commit will always regenerate them to be consistent with provider's pyproject.toml files. * `chart-utils` have been noved to `helm-tests` from `devel-common` as they were only used there. * for k8s tests we are using the `uv` main `.venv` environment rather than creating our own `.build` environment and we use `uv sync` to keep it in sync * Updated `uv` version to 0.6.10 * We are using `uv sync` to perform "upgrade to newer depencies" in `canary` builds and locally * leveldb has been turned into "dependency group" and removed from apache-airflow and apache-airflow-core extras, it is now only available by google provider's leveldb optional extra to install with `pip`
2025-04-02 13:11:13 +02:00
bash /scripts/docker/install_airflow_when_building_images.sh; \
fi; \
if [[ -n "${ADDITIONAL_PYTHON_DEPS}" ]]; then \
bash /scripts/docker/install_additional_dependencies.sh; \
fi; \
find "${AIRFLOW_USER_HOME_DIR}/.local/" -name '*.pyc' -print0 | xargs -0 rm -f || true ; \
find "${AIRFLOW_USER_HOME_DIR}/.local/" -type d -name '__pycache__' -print0 | xargs -0 rm -rf || true ; \
# make sure that all directories and files in .local are also group accessible
Switch from --user to venv for PROD image and enable uv (#37796) This PR introduces a joint way to treat the .local (--user) folder as both - venv and `--user` package installation. It fixes a number of problems the `--user` installation created us in the past and does it in fully backwards compatible way. This improves both "production" use for end user as well as local iteration on the PROD image during tests - but also for CI. Improvements for "end user": * user does not have to use `pip install --user` to install new packages any more and it is not enabled by default with PIP_USER flag. * users can use uv to install packages when they extend the image (but it's not obligatory - pip continues working as it did) * users can use `uv` to build custom production image, which gives 40%-50% saving for image build time compring to `pip`. * python -m venv --system-site-packages continues to use the .local packages from the .local installation (and not uses them if --system-site-packages is not used) - so we have full compatibility with previous images. Improvements for development: * when image is built from sources (no --use-docker-context-files are specified), airflow is installed in --editable mode, which means that airflow + all providers are installed locally from airflow sources, not from packages - which means that both airflow and providers have the latest version inside the prod image. * when local sources changes and you want to run k8s tests locally, it is now WAY faster (several minutes) to iterate with your changes because you do not have to rebuild the base image - the only thing needed is to copy sources to the PROD image to "/opt/airflow" which is where editable installlation is done from. You only need to rebuild the image if dependencies change. * By default `uv` is used for local source build for k8s tests so even if you have to rebuild it, it is way faster (60%-80%) during iterating with the image. CI/DEV tooling improvements: * this PR switches to use `uv` by default for most prod images we build in CI, but it adds a check if the image still builds with `pip`. * we also switch to more PEP standard way of installing packages from local filesystem (package-name @ file:///FILE) Fixes: #37785 Fixes: #37815 Update contributing-docs/testing/k8s_tests.rst Update contributing-docs/testing/k8s_tests.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update scripts/docker/install_airflow.sh Update docs/docker-stack/changelog.rst Update docs/docker-stack/build.rst Co-authored-by: Niko Oliveira <onikolas@amazon.com>
2024-03-06 01:27:15 +01:00
find "${AIRFLOW_USER_HOME_DIR}/.local" -executable ! -type l -print0 | xargs --null chmod g+x; \
find "${AIRFLOW_USER_HOME_DIR}/.local" ! -type l -print0 | xargs --null chmod g+rw
# In case there is a requirements.txt file in "docker-context-files" it will be installed
# during the build additionally to whatever has been installed so far. It is recommended that
# the requirements.txt contains only dependencies with == version specification
# hadolint ignore=DL3042
Simplify caching mechanisms for CI and PROD images (#45266) For a long time we had used a sophisticated mechanism to speed up our CI jobs by building the images in "pull_request_target" workflow and pushing them to GitHub registry. That however had several drawbacks: * CI image was complex when it comes to layer setup (we had to pre- cache installed dependencies by installing them from branch tip * The pull_request_target is a very dangerous workflow, we had a number of security problems with it (and it's difficult to debug) * Caching of `pip` and `uv` was not used because it increased size of the image significantly This PR significantly improves the caching mechanisms for the images building of several advacements that were not possible before: * The upload-artifacts@v4 action and improved stash action developed by @assignUser and published in "apache/infrastructure-actions" allows us to store all images (8GB per run) in artifacts rather than in registry - so we can do the image build once and share it with all the jobs. * The uv speed is "enough" to allow occasional installation of Airlfow locally. This allows to utilize cache-mount and locally build uv cache, rather than rely on "remote" cache when we are building local images for breeze. The first time you build local breeze image it will take 2-5 more minutes (depending on your network speed, but because we can utilise cache mounts, every subsequent build should be very fast - even if all dependencies change. Using uv also allows to "always" reinstall airflow when you build the image even if single source file changed, because with cache it takes sub-seconds to reinstall airflow and all dependencies. * the cache mounts are not included in the image size, and since we can export and import images in CI in artifacts and we do not need to rebuild them, the images shared as compressed artifacts are relatively small (2GB) - cache of `uv` is around 4GB on top of that so sharing image built in the "build image" job with other jobs in the same workflow is fast. * we are still using registry cache for the "non-python" parts of the image - both CI and breeze image build speed benefit from using the image cache for system dependencies, database clients etc. this helps with faster rebuilds of the images for local development environment * documentation has been updated to reflect the new CI setup. The diagrams showing the workflows of ours are no longer needed as the workflows are quite straightforward when they are looked at. Fixes: #42999 Fixes: #43268
2024-12-29 22:58:27 +01:00
RUN --mount=type=cache,id=prod-$TARGETARCH-$DEPENDENCY_CACHE_EPOCH,target=/tmp/.cache/,uid=${AIRFLOW_UID} \
if [[ -f /docker-context-files/requirements.txt ]]; then \
Switch from --user to venv for PROD image and enable uv (#37796) This PR introduces a joint way to treat the .local (--user) folder as both - venv and `--user` package installation. It fixes a number of problems the `--user` installation created us in the past and does it in fully backwards compatible way. This improves both "production" use for end user as well as local iteration on the PROD image during tests - but also for CI. Improvements for "end user": * user does not have to use `pip install --user` to install new packages any more and it is not enabled by default with PIP_USER flag. * users can use uv to install packages when they extend the image (but it's not obligatory - pip continues working as it did) * users can use `uv` to build custom production image, which gives 40%-50% saving for image build time compring to `pip`. * python -m venv --system-site-packages continues to use the .local packages from the .local installation (and not uses them if --system-site-packages is not used) - so we have full compatibility with previous images. Improvements for development: * when image is built from sources (no --use-docker-context-files are specified), airflow is installed in --editable mode, which means that airflow + all providers are installed locally from airflow sources, not from packages - which means that both airflow and providers have the latest version inside the prod image. * when local sources changes and you want to run k8s tests locally, it is now WAY faster (several minutes) to iterate with your changes because you do not have to rebuild the base image - the only thing needed is to copy sources to the PROD image to "/opt/airflow" which is where editable installlation is done from. You only need to rebuild the image if dependencies change. * By default `uv` is used for local source build for k8s tests so even if you have to rebuild it, it is way faster (60%-80%) during iterating with the image. CI/DEV tooling improvements: * this PR switches to use `uv` by default for most prod images we build in CI, but it adds a check if the image still builds with `pip`. * we also switch to more PEP standard way of installing packages from local filesystem (package-name @ file:///FILE) Fixes: #37785 Fixes: #37815 Update contributing-docs/testing/k8s_tests.rst Update contributing-docs/testing/k8s_tests.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update scripts/docker/install_airflow.sh Update docs/docker-stack/changelog.rst Update docs/docker-stack/build.rst Co-authored-by: Niko Oliveira <onikolas@amazon.com>
2024-03-06 01:27:15 +01:00
pip install -r /docker-context-files/requirements.txt; \
fi
##############################################################################################
# This is the actual Airflow image - much smaller than the build one. We copy
# installed Airflow and all its dependencies from the build image to make it smaller.
##############################################################################################
Migrated prod builds to use python built from source (#53770) Moved the prod build to also use the src built python as done by CI. This let's us iterate faster on our python versions without needing to wait for the community image of python. This involves a number of changes: * 3.0.5 -> 3.0.6 airflow version change * use released official packages rather than git repo to install Python * added wget (it was added in the original image to pull python packages so added for compatibility) added those flags to python build (same as in the * original build) * --with-ensurepip --build="$gnuArch" * --enable-loadable-sqlite-extension * --enable-option-checking=fatal * --enable-shared * --with-lto * added cleanup of apt after installing packages * added removal of .pyc/.test etc. files (saves 350 MB) * added relinking of symbolic links from /usr/python/bin to /usr/local/bin in the "main" image as well as in the build image. * we do not need AIRFLOW_SETUPTOOLS_VERSION any more - this was only added to upgrade setuptools, because the Python official image had a very old setuptools version. * checked all "customize" scripts and make them "work" * Updated the 'version upgrade" script to upgrade AIRFLOW_PYTHON_VERSION everywhere * Updated Changelog and documentation to update new ways of building images * removed installation with GitHub URL (it won't work easily after splitting to multiple packages - not easily at least) and it's not needed * all python, pip and similar links are created in /usr/python/bin * /usr/python/bin is always first in the PATH - before /usr/local/bin * added changelog entry explaining that Python's installation home has been moved to /usr/python/ from /usr/local * removal of installed editable distributions in breeze happens now first and THEN we install when --use-airflow-version is used. * LD_LIBRARY_PATH was not set so the shared python libraries could not be loaded when venv was created Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
2025-08-30 20:18:30 +05:30
FROM ${BASE_IMAGE} as main
# Nolog bash flag is currently ignored - but you can replace it with other flags (for example
# xtrace - to show commands executed)
SHELL ["/bin/bash", "-o", "pipefail", "-o", "errexit", "-o", "nounset", "-o", "nolog", "-c"]
ARG AIRFLOW_UID
LABEL org.apache.airflow.distro="debian" \
org.apache.airflow.module="airflow" \
org.apache.airflow.component="airflow" \
org.apache.airflow.image="airflow" \
org.apache.airflow.uid="${AIRFLOW_UID}"
Migrated prod builds to use python built from source (#53770) Moved the prod build to also use the src built python as done by CI. This let's us iterate faster on our python versions without needing to wait for the community image of python. This involves a number of changes: * 3.0.5 -> 3.0.6 airflow version change * use released official packages rather than git repo to install Python * added wget (it was added in the original image to pull python packages so added for compatibility) added those flags to python build (same as in the * original build) * --with-ensurepip --build="$gnuArch" * --enable-loadable-sqlite-extension * --enable-option-checking=fatal * --enable-shared * --with-lto * added cleanup of apt after installing packages * added removal of .pyc/.test etc. files (saves 350 MB) * added relinking of symbolic links from /usr/python/bin to /usr/local/bin in the "main" image as well as in the build image. * we do not need AIRFLOW_SETUPTOOLS_VERSION any more - this was only added to upgrade setuptools, because the Python official image had a very old setuptools version. * checked all "customize" scripts and make them "work" * Updated the 'version upgrade" script to upgrade AIRFLOW_PYTHON_VERSION everywhere * Updated Changelog and documentation to update new ways of building images * removed installation with GitHub URL (it won't work easily after splitting to multiple packages - not easily at least) and it's not needed * all python, pip and similar links are created in /usr/python/bin * /usr/python/bin is always first in the PATH - before /usr/local/bin * added changelog entry explaining that Python's installation home has been moved to /usr/python/ from /usr/local * removal of installed editable distributions in breeze happens now first and THEN we install when --use-airflow-version is used. * LD_LIBRARY_PATH was not set so the shared python libraries could not be loaded when venv was created Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
2025-08-30 20:18:30 +05:30
ARG BASE_IMAGE
Migrated prod builds to use python built from source (#53770) Moved the prod build to also use the src built python as done by CI. This let's us iterate faster on our python versions without needing to wait for the community image of python. This involves a number of changes: * 3.0.5 -> 3.0.6 airflow version change * use released official packages rather than git repo to install Python * added wget (it was added in the original image to pull python packages so added for compatibility) added those flags to python build (same as in the * original build) * --with-ensurepip --build="$gnuArch" * --enable-loadable-sqlite-extension * --enable-option-checking=fatal * --enable-shared * --with-lto * added cleanup of apt after installing packages * added removal of .pyc/.test etc. files (saves 350 MB) * added relinking of symbolic links from /usr/python/bin to /usr/local/bin in the "main" image as well as in the build image. * we do not need AIRFLOW_SETUPTOOLS_VERSION any more - this was only added to upgrade setuptools, because the Python official image had a very old setuptools version. * checked all "customize" scripts and make them "work" * Updated the 'version upgrade" script to upgrade AIRFLOW_PYTHON_VERSION everywhere * Updated Changelog and documentation to update new ways of building images * removed installation with GitHub URL (it won't work easily after splitting to multiple packages - not easily at least) and it's not needed * all python, pip and similar links are created in /usr/python/bin * /usr/python/bin is always first in the PATH - before /usr/local/bin * added changelog entry explaining that Python's installation home has been moved to /usr/python/ from /usr/local * removal of installed editable distributions in breeze happens now first and THEN we install when --use-airflow-version is used. * LD_LIBRARY_PATH was not set so the shared python libraries could not be loaded when venv was created Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
2025-08-30 20:18:30 +05:30
# Make sure noninteractive debian install is used and language variables set
ENV BASE_IMAGE=${BASE_IMAGE} \
DEBIAN_FRONTEND=noninteractive LANGUAGE=C.UTF-8 LANG=C.UTF-8 LC_ALL=C.UTF-8 \
LC_CTYPE=C.UTF-8 LC_MESSAGES=C.UTF-8 \
Simplify caching mechanisms for CI and PROD images (#45266) For a long time we had used a sophisticated mechanism to speed up our CI jobs by building the images in "pull_request_target" workflow and pushing them to GitHub registry. That however had several drawbacks: * CI image was complex when it comes to layer setup (we had to pre- cache installed dependencies by installing them from branch tip * The pull_request_target is a very dangerous workflow, we had a number of security problems with it (and it's difficult to debug) * Caching of `pip` and `uv` was not used because it increased size of the image significantly This PR significantly improves the caching mechanisms for the images building of several advacements that were not possible before: * The upload-artifacts@v4 action and improved stash action developed by @assignUser and published in "apache/infrastructure-actions" allows us to store all images (8GB per run) in artifacts rather than in registry - so we can do the image build once and share it with all the jobs. * The uv speed is "enough" to allow occasional installation of Airlfow locally. This allows to utilize cache-mount and locally build uv cache, rather than rely on "remote" cache when we are building local images for breeze. The first time you build local breeze image it will take 2-5 more minutes (depending on your network speed, but because we can utilise cache mounts, every subsequent build should be very fast - even if all dependencies change. Using uv also allows to "always" reinstall airflow when you build the image even if single source file changed, because with cache it takes sub-seconds to reinstall airflow and all dependencies. * the cache mounts are not included in the image size, and since we can export and import images in CI in artifacts and we do not need to rebuild them, the images shared as compressed artifacts are relatively small (2GB) - cache of `uv` is around 4GB on top of that so sharing image built in the "build image" job with other jobs in the same workflow is fast. * we are still using registry cache for the "non-python" parts of the image - both CI and breeze image build speed benefit from using the image cache for system dependencies, database clients etc. this helps with faster rebuilds of the images for local development environment * documentation has been updated to reflect the new CI setup. The diagrams showing the workflows of ours are no longer needed as the workflows are quite straightforward when they are looked at. Fixes: #42999 Fixes: #43268
2024-12-29 22:58:27 +01:00
PIP_CACHE_DIR=/tmp/.cache/pip \
UV_CACHE_DIR=/tmp/.cache/uv
ARG RUNTIME_APT_DEPS=""
More customizable build process for Docker images (#11176) * Allows more customizations for image building. This is the third (and not last) part of making the Production image more corporate-environment friendly. It's been prepared for the request of one of the big Airflow user (company) that has rather strict security requirements when it comes to preparing and building images. They are committed to synchronizing with the progress of Apache Airflow 2.0 development and making the image customizable so that they can build it using only sources controlled by them internally was one of the important requirements for them. This change adds the possibilty of customizing various steps in the build process: * adding custom scripts to be run before installation of both build image and runtime image. This allows for example to add installing custom GPG keys, and adding custom sources. * customizing the way NodeJS and Yarn are installed in the build image segment - as they might rely on their own way of installation. * adding extra packages to be installed during both build and dev segment build steps. This is crucial to achieve the same size optimizations as the original image. * defining additional environment variables (for example environment variables that indicate acceptance of the EULAs in case of installing proprietary packages that require EULA acceptance - both in the build image and runtime image (again the goal is to keep the image optimized for size) The image build process remains the same when no customization options are specified, but having those options increases flexibility of the image build process in corporate environments. This is part of #11171. This change also fixes some of the issues opened and raised by other users of the Dockerfile. Fixes: #10730 Fixes: #10555 Fixes: #10856 Input from those issues has been taken into account when this change was designed so that the cases described in those issues could be implemented. Example from one of the issue landed as an example way of building highly customized Airflow Image using those customization options. Depends on #11174 * Update IMAGES.rst Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
2020-09-29 15:30:00 +02:00
ARG ADDITIONAL_RUNTIME_APT_DEPS=""
ARG RUNTIME_APT_COMMAND="echo"
ARG ADDITIONAL_RUNTIME_APT_COMMAND=""
ARG ADDITIONAL_RUNTIME_APT_ENV=""
ARG INSTALL_MYSQL_CLIENT="true"
ARG INSTALL_MYSQL_CLIENT_TYPE="mariadb"
ARG INSTALL_MSSQL_CLIENT="true"
ARG INSTALL_POSTGRES_CLIENT="true"
Simplify tooling by switching completely to uv (#48223) The lazy consensus decision has been made at the devlist to switch entirely to `uv` as development tool: link: https://lists.apache.org/thread/6xxdon9lmjx3xh8zw09xc5k9jxb2n256 This PR implements that decision and removes a lot of baggage connected to using `pip` additionally to uv to install and sync the environment. It also introduces more consistency in the way how distribution packages are used in airflow sources - basicaly switching all internal distributions to use `pyproject.toml` approach and linking them all together via `uv`'s workspace feature. This enables much more streamlined development workflows, where any part of airflow development is manageable using `uv sync` in the right distribution - opening the way to moving more of the "sub-worfklows" from the CI image to local virtualenv environment. Unfortunately, such change cannot be done incrementally, really, because any change in the project layout drags with itself a lot of changes in the test/CI/management scripts, so we have to implement one big PR covering the move. This PR is "safe" in terms of the airflow and provider's code - it does not **really** (except occasional imports and type hint changes resulting from better isolation of packages) change Airflow code nor it should not affect any airflow or provider code, because it does not move any of the folder where airflow or provider's code is modified. It does move the test code - in a number of "auxiliary" distributions we have. It also moves the `docs` generation code to `devel-common` and introduces separate conf.py files for every doc package. What is still NOT done after that move and will be covered in the follow-up changes: * isolating docs-building to have separate configuraiton for docs building per distribution - allowing to run doc build locally with it's own conf.py file * moving some of the tests and checks out from breeze container image up to the local environment (for example mypy checks) and likely isolating them per-provider * Constraints are still generated using `pip freeze` and automatically managed by our custom scripts in `canary` builds - this will be replaced later by switching to `uv.lock` mechanism. * potentially, we could merge `devel-common` and `dev` - to be considered as a follow-up. * PROD image is stil build with `pip` by default when using `PyPI` or distribution packages - but we do not support building the source image with `pip` - when building from sources, uv is forced internally to install packages. Currently we have no plans to change default PROD building to use `uv`. This is the detailed list of changes implemented in this PR: * uv is now mandatory to install as pre-requisite in order to develop airflow. We do not support installing airflow for development with `pip` - there will be a lot of cases where it will not work for development - including development dependencies and installing several distributions together. * removed meta-package `hatch_build.py' and replacing it with pre-commit automatically modifying declarative pyproject.toml * stripped down `hatch_build_airflow_core.py` to only cover custom git and asset build hooks (and renaming the file to `hatch_build.py` and moving all airflow dependencies to `pyproject.toml` * converted "loose" packages in airflow repo into distributions: * docker-tests * kubernetes-tests * helm-tests * dev (here we do not have `src` subfolder - sources are directly in the distribution, which is for-now inconsistent with other distributions). The names of the `_tests` distribution folders have been renamed to the `-tests` convention to make sure the imports are always referring to base of each distribution and are not used from the content root. * Each eof the distributions (on top of already existing airflow-core, task-sdk, devel-common and 90+providers has it's own set of dependencies, and the top-level meta-package workspace root brings those distributions together allowing to install them all tegether with a simple `uv sync --all-packages` command and come up with consistent set of dependencies that are good for all those packages (yay!). This is used to build CI image with single common environment to run the tests (with some quirks due to constraints use where we have to manually list all distributions until we switch to `uv.lock` mechanism) * `doc` code is moved to `devel-common` distribution. The `doc` folder only keeps README informing where the other doc code is, the spelling_wordlist.txt and start_docs_server.sh. The documentation is generated in `generated/generated-docs/` folder which is entirely .gitignored. * the documentation is now fully moved to: * `airflow-core/docs` - documentation for Airflow Core * `providers/**/docs` - documentation for Providers * `chart/docs` - documentation for Helm Chart * `task-sdk/docs` - documentation for Task SDK (new format not yet published) * `docker-stack-docs` - documentation for Docker Stack' * `providers-summary-docs` - documentation for provider summary page * `versions` are not dynamically retrieved from `__init__.py` all of them are synchronized directly to pyproject.toml files - this way - except the custom build hook - we have no dynamic components in our `pyproject.toml` properties. * references to extras were removed from INSTALL and other places, the only references to extras remains in the user documentation - we stop using extras for local development, we switch to using dependency groups. * backtracking command was removed from breeze - we did not need it since we started using `uv` * internal commands (except constraint generation) have been moved to `uv` from `pip` * breeze requires `uv` to be installed and expects to be installed by `uv tool install -e ./dev/breeze` * pyproject.tomls are dynamically modified when we add a version suffix dynamically (`--version-suffix-for-pypi`) - only for the time of building the versions with updated suffix * `mypy` checks are now consistently used across all the different distributions and for consistency (and to fix some of the issues with namespace packages) rather than using "folder" approach when running mypy checks, even if we run mypy for whole distribution, we run check on individual files rather than on a folder. That adds consistency in execution of mypy heursistics. Rather than using in-container mypy script all the logic of selection and parameters passed to mypy are in pre-commit code. For now we are still using CI image to run mypy because mypy is very sensitive to version of dependencies installed, we should be able to switch to running mypy locally once we have the `uv.lock` mechanism incorporated in our workflows. * lower bounds for dependencies have been set consistently across all the distributions. With `uv sync` and dependabot, those should be generally kept consistently for the future * the `devel-common` dependencies have been groupped together in `devel-common` extras - including `basic`, `doc`, `doc-gen`, and `all` which will make it easier to install them for some OS-es (basic is used as default set of dependencies to cover most common set of development dependencies to be used for development) * generated/provider_dependencies.json are not committed to the repository any longer. They are .gitignored and geberated on-the-flight as needed (breeze will generate them automatically when empty and pre-commit will always regenerate them to be consistent with provider's pyproject.toml files. * `chart-utils` have been noved to `helm-tests` from `devel-common` as they were only used there. * for k8s tests we are using the `uv` main `.venv` environment rather than creating our own `.build` environment and we use `uv sync` to keep it in sync * Updated `uv` version to 0.6.10 * We are using `uv sync` to perform "upgrade to newer depencies" in `canary` builds and locally * leveldb has been turned into "dependency group" and removed from apache-airflow and apache-airflow-core extras, it is now only available by google provider's leveldb optional extra to install with `pip`
2025-04-02 13:11:13 +02:00
ARG AIRFLOW_INSTALLATION_METHOD="apache-airflow"
ENV RUNTIME_APT_DEPS=${RUNTIME_APT_DEPS} \
ADDITIONAL_RUNTIME_APT_DEPS=${ADDITIONAL_RUNTIME_APT_DEPS} \
RUNTIME_APT_COMMAND=${RUNTIME_APT_COMMAND} \
ADDITIONAL_RUNTIME_APT_COMMAND=${ADDITIONAL_RUNTIME_APT_COMMAND} \
INSTALL_MYSQL_CLIENT=${INSTALL_MYSQL_CLIENT} \
INSTALL_MYSQL_CLIENT_TYPE=${INSTALL_MYSQL_CLIENT_TYPE} \
INSTALL_MSSQL_CLIENT=${INSTALL_MSSQL_CLIENT} \
INSTALL_POSTGRES_CLIENT=${INSTALL_POSTGRES_CLIENT} \
GUNICORN_CMD_ARGS="--worker-tmp-dir /dev/shm" \
AIRFLOW_INSTALLATION_METHOD=${AIRFLOW_INSTALLATION_METHOD}
More customizable build process for Docker images (#11176) * Allows more customizations for image building. This is the third (and not last) part of making the Production image more corporate-environment friendly. It's been prepared for the request of one of the big Airflow user (company) that has rather strict security requirements when it comes to preparing and building images. They are committed to synchronizing with the progress of Apache Airflow 2.0 development and making the image customizable so that they can build it using only sources controlled by them internally was one of the important requirements for them. This change adds the possibilty of customizing various steps in the build process: * adding custom scripts to be run before installation of both build image and runtime image. This allows for example to add installing custom GPG keys, and adding custom sources. * customizing the way NodeJS and Yarn are installed in the build image segment - as they might rely on their own way of installation. * adding extra packages to be installed during both build and dev segment build steps. This is crucial to achieve the same size optimizations as the original image. * defining additional environment variables (for example environment variables that indicate acceptance of the EULAs in case of installing proprietary packages that require EULA acceptance - both in the build image and runtime image (again the goal is to keep the image optimized for size) The image build process remains the same when no customization options are specified, but having those options increases flexibility of the image build process in corporate environments. This is part of #11171. This change also fixes some of the issues opened and raised by other users of the Dockerfile. Fixes: #10730 Fixes: #10555 Fixes: #10856 Input from those issues has been taken into account when this change was designed so that the cases described in those issues could be implemented. Example from one of the issue landed as an example way of building highly customized Airflow Image using those customization options. Depends on #11174 * Update IMAGES.rst Co-authored-by: Kamil Breguła <mik-laj@users.noreply.github.com>
2020-09-29 15:30:00 +02:00
ARG PYTHON_LTO
Migrated prod builds to use python built from source (#53770) Moved the prod build to also use the src built python as done by CI. This let's us iterate faster on our python versions without needing to wait for the community image of python. This involves a number of changes: * 3.0.5 -> 3.0.6 airflow version change * use released official packages rather than git repo to install Python * added wget (it was added in the original image to pull python packages so added for compatibility) added those flags to python build (same as in the * original build) * --with-ensurepip --build="$gnuArch" * --enable-loadable-sqlite-extension * --enable-option-checking=fatal * --enable-shared * --with-lto * added cleanup of apt after installing packages * added removal of .pyc/.test etc. files (saves 350 MB) * added relinking of symbolic links from /usr/python/bin to /usr/local/bin in the "main" image as well as in the build image. * we do not need AIRFLOW_SETUPTOOLS_VERSION any more - this was only added to upgrade setuptools, because the Python official image had a very old setuptools version. * checked all "customize" scripts and make them "work" * Updated the 'version upgrade" script to upgrade AIRFLOW_PYTHON_VERSION everywhere * Updated Changelog and documentation to update new ways of building images * removed installation with GitHub URL (it won't work easily after splitting to multiple packages - not easily at least) and it's not needed * all python, pip and similar links are created in /usr/python/bin * /usr/python/bin is always first in the PATH - before /usr/local/bin * added changelog entry explaining that Python's installation home has been moved to /usr/python/ from /usr/local * removal of installed editable distributions in breeze happens now first and THEN we install when --use-airflow-version is used. * LD_LIBRARY_PATH was not set so the shared python libraries could not be loaded when venv was created Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
2025-08-30 20:18:30 +05:30
COPY --from=airflow-build-image "/usr/python/" "/usr/python/"
COPY --from=scripts install_os_dependencies.sh /scripts/docker/
RUN bash /scripts/docker/install_os_dependencies.sh runtime
# Having the variable in final image allows to disable providers manager warnings when
# production image is prepared from sources rather than from package
ARG AIRFLOW_IMAGE_REPOSITORY
ARG AIRFLOW_IMAGE_README_URL
ARG AIRFLOW_USER_HOME_DIR
ARG AIRFLOW_HOME
Simplify tooling by switching completely to uv (#48223) The lazy consensus decision has been made at the devlist to switch entirely to `uv` as development tool: link: https://lists.apache.org/thread/6xxdon9lmjx3xh8zw09xc5k9jxb2n256 This PR implements that decision and removes a lot of baggage connected to using `pip` additionally to uv to install and sync the environment. It also introduces more consistency in the way how distribution packages are used in airflow sources - basicaly switching all internal distributions to use `pyproject.toml` approach and linking them all together via `uv`'s workspace feature. This enables much more streamlined development workflows, where any part of airflow development is manageable using `uv sync` in the right distribution - opening the way to moving more of the "sub-worfklows" from the CI image to local virtualenv environment. Unfortunately, such change cannot be done incrementally, really, because any change in the project layout drags with itself a lot of changes in the test/CI/management scripts, so we have to implement one big PR covering the move. This PR is "safe" in terms of the airflow and provider's code - it does not **really** (except occasional imports and type hint changes resulting from better isolation of packages) change Airflow code nor it should not affect any airflow or provider code, because it does not move any of the folder where airflow or provider's code is modified. It does move the test code - in a number of "auxiliary" distributions we have. It also moves the `docs` generation code to `devel-common` and introduces separate conf.py files for every doc package. What is still NOT done after that move and will be covered in the follow-up changes: * isolating docs-building to have separate configuraiton for docs building per distribution - allowing to run doc build locally with it's own conf.py file * moving some of the tests and checks out from breeze container image up to the local environment (for example mypy checks) and likely isolating them per-provider * Constraints are still generated using `pip freeze` and automatically managed by our custom scripts in `canary` builds - this will be replaced later by switching to `uv.lock` mechanism. * potentially, we could merge `devel-common` and `dev` - to be considered as a follow-up. * PROD image is stil build with `pip` by default when using `PyPI` or distribution packages - but we do not support building the source image with `pip` - when building from sources, uv is forced internally to install packages. Currently we have no plans to change default PROD building to use `uv`. This is the detailed list of changes implemented in this PR: * uv is now mandatory to install as pre-requisite in order to develop airflow. We do not support installing airflow for development with `pip` - there will be a lot of cases where it will not work for development - including development dependencies and installing several distributions together. * removed meta-package `hatch_build.py' and replacing it with pre-commit automatically modifying declarative pyproject.toml * stripped down `hatch_build_airflow_core.py` to only cover custom git and asset build hooks (and renaming the file to `hatch_build.py` and moving all airflow dependencies to `pyproject.toml` * converted "loose" packages in airflow repo into distributions: * docker-tests * kubernetes-tests * helm-tests * dev (here we do not have `src` subfolder - sources are directly in the distribution, which is for-now inconsistent with other distributions). The names of the `_tests` distribution folders have been renamed to the `-tests` convention to make sure the imports are always referring to base of each distribution and are not used from the content root. * Each eof the distributions (on top of already existing airflow-core, task-sdk, devel-common and 90+providers has it's own set of dependencies, and the top-level meta-package workspace root brings those distributions together allowing to install them all tegether with a simple `uv sync --all-packages` command and come up with consistent set of dependencies that are good for all those packages (yay!). This is used to build CI image with single common environment to run the tests (with some quirks due to constraints use where we have to manually list all distributions until we switch to `uv.lock` mechanism) * `doc` code is moved to `devel-common` distribution. The `doc` folder only keeps README informing where the other doc code is, the spelling_wordlist.txt and start_docs_server.sh. The documentation is generated in `generated/generated-docs/` folder which is entirely .gitignored. * the documentation is now fully moved to: * `airflow-core/docs` - documentation for Airflow Core * `providers/**/docs` - documentation for Providers * `chart/docs` - documentation for Helm Chart * `task-sdk/docs` - documentation for Task SDK (new format not yet published) * `docker-stack-docs` - documentation for Docker Stack' * `providers-summary-docs` - documentation for provider summary page * `versions` are not dynamically retrieved from `__init__.py` all of them are synchronized directly to pyproject.toml files - this way - except the custom build hook - we have no dynamic components in our `pyproject.toml` properties. * references to extras were removed from INSTALL and other places, the only references to extras remains in the user documentation - we stop using extras for local development, we switch to using dependency groups. * backtracking command was removed from breeze - we did not need it since we started using `uv` * internal commands (except constraint generation) have been moved to `uv` from `pip` * breeze requires `uv` to be installed and expects to be installed by `uv tool install -e ./dev/breeze` * pyproject.tomls are dynamically modified when we add a version suffix dynamically (`--version-suffix-for-pypi`) - only for the time of building the versions with updated suffix * `mypy` checks are now consistently used across all the different distributions and for consistency (and to fix some of the issues with namespace packages) rather than using "folder" approach when running mypy checks, even if we run mypy for whole distribution, we run check on individual files rather than on a folder. That adds consistency in execution of mypy heursistics. Rather than using in-container mypy script all the logic of selection and parameters passed to mypy are in pre-commit code. For now we are still using CI image to run mypy because mypy is very sensitive to version of dependencies installed, we should be able to switch to running mypy locally once we have the `uv.lock` mechanism incorporated in our workflows. * lower bounds for dependencies have been set consistently across all the distributions. With `uv sync` and dependabot, those should be generally kept consistently for the future * the `devel-common` dependencies have been groupped together in `devel-common` extras - including `basic`, `doc`, `doc-gen`, and `all` which will make it easier to install them for some OS-es (basic is used as default set of dependencies to cover most common set of development dependencies to be used for development) * generated/provider_dependencies.json are not committed to the repository any longer. They are .gitignored and geberated on-the-flight as needed (breeze will generate them automatically when empty and pre-commit will always regenerate them to be consistent with provider's pyproject.toml files. * `chart-utils` have been noved to `helm-tests` from `devel-common` as they were only used there. * for k8s tests we are using the `uv` main `.venv` environment rather than creating our own `.build` environment and we use `uv sync` to keep it in sync * Updated `uv` version to 0.6.10 * We are using `uv sync` to perform "upgrade to newer depencies" in `canary` builds and locally * leveldb has been turned into "dependency group" and removed from apache-airflow and apache-airflow-core extras, it is now only available by google provider's leveldb optional extra to install with `pip`
2025-04-02 13:11:13 +02:00
ARG AIRFLOW_IMAGE_TYPE
# By default PIP installs everything to ~/.local
Migrated prod builds to use python built from source (#53770) Moved the prod build to also use the src built python as done by CI. This let's us iterate faster on our python versions without needing to wait for the community image of python. This involves a number of changes: * 3.0.5 -> 3.0.6 airflow version change * use released official packages rather than git repo to install Python * added wget (it was added in the original image to pull python packages so added for compatibility) added those flags to python build (same as in the * original build) * --with-ensurepip --build="$gnuArch" * --enable-loadable-sqlite-extension * --enable-option-checking=fatal * --enable-shared * --with-lto * added cleanup of apt after installing packages * added removal of .pyc/.test etc. files (saves 350 MB) * added relinking of symbolic links from /usr/python/bin to /usr/local/bin in the "main" image as well as in the build image. * we do not need AIRFLOW_SETUPTOOLS_VERSION any more - this was only added to upgrade setuptools, because the Python official image had a very old setuptools version. * checked all "customize" scripts and make them "work" * Updated the 'version upgrade" script to upgrade AIRFLOW_PYTHON_VERSION everywhere * Updated Changelog and documentation to update new ways of building images * removed installation with GitHub URL (it won't work easily after splitting to multiple packages - not easily at least) and it's not needed * all python, pip and similar links are created in /usr/python/bin * /usr/python/bin is always first in the PATH - before /usr/local/bin * added changelog entry explaining that Python's installation home has been moved to /usr/python/ from /usr/local * removal of installed editable distributions in breeze happens now first and THEN we install when --use-airflow-version is used. * LD_LIBRARY_PATH was not set so the shared python libraries could not be loaded when venv was created Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
2025-08-30 20:18:30 +05:30
ENV PATH="${AIRFLOW_USER_HOME_DIR}/.local/bin:/usr/python/bin:${PATH}" \
Switch from --user to venv for PROD image and enable uv (#37796) This PR introduces a joint way to treat the .local (--user) folder as both - venv and `--user` package installation. It fixes a number of problems the `--user` installation created us in the past and does it in fully backwards compatible way. This improves both "production" use for end user as well as local iteration on the PROD image during tests - but also for CI. Improvements for "end user": * user does not have to use `pip install --user` to install new packages any more and it is not enabled by default with PIP_USER flag. * users can use uv to install packages when they extend the image (but it's not obligatory - pip continues working as it did) * users can use `uv` to build custom production image, which gives 40%-50% saving for image build time compring to `pip`. * python -m venv --system-site-packages continues to use the .local packages from the .local installation (and not uses them if --system-site-packages is not used) - so we have full compatibility with previous images. Improvements for development: * when image is built from sources (no --use-docker-context-files are specified), airflow is installed in --editable mode, which means that airflow + all providers are installed locally from airflow sources, not from packages - which means that both airflow and providers have the latest version inside the prod image. * when local sources changes and you want to run k8s tests locally, it is now WAY faster (several minutes) to iterate with your changes because you do not have to rebuild the base image - the only thing needed is to copy sources to the PROD image to "/opt/airflow" which is where editable installlation is done from. You only need to rebuild the image if dependencies change. * By default `uv` is used for local source build for k8s tests so even if you have to rebuild it, it is way faster (60%-80%) during iterating with the image. CI/DEV tooling improvements: * this PR switches to use `uv` by default for most prod images we build in CI, but it adds a check if the image still builds with `pip`. * we also switch to more PEP standard way of installing packages from local filesystem (package-name @ file:///FILE) Fixes: #37785 Fixes: #37815 Update contributing-docs/testing/k8s_tests.rst Update contributing-docs/testing/k8s_tests.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update scripts/docker/install_airflow.sh Update docs/docker-stack/changelog.rst Update docs/docker-stack/build.rst Co-authored-by: Niko Oliveira <onikolas@amazon.com>
2024-03-06 01:27:15 +01:00
VIRTUAL_ENV="${AIRFLOW_USER_HOME_DIR}/.local" \
AIRFLOW_UID=${AIRFLOW_UID} \
AIRFLOW_USER_HOME_DIR=${AIRFLOW_USER_HOME_DIR} \
Simplify tooling by switching completely to uv (#48223) The lazy consensus decision has been made at the devlist to switch entirely to `uv` as development tool: link: https://lists.apache.org/thread/6xxdon9lmjx3xh8zw09xc5k9jxb2n256 This PR implements that decision and removes a lot of baggage connected to using `pip` additionally to uv to install and sync the environment. It also introduces more consistency in the way how distribution packages are used in airflow sources - basicaly switching all internal distributions to use `pyproject.toml` approach and linking them all together via `uv`'s workspace feature. This enables much more streamlined development workflows, where any part of airflow development is manageable using `uv sync` in the right distribution - opening the way to moving more of the "sub-worfklows" from the CI image to local virtualenv environment. Unfortunately, such change cannot be done incrementally, really, because any change in the project layout drags with itself a lot of changes in the test/CI/management scripts, so we have to implement one big PR covering the move. This PR is "safe" in terms of the airflow and provider's code - it does not **really** (except occasional imports and type hint changes resulting from better isolation of packages) change Airflow code nor it should not affect any airflow or provider code, because it does not move any of the folder where airflow or provider's code is modified. It does move the test code - in a number of "auxiliary" distributions we have. It also moves the `docs` generation code to `devel-common` and introduces separate conf.py files for every doc package. What is still NOT done after that move and will be covered in the follow-up changes: * isolating docs-building to have separate configuraiton for docs building per distribution - allowing to run doc build locally with it's own conf.py file * moving some of the tests and checks out from breeze container image up to the local environment (for example mypy checks) and likely isolating them per-provider * Constraints are still generated using `pip freeze` and automatically managed by our custom scripts in `canary` builds - this will be replaced later by switching to `uv.lock` mechanism. * potentially, we could merge `devel-common` and `dev` - to be considered as a follow-up. * PROD image is stil build with `pip` by default when using `PyPI` or distribution packages - but we do not support building the source image with `pip` - when building from sources, uv is forced internally to install packages. Currently we have no plans to change default PROD building to use `uv`. This is the detailed list of changes implemented in this PR: * uv is now mandatory to install as pre-requisite in order to develop airflow. We do not support installing airflow for development with `pip` - there will be a lot of cases where it will not work for development - including development dependencies and installing several distributions together. * removed meta-package `hatch_build.py' and replacing it with pre-commit automatically modifying declarative pyproject.toml * stripped down `hatch_build_airflow_core.py` to only cover custom git and asset build hooks (and renaming the file to `hatch_build.py` and moving all airflow dependencies to `pyproject.toml` * converted "loose" packages in airflow repo into distributions: * docker-tests * kubernetes-tests * helm-tests * dev (here we do not have `src` subfolder - sources are directly in the distribution, which is for-now inconsistent with other distributions). The names of the `_tests` distribution folders have been renamed to the `-tests` convention to make sure the imports are always referring to base of each distribution and are not used from the content root. * Each eof the distributions (on top of already existing airflow-core, task-sdk, devel-common and 90+providers has it's own set of dependencies, and the top-level meta-package workspace root brings those distributions together allowing to install them all tegether with a simple `uv sync --all-packages` command and come up with consistent set of dependencies that are good for all those packages (yay!). This is used to build CI image with single common environment to run the tests (with some quirks due to constraints use where we have to manually list all distributions until we switch to `uv.lock` mechanism) * `doc` code is moved to `devel-common` distribution. The `doc` folder only keeps README informing where the other doc code is, the spelling_wordlist.txt and start_docs_server.sh. The documentation is generated in `generated/generated-docs/` folder which is entirely .gitignored. * the documentation is now fully moved to: * `airflow-core/docs` - documentation for Airflow Core * `providers/**/docs` - documentation for Providers * `chart/docs` - documentation for Helm Chart * `task-sdk/docs` - documentation for Task SDK (new format not yet published) * `docker-stack-docs` - documentation for Docker Stack' * `providers-summary-docs` - documentation for provider summary page * `versions` are not dynamically retrieved from `__init__.py` all of them are synchronized directly to pyproject.toml files - this way - except the custom build hook - we have no dynamic components in our `pyproject.toml` properties. * references to extras were removed from INSTALL and other places, the only references to extras remains in the user documentation - we stop using extras for local development, we switch to using dependency groups. * backtracking command was removed from breeze - we did not need it since we started using `uv` * internal commands (except constraint generation) have been moved to `uv` from `pip` * breeze requires `uv` to be installed and expects to be installed by `uv tool install -e ./dev/breeze` * pyproject.tomls are dynamically modified when we add a version suffix dynamically (`--version-suffix-for-pypi`) - only for the time of building the versions with updated suffix * `mypy` checks are now consistently used across all the different distributions and for consistency (and to fix some of the issues with namespace packages) rather than using "folder" approach when running mypy checks, even if we run mypy for whole distribution, we run check on individual files rather than on a folder. That adds consistency in execution of mypy heursistics. Rather than using in-container mypy script all the logic of selection and parameters passed to mypy are in pre-commit code. For now we are still using CI image to run mypy because mypy is very sensitive to version of dependencies installed, we should be able to switch to running mypy locally once we have the `uv.lock` mechanism incorporated in our workflows. * lower bounds for dependencies have been set consistently across all the distributions. With `uv sync` and dependabot, those should be generally kept consistently for the future * the `devel-common` dependencies have been groupped together in `devel-common` extras - including `basic`, `doc`, `doc-gen`, and `all` which will make it easier to install them for some OS-es (basic is used as default set of dependencies to cover most common set of development dependencies to be used for development) * generated/provider_dependencies.json are not committed to the repository any longer. They are .gitignored and geberated on-the-flight as needed (breeze will generate them automatically when empty and pre-commit will always regenerate them to be consistent with provider's pyproject.toml files. * `chart-utils` have been noved to `helm-tests` from `devel-common` as they were only used there. * for k8s tests we are using the `uv` main `.venv` environment rather than creating our own `.build` environment and we use `uv sync` to keep it in sync * Updated `uv` version to 0.6.10 * We are using `uv sync` to perform "upgrade to newer depencies" in `canary` builds and locally * leveldb has been turned into "dependency group" and removed from apache-airflow and apache-airflow-core extras, it is now only available by google provider's leveldb optional extra to install with `pip`
2025-04-02 13:11:13 +02:00
AIRFLOW_HOME=${AIRFLOW_HOME} \
AIRFLOW_IMAGE_TYPE=${AIRFLOW_IMAGE_TYPE}
Fix ARM image building after Cython 3.0.0 release (#32748) Workaround an issue with installing pymssql on ARM architecture triggered by Cython 3.0.0 release as of 18 July 2023. The problem is that pip uses latest Cython to compile pymssql and since we are using setuptools, there is no easy way to fix version of Cython used to compile packages. This triggers a problem with newer `pip` versions that have build isolation enabled by default because There is no (easy) way to pin build dependencies for dependent packages. If a package does not have limit on build dependencies, it will use the latest version of them to build that particular package. The workaround to the problem suggest in the last thread by Pradyun Gedam - pip maintainer - is to use PIP_CONSTRAINT environment variable and constraint the version of Cython used while installing the package. Which is precisely what we are doing here. Note that it does not work if we pass ``--constraint`` option to pip because it will not be passed to the package being build in isolation. The fact that the PIP_CONSTRAINT env variable works in the isolation is a bit of side-effect on how env variables work and that they are passed to subprocesses as pip launches a subprocess `pip` to build the package. This is a temporary solution until the issue is resolved in pymssql or Cython. Issues/discussions that track it: * https://github.com/cython/cython/issues/5541 * https://github.com/pymssql/pymssql/pull/827 * https://discuss.python.org/t/no-way-to-pin-build-dependencies/29833 Since we have to change Dockerfile around installing `pip`, also version of `pip` has been upgraded to latest - 23.2
2023-07-21 19:27:51 +02:00
COPY --from=scripts common.sh /scripts/docker/
# Only copy mysql/mssql installation scripts for now - so that changing the other
# scripts which are needed much later will not invalidate the docker layer here.
COPY --from=scripts install_mysql.sh install_mssql.sh install_postgres.sh /scripts/docker/
# We run scripts with bash here to make sure we can execute the scripts. Changing to +x might have an
# unexpected result - the cache for Dockerfiles might get invalidated in case the host system
# had different umask set and group x bit was not set. In Azure the bit might be not set at all.
2022-06-12 22:59:48 +12:00
# That also protects against AUFS Docker backend problem where changing the executable bit required sync
RUN bash /scripts/docker/install_mysql.sh prod \
Fix ARM image building after Cython 3.0.0 release (#32748) Workaround an issue with installing pymssql on ARM architecture triggered by Cython 3.0.0 release as of 18 July 2023. The problem is that pip uses latest Cython to compile pymssql and since we are using setuptools, there is no easy way to fix version of Cython used to compile packages. This triggers a problem with newer `pip` versions that have build isolation enabled by default because There is no (easy) way to pin build dependencies for dependent packages. If a package does not have limit on build dependencies, it will use the latest version of them to build that particular package. The workaround to the problem suggest in the last thread by Pradyun Gedam - pip maintainer - is to use PIP_CONSTRAINT environment variable and constraint the version of Cython used while installing the package. Which is precisely what we are doing here. Note that it does not work if we pass ``--constraint`` option to pip because it will not be passed to the package being build in isolation. The fact that the PIP_CONSTRAINT env variable works in the isolation is a bit of side-effect on how env variables work and that they are passed to subprocesses as pip launches a subprocess `pip` to build the package. This is a temporary solution until the issue is resolved in pymssql or Cython. Issues/discussions that track it: * https://github.com/cython/cython/issues/5541 * https://github.com/pymssql/pymssql/pull/827 * https://discuss.python.org/t/no-way-to-pin-build-dependencies/29833 Since we have to change Dockerfile around installing `pip`, also version of `pip` has been upgraded to latest - 23.2
2023-07-21 19:27:51 +02:00
&& bash /scripts/docker/install_mssql.sh prod \
&& bash /scripts/docker/install_postgres.sh prod \
&& adduser --gecos "First Last,RoomNumber,WorkPhone,HomePhone" --disabled-password \
--quiet "airflow" --uid "${AIRFLOW_UID}" --gid "0" --home "${AIRFLOW_USER_HOME_DIR}" \
2020-12-17 19:53:35 +10:00
# Make Airflow files belong to the root group and are accessible. This is to accommodate the guidelines from
# OpenShift https://docs.openshift.com/enterprise/3.0/creating_images/guidelines.html
&& mkdir -pv "${AIRFLOW_HOME}" \
&& mkdir -pv "${AIRFLOW_HOME}/dags" \
&& mkdir -pv "${AIRFLOW_HOME}/logs" \
&& chown -R airflow:0 "${AIRFLOW_USER_HOME_DIR}" "${AIRFLOW_HOME}" \
&& chmod -R g+rw "${AIRFLOW_USER_HOME_DIR}" "${AIRFLOW_HOME}" \
Switch from --user to venv for PROD image and enable uv (#37796) This PR introduces a joint way to treat the .local (--user) folder as both - venv and `--user` package installation. It fixes a number of problems the `--user` installation created us in the past and does it in fully backwards compatible way. This improves both "production" use for end user as well as local iteration on the PROD image during tests - but also for CI. Improvements for "end user": * user does not have to use `pip install --user` to install new packages any more and it is not enabled by default with PIP_USER flag. * users can use uv to install packages when they extend the image (but it's not obligatory - pip continues working as it did) * users can use `uv` to build custom production image, which gives 40%-50% saving for image build time compring to `pip`. * python -m venv --system-site-packages continues to use the .local packages from the .local installation (and not uses them if --system-site-packages is not used) - so we have full compatibility with previous images. Improvements for development: * when image is built from sources (no --use-docker-context-files are specified), airflow is installed in --editable mode, which means that airflow + all providers are installed locally from airflow sources, not from packages - which means that both airflow and providers have the latest version inside the prod image. * when local sources changes and you want to run k8s tests locally, it is now WAY faster (several minutes) to iterate with your changes because you do not have to rebuild the base image - the only thing needed is to copy sources to the PROD image to "/opt/airflow" which is where editable installlation is done from. You only need to rebuild the image if dependencies change. * By default `uv` is used for local source build for k8s tests so even if you have to rebuild it, it is way faster (60%-80%) during iterating with the image. CI/DEV tooling improvements: * this PR switches to use `uv` by default for most prod images we build in CI, but it adds a check if the image still builds with `pip`. * we also switch to more PEP standard way of installing packages from local filesystem (package-name @ file:///FILE) Fixes: #37785 Fixes: #37815 Update contributing-docs/testing/k8s_tests.rst Update contributing-docs/testing/k8s_tests.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update scripts/docker/install_airflow.sh Update docs/docker-stack/changelog.rst Update docs/docker-stack/build.rst Co-authored-by: Niko Oliveira <onikolas@amazon.com>
2024-03-06 01:27:15 +01:00
&& find "${AIRFLOW_HOME}" -executable ! -type l -print0 | xargs --null chmod g+x \
&& find "${AIRFLOW_USER_HOME_DIR}" -executable ! -type l -print0 | xargs --null chmod g+x
ARG AIRFLOW_SOURCES_FROM
ARG AIRFLOW_SOURCES_TO
COPY --from=airflow-build-image --chown=airflow:0 \
"${AIRFLOW_USER_HOME_DIR}/.local" "${AIRFLOW_USER_HOME_DIR}/.local"
Switch from --user to venv for PROD image and enable uv (#37796) This PR introduces a joint way to treat the .local (--user) folder as both - venv and `--user` package installation. It fixes a number of problems the `--user` installation created us in the past and does it in fully backwards compatible way. This improves both "production" use for end user as well as local iteration on the PROD image during tests - but also for CI. Improvements for "end user": * user does not have to use `pip install --user` to install new packages any more and it is not enabled by default with PIP_USER flag. * users can use uv to install packages when they extend the image (but it's not obligatory - pip continues working as it did) * users can use `uv` to build custom production image, which gives 40%-50% saving for image build time compring to `pip`. * python -m venv --system-site-packages continues to use the .local packages from the .local installation (and not uses them if --system-site-packages is not used) - so we have full compatibility with previous images. Improvements for development: * when image is built from sources (no --use-docker-context-files are specified), airflow is installed in --editable mode, which means that airflow + all providers are installed locally from airflow sources, not from packages - which means that both airflow and providers have the latest version inside the prod image. * when local sources changes and you want to run k8s tests locally, it is now WAY faster (several minutes) to iterate with your changes because you do not have to rebuild the base image - the only thing needed is to copy sources to the PROD image to "/opt/airflow" which is where editable installlation is done from. You only need to rebuild the image if dependencies change. * By default `uv` is used for local source build for k8s tests so even if you have to rebuild it, it is way faster (60%-80%) during iterating with the image. CI/DEV tooling improvements: * this PR switches to use `uv` by default for most prod images we build in CI, but it adds a check if the image still builds with `pip`. * we also switch to more PEP standard way of installing packages from local filesystem (package-name @ file:///FILE) Fixes: #37785 Fixes: #37815 Update contributing-docs/testing/k8s_tests.rst Update contributing-docs/testing/k8s_tests.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update scripts/docker/install_airflow.sh Update docs/docker-stack/changelog.rst Update docs/docker-stack/build.rst Co-authored-by: Niko Oliveira <onikolas@amazon.com>
2024-03-06 01:27:15 +01:00
COPY --from=airflow-build-image --chown=airflow:0 \
"${AIRFLOW_USER_HOME_DIR}/constraints.txt" "${AIRFLOW_USER_HOME_DIR}/constraints.txt"
# In case of editable build also copy airflow sources so that they are available in the main image
# For regular image (non-editable) this will be just Dockerfile copied to /Dockerfile
COPY --from=airflow-build-image --chown=airflow:0 "${AIRFLOW_SOURCES_TO}" "${AIRFLOW_SOURCES_TO}"
COPY --from=scripts entrypoint_prod.sh /entrypoint
COPY --from=scripts clean-logs.sh /clean-logs
COPY --from=scripts airflow-scheduler-autorestart.sh /airflow-scheduler-autorestart
# Make /etc/passwd root-group-writeable so that user can be dynamically added by OpenShift
# See https://github.com/apache/airflow/issues/9248
# Set default groups for airflow and root user
RUN chmod a+rx /entrypoint /clean-logs \
&& chmod g=u /etc/passwd \
&& chmod g+w "${AIRFLOW_USER_HOME_DIR}/.local" \
&& usermod -g 0 airflow -G 0
# make sure that the venv is activated for all users
# including plain sudo, sudo with --interactive flag
Switch from --user to venv for PROD image and enable uv (#37796) This PR introduces a joint way to treat the .local (--user) folder as both - venv and `--user` package installation. It fixes a number of problems the `--user` installation created us in the past and does it in fully backwards compatible way. This improves both "production" use for end user as well as local iteration on the PROD image during tests - but also for CI. Improvements for "end user": * user does not have to use `pip install --user` to install new packages any more and it is not enabled by default with PIP_USER flag. * users can use uv to install packages when they extend the image (but it's not obligatory - pip continues working as it did) * users can use `uv` to build custom production image, which gives 40%-50% saving for image build time compring to `pip`. * python -m venv --system-site-packages continues to use the .local packages from the .local installation (and not uses them if --system-site-packages is not used) - so we have full compatibility with previous images. Improvements for development: * when image is built from sources (no --use-docker-context-files are specified), airflow is installed in --editable mode, which means that airflow + all providers are installed locally from airflow sources, not from packages - which means that both airflow and providers have the latest version inside the prod image. * when local sources changes and you want to run k8s tests locally, it is now WAY faster (several minutes) to iterate with your changes because you do not have to rebuild the base image - the only thing needed is to copy sources to the PROD image to "/opt/airflow" which is where editable installlation is done from. You only need to rebuild the image if dependencies change. * By default `uv` is used for local source build for k8s tests so even if you have to rebuild it, it is way faster (60%-80%) during iterating with the image. CI/DEV tooling improvements: * this PR switches to use `uv` by default for most prod images we build in CI, but it adds a check if the image still builds with `pip`. * we also switch to more PEP standard way of installing packages from local filesystem (package-name @ file:///FILE) Fixes: #37785 Fixes: #37815 Update contributing-docs/testing/k8s_tests.rst Update contributing-docs/testing/k8s_tests.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update scripts/docker/install_airflow.sh Update docs/docker-stack/changelog.rst Update docs/docker-stack/build.rst Co-authored-by: Niko Oliveira <onikolas@amazon.com>
2024-03-06 01:27:15 +01:00
RUN sed --in-place=.bak "s/secure_path=\"/secure_path=\"$(echo -n ${AIRFLOW_USER_HOME_DIR} | \
sed 's/\//\\\//g')\/.local\/bin:/" /etc/sudoers
ARG AIRFLOW_VERSION
Switch from --user to venv for PROD image and enable uv (#37796) This PR introduces a joint way to treat the .local (--user) folder as both - venv and `--user` package installation. It fixes a number of problems the `--user` installation created us in the past and does it in fully backwards compatible way. This improves both "production" use for end user as well as local iteration on the PROD image during tests - but also for CI. Improvements for "end user": * user does not have to use `pip install --user` to install new packages any more and it is not enabled by default with PIP_USER flag. * users can use uv to install packages when they extend the image (but it's not obligatory - pip continues working as it did) * users can use `uv` to build custom production image, which gives 40%-50% saving for image build time compring to `pip`. * python -m venv --system-site-packages continues to use the .local packages from the .local installation (and not uses them if --system-site-packages is not used) - so we have full compatibility with previous images. Improvements for development: * when image is built from sources (no --use-docker-context-files are specified), airflow is installed in --editable mode, which means that airflow + all providers are installed locally from airflow sources, not from packages - which means that both airflow and providers have the latest version inside the prod image. * when local sources changes and you want to run k8s tests locally, it is now WAY faster (several minutes) to iterate with your changes because you do not have to rebuild the base image - the only thing needed is to copy sources to the PROD image to "/opt/airflow" which is where editable installlation is done from. You only need to rebuild the image if dependencies change. * By default `uv` is used for local source build for k8s tests so even if you have to rebuild it, it is way faster (60%-80%) during iterating with the image. CI/DEV tooling improvements: * this PR switches to use `uv` by default for most prod images we build in CI, but it adds a check if the image still builds with `pip`. * we also switch to more PEP standard way of installing packages from local filesystem (package-name @ file:///FILE) Fixes: #37785 Fixes: #37815 Update contributing-docs/testing/k8s_tests.rst Update contributing-docs/testing/k8s_tests.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update scripts/docker/install_airflow.sh Update docs/docker-stack/changelog.rst Update docs/docker-stack/build.rst Co-authored-by: Niko Oliveira <onikolas@amazon.com>
2024-03-06 01:27:15 +01:00
ARG AIRFLOW_PIP_VERSION
ARG AIRFLOW_UV_VERSION
ARG AIRFLOW_USE_UV
Migrated prod builds to use python built from source (#53770) Moved the prod build to also use the src built python as done by CI. This let's us iterate faster on our python versions without needing to wait for the community image of python. This involves a number of changes: * 3.0.5 -> 3.0.6 airflow version change * use released official packages rather than git repo to install Python * added wget (it was added in the original image to pull python packages so added for compatibility) added those flags to python build (same as in the * original build) * --with-ensurepip --build="$gnuArch" * --enable-loadable-sqlite-extension * --enable-option-checking=fatal * --enable-shared * --with-lto * added cleanup of apt after installing packages * added removal of .pyc/.test etc. files (saves 350 MB) * added relinking of symbolic links from /usr/python/bin to /usr/local/bin in the "main" image as well as in the build image. * we do not need AIRFLOW_SETUPTOOLS_VERSION any more - this was only added to upgrade setuptools, because the Python official image had a very old setuptools version. * checked all "customize" scripts and make them "work" * Updated the 'version upgrade" script to upgrade AIRFLOW_PYTHON_VERSION everywhere * Updated Changelog and documentation to update new ways of building images * removed installation with GitHub URL (it won't work easily after splitting to multiple packages - not easily at least) and it's not needed * all python, pip and similar links are created in /usr/python/bin * /usr/python/bin is always first in the PATH - before /usr/local/bin * added changelog entry explaining that Python's installation home has been moved to /usr/python/ from /usr/local * removal of installed editable distributions in breeze happens now first and THEN we install when --use-airflow-version is used. * LD_LIBRARY_PATH was not set so the shared python libraries could not be loaded when venv was created Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
2025-08-30 20:18:30 +05:30
ARG AIRFLOW_PYTHON_VERSION
# See https://airflow.apache.org/docs/docker-stack/entrypoint.html#signal-propagation
# to learn more about the way how signals are handled by the image
# Also set airflow as nice PROMPT message.
ENV DUMB_INIT_SETSID="1" \
PS1="(airflow)" \
AIRFLOW_VERSION=${AIRFLOW_VERSION} \
Migrated prod builds to use python built from source (#53770) Moved the prod build to also use the src built python as done by CI. This let's us iterate faster on our python versions without needing to wait for the community image of python. This involves a number of changes: * 3.0.5 -> 3.0.6 airflow version change * use released official packages rather than git repo to install Python * added wget (it was added in the original image to pull python packages so added for compatibility) added those flags to python build (same as in the * original build) * --with-ensurepip --build="$gnuArch" * --enable-loadable-sqlite-extension * --enable-option-checking=fatal * --enable-shared * --with-lto * added cleanup of apt after installing packages * added removal of .pyc/.test etc. files (saves 350 MB) * added relinking of symbolic links from /usr/python/bin to /usr/local/bin in the "main" image as well as in the build image. * we do not need AIRFLOW_SETUPTOOLS_VERSION any more - this was only added to upgrade setuptools, because the Python official image had a very old setuptools version. * checked all "customize" scripts and make them "work" * Updated the 'version upgrade" script to upgrade AIRFLOW_PYTHON_VERSION everywhere * Updated Changelog and documentation to update new ways of building images * removed installation with GitHub URL (it won't work easily after splitting to multiple packages - not easily at least) and it's not needed * all python, pip and similar links are created in /usr/python/bin * /usr/python/bin is always first in the PATH - before /usr/local/bin * added changelog entry explaining that Python's installation home has been moved to /usr/python/ from /usr/local * removal of installed editable distributions in breeze happens now first and THEN we install when --use-airflow-version is used. * LD_LIBRARY_PATH was not set so the shared python libraries could not be loaded when venv was created Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
2025-08-30 20:18:30 +05:30
AIRFLOW_PYTHON_VERSION=${AIRFLOW_PYTHON_VERSION} \
AIRFLOW__CORE__LOAD_EXAMPLES="false" \
PATH="/root/bin:${PATH}" \
AIRFLOW_PIP_VERSION=${AIRFLOW_PIP_VERSION} \
AIRFLOW_UV_VERSION=${AIRFLOW_UV_VERSION} \
AIRFLOW_USE_UV=${AIRFLOW_USE_UV}
# Add protection against running pip as root user
RUN mkdir -pv /root/bin
COPY --from=scripts pip /root/bin/pip
RUN chmod u+x /root/bin/pip
WORKDIR ${AIRFLOW_HOME}
EXPOSE 8080
USER ${AIRFLOW_UID}
# Those should be set and used as late as possible as any change in commit/build otherwise invalidates the
# layers right after
ARG BUILD_ID
ARG COMMIT_SHA
ARG AIRFLOW_IMAGE_REPOSITORY
ARG AIRFLOW_IMAGE_DATE_CREATED
ENV BUILD_ID=${BUILD_ID} COMMIT_SHA=${COMMIT_SHA}
LABEL org.apache.airflow.distro="debian" \
org.apache.airflow.module="airflow" \
org.apache.airflow.component="airflow" \
org.apache.airflow.image="airflow" \
org.apache.airflow.version="${AIRFLOW_VERSION}" \
Migrated prod builds to use python built from source (#53770) Moved the prod build to also use the src built python as done by CI. This let's us iterate faster on our python versions without needing to wait for the community image of python. This involves a number of changes: * 3.0.5 -> 3.0.6 airflow version change * use released official packages rather than git repo to install Python * added wget (it was added in the original image to pull python packages so added for compatibility) added those flags to python build (same as in the * original build) * --with-ensurepip --build="$gnuArch" * --enable-loadable-sqlite-extension * --enable-option-checking=fatal * --enable-shared * --with-lto * added cleanup of apt after installing packages * added removal of .pyc/.test etc. files (saves 350 MB) * added relinking of symbolic links from /usr/python/bin to /usr/local/bin in the "main" image as well as in the build image. * we do not need AIRFLOW_SETUPTOOLS_VERSION any more - this was only added to upgrade setuptools, because the Python official image had a very old setuptools version. * checked all "customize" scripts and make them "work" * Updated the 'version upgrade" script to upgrade AIRFLOW_PYTHON_VERSION everywhere * Updated Changelog and documentation to update new ways of building images * removed installation with GitHub URL (it won't work easily after splitting to multiple packages - not easily at least) and it's not needed * all python, pip and similar links are created in /usr/python/bin * /usr/python/bin is always first in the PATH - before /usr/local/bin * added changelog entry explaining that Python's installation home has been moved to /usr/python/ from /usr/local * removal of installed editable distributions in breeze happens now first and THEN we install when --use-airflow-version is used. * LD_LIBRARY_PATH was not set so the shared python libraries could not be loaded when venv was created Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
2025-08-30 20:18:30 +05:30
org.apache.airflow.python.version="${AIRFLOW_PYTHON_VERSION}" \
org.apache.airflow.uid="${AIRFLOW_UID}" \
org.apache.airflow.main-image.build-id="${BUILD_ID}" \
org.apache.airflow.main-image.commit-sha="${COMMIT_SHA}" \
Adds capability of switching to Github Container Registry (#13726) * Adds capability of switching to Github Container Registry Currently we are using GitHub Packages to cache images for the build. GitHub Packages are "legacy" storage of binary artifacts for GitHub and as of September 2020 they introduced Github Container Registry as more stable, easier to manage replacement for container storage. It includes complete self-management of the images including permission management, public access, retention management and many more. More about it here: https://github.blog/2020-09-01-introducing-github-container-registry/ Recently we started to experience unstable behaviour of the Github Packages ('unknown blob' and manifest v1 vs. v2 when pushing images to it. So together with ASF we proposed to enable Github Container Registry and it happened as of January 2020. More about it in https://issues.apache.org/jira/browse/INFRA-20959 We are currently in the testing phase, especially when it comes to management of permissions - the model of permission mangement is not the same for Container Registry as it was for GitHub Packages (it was per-repository in GitHub Packages, but it is organization-wide in the Container Registry. This PR introduces an option to use GitHub Container Registry rather than GitHub Packages. It is implemented in both - CI level and Breeze level allowing to seamlessly switch between those two solutions: In Breeze (which we use to test pushing/pulling the images) --github-registry option was added with `ghcr.io` (Github Container Registry) or `docker.pkg.github.com` (GitHub Packages). In CI the same can be achieved by setting GITHUB_REGISTRY value (same values possible as for --github-registry Breeze parameter) * fixup! Adds capability of switching to Github Container Registry
2021-01-21 16:16:09 +01:00
org.opencontainers.image.source="${AIRFLOW_IMAGE_REPOSITORY}" \
org.opencontainers.image.created=${AIRFLOW_IMAGE_DATE_CREATED} \
org.opencontainers.image.authors="dev@airflow.apache.org" \
org.opencontainers.image.url="https://airflow.apache.org" \
org.opencontainers.image.documentation="https://airflow.apache.org/docs/docker-stack/index.html" \
Adds capability of switching to Github Container Registry (#13726) * Adds capability of switching to Github Container Registry Currently we are using GitHub Packages to cache images for the build. GitHub Packages are "legacy" storage of binary artifacts for GitHub and as of September 2020 they introduced Github Container Registry as more stable, easier to manage replacement for container storage. It includes complete self-management of the images including permission management, public access, retention management and many more. More about it here: https://github.blog/2020-09-01-introducing-github-container-registry/ Recently we started to experience unstable behaviour of the Github Packages ('unknown blob' and manifest v1 vs. v2 when pushing images to it. So together with ASF we proposed to enable Github Container Registry and it happened as of January 2020. More about it in https://issues.apache.org/jira/browse/INFRA-20959 We are currently in the testing phase, especially when it comes to management of permissions - the model of permission mangement is not the same for Container Registry as it was for GitHub Packages (it was per-repository in GitHub Packages, but it is organization-wide in the Container Registry. This PR introduces an option to use GitHub Container Registry rather than GitHub Packages. It is implemented in both - CI level and Breeze level allowing to seamlessly switch between those two solutions: In Breeze (which we use to test pushing/pulling the images) --github-registry option was added with `ghcr.io` (Github Container Registry) or `docker.pkg.github.com` (GitHub Packages). In CI the same can be achieved by setting GITHUB_REGISTRY value (same values possible as for --github-registry Breeze parameter) * fixup! Adds capability of switching to Github Container Registry
2021-01-21 16:16:09 +01:00
org.opencontainers.image.version="${AIRFLOW_VERSION}" \
org.opencontainers.image.revision="${COMMIT_SHA}" \
org.opencontainers.image.vendor="Apache Software Foundation" \
org.opencontainers.image.licenses="Apache-2.0" \
org.opencontainers.image.ref.name="airflow" \
org.opencontainers.image.title="Production Airflow Image" \
org.opencontainers.image.description="Reference, production-ready Apache Airflow image"
Switch from --user to venv for PROD image and enable uv (#37796) This PR introduces a joint way to treat the .local (--user) folder as both - venv and `--user` package installation. It fixes a number of problems the `--user` installation created us in the past and does it in fully backwards compatible way. This improves both "production" use for end user as well as local iteration on the PROD image during tests - but also for CI. Improvements for "end user": * user does not have to use `pip install --user` to install new packages any more and it is not enabled by default with PIP_USER flag. * users can use uv to install packages when they extend the image (but it's not obligatory - pip continues working as it did) * users can use `uv` to build custom production image, which gives 40%-50% saving for image build time compring to `pip`. * python -m venv --system-site-packages continues to use the .local packages from the .local installation (and not uses them if --system-site-packages is not used) - so we have full compatibility with previous images. Improvements for development: * when image is built from sources (no --use-docker-context-files are specified), airflow is installed in --editable mode, which means that airflow + all providers are installed locally from airflow sources, not from packages - which means that both airflow and providers have the latest version inside the prod image. * when local sources changes and you want to run k8s tests locally, it is now WAY faster (several minutes) to iterate with your changes because you do not have to rebuild the base image - the only thing needed is to copy sources to the PROD image to "/opt/airflow" which is where editable installlation is done from. You only need to rebuild the image if dependencies change. * By default `uv` is used for local source build for k8s tests so even if you have to rebuild it, it is way faster (60%-80%) during iterating with the image. CI/DEV tooling improvements: * this PR switches to use `uv` by default for most prod images we build in CI, but it adds a check if the image still builds with `pip`. * we also switch to more PEP standard way of installing packages from local filesystem (package-name @ file:///FILE) Fixes: #37785 Fixes: #37815 Update contributing-docs/testing/k8s_tests.rst Update contributing-docs/testing/k8s_tests.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update docs/docker-stack/build.rst Update scripts/docker/install_airflow.sh Update docs/docker-stack/changelog.rst Update docs/docker-stack/build.rst Co-authored-by: Niko Oliveira <onikolas@amazon.com>
2024-03-06 01:27:15 +01:00
ENTRYPOINT ["/usr/bin/dumb-init", "--", "/entrypoint"]
CMD []