SIGN IN SIGN UP
apache / airflow UNCLAIMED

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

44796 0 0 Python
.. Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
.. http://www.apache.org/licenses/LICENSE-2.0
.. Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
Recipes
========
Users sometimes share interesting ways of using the Docker images. We encourage users to contribute these
recipes to the documentation in case they prove useful to other members of the community by
submitting a pull request. The sections below capture this knowledge.
Google Cloud SDK installation
-----------------------------
Some operators, such as :class:`~airflow.providers.google.cloud.operators.kubernetes_engine.GKEStartPodOperator`,
require the installation of `Google Cloud SDK <https://cloud.google.com/sdk>`__ (includes ``gcloud``).
You can also run these commands with BashOperator.
Create a new Dockerfile like the one shown below.
.. exampleinclude:: /docker-images-recipes/gcloud.Dockerfile
:language: dockerfile
Then build a new image.
.. code-block:: bash
docker build . \
Switch to 'buildkit' to build Airflow images (#20664) The "buildkit" is much more modern docker build mechanism and supports multiarchitecture builds which makes it suitable for our future ARM support, it also has nicer UI and much more sophisticated caching mechanisms as well as supports better multi-segment builds. BuildKit has been promoted to official for quite a while and it is rather stable now. Also we can now install BuildKit Plugin to docker that add capabilities of building and managin cache using dedicated builders (previously BuildKit cache was managed using rather complex external tools). This gives us an opportunity to vastly simplify our build scripts, because it has now much more robust caching mechanism than the old docker build (which forced us to pull images before using them as cache). We had a lot of complexity involved in efficient caching but with BuildKit all that can be vastly simplified and we can get rid of: * keeping base python images in our registry * keeping build segments for prod image in our registry * keeping manifest images in our registry * deciding when to pull or pull&build image (not needed now, we can always build image with --cache-from and buildkit will pull cached layers as needed * building the image when performing pre-commit (rather than that we simply encourage users to rebuild the image via breeze command) * pulling the images before building * separate 'build' cache kept in our registry (not needed any more as buildkit allows to keep cache for all segments of multi-segmented build in a single cache * the nice animated tty UI of buildkit eliminates the need of manual spinner * and a number of other complexities. Depends on #20238
2022-01-18 22:59:30 +01:00
--pull \
--build-arg BASE_AIRFLOW_IMAGE="apache/airflow:2.0.2" \
--tag my-airflow-image:0.0.1
Apache Hadoop Stack installation
--------------------------------
Airflow is often used to run tasks on Hadoop cluster. It required Java Runtime Environment (JRE) to run.
Below are the steps to take tools that are frequently used in Hadoop-world:
- Java Runtime Environment (JRE)
- Apache Hadoop
- Apache Hive
- `Cloud Storage connector for Apache Hadoop <https://cloud.google.com/dataproc/docs/concepts/connectors/cloud-storage>`__
Create a new Dockerfile like the one shown below.
.. exampleinclude:: /docker-images-recipes/hadoop.Dockerfile
:language: dockerfile
Then build a new image.
.. code-block:: bash
docker build . \
Switch to 'buildkit' to build Airflow images (#20664) The "buildkit" is much more modern docker build mechanism and supports multiarchitecture builds which makes it suitable for our future ARM support, it also has nicer UI and much more sophisticated caching mechanisms as well as supports better multi-segment builds. BuildKit has been promoted to official for quite a while and it is rather stable now. Also we can now install BuildKit Plugin to docker that add capabilities of building and managin cache using dedicated builders (previously BuildKit cache was managed using rather complex external tools). This gives us an opportunity to vastly simplify our build scripts, because it has now much more robust caching mechanism than the old docker build (which forced us to pull images before using them as cache). We had a lot of complexity involved in efficient caching but with BuildKit all that can be vastly simplified and we can get rid of: * keeping base python images in our registry * keeping build segments for prod image in our registry * keeping manifest images in our registry * deciding when to pull or pull&build image (not needed now, we can always build image with --cache-from and buildkit will pull cached layers as needed * building the image when performing pre-commit (rather than that we simply encourage users to rebuild the image via breeze command) * pulling the images before building * separate 'build' cache kept in our registry (not needed any more as buildkit allows to keep cache for all segments of multi-segmented build in a single cache * the nice animated tty UI of buildkit eliminates the need of manual spinner * and a number of other complexities. Depends on #20238
2022-01-18 22:59:30 +01:00
--pull \
--build-arg BASE_AIRFLOW_IMAGE="apache/airflow:2.0.2" \
--tag my-airflow-image:0.0.1
Apache Beam Go Stack installation
---------------------------------
To be able to run Beam Go Pipeline with the :class:`~airflow.providers.apache.beam.operators.beam.BeamRunGoPipelineOperator`,
2025-04-19 16:47:32 +05:30
you will need Go in your container. Install Airflow with ``apache-airflow-providers-google>=6.5.0`` and ``apache-airflow-providers-apache-beam>=3.2.0``
Create a new Dockerfile like the one shown below.
.. exampleinclude:: /docker-images-recipes/go-beam.Dockerfile
:language: dockerfile
Then build a new image.
.. code-block:: bash
docker build . \
--pull \
--build-arg BASE_AIRFLOW_IMAGE="apache/airflow:2.2.5" \
--tag my-airflow-image:0.0.1