ARROW-7101: [CI] Refactor docker-compose setup and use it with GitHub Actions
## Projecting ideas from ursabot
### Parametric docker images
The images are better parameterized now, meaning that we can build more variant of the same service. Couple of examples:
```console
UBUNTU=16.04 docker-compose build ubuntu-cpp
ARCH=arm64v8 UBUNTU=18.04 docker-compose build ubuntu-cpp
PYTHON=3.6 docker-compose build conda-python
ARCH=arm32v7 PYTHON=3.6 PANDAS=0.25 docker-compose build conda-python-pandas
```
Each variant has it's own docker image following a string naming schema:
`{org}/{arch}-{platform}-{platform-version}[[-{variant}-{variant-version}]..]:latest`
### Use *_build.sh and *_test.sh for each job
The docker images provide the environment, and each language backend usually should implement two scripts, a `build.sh` and a `test.sh`. This way dependent build like the docker python, r or c glib are able to reuse the build script of the ancestor without running its tests.
With small enough scripts, if the environment is properly set up even the non-docker builds should be reproducible locally. GitHub Actions support bash scripts across all three platforms, so we can reuse the same `*_build.sh` and `*_test.sh` scripts to execute the builds either in docker, on the CI or locally.
## Using GitHub Actions for running the builds
Regardless of the CI we're going to choose, the isolation constraint of different platforms requires some sort of virtualisation. Currently linux (and windows, but I have not tried it yet) has lightweight containerisation, so we should keep the linux builds isolated in docker containers. The rest of the platforms (windows and macOS) should be executed on the CI system.
GitHub Actions support all three major platforms, linux, windows and macOS. I've added cross platform builds for a couple of languages, like Rust, and Go, the rest are work in progress.
### Workflow
A workflow should define all builds of a language, mostly because the path filters can be defined on workflow level. For example the python builds should be triggered if either a cpp/** or a python/** file changes which can be covered in the same workflow file.
## Feature parity with the current builds
Reaching feature parity with all of the builds below is not a goal for this PR, the difficult ones should at least have a tracking JIRA ticket.
### Travis-CI
- [x] **Lint, Release tests**:
- `Lint / C++, Python, R, Rust, Docker, RAT`
- `Dev / Source Release`
- [x] **C++ unit tests w/ conda-forge toolchain, coverage**: without coverage
- `C++ / AMD64 Conda C++`
- [x] **Python 3.6 unit tests, conda-forge toolchain, coverage**: without coverage
- `Python / AMD64 Conda Python 3.6`
- [x] **[OS X] C++ w/ Xcode 9.3**:
- `C++ / AMD64 MacOS 10.14 C++`: with Xcode 10.3
- [x] **[OS X] Python w/ Xcode 9.3**:
- `Python / AMD64 MacOS 10.14 Python 3`: with Xcode 10.3
- [x] **Java OpenJDK8 and OpenJDK11**:
- `Java / AMD64 Debian Java JDK 8 Maven 3.5.2`
- `Java / AMD64 Debian Java JDK 11 Maven 3.6.2`
- [x] **Protocol / Flight Integration Tests**:
- `Dev / Protocol Test`
- [x] **NodeJS**: without running lint and coverage
- `NodeJS / AMD64 Debian NodeJS 11`
- [x] **C++ & GLib & Ruby w/ gcc 5.4**:
- `C++ / AMD64 Debian 10 C++`: with GCC 8.3
- `C++ / AMD64 Ubuntu 16.04 C++`: with GCC 5.4
- `C++ / AMD64 Ubuntu 18.04 C++`: with GCC 7.4
- `C GLib / AMD64 Ubuntu 18.04 C GLib`
- `Ruby / AMD64 Ubuntu 18.04 Ruby`
- [x] **[OS X] C++ & GLib & Ruby w/ XCode 10.2 & Homebrew**
- `C++ / AMD64 MacOS 10.14 C++`: with Xcode 10.3
- `C GLib / AMD64 MacOS 10.14 C Glib`: with Xcode 10.3
- `Ruby / AMD64 MacOS 10.14 Ruby`: with Xcode 10.3
- [x] **Go**: without coverage
- `Go / AMD64 Debian Go 1.12`
- [x] **R (with and without libarrow)**:
- `R / AMD64 Conda R 3.6`: with libarrow
- `R / AMD64 Ubuntu 18.04 R 3.6` with libarrow
### Appveyor
- ~JOB=Build, GENERATOR=Ninja, CONFIGURATION=Release, APPVEYOR_BUILD_WORKER_IMAGE=Visual Studio 2017~
- ~JOB=Toolchain, GENERATOR=Ninja, CONFIGURATION=Release, ARROW_S3=ON, ARROW_BUILD_FLIGHT=ON, ARROW_BUILD_GANDIVA=ON~
- ~JOB=Build_Debug, GENERATOR=Ninja, CONFIGURATION=Debug~
- ~JOB=MinGW32, MINGW_ARCH=i686, MINGW_PACKAGE_PREFIX=mingw-w64-i686, MINGW_PREFIX=c:\msys64\mingw32, MSYSTEM=MINGW32, USE_CLCACHE=false~
- ~JOB=MinGW64, MINGW_ARCH=x86_64, MINGW_PACKAGE_PREFIX=mingw-w64-x86_64, MINGW_PREFIX=c:\msys64\mingw64, MSYSTEM=MINGW64, USE_CLCACHE=false~
- [x] **JOB=Rust, TARGET=x86_64-pc-windows-msvc, USE_CLCACHE=false**:
- `Rust / AMD64 Windows 2019 Rust nightly-2019-09-25`
- [x] **JOB=C#, APPVEYOR_BUILD_WORKER_IMAGE=Visual Studio 2017, USE_CLCACHE=false**
- `C# / AMD64 Windows 2019 C# 2.2.103`
- [x] **JOB=Go, MINGW_PACKAGE_PREFIX=mingw-w64-x86_64 ...**:
- `Go / AMD64 Windows 2019 Go 1.12`
- ~JOB=R with libarrow, USE_CLCACHE=false, TEST_R_WITH_ARROW=TRUE, RWINLIB_LOCAL=%APPVEYOR_BUILD_FOLDER%\libarrow.zip~
### Github Actions
- [x] **Windows MSVC C++ / Build (Visual Studio 16 2019)**:
- `C++ / AMD64 Windows 2019 C++`: without tests
- [x] **Windows MSVC C++ / Build (Visual Studio 15 2017)**:
- `C++ / AMD64 Windows 2016 C++`: without tests
- [x] **Linux docker-compose / Test (C++ w/ clang-7 & system packages)**: all have llvm for gandiva but the compiler is set to gcc
- `C++ / AMD64 Debian 10 C++`: with GCC 8.3
- `C++ / AMD64 Ubuntu 16.04 C++`: with GCC 5.4
- `C++ / AMD64 Ubuntu 18.04 C++`: with GCC 7.4
- [x] **Linux docker-compose / Test (Rust)**: without rustfmt
- `Rust / AMD64 Debian Rust nightly-2019-09-25`
- [x] **Linux docker-compose / Test (Lint, Release tests)**:
- `Lint / C++, Python, R, Rust, Docker, RAT`
- `Dev / Source Release`
### Nightly Crossbow tests
The packaging builds are out of the scope if this PR, but the nightly **dockerized test** task are in.
Nightly tests:
- [x] docker-r
- [x] docker-r-conda
- [x] docker-r-sanitizer
- [x] docker-rust
- [x] docker-cpp
- [x] docker-cpp-cmake32
- [x] docker-cpp-release
- [x] docker-cpp-static-only
- [x] docker-c_glib
- [x] docker-go
- [x] docker-python-2.7
- [x] docker-python-3.6
- [x] docker-python-3.7
- [x] docker-python-2.7-nopandas
- [x] docker-python-3.6-nopandas
- [x] docker-java
- [x] docker-js
- [x] docker-docs
- [x] docker-lint
- [x] docker-iwyu: included in the lint
- [x] docker-clang-format: included in the lint
- [x] docker-pandas-master
- [x] docker-dask-integration
- [x] docker-hdfs-integration
- [x] docker-spark-integration
- [x] docker-turbodbc-integration
# TODOs left:
- [x] Fix the Apidoc generation for c_glib
- [x] Fix the JNI test for Gandiva and ORC
- [x] Test that crossbow tests are passing
- ~Optionally restore the travis configuration to incrementally decommission old builds~
## Follow-up JIRAs:
- [Archery] Consider porting the docker tool of ursabot to archery
- [Archery] Consider to use archery with or instead of the pre-commit hooks
- [Archery] Create a wrapper script in archery for docker compose in order to run the containers with the host's user and group
- [C++] GCC 5.4.0 has a compile errors, reproduce with UBUNTU=16.04 docker-compose run ubuntu-cpp
- [C++][CI] Test the ported fuzzit integration image
- [C++][CI] Turn off unnecessary features in the integration tests (spark/turbodbc/dask/hdfs)
- [C++][CI] Revisit ASAN UBSAN settings in every C++ based image
- [CI] Consider re-adding the removed debian testing image is removed
- [Go][CI] Pre-install the go dependencies in the dockerfile using go get
- [JS][CI] Pre-install the JS dependencies in the dockerfile
- [Rust][CI] Pre-install the rust dependencies in the dockerfile
- [Java][CI] Pre-install the java dependencies in the dockerfile
- [Ruby][CI] Pre-install the ruby dependencies in the dockerfile and remove it from the test script
- [C#][CI] Pre-install the C# dependencies in the dockerfile
- [R][CI] Fix the r-sanitizer build https://issues.apache.org/jira/browse/ARROW-6957
- [GLIB][MacOS] Fail to execute lua examples (fails to load 'lgi.corelgilua51' despite that lgi is installed)
- [C++][CMake] Automatically set ARROW_GANDIVA_PC_CXX_FLAGS for conda and OSX sdk (see cpp_build.sh)
- [C++][CI] Hiveserver2 instegration test fails to connect to impala container
- [CI][Spark] Support specific Spark version in the integration tet including latest
- [JS][CI] Move nodejs linting from js_build.sh to archery
- [Python][CI] create a docker image for python ASV benchmarks and fix the script
- [CI] Find a short but related prefix for the env vars used for the docker-compose file to prevent collisions
- [C#] the docker container fails to run because of the ubuntu host versions, see https://github.com/dotnet/core/issues/3509
- [C++][Windows] Enable more features on the windows GHA build
- [Doc] document docker-compose usage in the developer sphinx guide
- [CI][C++] Add .ccache to the docker-compose mounts
- [Archery][CI] Refactor the ci/scripts to a sourceable bash functions or to archery directly
- [C++][CI] Use scripts/util_coredump.sh to show automatic backtraces
- [C++] Fix the hanging C++ tests in Windows 2019
- [CI] Ask INFRA to set up the DOCKERHUB_* secrets for GitHub actions
- [C++][CI] Running Gandiva tests fails on Fedora:
Reproduce with: `docker-compose run -e ARROW_GANDIVA=ON fedora-cpp`
```
Running gandiva-internals-test, redirecting output into /build/cpp/build/test-logs/gandiva-internals-test.txt (attempt 1/1)
1364
: CommandLine Error: Option 'x86-experimental-vector-widening-legalization' registered more than once!
1365
LLVM ERROR: inconsistency in registered CommandLine options
1366
/build/cpp/src/gandiva
```
- [JS][CI] NodeJS build fails on Github Actions Windows node
```
> NODE_NO_WARNINGS=1 gulp build
# 'NODE_NO_WARNINGS' is not recognized as an internal or external command,
# operable program or batch file.
# npm ERR! code ELIFECYCLE
# npm ERR! errno 1
# npm ERR! apache-arrow@1.0.0-SNAPSHOT build: `NODE_NO_WARNINGS=1 gulp build`
# npm ERR! Exit status 1
# npm ERR!
# npm ERR! Failed at the apache-arrow@1.0.0-SNAPSHOT build script.
# npm ERR! This is probably not a problem with npm. There is likely additional logging output above.
```
Closes #5589 from kszucs/docker-refactor and squashes the following commits:
5105d12e6 <Krisztián Szűcs> Rename pull-request folder to dev_cron
e9e9a7eec <Krisztián Szűcs> Use underscores for naming the workflow files
a92c99d03 <Krisztián Szűcs> Disable hanging C++ tests on windows
f158c89b5 <Krisztián Szűcs> Attempt to push from apache/arrow master; Don't push from crossbow tasks
0e1d470a1 <Krisztián Szűcs> Turn off ORC on macOS C++ test due to link error
258db5cff <Krisztián Szűcs> Only push docker images from apache/arrow repository
acdfcf086 <Krisztián Szűcs> Remove ORC from the brewfile
5102b85b1 <Krisztián Szűcs> Fix nodeJS workflow
032d6a388 <Krisztián Szűcs> Turn off 2 python builds
7f15b97a8 <Krisztián Szűcs> Filter branches
48b8d128a <Krisztián Szűcs> Fix workflows
36ad9d297 <Krisztián Szűcs> Disable builds
0f603af0c <Krisztián Szűcs> master only and cron workflows
28cc2d78d <Krisztián Szűcs> Rename Java JNI workflow
bcd8af7b7 <Krisztián Szűcs> Port the remaining travis utility scripts
ed5688154 <Krisztián Szűcs> Usage comments; recommend installing pandas from the docs because of its removal from conda_env_python
3c8c023ce <Krisztián Szűcs> Use Arch in volumes; some comments; remove conda version 'latest' from the images
771b023a8 <Krisztián Szűcs> Cleanup files; separate JNI builds
97ff8a122 <Krisztián Szűcs> Push docker images only from master
dc00b4297 <Krisztián Szűcs> Enable path filters
e0e2e1f46 <Krisztián Szűcs> Fix pandas master build
3814e0828 <Krisztián Szűcs> Fix manylinux volumes
c18edda70 <Krisztián Szűcs> Add CentOS version to the manylinux image names
c8b9dd6b1 <Krisztián Szűcs> Missing --pyargs argument for the python test command
33e646981 <Krisztián Szűcs> Turn off gandiva and flight for the HDFS test
b9c547889 <Krisztián Szűcs> Refactor docker-compose file and use it with github actions.
Authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
2019-11-12 11:07:48 +01:00
|
|
|
#!/usr/bin/env bash
|
|
|
|
|
#
|
|
|
|
|
# Licensed to the Apache Software Foundation (ASF) under one
|
|
|
|
|
# or more contributor license agreements. See the NOTICE file
|
|
|
|
|
# distributed with this work for additional information
|
|
|
|
|
# regarding copyright ownership. The ASF licenses this file
|
|
|
|
|
# to you under the Apache License, Version 2.0 (the
|
|
|
|
|
# "License"); you may not use this file except in compliance
|
|
|
|
|
# with the License. You may obtain a copy of the License at
|
|
|
|
|
#
|
|
|
|
|
# http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
|
#
|
|
|
|
|
# Unless required by applicable law or agreed to in writing,
|
|
|
|
|
# software distributed under the License is distributed on an
|
|
|
|
|
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
|
|
|
|
# KIND, either express or implied. See the License for the
|
|
|
|
|
# specific language governing permissions and limitations
|
|
|
|
|
# under the License.
|
|
|
|
|
|
|
|
|
|
set -ex
|
|
|
|
|
|
2021-12-02 13:34:47 +01:00
|
|
|
arrow_dir=${1}
|
2021-12-09 16:23:06 +01:00
|
|
|
build_dir=${2}
|
|
|
|
|
|
2021-12-02 13:34:47 +01:00
|
|
|
source_dir=${arrow_dir}/python
|
2021-12-09 16:23:06 +01:00
|
|
|
python_build_dir=${build_dir}/python
|
ARROW-7101: [CI] Refactor docker-compose setup and use it with GitHub Actions
## Projecting ideas from ursabot
### Parametric docker images
The images are better parameterized now, meaning that we can build more variant of the same service. Couple of examples:
```console
UBUNTU=16.04 docker-compose build ubuntu-cpp
ARCH=arm64v8 UBUNTU=18.04 docker-compose build ubuntu-cpp
PYTHON=3.6 docker-compose build conda-python
ARCH=arm32v7 PYTHON=3.6 PANDAS=0.25 docker-compose build conda-python-pandas
```
Each variant has it's own docker image following a string naming schema:
`{org}/{arch}-{platform}-{platform-version}[[-{variant}-{variant-version}]..]:latest`
### Use *_build.sh and *_test.sh for each job
The docker images provide the environment, and each language backend usually should implement two scripts, a `build.sh` and a `test.sh`. This way dependent build like the docker python, r or c glib are able to reuse the build script of the ancestor without running its tests.
With small enough scripts, if the environment is properly set up even the non-docker builds should be reproducible locally. GitHub Actions support bash scripts across all three platforms, so we can reuse the same `*_build.sh` and `*_test.sh` scripts to execute the builds either in docker, on the CI or locally.
## Using GitHub Actions for running the builds
Regardless of the CI we're going to choose, the isolation constraint of different platforms requires some sort of virtualisation. Currently linux (and windows, but I have not tried it yet) has lightweight containerisation, so we should keep the linux builds isolated in docker containers. The rest of the platforms (windows and macOS) should be executed on the CI system.
GitHub Actions support all three major platforms, linux, windows and macOS. I've added cross platform builds for a couple of languages, like Rust, and Go, the rest are work in progress.
### Workflow
A workflow should define all builds of a language, mostly because the path filters can be defined on workflow level. For example the python builds should be triggered if either a cpp/** or a python/** file changes which can be covered in the same workflow file.
## Feature parity with the current builds
Reaching feature parity with all of the builds below is not a goal for this PR, the difficult ones should at least have a tracking JIRA ticket.
### Travis-CI
- [x] **Lint, Release tests**:
- `Lint / C++, Python, R, Rust, Docker, RAT`
- `Dev / Source Release`
- [x] **C++ unit tests w/ conda-forge toolchain, coverage**: without coverage
- `C++ / AMD64 Conda C++`
- [x] **Python 3.6 unit tests, conda-forge toolchain, coverage**: without coverage
- `Python / AMD64 Conda Python 3.6`
- [x] **[OS X] C++ w/ Xcode 9.3**:
- `C++ / AMD64 MacOS 10.14 C++`: with Xcode 10.3
- [x] **[OS X] Python w/ Xcode 9.3**:
- `Python / AMD64 MacOS 10.14 Python 3`: with Xcode 10.3
- [x] **Java OpenJDK8 and OpenJDK11**:
- `Java / AMD64 Debian Java JDK 8 Maven 3.5.2`
- `Java / AMD64 Debian Java JDK 11 Maven 3.6.2`
- [x] **Protocol / Flight Integration Tests**:
- `Dev / Protocol Test`
- [x] **NodeJS**: without running lint and coverage
- `NodeJS / AMD64 Debian NodeJS 11`
- [x] **C++ & GLib & Ruby w/ gcc 5.4**:
- `C++ / AMD64 Debian 10 C++`: with GCC 8.3
- `C++ / AMD64 Ubuntu 16.04 C++`: with GCC 5.4
- `C++ / AMD64 Ubuntu 18.04 C++`: with GCC 7.4
- `C GLib / AMD64 Ubuntu 18.04 C GLib`
- `Ruby / AMD64 Ubuntu 18.04 Ruby`
- [x] **[OS X] C++ & GLib & Ruby w/ XCode 10.2 & Homebrew**
- `C++ / AMD64 MacOS 10.14 C++`: with Xcode 10.3
- `C GLib / AMD64 MacOS 10.14 C Glib`: with Xcode 10.3
- `Ruby / AMD64 MacOS 10.14 Ruby`: with Xcode 10.3
- [x] **Go**: without coverage
- `Go / AMD64 Debian Go 1.12`
- [x] **R (with and without libarrow)**:
- `R / AMD64 Conda R 3.6`: with libarrow
- `R / AMD64 Ubuntu 18.04 R 3.6` with libarrow
### Appveyor
- ~JOB=Build, GENERATOR=Ninja, CONFIGURATION=Release, APPVEYOR_BUILD_WORKER_IMAGE=Visual Studio 2017~
- ~JOB=Toolchain, GENERATOR=Ninja, CONFIGURATION=Release, ARROW_S3=ON, ARROW_BUILD_FLIGHT=ON, ARROW_BUILD_GANDIVA=ON~
- ~JOB=Build_Debug, GENERATOR=Ninja, CONFIGURATION=Debug~
- ~JOB=MinGW32, MINGW_ARCH=i686, MINGW_PACKAGE_PREFIX=mingw-w64-i686, MINGW_PREFIX=c:\msys64\mingw32, MSYSTEM=MINGW32, USE_CLCACHE=false~
- ~JOB=MinGW64, MINGW_ARCH=x86_64, MINGW_PACKAGE_PREFIX=mingw-w64-x86_64, MINGW_PREFIX=c:\msys64\mingw64, MSYSTEM=MINGW64, USE_CLCACHE=false~
- [x] **JOB=Rust, TARGET=x86_64-pc-windows-msvc, USE_CLCACHE=false**:
- `Rust / AMD64 Windows 2019 Rust nightly-2019-09-25`
- [x] **JOB=C#, APPVEYOR_BUILD_WORKER_IMAGE=Visual Studio 2017, USE_CLCACHE=false**
- `C# / AMD64 Windows 2019 C# 2.2.103`
- [x] **JOB=Go, MINGW_PACKAGE_PREFIX=mingw-w64-x86_64 ...**:
- `Go / AMD64 Windows 2019 Go 1.12`
- ~JOB=R with libarrow, USE_CLCACHE=false, TEST_R_WITH_ARROW=TRUE, RWINLIB_LOCAL=%APPVEYOR_BUILD_FOLDER%\libarrow.zip~
### Github Actions
- [x] **Windows MSVC C++ / Build (Visual Studio 16 2019)**:
- `C++ / AMD64 Windows 2019 C++`: without tests
- [x] **Windows MSVC C++ / Build (Visual Studio 15 2017)**:
- `C++ / AMD64 Windows 2016 C++`: without tests
- [x] **Linux docker-compose / Test (C++ w/ clang-7 & system packages)**: all have llvm for gandiva but the compiler is set to gcc
- `C++ / AMD64 Debian 10 C++`: with GCC 8.3
- `C++ / AMD64 Ubuntu 16.04 C++`: with GCC 5.4
- `C++ / AMD64 Ubuntu 18.04 C++`: with GCC 7.4
- [x] **Linux docker-compose / Test (Rust)**: without rustfmt
- `Rust / AMD64 Debian Rust nightly-2019-09-25`
- [x] **Linux docker-compose / Test (Lint, Release tests)**:
- `Lint / C++, Python, R, Rust, Docker, RAT`
- `Dev / Source Release`
### Nightly Crossbow tests
The packaging builds are out of the scope if this PR, but the nightly **dockerized test** task are in.
Nightly tests:
- [x] docker-r
- [x] docker-r-conda
- [x] docker-r-sanitizer
- [x] docker-rust
- [x] docker-cpp
- [x] docker-cpp-cmake32
- [x] docker-cpp-release
- [x] docker-cpp-static-only
- [x] docker-c_glib
- [x] docker-go
- [x] docker-python-2.7
- [x] docker-python-3.6
- [x] docker-python-3.7
- [x] docker-python-2.7-nopandas
- [x] docker-python-3.6-nopandas
- [x] docker-java
- [x] docker-js
- [x] docker-docs
- [x] docker-lint
- [x] docker-iwyu: included in the lint
- [x] docker-clang-format: included in the lint
- [x] docker-pandas-master
- [x] docker-dask-integration
- [x] docker-hdfs-integration
- [x] docker-spark-integration
- [x] docker-turbodbc-integration
# TODOs left:
- [x] Fix the Apidoc generation for c_glib
- [x] Fix the JNI test for Gandiva and ORC
- [x] Test that crossbow tests are passing
- ~Optionally restore the travis configuration to incrementally decommission old builds~
## Follow-up JIRAs:
- [Archery] Consider porting the docker tool of ursabot to archery
- [Archery] Consider to use archery with or instead of the pre-commit hooks
- [Archery] Create a wrapper script in archery for docker compose in order to run the containers with the host's user and group
- [C++] GCC 5.4.0 has a compile errors, reproduce with UBUNTU=16.04 docker-compose run ubuntu-cpp
- [C++][CI] Test the ported fuzzit integration image
- [C++][CI] Turn off unnecessary features in the integration tests (spark/turbodbc/dask/hdfs)
- [C++][CI] Revisit ASAN UBSAN settings in every C++ based image
- [CI] Consider re-adding the removed debian testing image is removed
- [Go][CI] Pre-install the go dependencies in the dockerfile using go get
- [JS][CI] Pre-install the JS dependencies in the dockerfile
- [Rust][CI] Pre-install the rust dependencies in the dockerfile
- [Java][CI] Pre-install the java dependencies in the dockerfile
- [Ruby][CI] Pre-install the ruby dependencies in the dockerfile and remove it from the test script
- [C#][CI] Pre-install the C# dependencies in the dockerfile
- [R][CI] Fix the r-sanitizer build https://issues.apache.org/jira/browse/ARROW-6957
- [GLIB][MacOS] Fail to execute lua examples (fails to load 'lgi.corelgilua51' despite that lgi is installed)
- [C++][CMake] Automatically set ARROW_GANDIVA_PC_CXX_FLAGS for conda and OSX sdk (see cpp_build.sh)
- [C++][CI] Hiveserver2 instegration test fails to connect to impala container
- [CI][Spark] Support specific Spark version in the integration tet including latest
- [JS][CI] Move nodejs linting from js_build.sh to archery
- [Python][CI] create a docker image for python ASV benchmarks and fix the script
- [CI] Find a short but related prefix for the env vars used for the docker-compose file to prevent collisions
- [C#] the docker container fails to run because of the ubuntu host versions, see https://github.com/dotnet/core/issues/3509
- [C++][Windows] Enable more features on the windows GHA build
- [Doc] document docker-compose usage in the developer sphinx guide
- [CI][C++] Add .ccache to the docker-compose mounts
- [Archery][CI] Refactor the ci/scripts to a sourceable bash functions or to archery directly
- [C++][CI] Use scripts/util_coredump.sh to show automatic backtraces
- [C++] Fix the hanging C++ tests in Windows 2019
- [CI] Ask INFRA to set up the DOCKERHUB_* secrets for GitHub actions
- [C++][CI] Running Gandiva tests fails on Fedora:
Reproduce with: `docker-compose run -e ARROW_GANDIVA=ON fedora-cpp`
```
Running gandiva-internals-test, redirecting output into /build/cpp/build/test-logs/gandiva-internals-test.txt (attempt 1/1)
1364
: CommandLine Error: Option 'x86-experimental-vector-widening-legalization' registered more than once!
1365
LLVM ERROR: inconsistency in registered CommandLine options
1366
/build/cpp/src/gandiva
```
- [JS][CI] NodeJS build fails on Github Actions Windows node
```
> NODE_NO_WARNINGS=1 gulp build
# 'NODE_NO_WARNINGS' is not recognized as an internal or external command,
# operable program or batch file.
# npm ERR! code ELIFECYCLE
# npm ERR! errno 1
# npm ERR! apache-arrow@1.0.0-SNAPSHOT build: `NODE_NO_WARNINGS=1 gulp build`
# npm ERR! Exit status 1
# npm ERR!
# npm ERR! Failed at the apache-arrow@1.0.0-SNAPSHOT build script.
# npm ERR! This is probably not a problem with npm. There is likely additional logging output above.
```
Closes #5589 from kszucs/docker-refactor and squashes the following commits:
5105d12e6 <Krisztián Szűcs> Rename pull-request folder to dev_cron
e9e9a7eec <Krisztián Szűcs> Use underscores for naming the workflow files
a92c99d03 <Krisztián Szűcs> Disable hanging C++ tests on windows
f158c89b5 <Krisztián Szűcs> Attempt to push from apache/arrow master; Don't push from crossbow tasks
0e1d470a1 <Krisztián Szűcs> Turn off ORC on macOS C++ test due to link error
258db5cff <Krisztián Szűcs> Only push docker images from apache/arrow repository
acdfcf086 <Krisztián Szűcs> Remove ORC from the brewfile
5102b85b1 <Krisztián Szűcs> Fix nodeJS workflow
032d6a388 <Krisztián Szűcs> Turn off 2 python builds
7f15b97a8 <Krisztián Szűcs> Filter branches
48b8d128a <Krisztián Szűcs> Fix workflows
36ad9d297 <Krisztián Szűcs> Disable builds
0f603af0c <Krisztián Szűcs> master only and cron workflows
28cc2d78d <Krisztián Szűcs> Rename Java JNI workflow
bcd8af7b7 <Krisztián Szűcs> Port the remaining travis utility scripts
ed5688154 <Krisztián Szűcs> Usage comments; recommend installing pandas from the docs because of its removal from conda_env_python
3c8c023ce <Krisztián Szűcs> Use Arch in volumes; some comments; remove conda version 'latest' from the images
771b023a8 <Krisztián Szűcs> Cleanup files; separate JNI builds
97ff8a122 <Krisztián Szűcs> Push docker images only from master
dc00b4297 <Krisztián Szűcs> Enable path filters
e0e2e1f46 <Krisztián Szűcs> Fix pandas master build
3814e0828 <Krisztián Szűcs> Fix manylinux volumes
c18edda70 <Krisztián Szűcs> Add CentOS version to the manylinux image names
c8b9dd6b1 <Krisztián Szűcs> Missing --pyargs argument for the python test command
33e646981 <Krisztián Szűcs> Turn off gandiva and flight for the HDFS test
b9c547889 <Krisztián Szűcs> Refactor docker-compose file and use it with github actions.
Authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
2019-11-12 11:07:48 +01:00
|
|
|
|
2025-08-13 06:26:26 +09:00
|
|
|
: "${BUILD_DOCS_PYTHON:=OFF}"
|
2021-12-02 13:34:47 +01:00
|
|
|
|
ARROW-16219: [CI] Fix git config to prevent SCM tools failure
This PR fixes the CI failures due to the latest git release fixing CVE-2022-24765.
I have been able to see the build passing the scm step with the change here:
https://app.travis-ci.com/github/raulcd/arrow/builds/249688925
The above build fails due to some tests failing but not related with installation anymore.
And this was the failure before the change:
https://app.travis-ci.com/github/raulcd/arrow/builds/249683847
I also have been able to reproduce the issue on the `verify-conda-rc` locally.
The failure:
```
$ docker-compose run conda-verify-rc
....
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
/arrow/python /arrow /
Traceback (most recent call last):
File "/arrow/python/setup.py", line 607, in <module>
setup(
File "/tmp/arrow-HEAD.YUVPq/mambaforge/envs/conda-source/lib/python3.10/site-packages/setuptools/__init__.py", line 87, in setup
return distutils.core.setup(**attrs)
File "/tmp/arrow-HEAD.YUVPq/mambaforge/envs/conda-source/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 109, in setup
_setup_distribution = dist = klass(attrs)
File "/tmp/arrow-HEAD.YUVPq/mambaforge/envs/conda-source/lib/python3.10/site-packages/setuptools/dist.py", line 462, in __init__
_Distribution.__init__(
File "/tmp/arrow-HEAD.YUVPq/mambaforge/envs/conda-source/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 293, in __init__
self.finalize_options()
File "/tmp/arrow-HEAD.YUVPq/mambaforge/envs/conda-source/lib/python3.10/site-packages/setuptools/dist.py", line 886, in finalize_options
ep(self)
File "/tmp/arrow-HEAD.YUVPq/mambaforge/envs/conda-source/lib/python3.10/site-packages/setuptools/dist.py", line 907, in _finalize_setup_keywords
ep.load()(self, ep.name, value)
File "/tmp/arrow-HEAD.YUVPq/mambaforge/envs/conda-source/lib/python3.10/site-packages/setuptools_scm/integration.py", line 75, in version_keyword
_assign_version(dist, config)
File "/tmp/arrow-HEAD.YUVPq/mambaforge/envs/conda-source/lib/python3.10/site-packages/setuptools_scm/integration.py", line 51, in _assign_version
_version_missing(config)
File "/tmp/arrow-HEAD.YUVPq/mambaforge/envs/conda-source/lib/python3.10/site-packages/setuptools_scm/__init__.py", line 106, in _version_missing
raise LookupError(
LookupError: setuptools-scm was unable to detect version for /arrow.
Make sure you're either building from a fully intact git repository or PyPI tarballs. Most other sources (such as GitHub's tarballs, a git checkout without the .git folder) don't contain the necessary metadata and will not work.
For example, if you're using pip, instead of https://github.com/user/proj/archive/master.zip use git+https://github.com/user/proj.git#egg=proj
Failed to verify release candidate. See /tmp/arrow-HEAD.YUVPq for details.
```
Waiting to validate the verify-rc fix at the moment, will update once the local build finishes
Closes #12945 from raulcd/ARROW-16219
Authored-by: Raúl Cumplido <raulcumplido@gmail.com>
Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
2022-04-22 18:23:40 +02:00
|
|
|
if [ -x "$(command -v git)" ]; then
|
2025-08-13 06:26:26 +09:00
|
|
|
git config --global --add safe.directory "${arrow_dir}"
|
ARROW-16219: [CI] Fix git config to prevent SCM tools failure
This PR fixes the CI failures due to the latest git release fixing CVE-2022-24765.
I have been able to see the build passing the scm step with the change here:
https://app.travis-ci.com/github/raulcd/arrow/builds/249688925
The above build fails due to some tests failing but not related with installation anymore.
And this was the failure before the change:
https://app.travis-ci.com/github/raulcd/arrow/builds/249683847
I also have been able to reproduce the issue on the `verify-conda-rc` locally.
The failure:
```
$ docker-compose run conda-verify-rc
....
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
/arrow/python /arrow /
Traceback (most recent call last):
File "/arrow/python/setup.py", line 607, in <module>
setup(
File "/tmp/arrow-HEAD.YUVPq/mambaforge/envs/conda-source/lib/python3.10/site-packages/setuptools/__init__.py", line 87, in setup
return distutils.core.setup(**attrs)
File "/tmp/arrow-HEAD.YUVPq/mambaforge/envs/conda-source/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 109, in setup
_setup_distribution = dist = klass(attrs)
File "/tmp/arrow-HEAD.YUVPq/mambaforge/envs/conda-source/lib/python3.10/site-packages/setuptools/dist.py", line 462, in __init__
_Distribution.__init__(
File "/tmp/arrow-HEAD.YUVPq/mambaforge/envs/conda-source/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 293, in __init__
self.finalize_options()
File "/tmp/arrow-HEAD.YUVPq/mambaforge/envs/conda-source/lib/python3.10/site-packages/setuptools/dist.py", line 886, in finalize_options
ep(self)
File "/tmp/arrow-HEAD.YUVPq/mambaforge/envs/conda-source/lib/python3.10/site-packages/setuptools/dist.py", line 907, in _finalize_setup_keywords
ep.load()(self, ep.name, value)
File "/tmp/arrow-HEAD.YUVPq/mambaforge/envs/conda-source/lib/python3.10/site-packages/setuptools_scm/integration.py", line 75, in version_keyword
_assign_version(dist, config)
File "/tmp/arrow-HEAD.YUVPq/mambaforge/envs/conda-source/lib/python3.10/site-packages/setuptools_scm/integration.py", line 51, in _assign_version
_version_missing(config)
File "/tmp/arrow-HEAD.YUVPq/mambaforge/envs/conda-source/lib/python3.10/site-packages/setuptools_scm/__init__.py", line 106, in _version_missing
raise LookupError(
LookupError: setuptools-scm was unable to detect version for /arrow.
Make sure you're either building from a fully intact git repository or PyPI tarballs. Most other sources (such as GitHub's tarballs, a git checkout without the .git folder) don't contain the necessary metadata and will not work.
For example, if you're using pip, instead of https://github.com/user/proj/archive/master.zip use git+https://github.com/user/proj.git#egg=proj
Failed to verify release candidate. See /tmp/arrow-HEAD.YUVPq for details.
```
Waiting to validate the verify-rc fix at the moment, will update once the local build finishes
Closes #12945 from raulcd/ARROW-16219
Authored-by: Raúl Cumplido <raulcumplido@gmail.com>
Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
2022-04-22 18:23:40 +02:00
|
|
|
fi
|
|
|
|
|
|
2024-03-13 19:19:38 +09:00
|
|
|
if [ -n "${ARROW_PYTHON_VENV:-}" ]; then
|
2025-08-13 06:26:26 +09:00
|
|
|
# We don't need to follow this external file.
|
|
|
|
|
# See also: https://www.shellcheck.net/wiki/SC1091
|
|
|
|
|
#
|
|
|
|
|
# shellcheck source=/dev/null
|
2024-03-13 19:19:38 +09:00
|
|
|
. "${ARROW_PYTHON_VENV}/bin/activate"
|
|
|
|
|
fi
|
|
|
|
|
|
2022-01-04 16:50:25 +01:00
|
|
|
case "$(uname)" in
|
|
|
|
|
Linux)
|
|
|
|
|
n_jobs=$(nproc)
|
|
|
|
|
;;
|
|
|
|
|
Darwin)
|
|
|
|
|
n_jobs=$(sysctl -n hw.ncpu)
|
|
|
|
|
;;
|
|
|
|
|
MINGW*)
|
|
|
|
|
n_jobs=${NUMBER_OF_PROCESSORS:-1}
|
|
|
|
|
;;
|
|
|
|
|
*)
|
|
|
|
|
n_jobs=${NPROC:-1}
|
|
|
|
|
;;
|
|
|
|
|
esac
|
|
|
|
|
|
2025-08-13 06:26:26 +09:00
|
|
|
if [ -n "${CONDA_PREFIX}" ]; then
|
2019-12-18 17:57:32 +01:00
|
|
|
echo -e "===\n=== Conda environment for build\n==="
|
|
|
|
|
conda list
|
|
|
|
|
fi
|
|
|
|
|
|
2026-03-09 09:47:21 +01:00
|
|
|
export CMAKE_BUILD_PARALLEL_LEVEL=${n_jobs}
|
|
|
|
|
export CMAKE_GENERATOR=${CMAKE_GENERATOR:-Ninja}
|
2023-09-01 09:58:27 +02:00
|
|
|
export PYARROW_WITH_ACERO=${ARROW_ACERO:-OFF}
|
2024-02-09 01:41:36 +00:00
|
|
|
export PYARROW_WITH_AZURE=${ARROW_AZURE:-OFF}
|
ARROW-7101: [CI] Refactor docker-compose setup and use it with GitHub Actions
## Projecting ideas from ursabot
### Parametric docker images
The images are better parameterized now, meaning that we can build more variant of the same service. Couple of examples:
```console
UBUNTU=16.04 docker-compose build ubuntu-cpp
ARCH=arm64v8 UBUNTU=18.04 docker-compose build ubuntu-cpp
PYTHON=3.6 docker-compose build conda-python
ARCH=arm32v7 PYTHON=3.6 PANDAS=0.25 docker-compose build conda-python-pandas
```
Each variant has it's own docker image following a string naming schema:
`{org}/{arch}-{platform}-{platform-version}[[-{variant}-{variant-version}]..]:latest`
### Use *_build.sh and *_test.sh for each job
The docker images provide the environment, and each language backend usually should implement two scripts, a `build.sh` and a `test.sh`. This way dependent build like the docker python, r or c glib are able to reuse the build script of the ancestor without running its tests.
With small enough scripts, if the environment is properly set up even the non-docker builds should be reproducible locally. GitHub Actions support bash scripts across all three platforms, so we can reuse the same `*_build.sh` and `*_test.sh` scripts to execute the builds either in docker, on the CI or locally.
## Using GitHub Actions for running the builds
Regardless of the CI we're going to choose, the isolation constraint of different platforms requires some sort of virtualisation. Currently linux (and windows, but I have not tried it yet) has lightweight containerisation, so we should keep the linux builds isolated in docker containers. The rest of the platforms (windows and macOS) should be executed on the CI system.
GitHub Actions support all three major platforms, linux, windows and macOS. I've added cross platform builds for a couple of languages, like Rust, and Go, the rest are work in progress.
### Workflow
A workflow should define all builds of a language, mostly because the path filters can be defined on workflow level. For example the python builds should be triggered if either a cpp/** or a python/** file changes which can be covered in the same workflow file.
## Feature parity with the current builds
Reaching feature parity with all of the builds below is not a goal for this PR, the difficult ones should at least have a tracking JIRA ticket.
### Travis-CI
- [x] **Lint, Release tests**:
- `Lint / C++, Python, R, Rust, Docker, RAT`
- `Dev / Source Release`
- [x] **C++ unit tests w/ conda-forge toolchain, coverage**: without coverage
- `C++ / AMD64 Conda C++`
- [x] **Python 3.6 unit tests, conda-forge toolchain, coverage**: without coverage
- `Python / AMD64 Conda Python 3.6`
- [x] **[OS X] C++ w/ Xcode 9.3**:
- `C++ / AMD64 MacOS 10.14 C++`: with Xcode 10.3
- [x] **[OS X] Python w/ Xcode 9.3**:
- `Python / AMD64 MacOS 10.14 Python 3`: with Xcode 10.3
- [x] **Java OpenJDK8 and OpenJDK11**:
- `Java / AMD64 Debian Java JDK 8 Maven 3.5.2`
- `Java / AMD64 Debian Java JDK 11 Maven 3.6.2`
- [x] **Protocol / Flight Integration Tests**:
- `Dev / Protocol Test`
- [x] **NodeJS**: without running lint and coverage
- `NodeJS / AMD64 Debian NodeJS 11`
- [x] **C++ & GLib & Ruby w/ gcc 5.4**:
- `C++ / AMD64 Debian 10 C++`: with GCC 8.3
- `C++ / AMD64 Ubuntu 16.04 C++`: with GCC 5.4
- `C++ / AMD64 Ubuntu 18.04 C++`: with GCC 7.4
- `C GLib / AMD64 Ubuntu 18.04 C GLib`
- `Ruby / AMD64 Ubuntu 18.04 Ruby`
- [x] **[OS X] C++ & GLib & Ruby w/ XCode 10.2 & Homebrew**
- `C++ / AMD64 MacOS 10.14 C++`: with Xcode 10.3
- `C GLib / AMD64 MacOS 10.14 C Glib`: with Xcode 10.3
- `Ruby / AMD64 MacOS 10.14 Ruby`: with Xcode 10.3
- [x] **Go**: without coverage
- `Go / AMD64 Debian Go 1.12`
- [x] **R (with and without libarrow)**:
- `R / AMD64 Conda R 3.6`: with libarrow
- `R / AMD64 Ubuntu 18.04 R 3.6` with libarrow
### Appveyor
- ~JOB=Build, GENERATOR=Ninja, CONFIGURATION=Release, APPVEYOR_BUILD_WORKER_IMAGE=Visual Studio 2017~
- ~JOB=Toolchain, GENERATOR=Ninja, CONFIGURATION=Release, ARROW_S3=ON, ARROW_BUILD_FLIGHT=ON, ARROW_BUILD_GANDIVA=ON~
- ~JOB=Build_Debug, GENERATOR=Ninja, CONFIGURATION=Debug~
- ~JOB=MinGW32, MINGW_ARCH=i686, MINGW_PACKAGE_PREFIX=mingw-w64-i686, MINGW_PREFIX=c:\msys64\mingw32, MSYSTEM=MINGW32, USE_CLCACHE=false~
- ~JOB=MinGW64, MINGW_ARCH=x86_64, MINGW_PACKAGE_PREFIX=mingw-w64-x86_64, MINGW_PREFIX=c:\msys64\mingw64, MSYSTEM=MINGW64, USE_CLCACHE=false~
- [x] **JOB=Rust, TARGET=x86_64-pc-windows-msvc, USE_CLCACHE=false**:
- `Rust / AMD64 Windows 2019 Rust nightly-2019-09-25`
- [x] **JOB=C#, APPVEYOR_BUILD_WORKER_IMAGE=Visual Studio 2017, USE_CLCACHE=false**
- `C# / AMD64 Windows 2019 C# 2.2.103`
- [x] **JOB=Go, MINGW_PACKAGE_PREFIX=mingw-w64-x86_64 ...**:
- `Go / AMD64 Windows 2019 Go 1.12`
- ~JOB=R with libarrow, USE_CLCACHE=false, TEST_R_WITH_ARROW=TRUE, RWINLIB_LOCAL=%APPVEYOR_BUILD_FOLDER%\libarrow.zip~
### Github Actions
- [x] **Windows MSVC C++ / Build (Visual Studio 16 2019)**:
- `C++ / AMD64 Windows 2019 C++`: without tests
- [x] **Windows MSVC C++ / Build (Visual Studio 15 2017)**:
- `C++ / AMD64 Windows 2016 C++`: without tests
- [x] **Linux docker-compose / Test (C++ w/ clang-7 & system packages)**: all have llvm for gandiva but the compiler is set to gcc
- `C++ / AMD64 Debian 10 C++`: with GCC 8.3
- `C++ / AMD64 Ubuntu 16.04 C++`: with GCC 5.4
- `C++ / AMD64 Ubuntu 18.04 C++`: with GCC 7.4
- [x] **Linux docker-compose / Test (Rust)**: without rustfmt
- `Rust / AMD64 Debian Rust nightly-2019-09-25`
- [x] **Linux docker-compose / Test (Lint, Release tests)**:
- `Lint / C++, Python, R, Rust, Docker, RAT`
- `Dev / Source Release`
### Nightly Crossbow tests
The packaging builds are out of the scope if this PR, but the nightly **dockerized test** task are in.
Nightly tests:
- [x] docker-r
- [x] docker-r-conda
- [x] docker-r-sanitizer
- [x] docker-rust
- [x] docker-cpp
- [x] docker-cpp-cmake32
- [x] docker-cpp-release
- [x] docker-cpp-static-only
- [x] docker-c_glib
- [x] docker-go
- [x] docker-python-2.7
- [x] docker-python-3.6
- [x] docker-python-3.7
- [x] docker-python-2.7-nopandas
- [x] docker-python-3.6-nopandas
- [x] docker-java
- [x] docker-js
- [x] docker-docs
- [x] docker-lint
- [x] docker-iwyu: included in the lint
- [x] docker-clang-format: included in the lint
- [x] docker-pandas-master
- [x] docker-dask-integration
- [x] docker-hdfs-integration
- [x] docker-spark-integration
- [x] docker-turbodbc-integration
# TODOs left:
- [x] Fix the Apidoc generation for c_glib
- [x] Fix the JNI test for Gandiva and ORC
- [x] Test that crossbow tests are passing
- ~Optionally restore the travis configuration to incrementally decommission old builds~
## Follow-up JIRAs:
- [Archery] Consider porting the docker tool of ursabot to archery
- [Archery] Consider to use archery with or instead of the pre-commit hooks
- [Archery] Create a wrapper script in archery for docker compose in order to run the containers with the host's user and group
- [C++] GCC 5.4.0 has a compile errors, reproduce with UBUNTU=16.04 docker-compose run ubuntu-cpp
- [C++][CI] Test the ported fuzzit integration image
- [C++][CI] Turn off unnecessary features in the integration tests (spark/turbodbc/dask/hdfs)
- [C++][CI] Revisit ASAN UBSAN settings in every C++ based image
- [CI] Consider re-adding the removed debian testing image is removed
- [Go][CI] Pre-install the go dependencies in the dockerfile using go get
- [JS][CI] Pre-install the JS dependencies in the dockerfile
- [Rust][CI] Pre-install the rust dependencies in the dockerfile
- [Java][CI] Pre-install the java dependencies in the dockerfile
- [Ruby][CI] Pre-install the ruby dependencies in the dockerfile and remove it from the test script
- [C#][CI] Pre-install the C# dependencies in the dockerfile
- [R][CI] Fix the r-sanitizer build https://issues.apache.org/jira/browse/ARROW-6957
- [GLIB][MacOS] Fail to execute lua examples (fails to load 'lgi.corelgilua51' despite that lgi is installed)
- [C++][CMake] Automatically set ARROW_GANDIVA_PC_CXX_FLAGS for conda and OSX sdk (see cpp_build.sh)
- [C++][CI] Hiveserver2 instegration test fails to connect to impala container
- [CI][Spark] Support specific Spark version in the integration tet including latest
- [JS][CI] Move nodejs linting from js_build.sh to archery
- [Python][CI] create a docker image for python ASV benchmarks and fix the script
- [CI] Find a short but related prefix for the env vars used for the docker-compose file to prevent collisions
- [C#] the docker container fails to run because of the ubuntu host versions, see https://github.com/dotnet/core/issues/3509
- [C++][Windows] Enable more features on the windows GHA build
- [Doc] document docker-compose usage in the developer sphinx guide
- [CI][C++] Add .ccache to the docker-compose mounts
- [Archery][CI] Refactor the ci/scripts to a sourceable bash functions or to archery directly
- [C++][CI] Use scripts/util_coredump.sh to show automatic backtraces
- [C++] Fix the hanging C++ tests in Windows 2019
- [CI] Ask INFRA to set up the DOCKERHUB_* secrets for GitHub actions
- [C++][CI] Running Gandiva tests fails on Fedora:
Reproduce with: `docker-compose run -e ARROW_GANDIVA=ON fedora-cpp`
```
Running gandiva-internals-test, redirecting output into /build/cpp/build/test-logs/gandiva-internals-test.txt (attempt 1/1)
1364
: CommandLine Error: Option 'x86-experimental-vector-widening-legalization' registered more than once!
1365
LLVM ERROR: inconsistency in registered CommandLine options
1366
/build/cpp/src/gandiva
```
- [JS][CI] NodeJS build fails on Github Actions Windows node
```
> NODE_NO_WARNINGS=1 gulp build
# 'NODE_NO_WARNINGS' is not recognized as an internal or external command,
# operable program or batch file.
# npm ERR! code ELIFECYCLE
# npm ERR! errno 1
# npm ERR! apache-arrow@1.0.0-SNAPSHOT build: `NODE_NO_WARNINGS=1 gulp build`
# npm ERR! Exit status 1
# npm ERR!
# npm ERR! Failed at the apache-arrow@1.0.0-SNAPSHOT build script.
# npm ERR! This is probably not a problem with npm. There is likely additional logging output above.
```
Closes #5589 from kszucs/docker-refactor and squashes the following commits:
5105d12e6 <Krisztián Szűcs> Rename pull-request folder to dev_cron
e9e9a7eec <Krisztián Szűcs> Use underscores for naming the workflow files
a92c99d03 <Krisztián Szűcs> Disable hanging C++ tests on windows
f158c89b5 <Krisztián Szűcs> Attempt to push from apache/arrow master; Don't push from crossbow tasks
0e1d470a1 <Krisztián Szűcs> Turn off ORC on macOS C++ test due to link error
258db5cff <Krisztián Szűcs> Only push docker images from apache/arrow repository
acdfcf086 <Krisztián Szűcs> Remove ORC from the brewfile
5102b85b1 <Krisztián Szűcs> Fix nodeJS workflow
032d6a388 <Krisztián Szűcs> Turn off 2 python builds
7f15b97a8 <Krisztián Szűcs> Filter branches
48b8d128a <Krisztián Szűcs> Fix workflows
36ad9d297 <Krisztián Szűcs> Disable builds
0f603af0c <Krisztián Szűcs> master only and cron workflows
28cc2d78d <Krisztián Szűcs> Rename Java JNI workflow
bcd8af7b7 <Krisztián Szűcs> Port the remaining travis utility scripts
ed5688154 <Krisztián Szűcs> Usage comments; recommend installing pandas from the docs because of its removal from conda_env_python
3c8c023ce <Krisztián Szűcs> Use Arch in volumes; some comments; remove conda version 'latest' from the images
771b023a8 <Krisztián Szűcs> Cleanup files; separate JNI builds
97ff8a122 <Krisztián Szűcs> Push docker images only from master
dc00b4297 <Krisztián Szűcs> Enable path filters
e0e2e1f46 <Krisztián Szűcs> Fix pandas master build
3814e0828 <Krisztián Szűcs> Fix manylinux volumes
c18edda70 <Krisztián Szűcs> Add CentOS version to the manylinux image names
c8b9dd6b1 <Krisztián Szűcs> Missing --pyargs argument for the python test command
33e646981 <Krisztián Szűcs> Turn off gandiva and flight for the HDFS test
b9c547889 <Krisztián Szűcs> Refactor docker-compose file and use it with github actions.
Authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
2019-11-12 11:07:48 +01:00
|
|
|
export PYARROW_WITH_CUDA=${ARROW_CUDA:-OFF}
|
2022-03-08 12:00:19 +01:00
|
|
|
export PYARROW_WITH_DATASET=${ARROW_DATASET:-ON}
|
ARROW-7101: [CI] Refactor docker-compose setup and use it with GitHub Actions
## Projecting ideas from ursabot
### Parametric docker images
The images are better parameterized now, meaning that we can build more variant of the same service. Couple of examples:
```console
UBUNTU=16.04 docker-compose build ubuntu-cpp
ARCH=arm64v8 UBUNTU=18.04 docker-compose build ubuntu-cpp
PYTHON=3.6 docker-compose build conda-python
ARCH=arm32v7 PYTHON=3.6 PANDAS=0.25 docker-compose build conda-python-pandas
```
Each variant has it's own docker image following a string naming schema:
`{org}/{arch}-{platform}-{platform-version}[[-{variant}-{variant-version}]..]:latest`
### Use *_build.sh and *_test.sh for each job
The docker images provide the environment, and each language backend usually should implement two scripts, a `build.sh` and a `test.sh`. This way dependent build like the docker python, r or c glib are able to reuse the build script of the ancestor without running its tests.
With small enough scripts, if the environment is properly set up even the non-docker builds should be reproducible locally. GitHub Actions support bash scripts across all three platforms, so we can reuse the same `*_build.sh` and `*_test.sh` scripts to execute the builds either in docker, on the CI or locally.
## Using GitHub Actions for running the builds
Regardless of the CI we're going to choose, the isolation constraint of different platforms requires some sort of virtualisation. Currently linux (and windows, but I have not tried it yet) has lightweight containerisation, so we should keep the linux builds isolated in docker containers. The rest of the platforms (windows and macOS) should be executed on the CI system.
GitHub Actions support all three major platforms, linux, windows and macOS. I've added cross platform builds for a couple of languages, like Rust, and Go, the rest are work in progress.
### Workflow
A workflow should define all builds of a language, mostly because the path filters can be defined on workflow level. For example the python builds should be triggered if either a cpp/** or a python/** file changes which can be covered in the same workflow file.
## Feature parity with the current builds
Reaching feature parity with all of the builds below is not a goal for this PR, the difficult ones should at least have a tracking JIRA ticket.
### Travis-CI
- [x] **Lint, Release tests**:
- `Lint / C++, Python, R, Rust, Docker, RAT`
- `Dev / Source Release`
- [x] **C++ unit tests w/ conda-forge toolchain, coverage**: without coverage
- `C++ / AMD64 Conda C++`
- [x] **Python 3.6 unit tests, conda-forge toolchain, coverage**: without coverage
- `Python / AMD64 Conda Python 3.6`
- [x] **[OS X] C++ w/ Xcode 9.3**:
- `C++ / AMD64 MacOS 10.14 C++`: with Xcode 10.3
- [x] **[OS X] Python w/ Xcode 9.3**:
- `Python / AMD64 MacOS 10.14 Python 3`: with Xcode 10.3
- [x] **Java OpenJDK8 and OpenJDK11**:
- `Java / AMD64 Debian Java JDK 8 Maven 3.5.2`
- `Java / AMD64 Debian Java JDK 11 Maven 3.6.2`
- [x] **Protocol / Flight Integration Tests**:
- `Dev / Protocol Test`
- [x] **NodeJS**: without running lint and coverage
- `NodeJS / AMD64 Debian NodeJS 11`
- [x] **C++ & GLib & Ruby w/ gcc 5.4**:
- `C++ / AMD64 Debian 10 C++`: with GCC 8.3
- `C++ / AMD64 Ubuntu 16.04 C++`: with GCC 5.4
- `C++ / AMD64 Ubuntu 18.04 C++`: with GCC 7.4
- `C GLib / AMD64 Ubuntu 18.04 C GLib`
- `Ruby / AMD64 Ubuntu 18.04 Ruby`
- [x] **[OS X] C++ & GLib & Ruby w/ XCode 10.2 & Homebrew**
- `C++ / AMD64 MacOS 10.14 C++`: with Xcode 10.3
- `C GLib / AMD64 MacOS 10.14 C Glib`: with Xcode 10.3
- `Ruby / AMD64 MacOS 10.14 Ruby`: with Xcode 10.3
- [x] **Go**: without coverage
- `Go / AMD64 Debian Go 1.12`
- [x] **R (with and without libarrow)**:
- `R / AMD64 Conda R 3.6`: with libarrow
- `R / AMD64 Ubuntu 18.04 R 3.6` with libarrow
### Appveyor
- ~JOB=Build, GENERATOR=Ninja, CONFIGURATION=Release, APPVEYOR_BUILD_WORKER_IMAGE=Visual Studio 2017~
- ~JOB=Toolchain, GENERATOR=Ninja, CONFIGURATION=Release, ARROW_S3=ON, ARROW_BUILD_FLIGHT=ON, ARROW_BUILD_GANDIVA=ON~
- ~JOB=Build_Debug, GENERATOR=Ninja, CONFIGURATION=Debug~
- ~JOB=MinGW32, MINGW_ARCH=i686, MINGW_PACKAGE_PREFIX=mingw-w64-i686, MINGW_PREFIX=c:\msys64\mingw32, MSYSTEM=MINGW32, USE_CLCACHE=false~
- ~JOB=MinGW64, MINGW_ARCH=x86_64, MINGW_PACKAGE_PREFIX=mingw-w64-x86_64, MINGW_PREFIX=c:\msys64\mingw64, MSYSTEM=MINGW64, USE_CLCACHE=false~
- [x] **JOB=Rust, TARGET=x86_64-pc-windows-msvc, USE_CLCACHE=false**:
- `Rust / AMD64 Windows 2019 Rust nightly-2019-09-25`
- [x] **JOB=C#, APPVEYOR_BUILD_WORKER_IMAGE=Visual Studio 2017, USE_CLCACHE=false**
- `C# / AMD64 Windows 2019 C# 2.2.103`
- [x] **JOB=Go, MINGW_PACKAGE_PREFIX=mingw-w64-x86_64 ...**:
- `Go / AMD64 Windows 2019 Go 1.12`
- ~JOB=R with libarrow, USE_CLCACHE=false, TEST_R_WITH_ARROW=TRUE, RWINLIB_LOCAL=%APPVEYOR_BUILD_FOLDER%\libarrow.zip~
### Github Actions
- [x] **Windows MSVC C++ / Build (Visual Studio 16 2019)**:
- `C++ / AMD64 Windows 2019 C++`: without tests
- [x] **Windows MSVC C++ / Build (Visual Studio 15 2017)**:
- `C++ / AMD64 Windows 2016 C++`: without tests
- [x] **Linux docker-compose / Test (C++ w/ clang-7 & system packages)**: all have llvm for gandiva but the compiler is set to gcc
- `C++ / AMD64 Debian 10 C++`: with GCC 8.3
- `C++ / AMD64 Ubuntu 16.04 C++`: with GCC 5.4
- `C++ / AMD64 Ubuntu 18.04 C++`: with GCC 7.4
- [x] **Linux docker-compose / Test (Rust)**: without rustfmt
- `Rust / AMD64 Debian Rust nightly-2019-09-25`
- [x] **Linux docker-compose / Test (Lint, Release tests)**:
- `Lint / C++, Python, R, Rust, Docker, RAT`
- `Dev / Source Release`
### Nightly Crossbow tests
The packaging builds are out of the scope if this PR, but the nightly **dockerized test** task are in.
Nightly tests:
- [x] docker-r
- [x] docker-r-conda
- [x] docker-r-sanitizer
- [x] docker-rust
- [x] docker-cpp
- [x] docker-cpp-cmake32
- [x] docker-cpp-release
- [x] docker-cpp-static-only
- [x] docker-c_glib
- [x] docker-go
- [x] docker-python-2.7
- [x] docker-python-3.6
- [x] docker-python-3.7
- [x] docker-python-2.7-nopandas
- [x] docker-python-3.6-nopandas
- [x] docker-java
- [x] docker-js
- [x] docker-docs
- [x] docker-lint
- [x] docker-iwyu: included in the lint
- [x] docker-clang-format: included in the lint
- [x] docker-pandas-master
- [x] docker-dask-integration
- [x] docker-hdfs-integration
- [x] docker-spark-integration
- [x] docker-turbodbc-integration
# TODOs left:
- [x] Fix the Apidoc generation for c_glib
- [x] Fix the JNI test for Gandiva and ORC
- [x] Test that crossbow tests are passing
- ~Optionally restore the travis configuration to incrementally decommission old builds~
## Follow-up JIRAs:
- [Archery] Consider porting the docker tool of ursabot to archery
- [Archery] Consider to use archery with or instead of the pre-commit hooks
- [Archery] Create a wrapper script in archery for docker compose in order to run the containers with the host's user and group
- [C++] GCC 5.4.0 has a compile errors, reproduce with UBUNTU=16.04 docker-compose run ubuntu-cpp
- [C++][CI] Test the ported fuzzit integration image
- [C++][CI] Turn off unnecessary features in the integration tests (spark/turbodbc/dask/hdfs)
- [C++][CI] Revisit ASAN UBSAN settings in every C++ based image
- [CI] Consider re-adding the removed debian testing image is removed
- [Go][CI] Pre-install the go dependencies in the dockerfile using go get
- [JS][CI] Pre-install the JS dependencies in the dockerfile
- [Rust][CI] Pre-install the rust dependencies in the dockerfile
- [Java][CI] Pre-install the java dependencies in the dockerfile
- [Ruby][CI] Pre-install the ruby dependencies in the dockerfile and remove it from the test script
- [C#][CI] Pre-install the C# dependencies in the dockerfile
- [R][CI] Fix the r-sanitizer build https://issues.apache.org/jira/browse/ARROW-6957
- [GLIB][MacOS] Fail to execute lua examples (fails to load 'lgi.corelgilua51' despite that lgi is installed)
- [C++][CMake] Automatically set ARROW_GANDIVA_PC_CXX_FLAGS for conda and OSX sdk (see cpp_build.sh)
- [C++][CI] Hiveserver2 instegration test fails to connect to impala container
- [CI][Spark] Support specific Spark version in the integration tet including latest
- [JS][CI] Move nodejs linting from js_build.sh to archery
- [Python][CI] create a docker image for python ASV benchmarks and fix the script
- [CI] Find a short but related prefix for the env vars used for the docker-compose file to prevent collisions
- [C#] the docker container fails to run because of the ubuntu host versions, see https://github.com/dotnet/core/issues/3509
- [C++][Windows] Enable more features on the windows GHA build
- [Doc] document docker-compose usage in the developer sphinx guide
- [CI][C++] Add .ccache to the docker-compose mounts
- [Archery][CI] Refactor the ci/scripts to a sourceable bash functions or to archery directly
- [C++][CI] Use scripts/util_coredump.sh to show automatic backtraces
- [C++] Fix the hanging C++ tests in Windows 2019
- [CI] Ask INFRA to set up the DOCKERHUB_* secrets for GitHub actions
- [C++][CI] Running Gandiva tests fails on Fedora:
Reproduce with: `docker-compose run -e ARROW_GANDIVA=ON fedora-cpp`
```
Running gandiva-internals-test, redirecting output into /build/cpp/build/test-logs/gandiva-internals-test.txt (attempt 1/1)
1364
: CommandLine Error: Option 'x86-experimental-vector-widening-legalization' registered more than once!
1365
LLVM ERROR: inconsistency in registered CommandLine options
1366
/build/cpp/src/gandiva
```
- [JS][CI] NodeJS build fails on Github Actions Windows node
```
> NODE_NO_WARNINGS=1 gulp build
# 'NODE_NO_WARNINGS' is not recognized as an internal or external command,
# operable program or batch file.
# npm ERR! code ELIFECYCLE
# npm ERR! errno 1
# npm ERR! apache-arrow@1.0.0-SNAPSHOT build: `NODE_NO_WARNINGS=1 gulp build`
# npm ERR! Exit status 1
# npm ERR!
# npm ERR! Failed at the apache-arrow@1.0.0-SNAPSHOT build script.
# npm ERR! This is probably not a problem with npm. There is likely additional logging output above.
```
Closes #5589 from kszucs/docker-refactor and squashes the following commits:
5105d12e6 <Krisztián Szűcs> Rename pull-request folder to dev_cron
e9e9a7eec <Krisztián Szűcs> Use underscores for naming the workflow files
a92c99d03 <Krisztián Szűcs> Disable hanging C++ tests on windows
f158c89b5 <Krisztián Szűcs> Attempt to push from apache/arrow master; Don't push from crossbow tasks
0e1d470a1 <Krisztián Szűcs> Turn off ORC on macOS C++ test due to link error
258db5cff <Krisztián Szűcs> Only push docker images from apache/arrow repository
acdfcf086 <Krisztián Szűcs> Remove ORC from the brewfile
5102b85b1 <Krisztián Szűcs> Fix nodeJS workflow
032d6a388 <Krisztián Szűcs> Turn off 2 python builds
7f15b97a8 <Krisztián Szűcs> Filter branches
48b8d128a <Krisztián Szűcs> Fix workflows
36ad9d297 <Krisztián Szűcs> Disable builds
0f603af0c <Krisztián Szűcs> master only and cron workflows
28cc2d78d <Krisztián Szűcs> Rename Java JNI workflow
bcd8af7b7 <Krisztián Szűcs> Port the remaining travis utility scripts
ed5688154 <Krisztián Szűcs> Usage comments; recommend installing pandas from the docs because of its removal from conda_env_python
3c8c023ce <Krisztián Szűcs> Use Arch in volumes; some comments; remove conda version 'latest' from the images
771b023a8 <Krisztián Szűcs> Cleanup files; separate JNI builds
97ff8a122 <Krisztián Szűcs> Push docker images only from master
dc00b4297 <Krisztián Szűcs> Enable path filters
e0e2e1f46 <Krisztián Szűcs> Fix pandas master build
3814e0828 <Krisztián Szűcs> Fix manylinux volumes
c18edda70 <Krisztián Szűcs> Add CentOS version to the manylinux image names
c8b9dd6b1 <Krisztián Szűcs> Missing --pyargs argument for the python test command
33e646981 <Krisztián Szűcs> Turn off gandiva and flight for the HDFS test
b9c547889 <Krisztián Szűcs> Refactor docker-compose file and use it with github actions.
Authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
2019-11-12 11:07:48 +01:00
|
|
|
export PYARROW_WITH_FLIGHT=${ARROW_FLIGHT:-OFF}
|
|
|
|
|
export PYARROW_WITH_GANDIVA=${ARROW_GANDIVA:-OFF}
|
2022-06-12 02:50:28 -07:00
|
|
|
export PYARROW_WITH_GCS=${ARROW_GCS:-OFF}
|
2022-03-08 12:00:19 +01:00
|
|
|
export PYARROW_WITH_HDFS=${ARROW_HDFS:-ON}
|
|
|
|
|
export PYARROW_WITH_ORC=${ARROW_ORC:-OFF}
|
ARROW-7101: [CI] Refactor docker-compose setup and use it with GitHub Actions
## Projecting ideas from ursabot
### Parametric docker images
The images are better parameterized now, meaning that we can build more variant of the same service. Couple of examples:
```console
UBUNTU=16.04 docker-compose build ubuntu-cpp
ARCH=arm64v8 UBUNTU=18.04 docker-compose build ubuntu-cpp
PYTHON=3.6 docker-compose build conda-python
ARCH=arm32v7 PYTHON=3.6 PANDAS=0.25 docker-compose build conda-python-pandas
```
Each variant has it's own docker image following a string naming schema:
`{org}/{arch}-{platform}-{platform-version}[[-{variant}-{variant-version}]..]:latest`
### Use *_build.sh and *_test.sh for each job
The docker images provide the environment, and each language backend usually should implement two scripts, a `build.sh` and a `test.sh`. This way dependent build like the docker python, r or c glib are able to reuse the build script of the ancestor without running its tests.
With small enough scripts, if the environment is properly set up even the non-docker builds should be reproducible locally. GitHub Actions support bash scripts across all three platforms, so we can reuse the same `*_build.sh` and `*_test.sh` scripts to execute the builds either in docker, on the CI or locally.
## Using GitHub Actions for running the builds
Regardless of the CI we're going to choose, the isolation constraint of different platforms requires some sort of virtualisation. Currently linux (and windows, but I have not tried it yet) has lightweight containerisation, so we should keep the linux builds isolated in docker containers. The rest of the platforms (windows and macOS) should be executed on the CI system.
GitHub Actions support all three major platforms, linux, windows and macOS. I've added cross platform builds for a couple of languages, like Rust, and Go, the rest are work in progress.
### Workflow
A workflow should define all builds of a language, mostly because the path filters can be defined on workflow level. For example the python builds should be triggered if either a cpp/** or a python/** file changes which can be covered in the same workflow file.
## Feature parity with the current builds
Reaching feature parity with all of the builds below is not a goal for this PR, the difficult ones should at least have a tracking JIRA ticket.
### Travis-CI
- [x] **Lint, Release tests**:
- `Lint / C++, Python, R, Rust, Docker, RAT`
- `Dev / Source Release`
- [x] **C++ unit tests w/ conda-forge toolchain, coverage**: without coverage
- `C++ / AMD64 Conda C++`
- [x] **Python 3.6 unit tests, conda-forge toolchain, coverage**: without coverage
- `Python / AMD64 Conda Python 3.6`
- [x] **[OS X] C++ w/ Xcode 9.3**:
- `C++ / AMD64 MacOS 10.14 C++`: with Xcode 10.3
- [x] **[OS X] Python w/ Xcode 9.3**:
- `Python / AMD64 MacOS 10.14 Python 3`: with Xcode 10.3
- [x] **Java OpenJDK8 and OpenJDK11**:
- `Java / AMD64 Debian Java JDK 8 Maven 3.5.2`
- `Java / AMD64 Debian Java JDK 11 Maven 3.6.2`
- [x] **Protocol / Flight Integration Tests**:
- `Dev / Protocol Test`
- [x] **NodeJS**: without running lint and coverage
- `NodeJS / AMD64 Debian NodeJS 11`
- [x] **C++ & GLib & Ruby w/ gcc 5.4**:
- `C++ / AMD64 Debian 10 C++`: with GCC 8.3
- `C++ / AMD64 Ubuntu 16.04 C++`: with GCC 5.4
- `C++ / AMD64 Ubuntu 18.04 C++`: with GCC 7.4
- `C GLib / AMD64 Ubuntu 18.04 C GLib`
- `Ruby / AMD64 Ubuntu 18.04 Ruby`
- [x] **[OS X] C++ & GLib & Ruby w/ XCode 10.2 & Homebrew**
- `C++ / AMD64 MacOS 10.14 C++`: with Xcode 10.3
- `C GLib / AMD64 MacOS 10.14 C Glib`: with Xcode 10.3
- `Ruby / AMD64 MacOS 10.14 Ruby`: with Xcode 10.3
- [x] **Go**: without coverage
- `Go / AMD64 Debian Go 1.12`
- [x] **R (with and without libarrow)**:
- `R / AMD64 Conda R 3.6`: with libarrow
- `R / AMD64 Ubuntu 18.04 R 3.6` with libarrow
### Appveyor
- ~JOB=Build, GENERATOR=Ninja, CONFIGURATION=Release, APPVEYOR_BUILD_WORKER_IMAGE=Visual Studio 2017~
- ~JOB=Toolchain, GENERATOR=Ninja, CONFIGURATION=Release, ARROW_S3=ON, ARROW_BUILD_FLIGHT=ON, ARROW_BUILD_GANDIVA=ON~
- ~JOB=Build_Debug, GENERATOR=Ninja, CONFIGURATION=Debug~
- ~JOB=MinGW32, MINGW_ARCH=i686, MINGW_PACKAGE_PREFIX=mingw-w64-i686, MINGW_PREFIX=c:\msys64\mingw32, MSYSTEM=MINGW32, USE_CLCACHE=false~
- ~JOB=MinGW64, MINGW_ARCH=x86_64, MINGW_PACKAGE_PREFIX=mingw-w64-x86_64, MINGW_PREFIX=c:\msys64\mingw64, MSYSTEM=MINGW64, USE_CLCACHE=false~
- [x] **JOB=Rust, TARGET=x86_64-pc-windows-msvc, USE_CLCACHE=false**:
- `Rust / AMD64 Windows 2019 Rust nightly-2019-09-25`
- [x] **JOB=C#, APPVEYOR_BUILD_WORKER_IMAGE=Visual Studio 2017, USE_CLCACHE=false**
- `C# / AMD64 Windows 2019 C# 2.2.103`
- [x] **JOB=Go, MINGW_PACKAGE_PREFIX=mingw-w64-x86_64 ...**:
- `Go / AMD64 Windows 2019 Go 1.12`
- ~JOB=R with libarrow, USE_CLCACHE=false, TEST_R_WITH_ARROW=TRUE, RWINLIB_LOCAL=%APPVEYOR_BUILD_FOLDER%\libarrow.zip~
### Github Actions
- [x] **Windows MSVC C++ / Build (Visual Studio 16 2019)**:
- `C++ / AMD64 Windows 2019 C++`: without tests
- [x] **Windows MSVC C++ / Build (Visual Studio 15 2017)**:
- `C++ / AMD64 Windows 2016 C++`: without tests
- [x] **Linux docker-compose / Test (C++ w/ clang-7 & system packages)**: all have llvm for gandiva but the compiler is set to gcc
- `C++ / AMD64 Debian 10 C++`: with GCC 8.3
- `C++ / AMD64 Ubuntu 16.04 C++`: with GCC 5.4
- `C++ / AMD64 Ubuntu 18.04 C++`: with GCC 7.4
- [x] **Linux docker-compose / Test (Rust)**: without rustfmt
- `Rust / AMD64 Debian Rust nightly-2019-09-25`
- [x] **Linux docker-compose / Test (Lint, Release tests)**:
- `Lint / C++, Python, R, Rust, Docker, RAT`
- `Dev / Source Release`
### Nightly Crossbow tests
The packaging builds are out of the scope if this PR, but the nightly **dockerized test** task are in.
Nightly tests:
- [x] docker-r
- [x] docker-r-conda
- [x] docker-r-sanitizer
- [x] docker-rust
- [x] docker-cpp
- [x] docker-cpp-cmake32
- [x] docker-cpp-release
- [x] docker-cpp-static-only
- [x] docker-c_glib
- [x] docker-go
- [x] docker-python-2.7
- [x] docker-python-3.6
- [x] docker-python-3.7
- [x] docker-python-2.7-nopandas
- [x] docker-python-3.6-nopandas
- [x] docker-java
- [x] docker-js
- [x] docker-docs
- [x] docker-lint
- [x] docker-iwyu: included in the lint
- [x] docker-clang-format: included in the lint
- [x] docker-pandas-master
- [x] docker-dask-integration
- [x] docker-hdfs-integration
- [x] docker-spark-integration
- [x] docker-turbodbc-integration
# TODOs left:
- [x] Fix the Apidoc generation for c_glib
- [x] Fix the JNI test for Gandiva and ORC
- [x] Test that crossbow tests are passing
- ~Optionally restore the travis configuration to incrementally decommission old builds~
## Follow-up JIRAs:
- [Archery] Consider porting the docker tool of ursabot to archery
- [Archery] Consider to use archery with or instead of the pre-commit hooks
- [Archery] Create a wrapper script in archery for docker compose in order to run the containers with the host's user and group
- [C++] GCC 5.4.0 has a compile errors, reproduce with UBUNTU=16.04 docker-compose run ubuntu-cpp
- [C++][CI] Test the ported fuzzit integration image
- [C++][CI] Turn off unnecessary features in the integration tests (spark/turbodbc/dask/hdfs)
- [C++][CI] Revisit ASAN UBSAN settings in every C++ based image
- [CI] Consider re-adding the removed debian testing image is removed
- [Go][CI] Pre-install the go dependencies in the dockerfile using go get
- [JS][CI] Pre-install the JS dependencies in the dockerfile
- [Rust][CI] Pre-install the rust dependencies in the dockerfile
- [Java][CI] Pre-install the java dependencies in the dockerfile
- [Ruby][CI] Pre-install the ruby dependencies in the dockerfile and remove it from the test script
- [C#][CI] Pre-install the C# dependencies in the dockerfile
- [R][CI] Fix the r-sanitizer build https://issues.apache.org/jira/browse/ARROW-6957
- [GLIB][MacOS] Fail to execute lua examples (fails to load 'lgi.corelgilua51' despite that lgi is installed)
- [C++][CMake] Automatically set ARROW_GANDIVA_PC_CXX_FLAGS for conda and OSX sdk (see cpp_build.sh)
- [C++][CI] Hiveserver2 instegration test fails to connect to impala container
- [CI][Spark] Support specific Spark version in the integration tet including latest
- [JS][CI] Move nodejs linting from js_build.sh to archery
- [Python][CI] create a docker image for python ASV benchmarks and fix the script
- [CI] Find a short but related prefix for the env vars used for the docker-compose file to prevent collisions
- [C#] the docker container fails to run because of the ubuntu host versions, see https://github.com/dotnet/core/issues/3509
- [C++][Windows] Enable more features on the windows GHA build
- [Doc] document docker-compose usage in the developer sphinx guide
- [CI][C++] Add .ccache to the docker-compose mounts
- [Archery][CI] Refactor the ci/scripts to a sourceable bash functions or to archery directly
- [C++][CI] Use scripts/util_coredump.sh to show automatic backtraces
- [C++] Fix the hanging C++ tests in Windows 2019
- [CI] Ask INFRA to set up the DOCKERHUB_* secrets for GitHub actions
- [C++][CI] Running Gandiva tests fails on Fedora:
Reproduce with: `docker-compose run -e ARROW_GANDIVA=ON fedora-cpp`
```
Running gandiva-internals-test, redirecting output into /build/cpp/build/test-logs/gandiva-internals-test.txt (attempt 1/1)
1364
: CommandLine Error: Option 'x86-experimental-vector-widening-legalization' registered more than once!
1365
LLVM ERROR: inconsistency in registered CommandLine options
1366
/build/cpp/src/gandiva
```
- [JS][CI] NodeJS build fails on Github Actions Windows node
```
> NODE_NO_WARNINGS=1 gulp build
# 'NODE_NO_WARNINGS' is not recognized as an internal or external command,
# operable program or batch file.
# npm ERR! code ELIFECYCLE
# npm ERR! errno 1
# npm ERR! apache-arrow@1.0.0-SNAPSHOT build: `NODE_NO_WARNINGS=1 gulp build`
# npm ERR! Exit status 1
# npm ERR!
# npm ERR! Failed at the apache-arrow@1.0.0-SNAPSHOT build script.
# npm ERR! This is probably not a problem with npm. There is likely additional logging output above.
```
Closes #5589 from kszucs/docker-refactor and squashes the following commits:
5105d12e6 <Krisztián Szűcs> Rename pull-request folder to dev_cron
e9e9a7eec <Krisztián Szűcs> Use underscores for naming the workflow files
a92c99d03 <Krisztián Szűcs> Disable hanging C++ tests on windows
f158c89b5 <Krisztián Szűcs> Attempt to push from apache/arrow master; Don't push from crossbow tasks
0e1d470a1 <Krisztián Szűcs> Turn off ORC on macOS C++ test due to link error
258db5cff <Krisztián Szűcs> Only push docker images from apache/arrow repository
acdfcf086 <Krisztián Szűcs> Remove ORC from the brewfile
5102b85b1 <Krisztián Szűcs> Fix nodeJS workflow
032d6a388 <Krisztián Szűcs> Turn off 2 python builds
7f15b97a8 <Krisztián Szűcs> Filter branches
48b8d128a <Krisztián Szűcs> Fix workflows
36ad9d297 <Krisztián Szűcs> Disable builds
0f603af0c <Krisztián Szűcs> master only and cron workflows
28cc2d78d <Krisztián Szűcs> Rename Java JNI workflow
bcd8af7b7 <Krisztián Szűcs> Port the remaining travis utility scripts
ed5688154 <Krisztián Szűcs> Usage comments; recommend installing pandas from the docs because of its removal from conda_env_python
3c8c023ce <Krisztián Szűcs> Use Arch in volumes; some comments; remove conda version 'latest' from the images
771b023a8 <Krisztián Szűcs> Cleanup files; separate JNI builds
97ff8a122 <Krisztián Szűcs> Push docker images only from master
dc00b4297 <Krisztián Szűcs> Enable path filters
e0e2e1f46 <Krisztián Szűcs> Fix pandas master build
3814e0828 <Krisztián Szűcs> Fix manylinux volumes
c18edda70 <Krisztián Szűcs> Add CentOS version to the manylinux image names
c8b9dd6b1 <Krisztián Szűcs> Missing --pyargs argument for the python test command
33e646981 <Krisztián Szűcs> Turn off gandiva and flight for the HDFS test
b9c547889 <Krisztián Szűcs> Refactor docker-compose file and use it with github actions.
Authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
2019-11-12 11:07:48 +01:00
|
|
|
export PYARROW_WITH_PARQUET=${ARROW_PARQUET:-OFF}
|
2022-03-08 12:00:19 +01:00
|
|
|
export PYARROW_WITH_PARQUET_ENCRYPTION=${PARQUET_REQUIRE_ENCRYPTION:-ON}
|
|
|
|
|
export PYARROW_WITH_S3=${ARROW_S3:-OFF}
|
2022-05-20 13:50:01 -10:00
|
|
|
export PYARROW_WITH_SUBSTRAIT=${ARROW_SUBSTRAIT:-OFF}
|
ARROW-7101: [CI] Refactor docker-compose setup and use it with GitHub Actions
## Projecting ideas from ursabot
### Parametric docker images
The images are better parameterized now, meaning that we can build more variant of the same service. Couple of examples:
```console
UBUNTU=16.04 docker-compose build ubuntu-cpp
ARCH=arm64v8 UBUNTU=18.04 docker-compose build ubuntu-cpp
PYTHON=3.6 docker-compose build conda-python
ARCH=arm32v7 PYTHON=3.6 PANDAS=0.25 docker-compose build conda-python-pandas
```
Each variant has it's own docker image following a string naming schema:
`{org}/{arch}-{platform}-{platform-version}[[-{variant}-{variant-version}]..]:latest`
### Use *_build.sh and *_test.sh for each job
The docker images provide the environment, and each language backend usually should implement two scripts, a `build.sh` and a `test.sh`. This way dependent build like the docker python, r or c glib are able to reuse the build script of the ancestor without running its tests.
With small enough scripts, if the environment is properly set up even the non-docker builds should be reproducible locally. GitHub Actions support bash scripts across all three platforms, so we can reuse the same `*_build.sh` and `*_test.sh` scripts to execute the builds either in docker, on the CI or locally.
## Using GitHub Actions for running the builds
Regardless of the CI we're going to choose, the isolation constraint of different platforms requires some sort of virtualisation. Currently linux (and windows, but I have not tried it yet) has lightweight containerisation, so we should keep the linux builds isolated in docker containers. The rest of the platforms (windows and macOS) should be executed on the CI system.
GitHub Actions support all three major platforms, linux, windows and macOS. I've added cross platform builds for a couple of languages, like Rust, and Go, the rest are work in progress.
### Workflow
A workflow should define all builds of a language, mostly because the path filters can be defined on workflow level. For example the python builds should be triggered if either a cpp/** or a python/** file changes which can be covered in the same workflow file.
## Feature parity with the current builds
Reaching feature parity with all of the builds below is not a goal for this PR, the difficult ones should at least have a tracking JIRA ticket.
### Travis-CI
- [x] **Lint, Release tests**:
- `Lint / C++, Python, R, Rust, Docker, RAT`
- `Dev / Source Release`
- [x] **C++ unit tests w/ conda-forge toolchain, coverage**: without coverage
- `C++ / AMD64 Conda C++`
- [x] **Python 3.6 unit tests, conda-forge toolchain, coverage**: without coverage
- `Python / AMD64 Conda Python 3.6`
- [x] **[OS X] C++ w/ Xcode 9.3**:
- `C++ / AMD64 MacOS 10.14 C++`: with Xcode 10.3
- [x] **[OS X] Python w/ Xcode 9.3**:
- `Python / AMD64 MacOS 10.14 Python 3`: with Xcode 10.3
- [x] **Java OpenJDK8 and OpenJDK11**:
- `Java / AMD64 Debian Java JDK 8 Maven 3.5.2`
- `Java / AMD64 Debian Java JDK 11 Maven 3.6.2`
- [x] **Protocol / Flight Integration Tests**:
- `Dev / Protocol Test`
- [x] **NodeJS**: without running lint and coverage
- `NodeJS / AMD64 Debian NodeJS 11`
- [x] **C++ & GLib & Ruby w/ gcc 5.4**:
- `C++ / AMD64 Debian 10 C++`: with GCC 8.3
- `C++ / AMD64 Ubuntu 16.04 C++`: with GCC 5.4
- `C++ / AMD64 Ubuntu 18.04 C++`: with GCC 7.4
- `C GLib / AMD64 Ubuntu 18.04 C GLib`
- `Ruby / AMD64 Ubuntu 18.04 Ruby`
- [x] **[OS X] C++ & GLib & Ruby w/ XCode 10.2 & Homebrew**
- `C++ / AMD64 MacOS 10.14 C++`: with Xcode 10.3
- `C GLib / AMD64 MacOS 10.14 C Glib`: with Xcode 10.3
- `Ruby / AMD64 MacOS 10.14 Ruby`: with Xcode 10.3
- [x] **Go**: without coverage
- `Go / AMD64 Debian Go 1.12`
- [x] **R (with and without libarrow)**:
- `R / AMD64 Conda R 3.6`: with libarrow
- `R / AMD64 Ubuntu 18.04 R 3.6` with libarrow
### Appveyor
- ~JOB=Build, GENERATOR=Ninja, CONFIGURATION=Release, APPVEYOR_BUILD_WORKER_IMAGE=Visual Studio 2017~
- ~JOB=Toolchain, GENERATOR=Ninja, CONFIGURATION=Release, ARROW_S3=ON, ARROW_BUILD_FLIGHT=ON, ARROW_BUILD_GANDIVA=ON~
- ~JOB=Build_Debug, GENERATOR=Ninja, CONFIGURATION=Debug~
- ~JOB=MinGW32, MINGW_ARCH=i686, MINGW_PACKAGE_PREFIX=mingw-w64-i686, MINGW_PREFIX=c:\msys64\mingw32, MSYSTEM=MINGW32, USE_CLCACHE=false~
- ~JOB=MinGW64, MINGW_ARCH=x86_64, MINGW_PACKAGE_PREFIX=mingw-w64-x86_64, MINGW_PREFIX=c:\msys64\mingw64, MSYSTEM=MINGW64, USE_CLCACHE=false~
- [x] **JOB=Rust, TARGET=x86_64-pc-windows-msvc, USE_CLCACHE=false**:
- `Rust / AMD64 Windows 2019 Rust nightly-2019-09-25`
- [x] **JOB=C#, APPVEYOR_BUILD_WORKER_IMAGE=Visual Studio 2017, USE_CLCACHE=false**
- `C# / AMD64 Windows 2019 C# 2.2.103`
- [x] **JOB=Go, MINGW_PACKAGE_PREFIX=mingw-w64-x86_64 ...**:
- `Go / AMD64 Windows 2019 Go 1.12`
- ~JOB=R with libarrow, USE_CLCACHE=false, TEST_R_WITH_ARROW=TRUE, RWINLIB_LOCAL=%APPVEYOR_BUILD_FOLDER%\libarrow.zip~
### Github Actions
- [x] **Windows MSVC C++ / Build (Visual Studio 16 2019)**:
- `C++ / AMD64 Windows 2019 C++`: without tests
- [x] **Windows MSVC C++ / Build (Visual Studio 15 2017)**:
- `C++ / AMD64 Windows 2016 C++`: without tests
- [x] **Linux docker-compose / Test (C++ w/ clang-7 & system packages)**: all have llvm for gandiva but the compiler is set to gcc
- `C++ / AMD64 Debian 10 C++`: with GCC 8.3
- `C++ / AMD64 Ubuntu 16.04 C++`: with GCC 5.4
- `C++ / AMD64 Ubuntu 18.04 C++`: with GCC 7.4
- [x] **Linux docker-compose / Test (Rust)**: without rustfmt
- `Rust / AMD64 Debian Rust nightly-2019-09-25`
- [x] **Linux docker-compose / Test (Lint, Release tests)**:
- `Lint / C++, Python, R, Rust, Docker, RAT`
- `Dev / Source Release`
### Nightly Crossbow tests
The packaging builds are out of the scope if this PR, but the nightly **dockerized test** task are in.
Nightly tests:
- [x] docker-r
- [x] docker-r-conda
- [x] docker-r-sanitizer
- [x] docker-rust
- [x] docker-cpp
- [x] docker-cpp-cmake32
- [x] docker-cpp-release
- [x] docker-cpp-static-only
- [x] docker-c_glib
- [x] docker-go
- [x] docker-python-2.7
- [x] docker-python-3.6
- [x] docker-python-3.7
- [x] docker-python-2.7-nopandas
- [x] docker-python-3.6-nopandas
- [x] docker-java
- [x] docker-js
- [x] docker-docs
- [x] docker-lint
- [x] docker-iwyu: included in the lint
- [x] docker-clang-format: included in the lint
- [x] docker-pandas-master
- [x] docker-dask-integration
- [x] docker-hdfs-integration
- [x] docker-spark-integration
- [x] docker-turbodbc-integration
# TODOs left:
- [x] Fix the Apidoc generation for c_glib
- [x] Fix the JNI test for Gandiva and ORC
- [x] Test that crossbow tests are passing
- ~Optionally restore the travis configuration to incrementally decommission old builds~
## Follow-up JIRAs:
- [Archery] Consider porting the docker tool of ursabot to archery
- [Archery] Consider to use archery with or instead of the pre-commit hooks
- [Archery] Create a wrapper script in archery for docker compose in order to run the containers with the host's user and group
- [C++] GCC 5.4.0 has a compile errors, reproduce with UBUNTU=16.04 docker-compose run ubuntu-cpp
- [C++][CI] Test the ported fuzzit integration image
- [C++][CI] Turn off unnecessary features in the integration tests (spark/turbodbc/dask/hdfs)
- [C++][CI] Revisit ASAN UBSAN settings in every C++ based image
- [CI] Consider re-adding the removed debian testing image is removed
- [Go][CI] Pre-install the go dependencies in the dockerfile using go get
- [JS][CI] Pre-install the JS dependencies in the dockerfile
- [Rust][CI] Pre-install the rust dependencies in the dockerfile
- [Java][CI] Pre-install the java dependencies in the dockerfile
- [Ruby][CI] Pre-install the ruby dependencies in the dockerfile and remove it from the test script
- [C#][CI] Pre-install the C# dependencies in the dockerfile
- [R][CI] Fix the r-sanitizer build https://issues.apache.org/jira/browse/ARROW-6957
- [GLIB][MacOS] Fail to execute lua examples (fails to load 'lgi.corelgilua51' despite that lgi is installed)
- [C++][CMake] Automatically set ARROW_GANDIVA_PC_CXX_FLAGS for conda and OSX sdk (see cpp_build.sh)
- [C++][CI] Hiveserver2 instegration test fails to connect to impala container
- [CI][Spark] Support specific Spark version in the integration tet including latest
- [JS][CI] Move nodejs linting from js_build.sh to archery
- [Python][CI] create a docker image for python ASV benchmarks and fix the script
- [CI] Find a short but related prefix for the env vars used for the docker-compose file to prevent collisions
- [C#] the docker container fails to run because of the ubuntu host versions, see https://github.com/dotnet/core/issues/3509
- [C++][Windows] Enable more features on the windows GHA build
- [Doc] document docker-compose usage in the developer sphinx guide
- [CI][C++] Add .ccache to the docker-compose mounts
- [Archery][CI] Refactor the ci/scripts to a sourceable bash functions or to archery directly
- [C++][CI] Use scripts/util_coredump.sh to show automatic backtraces
- [C++] Fix the hanging C++ tests in Windows 2019
- [CI] Ask INFRA to set up the DOCKERHUB_* secrets for GitHub actions
- [C++][CI] Running Gandiva tests fails on Fedora:
Reproduce with: `docker-compose run -e ARROW_GANDIVA=ON fedora-cpp`
```
Running gandiva-internals-test, redirecting output into /build/cpp/build/test-logs/gandiva-internals-test.txt (attempt 1/1)
1364
: CommandLine Error: Option 'x86-experimental-vector-widening-legalization' registered more than once!
1365
LLVM ERROR: inconsistency in registered CommandLine options
1366
/build/cpp/src/gandiva
```
- [JS][CI] NodeJS build fails on Github Actions Windows node
```
> NODE_NO_WARNINGS=1 gulp build
# 'NODE_NO_WARNINGS' is not recognized as an internal or external command,
# operable program or batch file.
# npm ERR! code ELIFECYCLE
# npm ERR! errno 1
# npm ERR! apache-arrow@1.0.0-SNAPSHOT build: `NODE_NO_WARNINGS=1 gulp build`
# npm ERR! Exit status 1
# npm ERR!
# npm ERR! Failed at the apache-arrow@1.0.0-SNAPSHOT build script.
# npm ERR! This is probably not a problem with npm. There is likely additional logging output above.
```
Closes #5589 from kszucs/docker-refactor and squashes the following commits:
5105d12e6 <Krisztián Szűcs> Rename pull-request folder to dev_cron
e9e9a7eec <Krisztián Szűcs> Use underscores for naming the workflow files
a92c99d03 <Krisztián Szűcs> Disable hanging C++ tests on windows
f158c89b5 <Krisztián Szűcs> Attempt to push from apache/arrow master; Don't push from crossbow tasks
0e1d470a1 <Krisztián Szűcs> Turn off ORC on macOS C++ test due to link error
258db5cff <Krisztián Szűcs> Only push docker images from apache/arrow repository
acdfcf086 <Krisztián Szűcs> Remove ORC from the brewfile
5102b85b1 <Krisztián Szűcs> Fix nodeJS workflow
032d6a388 <Krisztián Szűcs> Turn off 2 python builds
7f15b97a8 <Krisztián Szűcs> Filter branches
48b8d128a <Krisztián Szűcs> Fix workflows
36ad9d297 <Krisztián Szűcs> Disable builds
0f603af0c <Krisztián Szűcs> master only and cron workflows
28cc2d78d <Krisztián Szűcs> Rename Java JNI workflow
bcd8af7b7 <Krisztián Szűcs> Port the remaining travis utility scripts
ed5688154 <Krisztián Szűcs> Usage comments; recommend installing pandas from the docs because of its removal from conda_env_python
3c8c023ce <Krisztián Szűcs> Use Arch in volumes; some comments; remove conda version 'latest' from the images
771b023a8 <Krisztián Szűcs> Cleanup files; separate JNI builds
97ff8a122 <Krisztián Szűcs> Push docker images only from master
dc00b4297 <Krisztián Szűcs> Enable path filters
e0e2e1f46 <Krisztián Szűcs> Fix pandas master build
3814e0828 <Krisztián Szűcs> Fix manylinux volumes
c18edda70 <Krisztián Szűcs> Add CentOS version to the manylinux image names
c8b9dd6b1 <Krisztián Szűcs> Missing --pyargs argument for the python test command
33e646981 <Krisztián Szűcs> Turn off gandiva and flight for the HDFS test
b9c547889 <Krisztián Szűcs> Refactor docker-compose file and use it with github actions.
Authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
2019-11-12 11:07:48 +01:00
|
|
|
|
2025-08-13 06:26:26 +09:00
|
|
|
: "${CMAKE_PREFIX_PATH:=${ARROW_HOME}}"
|
2024-04-23 13:50:45 +09:00
|
|
|
export CMAKE_PREFIX_PATH
|
ARROW-7101: [CI] Refactor docker-compose setup and use it with GitHub Actions
## Projecting ideas from ursabot
### Parametric docker images
The images are better parameterized now, meaning that we can build more variant of the same service. Couple of examples:
```console
UBUNTU=16.04 docker-compose build ubuntu-cpp
ARCH=arm64v8 UBUNTU=18.04 docker-compose build ubuntu-cpp
PYTHON=3.6 docker-compose build conda-python
ARCH=arm32v7 PYTHON=3.6 PANDAS=0.25 docker-compose build conda-python-pandas
```
Each variant has it's own docker image following a string naming schema:
`{org}/{arch}-{platform}-{platform-version}[[-{variant}-{variant-version}]..]:latest`
### Use *_build.sh and *_test.sh for each job
The docker images provide the environment, and each language backend usually should implement two scripts, a `build.sh` and a `test.sh`. This way dependent build like the docker python, r or c glib are able to reuse the build script of the ancestor without running its tests.
With small enough scripts, if the environment is properly set up even the non-docker builds should be reproducible locally. GitHub Actions support bash scripts across all three platforms, so we can reuse the same `*_build.sh` and `*_test.sh` scripts to execute the builds either in docker, on the CI or locally.
## Using GitHub Actions for running the builds
Regardless of the CI we're going to choose, the isolation constraint of different platforms requires some sort of virtualisation. Currently linux (and windows, but I have not tried it yet) has lightweight containerisation, so we should keep the linux builds isolated in docker containers. The rest of the platforms (windows and macOS) should be executed on the CI system.
GitHub Actions support all three major platforms, linux, windows and macOS. I've added cross platform builds for a couple of languages, like Rust, and Go, the rest are work in progress.
### Workflow
A workflow should define all builds of a language, mostly because the path filters can be defined on workflow level. For example the python builds should be triggered if either a cpp/** or a python/** file changes which can be covered in the same workflow file.
## Feature parity with the current builds
Reaching feature parity with all of the builds below is not a goal for this PR, the difficult ones should at least have a tracking JIRA ticket.
### Travis-CI
- [x] **Lint, Release tests**:
- `Lint / C++, Python, R, Rust, Docker, RAT`
- `Dev / Source Release`
- [x] **C++ unit tests w/ conda-forge toolchain, coverage**: without coverage
- `C++ / AMD64 Conda C++`
- [x] **Python 3.6 unit tests, conda-forge toolchain, coverage**: without coverage
- `Python / AMD64 Conda Python 3.6`
- [x] **[OS X] C++ w/ Xcode 9.3**:
- `C++ / AMD64 MacOS 10.14 C++`: with Xcode 10.3
- [x] **[OS X] Python w/ Xcode 9.3**:
- `Python / AMD64 MacOS 10.14 Python 3`: with Xcode 10.3
- [x] **Java OpenJDK8 and OpenJDK11**:
- `Java / AMD64 Debian Java JDK 8 Maven 3.5.2`
- `Java / AMD64 Debian Java JDK 11 Maven 3.6.2`
- [x] **Protocol / Flight Integration Tests**:
- `Dev / Protocol Test`
- [x] **NodeJS**: without running lint and coverage
- `NodeJS / AMD64 Debian NodeJS 11`
- [x] **C++ & GLib & Ruby w/ gcc 5.4**:
- `C++ / AMD64 Debian 10 C++`: with GCC 8.3
- `C++ / AMD64 Ubuntu 16.04 C++`: with GCC 5.4
- `C++ / AMD64 Ubuntu 18.04 C++`: with GCC 7.4
- `C GLib / AMD64 Ubuntu 18.04 C GLib`
- `Ruby / AMD64 Ubuntu 18.04 Ruby`
- [x] **[OS X] C++ & GLib & Ruby w/ XCode 10.2 & Homebrew**
- `C++ / AMD64 MacOS 10.14 C++`: with Xcode 10.3
- `C GLib / AMD64 MacOS 10.14 C Glib`: with Xcode 10.3
- `Ruby / AMD64 MacOS 10.14 Ruby`: with Xcode 10.3
- [x] **Go**: without coverage
- `Go / AMD64 Debian Go 1.12`
- [x] **R (with and without libarrow)**:
- `R / AMD64 Conda R 3.6`: with libarrow
- `R / AMD64 Ubuntu 18.04 R 3.6` with libarrow
### Appveyor
- ~JOB=Build, GENERATOR=Ninja, CONFIGURATION=Release, APPVEYOR_BUILD_WORKER_IMAGE=Visual Studio 2017~
- ~JOB=Toolchain, GENERATOR=Ninja, CONFIGURATION=Release, ARROW_S3=ON, ARROW_BUILD_FLIGHT=ON, ARROW_BUILD_GANDIVA=ON~
- ~JOB=Build_Debug, GENERATOR=Ninja, CONFIGURATION=Debug~
- ~JOB=MinGW32, MINGW_ARCH=i686, MINGW_PACKAGE_PREFIX=mingw-w64-i686, MINGW_PREFIX=c:\msys64\mingw32, MSYSTEM=MINGW32, USE_CLCACHE=false~
- ~JOB=MinGW64, MINGW_ARCH=x86_64, MINGW_PACKAGE_PREFIX=mingw-w64-x86_64, MINGW_PREFIX=c:\msys64\mingw64, MSYSTEM=MINGW64, USE_CLCACHE=false~
- [x] **JOB=Rust, TARGET=x86_64-pc-windows-msvc, USE_CLCACHE=false**:
- `Rust / AMD64 Windows 2019 Rust nightly-2019-09-25`
- [x] **JOB=C#, APPVEYOR_BUILD_WORKER_IMAGE=Visual Studio 2017, USE_CLCACHE=false**
- `C# / AMD64 Windows 2019 C# 2.2.103`
- [x] **JOB=Go, MINGW_PACKAGE_PREFIX=mingw-w64-x86_64 ...**:
- `Go / AMD64 Windows 2019 Go 1.12`
- ~JOB=R with libarrow, USE_CLCACHE=false, TEST_R_WITH_ARROW=TRUE, RWINLIB_LOCAL=%APPVEYOR_BUILD_FOLDER%\libarrow.zip~
### Github Actions
- [x] **Windows MSVC C++ / Build (Visual Studio 16 2019)**:
- `C++ / AMD64 Windows 2019 C++`: without tests
- [x] **Windows MSVC C++ / Build (Visual Studio 15 2017)**:
- `C++ / AMD64 Windows 2016 C++`: without tests
- [x] **Linux docker-compose / Test (C++ w/ clang-7 & system packages)**: all have llvm for gandiva but the compiler is set to gcc
- `C++ / AMD64 Debian 10 C++`: with GCC 8.3
- `C++ / AMD64 Ubuntu 16.04 C++`: with GCC 5.4
- `C++ / AMD64 Ubuntu 18.04 C++`: with GCC 7.4
- [x] **Linux docker-compose / Test (Rust)**: without rustfmt
- `Rust / AMD64 Debian Rust nightly-2019-09-25`
- [x] **Linux docker-compose / Test (Lint, Release tests)**:
- `Lint / C++, Python, R, Rust, Docker, RAT`
- `Dev / Source Release`
### Nightly Crossbow tests
The packaging builds are out of the scope if this PR, but the nightly **dockerized test** task are in.
Nightly tests:
- [x] docker-r
- [x] docker-r-conda
- [x] docker-r-sanitizer
- [x] docker-rust
- [x] docker-cpp
- [x] docker-cpp-cmake32
- [x] docker-cpp-release
- [x] docker-cpp-static-only
- [x] docker-c_glib
- [x] docker-go
- [x] docker-python-2.7
- [x] docker-python-3.6
- [x] docker-python-3.7
- [x] docker-python-2.7-nopandas
- [x] docker-python-3.6-nopandas
- [x] docker-java
- [x] docker-js
- [x] docker-docs
- [x] docker-lint
- [x] docker-iwyu: included in the lint
- [x] docker-clang-format: included in the lint
- [x] docker-pandas-master
- [x] docker-dask-integration
- [x] docker-hdfs-integration
- [x] docker-spark-integration
- [x] docker-turbodbc-integration
# TODOs left:
- [x] Fix the Apidoc generation for c_glib
- [x] Fix the JNI test for Gandiva and ORC
- [x] Test that crossbow tests are passing
- ~Optionally restore the travis configuration to incrementally decommission old builds~
## Follow-up JIRAs:
- [Archery] Consider porting the docker tool of ursabot to archery
- [Archery] Consider to use archery with or instead of the pre-commit hooks
- [Archery] Create a wrapper script in archery for docker compose in order to run the containers with the host's user and group
- [C++] GCC 5.4.0 has a compile errors, reproduce with UBUNTU=16.04 docker-compose run ubuntu-cpp
- [C++][CI] Test the ported fuzzit integration image
- [C++][CI] Turn off unnecessary features in the integration tests (spark/turbodbc/dask/hdfs)
- [C++][CI] Revisit ASAN UBSAN settings in every C++ based image
- [CI] Consider re-adding the removed debian testing image is removed
- [Go][CI] Pre-install the go dependencies in the dockerfile using go get
- [JS][CI] Pre-install the JS dependencies in the dockerfile
- [Rust][CI] Pre-install the rust dependencies in the dockerfile
- [Java][CI] Pre-install the java dependencies in the dockerfile
- [Ruby][CI] Pre-install the ruby dependencies in the dockerfile and remove it from the test script
- [C#][CI] Pre-install the C# dependencies in the dockerfile
- [R][CI] Fix the r-sanitizer build https://issues.apache.org/jira/browse/ARROW-6957
- [GLIB][MacOS] Fail to execute lua examples (fails to load 'lgi.corelgilua51' despite that lgi is installed)
- [C++][CMake] Automatically set ARROW_GANDIVA_PC_CXX_FLAGS for conda and OSX sdk (see cpp_build.sh)
- [C++][CI] Hiveserver2 instegration test fails to connect to impala container
- [CI][Spark] Support specific Spark version in the integration tet including latest
- [JS][CI] Move nodejs linting from js_build.sh to archery
- [Python][CI] create a docker image for python ASV benchmarks and fix the script
- [CI] Find a short but related prefix for the env vars used for the docker-compose file to prevent collisions
- [C#] the docker container fails to run because of the ubuntu host versions, see https://github.com/dotnet/core/issues/3509
- [C++][Windows] Enable more features on the windows GHA build
- [Doc] document docker-compose usage in the developer sphinx guide
- [CI][C++] Add .ccache to the docker-compose mounts
- [Archery][CI] Refactor the ci/scripts to a sourceable bash functions or to archery directly
- [C++][CI] Use scripts/util_coredump.sh to show automatic backtraces
- [C++] Fix the hanging C++ tests in Windows 2019
- [CI] Ask INFRA to set up the DOCKERHUB_* secrets for GitHub actions
- [C++][CI] Running Gandiva tests fails on Fedora:
Reproduce with: `docker-compose run -e ARROW_GANDIVA=ON fedora-cpp`
```
Running gandiva-internals-test, redirecting output into /build/cpp/build/test-logs/gandiva-internals-test.txt (attempt 1/1)
1364
: CommandLine Error: Option 'x86-experimental-vector-widening-legalization' registered more than once!
1365
LLVM ERROR: inconsistency in registered CommandLine options
1366
/build/cpp/src/gandiva
```
- [JS][CI] NodeJS build fails on Github Actions Windows node
```
> NODE_NO_WARNINGS=1 gulp build
# 'NODE_NO_WARNINGS' is not recognized as an internal or external command,
# operable program or batch file.
# npm ERR! code ELIFECYCLE
# npm ERR! errno 1
# npm ERR! apache-arrow@1.0.0-SNAPSHOT build: `NODE_NO_WARNINGS=1 gulp build`
# npm ERR! Exit status 1
# npm ERR!
# npm ERR! Failed at the apache-arrow@1.0.0-SNAPSHOT build script.
# npm ERR! This is probably not a problem with npm. There is likely additional logging output above.
```
Closes #5589 from kszucs/docker-refactor and squashes the following commits:
5105d12e6 <Krisztián Szűcs> Rename pull-request folder to dev_cron
e9e9a7eec <Krisztián Szűcs> Use underscores for naming the workflow files
a92c99d03 <Krisztián Szűcs> Disable hanging C++ tests on windows
f158c89b5 <Krisztián Szűcs> Attempt to push from apache/arrow master; Don't push from crossbow tasks
0e1d470a1 <Krisztián Szűcs> Turn off ORC on macOS C++ test due to link error
258db5cff <Krisztián Szűcs> Only push docker images from apache/arrow repository
acdfcf086 <Krisztián Szűcs> Remove ORC from the brewfile
5102b85b1 <Krisztián Szűcs> Fix nodeJS workflow
032d6a388 <Krisztián Szűcs> Turn off 2 python builds
7f15b97a8 <Krisztián Szűcs> Filter branches
48b8d128a <Krisztián Szűcs> Fix workflows
36ad9d297 <Krisztián Szűcs> Disable builds
0f603af0c <Krisztián Szűcs> master only and cron workflows
28cc2d78d <Krisztián Szűcs> Rename Java JNI workflow
bcd8af7b7 <Krisztián Szűcs> Port the remaining travis utility scripts
ed5688154 <Krisztián Szűcs> Usage comments; recommend installing pandas from the docs because of its removal from conda_env_python
3c8c023ce <Krisztián Szűcs> Use Arch in volumes; some comments; remove conda version 'latest' from the images
771b023a8 <Krisztián Szűcs> Cleanup files; separate JNI builds
97ff8a122 <Krisztián Szűcs> Push docker images only from master
dc00b4297 <Krisztián Szűcs> Enable path filters
e0e2e1f46 <Krisztián Szűcs> Fix pandas master build
3814e0828 <Krisztián Szűcs> Fix manylinux volumes
c18edda70 <Krisztián Szűcs> Add CentOS version to the manylinux image names
c8b9dd6b1 <Krisztián Szűcs> Missing --pyargs argument for the python test command
33e646981 <Krisztián Szűcs> Turn off gandiva and flight for the HDFS test
b9c547889 <Krisztián Szűcs> Refactor docker-compose file and use it with github actions.
Authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
2019-11-12 11:07:48 +01:00
|
|
|
export LD_LIBRARY_PATH=${ARROW_HOME}/lib:${LD_LIBRARY_PATH}
|
2026-02-09 19:47:45 +01:00
|
|
|
export DYLD_LIBRARY_PATH=${ARROW_HOME}/lib${DYLD_LIBRARY_PATH:+:${DYLD_LIBRARY_PATH}}
|
ARROW-7101: [CI] Refactor docker-compose setup and use it with GitHub Actions
## Projecting ideas from ursabot
### Parametric docker images
The images are better parameterized now, meaning that we can build more variant of the same service. Couple of examples:
```console
UBUNTU=16.04 docker-compose build ubuntu-cpp
ARCH=arm64v8 UBUNTU=18.04 docker-compose build ubuntu-cpp
PYTHON=3.6 docker-compose build conda-python
ARCH=arm32v7 PYTHON=3.6 PANDAS=0.25 docker-compose build conda-python-pandas
```
Each variant has it's own docker image following a string naming schema:
`{org}/{arch}-{platform}-{platform-version}[[-{variant}-{variant-version}]..]:latest`
### Use *_build.sh and *_test.sh for each job
The docker images provide the environment, and each language backend usually should implement two scripts, a `build.sh` and a `test.sh`. This way dependent build like the docker python, r or c glib are able to reuse the build script of the ancestor without running its tests.
With small enough scripts, if the environment is properly set up even the non-docker builds should be reproducible locally. GitHub Actions support bash scripts across all three platforms, so we can reuse the same `*_build.sh` and `*_test.sh` scripts to execute the builds either in docker, on the CI or locally.
## Using GitHub Actions for running the builds
Regardless of the CI we're going to choose, the isolation constraint of different platforms requires some sort of virtualisation. Currently linux (and windows, but I have not tried it yet) has lightweight containerisation, so we should keep the linux builds isolated in docker containers. The rest of the platforms (windows and macOS) should be executed on the CI system.
GitHub Actions support all three major platforms, linux, windows and macOS. I've added cross platform builds for a couple of languages, like Rust, and Go, the rest are work in progress.
### Workflow
A workflow should define all builds of a language, mostly because the path filters can be defined on workflow level. For example the python builds should be triggered if either a cpp/** or a python/** file changes which can be covered in the same workflow file.
## Feature parity with the current builds
Reaching feature parity with all of the builds below is not a goal for this PR, the difficult ones should at least have a tracking JIRA ticket.
### Travis-CI
- [x] **Lint, Release tests**:
- `Lint / C++, Python, R, Rust, Docker, RAT`
- `Dev / Source Release`
- [x] **C++ unit tests w/ conda-forge toolchain, coverage**: without coverage
- `C++ / AMD64 Conda C++`
- [x] **Python 3.6 unit tests, conda-forge toolchain, coverage**: without coverage
- `Python / AMD64 Conda Python 3.6`
- [x] **[OS X] C++ w/ Xcode 9.3**:
- `C++ / AMD64 MacOS 10.14 C++`: with Xcode 10.3
- [x] **[OS X] Python w/ Xcode 9.3**:
- `Python / AMD64 MacOS 10.14 Python 3`: with Xcode 10.3
- [x] **Java OpenJDK8 and OpenJDK11**:
- `Java / AMD64 Debian Java JDK 8 Maven 3.5.2`
- `Java / AMD64 Debian Java JDK 11 Maven 3.6.2`
- [x] **Protocol / Flight Integration Tests**:
- `Dev / Protocol Test`
- [x] **NodeJS**: without running lint and coverage
- `NodeJS / AMD64 Debian NodeJS 11`
- [x] **C++ & GLib & Ruby w/ gcc 5.4**:
- `C++ / AMD64 Debian 10 C++`: with GCC 8.3
- `C++ / AMD64 Ubuntu 16.04 C++`: with GCC 5.4
- `C++ / AMD64 Ubuntu 18.04 C++`: with GCC 7.4
- `C GLib / AMD64 Ubuntu 18.04 C GLib`
- `Ruby / AMD64 Ubuntu 18.04 Ruby`
- [x] **[OS X] C++ & GLib & Ruby w/ XCode 10.2 & Homebrew**
- `C++ / AMD64 MacOS 10.14 C++`: with Xcode 10.3
- `C GLib / AMD64 MacOS 10.14 C Glib`: with Xcode 10.3
- `Ruby / AMD64 MacOS 10.14 Ruby`: with Xcode 10.3
- [x] **Go**: without coverage
- `Go / AMD64 Debian Go 1.12`
- [x] **R (with and without libarrow)**:
- `R / AMD64 Conda R 3.6`: with libarrow
- `R / AMD64 Ubuntu 18.04 R 3.6` with libarrow
### Appveyor
- ~JOB=Build, GENERATOR=Ninja, CONFIGURATION=Release, APPVEYOR_BUILD_WORKER_IMAGE=Visual Studio 2017~
- ~JOB=Toolchain, GENERATOR=Ninja, CONFIGURATION=Release, ARROW_S3=ON, ARROW_BUILD_FLIGHT=ON, ARROW_BUILD_GANDIVA=ON~
- ~JOB=Build_Debug, GENERATOR=Ninja, CONFIGURATION=Debug~
- ~JOB=MinGW32, MINGW_ARCH=i686, MINGW_PACKAGE_PREFIX=mingw-w64-i686, MINGW_PREFIX=c:\msys64\mingw32, MSYSTEM=MINGW32, USE_CLCACHE=false~
- ~JOB=MinGW64, MINGW_ARCH=x86_64, MINGW_PACKAGE_PREFIX=mingw-w64-x86_64, MINGW_PREFIX=c:\msys64\mingw64, MSYSTEM=MINGW64, USE_CLCACHE=false~
- [x] **JOB=Rust, TARGET=x86_64-pc-windows-msvc, USE_CLCACHE=false**:
- `Rust / AMD64 Windows 2019 Rust nightly-2019-09-25`
- [x] **JOB=C#, APPVEYOR_BUILD_WORKER_IMAGE=Visual Studio 2017, USE_CLCACHE=false**
- `C# / AMD64 Windows 2019 C# 2.2.103`
- [x] **JOB=Go, MINGW_PACKAGE_PREFIX=mingw-w64-x86_64 ...**:
- `Go / AMD64 Windows 2019 Go 1.12`
- ~JOB=R with libarrow, USE_CLCACHE=false, TEST_R_WITH_ARROW=TRUE, RWINLIB_LOCAL=%APPVEYOR_BUILD_FOLDER%\libarrow.zip~
### Github Actions
- [x] **Windows MSVC C++ / Build (Visual Studio 16 2019)**:
- `C++ / AMD64 Windows 2019 C++`: without tests
- [x] **Windows MSVC C++ / Build (Visual Studio 15 2017)**:
- `C++ / AMD64 Windows 2016 C++`: without tests
- [x] **Linux docker-compose / Test (C++ w/ clang-7 & system packages)**: all have llvm for gandiva but the compiler is set to gcc
- `C++ / AMD64 Debian 10 C++`: with GCC 8.3
- `C++ / AMD64 Ubuntu 16.04 C++`: with GCC 5.4
- `C++ / AMD64 Ubuntu 18.04 C++`: with GCC 7.4
- [x] **Linux docker-compose / Test (Rust)**: without rustfmt
- `Rust / AMD64 Debian Rust nightly-2019-09-25`
- [x] **Linux docker-compose / Test (Lint, Release tests)**:
- `Lint / C++, Python, R, Rust, Docker, RAT`
- `Dev / Source Release`
### Nightly Crossbow tests
The packaging builds are out of the scope if this PR, but the nightly **dockerized test** task are in.
Nightly tests:
- [x] docker-r
- [x] docker-r-conda
- [x] docker-r-sanitizer
- [x] docker-rust
- [x] docker-cpp
- [x] docker-cpp-cmake32
- [x] docker-cpp-release
- [x] docker-cpp-static-only
- [x] docker-c_glib
- [x] docker-go
- [x] docker-python-2.7
- [x] docker-python-3.6
- [x] docker-python-3.7
- [x] docker-python-2.7-nopandas
- [x] docker-python-3.6-nopandas
- [x] docker-java
- [x] docker-js
- [x] docker-docs
- [x] docker-lint
- [x] docker-iwyu: included in the lint
- [x] docker-clang-format: included in the lint
- [x] docker-pandas-master
- [x] docker-dask-integration
- [x] docker-hdfs-integration
- [x] docker-spark-integration
- [x] docker-turbodbc-integration
# TODOs left:
- [x] Fix the Apidoc generation for c_glib
- [x] Fix the JNI test for Gandiva and ORC
- [x] Test that crossbow tests are passing
- ~Optionally restore the travis configuration to incrementally decommission old builds~
## Follow-up JIRAs:
- [Archery] Consider porting the docker tool of ursabot to archery
- [Archery] Consider to use archery with or instead of the pre-commit hooks
- [Archery] Create a wrapper script in archery for docker compose in order to run the containers with the host's user and group
- [C++] GCC 5.4.0 has a compile errors, reproduce with UBUNTU=16.04 docker-compose run ubuntu-cpp
- [C++][CI] Test the ported fuzzit integration image
- [C++][CI] Turn off unnecessary features in the integration tests (spark/turbodbc/dask/hdfs)
- [C++][CI] Revisit ASAN UBSAN settings in every C++ based image
- [CI] Consider re-adding the removed debian testing image is removed
- [Go][CI] Pre-install the go dependencies in the dockerfile using go get
- [JS][CI] Pre-install the JS dependencies in the dockerfile
- [Rust][CI] Pre-install the rust dependencies in the dockerfile
- [Java][CI] Pre-install the java dependencies in the dockerfile
- [Ruby][CI] Pre-install the ruby dependencies in the dockerfile and remove it from the test script
- [C#][CI] Pre-install the C# dependencies in the dockerfile
- [R][CI] Fix the r-sanitizer build https://issues.apache.org/jira/browse/ARROW-6957
- [GLIB][MacOS] Fail to execute lua examples (fails to load 'lgi.corelgilua51' despite that lgi is installed)
- [C++][CMake] Automatically set ARROW_GANDIVA_PC_CXX_FLAGS for conda and OSX sdk (see cpp_build.sh)
- [C++][CI] Hiveserver2 instegration test fails to connect to impala container
- [CI][Spark] Support specific Spark version in the integration tet including latest
- [JS][CI] Move nodejs linting from js_build.sh to archery
- [Python][CI] create a docker image for python ASV benchmarks and fix the script
- [CI] Find a short but related prefix for the env vars used for the docker-compose file to prevent collisions
- [C#] the docker container fails to run because of the ubuntu host versions, see https://github.com/dotnet/core/issues/3509
- [C++][Windows] Enable more features on the windows GHA build
- [Doc] document docker-compose usage in the developer sphinx guide
- [CI][C++] Add .ccache to the docker-compose mounts
- [Archery][CI] Refactor the ci/scripts to a sourceable bash functions or to archery directly
- [C++][CI] Use scripts/util_coredump.sh to show automatic backtraces
- [C++] Fix the hanging C++ tests in Windows 2019
- [CI] Ask INFRA to set up the DOCKERHUB_* secrets for GitHub actions
- [C++][CI] Running Gandiva tests fails on Fedora:
Reproduce with: `docker-compose run -e ARROW_GANDIVA=ON fedora-cpp`
```
Running gandiva-internals-test, redirecting output into /build/cpp/build/test-logs/gandiva-internals-test.txt (attempt 1/1)
1364
: CommandLine Error: Option 'x86-experimental-vector-widening-legalization' registered more than once!
1365
LLVM ERROR: inconsistency in registered CommandLine options
1366
/build/cpp/src/gandiva
```
- [JS][CI] NodeJS build fails on Github Actions Windows node
```
> NODE_NO_WARNINGS=1 gulp build
# 'NODE_NO_WARNINGS' is not recognized as an internal or external command,
# operable program or batch file.
# npm ERR! code ELIFECYCLE
# npm ERR! errno 1
# npm ERR! apache-arrow@1.0.0-SNAPSHOT build: `NODE_NO_WARNINGS=1 gulp build`
# npm ERR! Exit status 1
# npm ERR!
# npm ERR! Failed at the apache-arrow@1.0.0-SNAPSHOT build script.
# npm ERR! This is probably not a problem with npm. There is likely additional logging output above.
```
Closes #5589 from kszucs/docker-refactor and squashes the following commits:
5105d12e6 <Krisztián Szűcs> Rename pull-request folder to dev_cron
e9e9a7eec <Krisztián Szűcs> Use underscores for naming the workflow files
a92c99d03 <Krisztián Szűcs> Disable hanging C++ tests on windows
f158c89b5 <Krisztián Szűcs> Attempt to push from apache/arrow master; Don't push from crossbow tasks
0e1d470a1 <Krisztián Szűcs> Turn off ORC on macOS C++ test due to link error
258db5cff <Krisztián Szűcs> Only push docker images from apache/arrow repository
acdfcf086 <Krisztián Szűcs> Remove ORC from the brewfile
5102b85b1 <Krisztián Szűcs> Fix nodeJS workflow
032d6a388 <Krisztián Szűcs> Turn off 2 python builds
7f15b97a8 <Krisztián Szűcs> Filter branches
48b8d128a <Krisztián Szűcs> Fix workflows
36ad9d297 <Krisztián Szűcs> Disable builds
0f603af0c <Krisztián Szűcs> master only and cron workflows
28cc2d78d <Krisztián Szűcs> Rename Java JNI workflow
bcd8af7b7 <Krisztián Szűcs> Port the remaining travis utility scripts
ed5688154 <Krisztián Szűcs> Usage comments; recommend installing pandas from the docs because of its removal from conda_env_python
3c8c023ce <Krisztián Szűcs> Use Arch in volumes; some comments; remove conda version 'latest' from the images
771b023a8 <Krisztián Szűcs> Cleanup files; separate JNI builds
97ff8a122 <Krisztián Szűcs> Push docker images only from master
dc00b4297 <Krisztián Szűcs> Enable path filters
e0e2e1f46 <Krisztián Szűcs> Fix pandas master build
3814e0828 <Krisztián Szűcs> Fix manylinux volumes
c18edda70 <Krisztián Szűcs> Add CentOS version to the manylinux image names
c8b9dd6b1 <Krisztián Szűcs> Missing --pyargs argument for the python test command
33e646981 <Krisztián Szűcs> Turn off gandiva and flight for the HDFS test
b9c547889 <Krisztián Szűcs> Refactor docker-compose file and use it with github actions.
Authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
2019-11-12 11:07:48 +01:00
|
|
|
|
2024-05-09 06:29:46 +09:00
|
|
|
# https://github.com/apache/arrow/issues/41429
|
|
|
|
|
# TODO: We want to out-of-source build. This is a workaround. We copy
|
|
|
|
|
# all needed files to the build directory from the source directory
|
|
|
|
|
# and build in the build directory.
|
2025-08-13 06:26:26 +09:00
|
|
|
rm -rf "${python_build_dir}"
|
|
|
|
|
cp -aL "${source_dir}" "${python_build_dir}"
|
|
|
|
|
pushd "${python_build_dir}"
|
2022-01-04 16:50:25 +01:00
|
|
|
# - Cannot use build isolation as we want to use specific dependency versions
|
|
|
|
|
# (e.g. Numpy, Pandas) on some CI jobs.
|
2026-03-09 09:47:21 +01:00
|
|
|
${PYTHON:-python} -m pip install --no-deps --no-build-isolation -vv -C cmake.build-type="${CMAKE_BUILD_TYPE:-Debug}" .
|
ARROW-7101: [CI] Refactor docker-compose setup and use it with GitHub Actions
## Projecting ideas from ursabot
### Parametric docker images
The images are better parameterized now, meaning that we can build more variant of the same service. Couple of examples:
```console
UBUNTU=16.04 docker-compose build ubuntu-cpp
ARCH=arm64v8 UBUNTU=18.04 docker-compose build ubuntu-cpp
PYTHON=3.6 docker-compose build conda-python
ARCH=arm32v7 PYTHON=3.6 PANDAS=0.25 docker-compose build conda-python-pandas
```
Each variant has it's own docker image following a string naming schema:
`{org}/{arch}-{platform}-{platform-version}[[-{variant}-{variant-version}]..]:latest`
### Use *_build.sh and *_test.sh for each job
The docker images provide the environment, and each language backend usually should implement two scripts, a `build.sh` and a `test.sh`. This way dependent build like the docker python, r or c glib are able to reuse the build script of the ancestor without running its tests.
With small enough scripts, if the environment is properly set up even the non-docker builds should be reproducible locally. GitHub Actions support bash scripts across all three platforms, so we can reuse the same `*_build.sh` and `*_test.sh` scripts to execute the builds either in docker, on the CI or locally.
## Using GitHub Actions for running the builds
Regardless of the CI we're going to choose, the isolation constraint of different platforms requires some sort of virtualisation. Currently linux (and windows, but I have not tried it yet) has lightweight containerisation, so we should keep the linux builds isolated in docker containers. The rest of the platforms (windows and macOS) should be executed on the CI system.
GitHub Actions support all three major platforms, linux, windows and macOS. I've added cross platform builds for a couple of languages, like Rust, and Go, the rest are work in progress.
### Workflow
A workflow should define all builds of a language, mostly because the path filters can be defined on workflow level. For example the python builds should be triggered if either a cpp/** or a python/** file changes which can be covered in the same workflow file.
## Feature parity with the current builds
Reaching feature parity with all of the builds below is not a goal for this PR, the difficult ones should at least have a tracking JIRA ticket.
### Travis-CI
- [x] **Lint, Release tests**:
- `Lint / C++, Python, R, Rust, Docker, RAT`
- `Dev / Source Release`
- [x] **C++ unit tests w/ conda-forge toolchain, coverage**: without coverage
- `C++ / AMD64 Conda C++`
- [x] **Python 3.6 unit tests, conda-forge toolchain, coverage**: without coverage
- `Python / AMD64 Conda Python 3.6`
- [x] **[OS X] C++ w/ Xcode 9.3**:
- `C++ / AMD64 MacOS 10.14 C++`: with Xcode 10.3
- [x] **[OS X] Python w/ Xcode 9.3**:
- `Python / AMD64 MacOS 10.14 Python 3`: with Xcode 10.3
- [x] **Java OpenJDK8 and OpenJDK11**:
- `Java / AMD64 Debian Java JDK 8 Maven 3.5.2`
- `Java / AMD64 Debian Java JDK 11 Maven 3.6.2`
- [x] **Protocol / Flight Integration Tests**:
- `Dev / Protocol Test`
- [x] **NodeJS**: without running lint and coverage
- `NodeJS / AMD64 Debian NodeJS 11`
- [x] **C++ & GLib & Ruby w/ gcc 5.4**:
- `C++ / AMD64 Debian 10 C++`: with GCC 8.3
- `C++ / AMD64 Ubuntu 16.04 C++`: with GCC 5.4
- `C++ / AMD64 Ubuntu 18.04 C++`: with GCC 7.4
- `C GLib / AMD64 Ubuntu 18.04 C GLib`
- `Ruby / AMD64 Ubuntu 18.04 Ruby`
- [x] **[OS X] C++ & GLib & Ruby w/ XCode 10.2 & Homebrew**
- `C++ / AMD64 MacOS 10.14 C++`: with Xcode 10.3
- `C GLib / AMD64 MacOS 10.14 C Glib`: with Xcode 10.3
- `Ruby / AMD64 MacOS 10.14 Ruby`: with Xcode 10.3
- [x] **Go**: without coverage
- `Go / AMD64 Debian Go 1.12`
- [x] **R (with and without libarrow)**:
- `R / AMD64 Conda R 3.6`: with libarrow
- `R / AMD64 Ubuntu 18.04 R 3.6` with libarrow
### Appveyor
- ~JOB=Build, GENERATOR=Ninja, CONFIGURATION=Release, APPVEYOR_BUILD_WORKER_IMAGE=Visual Studio 2017~
- ~JOB=Toolchain, GENERATOR=Ninja, CONFIGURATION=Release, ARROW_S3=ON, ARROW_BUILD_FLIGHT=ON, ARROW_BUILD_GANDIVA=ON~
- ~JOB=Build_Debug, GENERATOR=Ninja, CONFIGURATION=Debug~
- ~JOB=MinGW32, MINGW_ARCH=i686, MINGW_PACKAGE_PREFIX=mingw-w64-i686, MINGW_PREFIX=c:\msys64\mingw32, MSYSTEM=MINGW32, USE_CLCACHE=false~
- ~JOB=MinGW64, MINGW_ARCH=x86_64, MINGW_PACKAGE_PREFIX=mingw-w64-x86_64, MINGW_PREFIX=c:\msys64\mingw64, MSYSTEM=MINGW64, USE_CLCACHE=false~
- [x] **JOB=Rust, TARGET=x86_64-pc-windows-msvc, USE_CLCACHE=false**:
- `Rust / AMD64 Windows 2019 Rust nightly-2019-09-25`
- [x] **JOB=C#, APPVEYOR_BUILD_WORKER_IMAGE=Visual Studio 2017, USE_CLCACHE=false**
- `C# / AMD64 Windows 2019 C# 2.2.103`
- [x] **JOB=Go, MINGW_PACKAGE_PREFIX=mingw-w64-x86_64 ...**:
- `Go / AMD64 Windows 2019 Go 1.12`
- ~JOB=R with libarrow, USE_CLCACHE=false, TEST_R_WITH_ARROW=TRUE, RWINLIB_LOCAL=%APPVEYOR_BUILD_FOLDER%\libarrow.zip~
### Github Actions
- [x] **Windows MSVC C++ / Build (Visual Studio 16 2019)**:
- `C++ / AMD64 Windows 2019 C++`: without tests
- [x] **Windows MSVC C++ / Build (Visual Studio 15 2017)**:
- `C++ / AMD64 Windows 2016 C++`: without tests
- [x] **Linux docker-compose / Test (C++ w/ clang-7 & system packages)**: all have llvm for gandiva but the compiler is set to gcc
- `C++ / AMD64 Debian 10 C++`: with GCC 8.3
- `C++ / AMD64 Ubuntu 16.04 C++`: with GCC 5.4
- `C++ / AMD64 Ubuntu 18.04 C++`: with GCC 7.4
- [x] **Linux docker-compose / Test (Rust)**: without rustfmt
- `Rust / AMD64 Debian Rust nightly-2019-09-25`
- [x] **Linux docker-compose / Test (Lint, Release tests)**:
- `Lint / C++, Python, R, Rust, Docker, RAT`
- `Dev / Source Release`
### Nightly Crossbow tests
The packaging builds are out of the scope if this PR, but the nightly **dockerized test** task are in.
Nightly tests:
- [x] docker-r
- [x] docker-r-conda
- [x] docker-r-sanitizer
- [x] docker-rust
- [x] docker-cpp
- [x] docker-cpp-cmake32
- [x] docker-cpp-release
- [x] docker-cpp-static-only
- [x] docker-c_glib
- [x] docker-go
- [x] docker-python-2.7
- [x] docker-python-3.6
- [x] docker-python-3.7
- [x] docker-python-2.7-nopandas
- [x] docker-python-3.6-nopandas
- [x] docker-java
- [x] docker-js
- [x] docker-docs
- [x] docker-lint
- [x] docker-iwyu: included in the lint
- [x] docker-clang-format: included in the lint
- [x] docker-pandas-master
- [x] docker-dask-integration
- [x] docker-hdfs-integration
- [x] docker-spark-integration
- [x] docker-turbodbc-integration
# TODOs left:
- [x] Fix the Apidoc generation for c_glib
- [x] Fix the JNI test for Gandiva and ORC
- [x] Test that crossbow tests are passing
- ~Optionally restore the travis configuration to incrementally decommission old builds~
## Follow-up JIRAs:
- [Archery] Consider porting the docker tool of ursabot to archery
- [Archery] Consider to use archery with or instead of the pre-commit hooks
- [Archery] Create a wrapper script in archery for docker compose in order to run the containers with the host's user and group
- [C++] GCC 5.4.0 has a compile errors, reproduce with UBUNTU=16.04 docker-compose run ubuntu-cpp
- [C++][CI] Test the ported fuzzit integration image
- [C++][CI] Turn off unnecessary features in the integration tests (spark/turbodbc/dask/hdfs)
- [C++][CI] Revisit ASAN UBSAN settings in every C++ based image
- [CI] Consider re-adding the removed debian testing image is removed
- [Go][CI] Pre-install the go dependencies in the dockerfile using go get
- [JS][CI] Pre-install the JS dependencies in the dockerfile
- [Rust][CI] Pre-install the rust dependencies in the dockerfile
- [Java][CI] Pre-install the java dependencies in the dockerfile
- [Ruby][CI] Pre-install the ruby dependencies in the dockerfile and remove it from the test script
- [C#][CI] Pre-install the C# dependencies in the dockerfile
- [R][CI] Fix the r-sanitizer build https://issues.apache.org/jira/browse/ARROW-6957
- [GLIB][MacOS] Fail to execute lua examples (fails to load 'lgi.corelgilua51' despite that lgi is installed)
- [C++][CMake] Automatically set ARROW_GANDIVA_PC_CXX_FLAGS for conda and OSX sdk (see cpp_build.sh)
- [C++][CI] Hiveserver2 instegration test fails to connect to impala container
- [CI][Spark] Support specific Spark version in the integration tet including latest
- [JS][CI] Move nodejs linting from js_build.sh to archery
- [Python][CI] create a docker image for python ASV benchmarks and fix the script
- [CI] Find a short but related prefix for the env vars used for the docker-compose file to prevent collisions
- [C#] the docker container fails to run because of the ubuntu host versions, see https://github.com/dotnet/core/issues/3509
- [C++][Windows] Enable more features on the windows GHA build
- [Doc] document docker-compose usage in the developer sphinx guide
- [CI][C++] Add .ccache to the docker-compose mounts
- [Archery][CI] Refactor the ci/scripts to a sourceable bash functions or to archery directly
- [C++][CI] Use scripts/util_coredump.sh to show automatic backtraces
- [C++] Fix the hanging C++ tests in Windows 2019
- [CI] Ask INFRA to set up the DOCKERHUB_* secrets for GitHub actions
- [C++][CI] Running Gandiva tests fails on Fedora:
Reproduce with: `docker-compose run -e ARROW_GANDIVA=ON fedora-cpp`
```
Running gandiva-internals-test, redirecting output into /build/cpp/build/test-logs/gandiva-internals-test.txt (attempt 1/1)
1364
: CommandLine Error: Option 'x86-experimental-vector-widening-legalization' registered more than once!
1365
LLVM ERROR: inconsistency in registered CommandLine options
1366
/build/cpp/src/gandiva
```
- [JS][CI] NodeJS build fails on Github Actions Windows node
```
> NODE_NO_WARNINGS=1 gulp build
# 'NODE_NO_WARNINGS' is not recognized as an internal or external command,
# operable program or batch file.
# npm ERR! code ELIFECYCLE
# npm ERR! errno 1
# npm ERR! apache-arrow@1.0.0-SNAPSHOT build: `NODE_NO_WARNINGS=1 gulp build`
# npm ERR! Exit status 1
# npm ERR!
# npm ERR! Failed at the apache-arrow@1.0.0-SNAPSHOT build script.
# npm ERR! This is probably not a problem with npm. There is likely additional logging output above.
```
Closes #5589 from kszucs/docker-refactor and squashes the following commits:
5105d12e6 <Krisztián Szűcs> Rename pull-request folder to dev_cron
e9e9a7eec <Krisztián Szűcs> Use underscores for naming the workflow files
a92c99d03 <Krisztián Szűcs> Disable hanging C++ tests on windows
f158c89b5 <Krisztián Szűcs> Attempt to push from apache/arrow master; Don't push from crossbow tasks
0e1d470a1 <Krisztián Szűcs> Turn off ORC on macOS C++ test due to link error
258db5cff <Krisztián Szűcs> Only push docker images from apache/arrow repository
acdfcf086 <Krisztián Szűcs> Remove ORC from the brewfile
5102b85b1 <Krisztián Szűcs> Fix nodeJS workflow
032d6a388 <Krisztián Szűcs> Turn off 2 python builds
7f15b97a8 <Krisztián Szűcs> Filter branches
48b8d128a <Krisztián Szűcs> Fix workflows
36ad9d297 <Krisztián Szűcs> Disable builds
0f603af0c <Krisztián Szűcs> master only and cron workflows
28cc2d78d <Krisztián Szűcs> Rename Java JNI workflow
bcd8af7b7 <Krisztián Szűcs> Port the remaining travis utility scripts
ed5688154 <Krisztián Szűcs> Usage comments; recommend installing pandas from the docs because of its removal from conda_env_python
3c8c023ce <Krisztián Szűcs> Use Arch in volumes; some comments; remove conda version 'latest' from the images
771b023a8 <Krisztián Szűcs> Cleanup files; separate JNI builds
97ff8a122 <Krisztián Szűcs> Push docker images only from master
dc00b4297 <Krisztián Szűcs> Enable path filters
e0e2e1f46 <Krisztián Szűcs> Fix pandas master build
3814e0828 <Krisztián Szűcs> Fix manylinux volumes
c18edda70 <Krisztián Szűcs> Add CentOS version to the manylinux image names
c8b9dd6b1 <Krisztián Szűcs> Missing --pyargs argument for the python test command
33e646981 <Krisztián Szűcs> Turn off gandiva and flight for the HDFS test
b9c547889 <Krisztián Szűcs> Refactor docker-compose file and use it with github actions.
Authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>
2019-11-12 11:07:48 +01:00
|
|
|
popd
|
2021-12-02 13:34:47 +01:00
|
|
|
|
|
|
|
|
if [ "${BUILD_DOCS_PYTHON}" == "ON" ]; then
|
2024-05-09 06:29:46 +09:00
|
|
|
# https://github.com/apache/arrow/issues/41429
|
|
|
|
|
# TODO: We want to out-of-source build. This is a workaround.
|
|
|
|
|
#
|
|
|
|
|
# Copy docs/source because the "autosummary_generate = True"
|
|
|
|
|
# configuration generates files to docs/source/python/generated/.
|
2025-08-13 06:26:26 +09:00
|
|
|
rm -rf "${python_build_dir}/docs/source"
|
|
|
|
|
mkdir -p "${python_build_dir}/docs"
|
|
|
|
|
cp -a "${arrow_dir}/docs/source" "${python_build_dir}/docs/"
|
|
|
|
|
rm -rf "${python_build_dir}/format"
|
|
|
|
|
cp -a "${arrow_dir}/format" "${python_build_dir}/"
|
|
|
|
|
rm -rf "${python_build_dir}/cpp/examples"
|
|
|
|
|
mkdir -p "${python_build_dir}/cpp"
|
|
|
|
|
cp -a "${arrow_dir}/cpp/examples" "${python_build_dir}/cpp/"
|
|
|
|
|
rm -rf "${python_build_dir}/ci"
|
|
|
|
|
cp -a "${arrow_dir}/ci/" "${python_build_dir}/"
|
2024-05-09 06:29:46 +09:00
|
|
|
export ARROW_CPP_DOXYGEN_XML=${build_dir}/cpp/apidoc/xml
|
2025-08-13 06:26:26 +09:00
|
|
|
pushd "${build_dir}"
|
2024-05-09 06:29:46 +09:00
|
|
|
sphinx-build \
|
2026-01-28 09:38:25 +01:00
|
|
|
-j auto \
|
2024-05-09 06:29:46 +09:00
|
|
|
-b html \
|
2025-08-13 06:26:26 +09:00
|
|
|
"${python_build_dir}/docs/source" \
|
|
|
|
|
"${build_dir}/docs"
|
2024-05-09 06:29:46 +09:00
|
|
|
popd
|
2021-12-02 13:34:47 +01:00
|
|
|
fi
|