914 Commits

Author SHA1 Message Date
Tian Gao
d51b39b8a2 [SPARK-56186][PYTHON] Retire pypy
### What changes were proposed in this pull request?

We retire pypy
* Remove all pypy related code in pyspark (actually the only mattered one is for simple traceback so it probably will still work)
* Remove all pypy skips for tests
* Remove master CI for pypy. **branch-4.0 and branch-4.1 tests are kept**
* Remove pypy 3.11 docker image (3.10 is kept for testing)
* Remove pypy from docs (we should probably do it for the actual spark website too)

### Why are the changes needed?

We had a discussion in https://lists.apache.org/thread/glcq0zgr33sozo7y4y7jqph24yh3m92p about dropping support for pypy and we have many +1s and no -1s.

`numpy` dropped support for pypy and pypy is not really in active development.

### Does this PR introduce _any_ user-facing change?

Yes, we don't officially support pypy anymore. We still expect most of the old pypy code to work but we should not make any promises.

### How was this patch tested?

CI.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #54988 from gaogaotiantian/retire-pypy.

Authored-by: Tian Gao <gaogaotiantian@hotmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2026-03-25 07:48:07 +09:00
Dongjoon Hyun
dc93335c9b [SPARK-56126][INFRA] Sync docker-related GitHub Actions versions to the ASF approved patterns
### What changes were proposed in this pull request?

This PR aims to sync `docker`-related GitHub Actions versions to the ASF approved patterns.

### Why are the changes needed?

Currently, the CI is blocked by the ASF check because of the recent change.
- https://github.com/apache/spark/actions/workflows/build_main.yml
  - https://github.com/apache/spark/actions/runs/23362042477
- https://github.com/apache/spark/actions/workflows/build_non_ansi.yml
  - https://github.com/apache/spark/actions/runs/23369253367

> The actions docker/login-actionv3, docker/setup-qemu-actionv3, docker/setup-buildx-actionv3, and docker/build-push-actionv6 are not allowed in apache/spark because all actions must be from a repository owned by your enterprise, created by GitHub, or match one of the patterns:

<img width="905" height="380" alt="Screenshot 2026-03-20 at 20 32 56" src="https://github.com/user-attachments/assets/2582b68a-6303-44ab-b961-d9b753072f1e" />

This is due to the following change.
- https://github.com/apache/infrastructure-actions/pull/547

As of now, the updated patterns are the following.

- 07f5f9d2b0/approved_patterns.yml (L100-L104)
```
- docker/build-push-action10e90e3645eae34f1e60eeb005ba3a3d33f178e8
- docker/login-actionc94ce9fb468520275223c153574b00df6fe4bcc9
- docker/metadata-actionc299e40c65443455700f0fdfc63efafe5b349051
- docker/setup-buildx-action8d2750c68a42422c14e847fe6c8ac0403b4cbd6f
- docker/setup-qemu-action29109295f81e9208d7d86ff1c6c12d2833863392
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually check like the following because the updated CI should be triggered manually.

```
$ git grep 'uses: docker' | sort | uniq -c
   5 .github/workflows/build_and_test.yml:        uses: docker/build-push-action10e90e3645eae34f1e60eeb005ba3a3d33f178e8
   1 .github/workflows/build_and_test.yml:        uses: docker/login-actionc94ce9fb468520275223c153574b00df6fe4bcc9
   1 .github/workflows/build_and_test.yml:        uses: docker/setup-buildx-action8d2750c68a42422c14e847fe6c8ac0403b4cbd6f
   1 .github/workflows/build_and_test.yml:        uses: docker/setup-qemu-action29109295f81e9208d7d86ff1c6c12d2833863392
  16 .github/workflows/build_infra_images_cache.yml:        uses: docker/build-push-action10e90e3645eae34f1e60eeb005ba3a3d33f178e8
   1 .github/workflows/build_infra_images_cache.yml:        uses: docker/login-actionc94ce9fb468520275223c153574b00df6fe4bcc9
   1 .github/workflows/build_infra_images_cache.yml:        uses: docker/setup-buildx-action8d2750c68a42422c14e847fe6c8ac0403b4cbd6f
   1 .github/workflows/build_infra_images_cache.yml:        uses: docker/setup-qemu-action29109295f81e9208d7d86ff1c6c12d2833863392
```

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (claude-opus-4-6)

Closes #54935 from dongjoon-hyun/SPARK-56126.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2026-03-20 21:40:58 -07:00
Tian Gao
eac3fe36e8 [SPARK-56018][PYTHON] Use ruff as formatter
### What changes were proposed in this pull request?

Replace `black` with `ruff format`.

### Why are the changes needed?

There are few reasons we should use `ruff`

1. We already use `ruff` for linter, using it for `format` will reduce a dependency, which makes upgrade easier
2. `ruff` is significantly faster than `black` which is helpful for our pre-commit hooks
3. `ruff` is more customizable if we need
4. Personally I think the taste of `ruff` is slightly better than `black`. For example:
    * `ruff` enforces blank spaces for `import`, `class` and `function` better
    * `ruff` will put the code back in a single line if it fits
    * `ruff` always uses double quote when it can

There are some other details that you'll realize if you take a look at the diff. I think overall `ruff` generates slightly better code than `black` (and `ruff` is probably a bit more strict than `black`).

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

CI needs to pass because we removed the black dependency.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #54840 from gaogaotiantian/use-ruff-as-formatter.

Authored-by: Tian Gao <gaogaotiantian@hotmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2026-03-20 07:20:37 +09:00
yangjie01
cbcee8c2f6 [SPARK-55986][PYTHON] Upgrade black to 26.3.1
### What changes were proposed in this pull request?
This pr aims to upgrade black from 23.12.1 to 26.3.1

### Why are the changes needed?
To fix https://github.com/apache/spark/security/dependabot/172

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass Github Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #54782 from LuciferYang/black-26.3.1.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2026-03-16 09:20:05 -07:00
Kent Yao
3126983706 [SPARK-55829][INFRA][FOLLOWUP] Fix static-only CI check skipping all jobs on empty diff
### What changes were proposed in this pull request?

Fixes a bug in #54616 (SPARK-55829) where the static-only CI check incorrectly skips PySpark, SparkR, and TPC-DS jobs on **all** pushes when there are no fork-only changes.

**Root cause**: When `APACHE_SPARK_REF == HEAD` (e.g., syncing fork with upstream), `git diff` returns empty output. The `for` loop runs zero iterations, `static_only` stays `true`, and all jobs are skipped.

**Fix**:
- Default `static_only=false` when diff is empty (no changed files → run everything)
- Only set `static_only=true` when there ARE changed files and ALL match `*/resources/*/static/*`
- Narrowed pattern: removed `*.css|*.js|*.html` (could false-positive on non-static files like Scala test CSS assertions)

### Why are the changes needed?

The bug caused all build/test jobs to be skipped on fork master pushes after merging upstream changes: https://github.com/yaooqinn/spark/actions/runs/22678711938

### Does this PR introduce _any_ user-facing change?

No. CI-only fix.

### How was this patch tested?

Verified logic: empty diff → `static_only=false` → all jobs run as normal.

### Was this patch authored or co-authored using generative AI tooling?

Yes, co-authored with GitHub Copilot.

Closes #54625 from yaooqinn/SPARK-55829-fix.

Authored-by: Kent Yao <kentyao@microsoft.com>
Signed-off-by: Kent Yao <kentyao@microsoft.com>
2026-03-05 12:48:18 +08:00
Kent Yao
99a9fb3794 [SPARK-55829][INFRA] Skip PySpark/SparkR/TPC-DS CI for static-resource-only changes
### What changes were proposed in this pull request?

When a PR only modifies static UI resources (JS, CSS, HTML files), skip PySpark, SparkR, TPC-DS, and Docker integration tests in CI.

Adds a `static_only` check in the precondition job of `build_and_test.yml`. After `is-changed.py` determines affected modules, it iterates all changed files. If every file matches `*/resources/*/static/*`, `*.css`, `*.js`, or `*.html`, it overrides:
- `pyspark` → `false`
- `pandas` → `false`
- `sparkr` → `false`
- `tpcds` → `false`
- `docker` → `false`

Build and lint jobs still run (needed for compilation check and lint-js).

### Why are the changes needed?

UI modernization PRs (SPARK-55760) change only JS/CSS/HTML files but currently trigger the full CI matrix (~30 jobs). PySpark, SparkR, TPC-DS, and Docker tests are completely unaffected by static resource changes. Skipping them saves ~20 CI jobs per UI-only PR.

### Does this PR introduce _any_ user-facing change?

No. Only affects CI pipeline behavior for static-resource-only PRs.

### How was this patch tested?

Logic verified by inspecting `is-changed.py` module system — `core/` changes trigger all downstream modules because `is-changed.py` does not distinguish source code from static resources within a module.

### Was this patch authored or co-authored using generative AI tooling?

Yes, co-authored with GitHub Copilot.

Closes #54616 from yaooqinn/SPARK-55760-ci-skip.

Authored-by: Kent Yao <kentyao@microsoft.com>
Signed-off-by: Kent Yao <kentyao@microsoft.com>
2026-03-05 00:07:13 +08:00
Dongjoon Hyun
eae6ede292 [SPARK-55741][INFRA] Use ubuntu-slim for GitHub Actions precondition and ui jobs
### What changes were proposed in this pull request?

This PR aims to use `ubuntu-slim` for GitHub Actions `precondition` and `ui` job.

### Why are the changes needed?

`ubuntu-slim` is a new cost-efficient runner to fit small jobs like `precondition`. We had better use this to save ASF infra usage (if possible)
- https://github.blog/changelog/2025-10-28-1-vcpu-linux-runner-now-available-in-github-actions-in-public-preview/
- https://github.com/actions/runner-images/blob/main/images/ubuntu-slim/ubuntu-slim-Readme.md

| Feature | `ubuntu-slim` | `ubuntu-latest` |
| :--- | :---: | :---: |
| **Cost (per minute)** | **$0.002** | **$0.012** |
| **Cost for 1,000 mins** | **$2.00** | **$12.00** |
| **CPU** | 1 vCPU | 4 vCPU |
| **Memory (RAM)** | 5 GB | 16 GB |
| **Storage (SSD)** | 14 GB | 14 GB |
| **Maximum Runtime** | **15 Minutes** | 6 Hours |
| **Isolation Type** | Container-based (L2) | Dedicated VM |

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

<img width="365" height="212" alt="Screenshot 2026-02-26 at 21 51 06" src="https://github.com/user-attachments/assets/dbcdb852-5e29-4adc-8846-7f260664f62f" />

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: `Gemini 3.1 Pro (High)` on `Antigravity`

Closes #54537 from dongjoon-hyun/SPARK-55741.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2026-02-27 09:03:18 -08:00
Cheng Pan
b9ad073424 [MINOR][INFRA] Update outdated comment for GHA runner
### What changes were proposed in this pull request?

GHA runner has limited 2C/7G, now limits 4C/16G, for public repo.

https://docs.github.com/en/actions/reference/runners/github-hosted-runners#standard-github-hosted-runners-for-public-repositories

### Why are the changes needed?

Update outdated comments.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Review.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #54541 from pan3793/minor-gha.

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2026-02-27 16:34:14 +08:00
Ruifeng Zheng
c15e36b70b [SPARK-55705][PYTHON][INFRA] Upgrade PyArrow to 23
### What changes were proposed in this pull request?
Upgrade PyArrow to 23

### Why are the changes needed?
refresh the images to test against latest pyarrow

### Does this PR introduce _any_ user-facing change?
no, infra-only

### How was this patch tested?
1, for changes in lint/doc/python3-12, the PR builder should cover;
2, for other places, will monitor the scheduled jobs

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #54504 from zhengruifeng/upgrade_pa_23.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2026-02-26 09:05:40 -08:00
Cheng Pan
02453b101a [SPARK-55712][INFRA] Allow run benchmark with JDK 25
### What changes were proposed in this pull request?

As title.

### Why are the changes needed?

To enable Java 25 support.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Review.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #54512 from pan3793/SPARK-55712.

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2026-02-26 19:14:44 +08:00
Dongjoon Hyun
1d5681382b [SPARK-55703][K8S][DOCS][INFRA] Upgrade Volcano to 1.14.1
### What changes were proposed in this pull request?

This PR aims to upgrade `Volcano` to 1.14.1 in K8s integration test document and GA job.

### Why are the changes needed?

To use the latest version for testing and documentation for Apache Spark 4.2.0.
- https://github.com/volcano-sh/volcano/releases/tag/v1.14.1
  - https://github.com/volcano-sh/volcano/pull/5041

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass GA.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: `Gemini 3.1 Pro (High)` on `Antigravity`

Closes #54502 from dongjoon-hyun/SPARK-55703.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2026-02-25 21:21:48 -08:00
Cheng Pan
32c3b666da [SPARK-55678][BUILD][FOLLOWUP] Add Maven Java 25 daily CI and update badges
### What changes were proposed in this pull request?

- Add Maven Java 25 daily CI
- Update pipeline badges in README

### Why are the changes needed?

Address comments https://github.com/apache/spark/pull/54472#issuecomment-3956562070

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Review and monitor CI status after merging.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #54474 from pan3793/SPARK-55678-followup.

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: yangjie01 <yangjie01@baidu.com>
2026-02-25 13:53:04 +08:00
Cheng Pan
1d43629af3 [SPARK-55678][BUILD] Add daily test for Java 25
### What changes were proposed in this pull request?

Add daily test for Java 25 by
```
$ cp .github/workflows/build_java21.yml .github/workflows/build_java25.yml
```
and replace 21 with 25.

Don't expect all UT pass now, there are issues to fix.

### Why are the changes needed?

Spark now basically works with JDK 25, and a daily CI helps us to find out remaining issues.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Monitor daily test reports after merging.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #54472 from pan3793/SPARK-55678.

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
2026-02-25 11:38:36 +08:00
Tian Gao
a8d01a5e18 [SPARK-54868][PYTHON][INFRA][FOLLOWUP] Add PYSPARK_TEST_TIMEOUT to hosted runner test action
### What changes were proposed in this pull request?

Add `PYSPARK_TEST_TIMEOUT` to hosted runner test action.

### Why are the changes needed?

The test could stuck in hosted runners:

https://github.com/apache/spark/actions/runs/22286360532

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

action change.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #54431 from gaogaotiantian/add-timeout-hosted-runner.

Authored-by: Tian Gao <gaogaotiantian@hotmail.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
2026-02-24 11:30:47 +08:00
Dongjoon Hyun
d2be5d2fab [SPARK-55604][INFRA] Make actions/* GitHub Actions jobs up-to-date
### What changes were proposed in this pull request?

This PR aims to make `actions/*` GitHub Actions jobs up-to-date.

### Why are the changes needed?

To keep the CIs up-to-date.

| Action | Old Version | New Version |
| :--- | :--- | :--- |
| `actions/checkout` | `v4` | **`v6`** |
| `actions/cache` | `v4` | **`v5`** |
| `actions/setup-java` | `v4` | **`v5`** |
| `actions/setup-python` | `v5` | **`v6`** |
| `actions/upload-artifact` | `v4` | **`v6`** |
| `actions/download-artifact` | `v5` | **`v6`** |
| `actions/github-script` | `v7` | **`v8`** |
| `actions/stale` | `c201d45...` (v1.1.0) | **`v10`** |

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: `Gemini 3 Pro (High)` on `Antigravity`

Closes #54377 from dongjoon-hyun/SPARK-55604.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2026-02-19 07:34:11 -08:00
Ruifeng Zheng
eb1673833e [SPARK-55518][INFRA][PYTHON][DOCS] Upgrade Python to 3.12 in doc build
### What changes were proposed in this pull request?
Upgrade Python to 3.12 in doc build

### Why are the changes needed?
1, Upgrade Python to 3.12 in doc build;
2, Unpin `pyzmq<24.0.0` introduced in https://github.com/apache/spark/pull/37904 for python linter, otherwise the python installation fails

### Does this PR introduce _any_ user-facing change?
No, infra-only

### How was this patch tested?
CI, the PR builder should cover this change

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #54310 from zhengruifeng/doc_py312.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2026-02-19 10:13:06 +09:00
Dongjoon Hyun
78ab6e2e2f [SPARK-55594][INFRA] Fix test_report.yml to ignore pages.yml
### What changes were proposed in this pull request?

This PR aims to fix `test_report.yml` to ignore `pages.yml`.

### Why are the changes needed?

Currently, CIs seems to have a corner case where the `test_report` job fails on `GitHub Pages deployment` like the following with `No test results found!` error. We had better ignore `pages.yml`.

<img width="652" height="200" alt="Screenshot 2026-02-18 at 11 35 32" src="https://github.com/user-attachments/assets/b2849e0f-be90-45b4-9ca9-dabf336b06b5" />

<img width="729" height="189" alt="Screenshot 2026-02-18 at 11 35 52" src="https://github.com/user-attachments/assets/b046a91c-39a6-4373-a0cf-5d09e6f947d1" />

### Does this PR introduce _any_ user-facing change?

No behavior change.

### How was this patch tested?

Manual review.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: `Gemini 3 Pro (High)` on `Antigravity`

Closes #54369 from dongjoon-hyun/SPARK-55594.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2026-02-18 15:13:10 -08:00
Dongjoon Hyun
4463ba6edd [SPARK-55511][K8S][DOCS][INFRA] Upgrade Volcano to 1.14.0
### What changes were proposed in this pull request?

This PR aims to upgrade `Volcano` to 1.14.0 in K8s integration test document and GA job.

### Why are the changes needed?

To use the latest version for testing and documentation for Apache Spark 4.2.0.
- https://github.com/volcano-sh/volcano/releases/tag/v1.14.0

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass GA.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: `Gemini 3 Pro (High)` on `Antigravity`

Closes #54300 from dongjoon-hyun/SPARK-55511.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2026-02-12 20:07:16 -08:00
Tian Gao
935e5cd146 [SPARK-55367][PYTHON] Use venv for run-pip-tests
### What changes were proposed in this pull request?

Use `venv` instead of `conda` or `virtualenv` for `run-pip-tests`. Remove the `conda` dependency in our CI.

### Why are the changes needed?

`run-pip-tests` require a virtual environment which we used to achieve with `conda` or `virtualenv`. However, `venv`(https://docs.python.org/3/library/venv.html) is the recommended way to create a virtual environment since python 3.5. It's a standard library so we don't need any new dependency. It just require python to work.

In this way we can just remove the conda part which is messing with our CI when it installs the same version of python as our docker image.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

I tried it locally and it worked. Let's wait for CI results.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #54154 from gaogaotiantian/redo-pip-test.

Authored-by: Tian Gao <gaogaotiantian@hotmail.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
2026-02-11 09:20:31 +08:00
Ruifeng Zheng
6757f78774 [SPARK-55446][PYTHON][INFRA] Upgrade python to 3.12 in UDS/noANSI/RockDB tests
### What changes were proposed in this pull request?
Upgrade python to 3.12 in UDS/noANSI/RockDB tests

### Why are the changes needed?
to test against current major python version

### Does this PR introduce _any_ user-facing change?
no, test-only

### How was this patch tested?
will monitor the scheduled jobs

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #54222 from zhengruifeng/upgrade_rock_uds_312.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
2026-02-09 17:55:57 +08:00
Ruifeng Zheng
3b5b8a639a [MINOR][INFRA] Use lsb_release -a to display the container os version
### What changes were proposed in this pull request?
Use `lsb_release -a` to display the container os version

### Why are the changes needed?
`uname -a` always show the host os version

### Does this PR introduce _any_ user-facing change?
no, infra-only

### How was this patch tested?
check in `lint` job which is currently based on ubuntu 22.04.

```
uname -a
lsb_release -a
```

outputs

```
Linux ffe3f795a8eb 6.11.0-1018-azure #18~24.04.1-Ubuntu SMP Sat Jun 28 04:46:03 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 22.04.5 LTS
Release:	22.04
Codename:	jammy
```

https://github.com/zhengruifeng/spark/actions/runs/21808410627/job/62915850607

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #54219 from zhengruifeng/show_container_os.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
2026-02-09 12:33:25 +08:00
Hyukjin Kwon
ee58e0e175 [SPARK-55433][INFRA] Remove labeler in GitHub Actions
### What changes were proposed in this pull request?

There might be a security issue.

### Why are the changes needed?

Remove labeler.yml

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Will monitor the CI.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #54216 from HyukjinKwon/SPARK-55433.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2026-02-09 08:48:38 +09:00
Hyukjin Kwon
0534f98e5f [SPARK-54860][INFRA] Followup of the revert to set the permission correctly
### What changes were proposed in this pull request?

This is a followup of 63f1f96a89 to clean revert.

### Why are the changes needed?

Followup of the revert. The revert wasn't clean.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

N/A

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #54213 from HyukjinKwon/follow-up-permission.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2026-02-09 07:58:38 +09:00
Hyukjin Kwon
63f1f96a89 Revert "[SPARK-54860][INFRA] Add JIRA Ticket Validating in GHA"
This reverts commit 678f314c33.
2026-02-09 07:46:50 +09:00
Tian Gao
71f514774b [SPARK-55383][INFRA] Only send test report to codecov in coverage run
### What changes were proposed in this pull request?

Instead of trying to send test report on all main branch commits, we only send it for coverage run.

Also removed an unused coverage token.

### Why are the changes needed?

We tried the test dashboard of codecov - https://app.codecov.io/github/apache/spark/tests but it's not super great.

* It can't parse the xml file to locate the failed test case. Instead it records a failure for the full test suite.
* It does not have a timeline so we don't really know if the test failed recently.
* The deduplication is bad.
* It messed up our coverage report - because the backend can't split the test and coverage report into their own category.

However it's not completely useless. The average time of tests actually helps us to locate slowest tests. It does not mess up with coverage report, as long as for each commit we have both the coverage report and test report.

So, we can do a report only on coverage run - single report each day for both coverage and test speed.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

CI should pass, hopefully coverage can recover.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #54168 from gaogaotiantian/revert-test-report.

Authored-by: Tian Gao <gaogaotiantian@hotmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2026-02-09 07:12:11 +09:00
Dongjoon Hyun
de34528883 [SPARK-55423][INFRA] Set strategy.max-parrallel to 20 for all GitHub Action jobs
### What changes were proposed in this pull request?

This PR aims to set `strategy.max-parrallel` to 20 for all GitHub Action jobs.

### Why are the changes needed?

ASF Infra team directly requested us via email in (privatespark) mailing list.

- https://lists.apache.org/thread/voqz9tp3m8wj00lp0y81n25qgvc90f3q

Here is `GitHub Action` syntax.
- https://docs.github.com/en/actions/reference/workflows-and-actions/workflow-syntax#jobsjob_idstrategymax-parallel

<img width="762" height="112" alt="Screenshot 2026-02-07 at 21 09 02" src="https://github.com/user-attachments/assets/770d1b81-390b-49d1-8518-70cb20eb93af" />

### Does this PR introduce _any_ user-facing change?

No Apache Spark behavior change.
- Technically, for the PR builder, we use more 20 jobs on the PR contributor's GitHub repo. This job will be limited.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: `Opus 4.5` on `Claude Code`

Closes #54204 from dongjoon-hyun/SPARK-55423.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2026-02-07 21:55:31 -08:00
Ruifeng Zheng
8c72eff74b [SPARK-55180][PYTHON][INFRA][FOLLOW-UP] Delete unused yml file
### What changes were proposed in this pull request?
followup of https://github.com/apache/spark/pull/53960, the yml file is not used

### Why are the changes needed?
code clean up

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #54198 from zhengruifeng/del_313t.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2026-02-07 14:40:57 -08:00
Ruifeng Zheng
ee324696f9 [SPARK-55358][PYTHON][INFRA] Upgrade Python 3.12 test image to Ubuntu 24.04
### What changes were proposed in this pull request?
Upgrade Python 3.12 test image to Ubuntu 24.04

### Why are the changes needed?
Ubuntu 22.04 is kind of out of date, we haven't upgrade the OS version for years.
And we started to face test failure specific to old versions (https://github.com/apache/spark/pull/54129)

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #54130 from zhengruifeng/test_ubuntu_2404.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
2026-02-06 17:21:57 +08:00
Dongjoon Hyun
f0d9f993fc [SPARK-55386][INFRA] Run Java 17/25 Maven install tests on PR build only
### What changes were proposed in this pull request?

This PR aims to run `Java 17` and `Java 25` Maven **build and install** tests on PR builder only. In other words, it will run on a repository which is not `apache/spark`.

Since we still have daily Maven jobs, it's okay to skip this **build and install** tests.

### Why are the changes needed?

To meet the **ASF Policy**:

**1. 20 JOB RULE**

> All workflows MUST have a job concurrency level less than or equal to 20. This means a workflow cannot have more than 20 jobs running at the same time across all matrices.

Currently, our CI seems to trigger more than 20 jobs although it's not currently at the same time.

<img width="552" height="67" alt="Screenshot 2026-02-05 at 20 49 58" src="https://github.com/user-attachments/assets/38b6733f-7cd4-46ad-948f-25b1dcfcd492" />

<img width="787" height="70" alt="Screenshot 2026-02-05 at 20 38 47" src="https://github.com/user-attachments/assets/2d6d01bc-c522-4d6f-be7c-5cad59520d83" />

**2. RUNNING TIME RULE**

> The average number of minutes a project uses per calendar week MUST NOT exceed the equivalent of 25 full-time runners (250,000 minutes, or 4,200 hours).

https://infra-reports.apache.org/#ghactions&project=spark&hours=168

This PR also reduces the CI cost by skipping `Java 17 and 25` Maven build on commit builders.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual review.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #54170 from dongjoon-hyun/SPARK-55386.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2026-02-05 22:01:13 -08:00
Tian Gao
efe9b5335f Revert "[SPARK-55313][PYTHON][FOLLOW-UP] Only add condabin to PATH for pip tests"
…r pip tests"

This reverts commit 45879b772f.

### What changes were proposed in this pull request?

Revert 45879b7

### Why are the changes needed?

Breaking CI.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Revert

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #54151 from gaogaotiantian/revert-conda.

Authored-by: Tian Gao <gaogaotiantian@hotmail.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
2026-02-05 13:05:45 +08:00
Tian Gao
45879b772f [SPARK-55313][PYTHON][FOLLOW-UP] Only add condabin to PATH for pip tests
### What changes were proposed in this pull request?

Instead of adding `$CONDA/bin` to `PATH`, we add `$CONDA/condabin` which only contains `conda`, not the installed `python`.

### Why are the changes needed?

If we add `$CONDA/bin`, it's basically equivalent to activate conda environment. `python3.12`, the default python conda installs, will be in that directory and shadowing our local python out - so we still don't have access to `coverage`.

https://github.com/apache/spark/actions/runs/21667536859/job/62466955323

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

CI.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #54149 from gaogaotiantian/fix-conda-again.

Authored-by: Tian Gao <gaogaotiantian@hotmail.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
2026-02-05 10:59:25 +08:00
Tian Gao
5802a78b7a [SPARK-55313][PYTHON][FOLLOW-UP] Do not auto-activate conda for CI
### What changes were proposed in this pull request?

Set `auto-activate` to `false` for conda action.

### Why are the changes needed?

After reading https://github.com/conda-incubator/setup-miniconda/blob/main/action.yml I think the previous change to `activate-environment` applies to *new* shells, which is also helpful. However, the immediate fix we need is to not activate conda environments in the *current* shell.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

CI should show (the previous fix did not fix coverage run).

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #54120 from gaogaotiantian/fix-conda-auto-activate.

Authored-by: Tian Gao <gaogaotiantian@hotmail.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2026-02-04 07:54:58 +09:00
Tian Gao
14bc85248f [SPARK-55313][PYTHON] Do not activate conda environment when installing conda
### What changes were proposed in this pull request?

Use `activate-environment: ""` to avoid activate any conda environments while installing conda.

### Why are the changes needed?

coverage run is failing - https://github.com/apache/spark/actions/runs/21542850848/job/62079994728 . After we bumped python version to 3.12, it conflicts with the conda installed 3.12. Basically we

* Install conda for pip test
* **Activate conda for python3.12**
* Run tests using `python3.12` which requires `coverage`

It accidentally linked to python3.12 in activated conda environment instead of the one from docker image, where `coverage` package is not installed.

We don't need to activate a conda environment, we only need the binary. (Actually if we have that binary in the image we don't even need this action, but we can discuss it later). It's better to not activate the environment at all so it won't interfere with our testing environment.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

This should reflect on coverage CI.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #54093 from gaogaotiantian/deactive-conda.

Authored-by: Tian Gao <gaogaotiantian@hotmail.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
2026-02-03 13:50:43 +08:00
yangjie01
30ace9fc41 [SPARK-55309][BUILD] Upgrade protobuf to 33.5
### What changes were proposed in this pull request?
This pr aims to upgrade protobuf from 33.0 to 33.5:
- For Java, upgrading from version 4.33.0 to 4.33.5
- For Python, upgrading from version 6.33.0 to 6.33.5

### Why are the changes needed?
The new version brings  CVE fixes for CVE-2026-0994,  and the full release notes as follows:

- https://github.com/protocolbuffers/protobuf/releases/tag/v33.5
- https://github.com/protocolbuffers/protobuf/releases/tag/v33.4
- https://github.com/protocolbuffers/protobuf/releases/tag/v33.3
- https://github.com/protocolbuffers/protobuf/releases/tag/v33.2
- https://github.com/protocolbuffers/protobuf/releases/tag/v33.1

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass Github Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #54090 from LuciferYang/protobuf-33.5.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: yangjie01 <yangjie01@baidu.com>
2026-02-03 13:31:43 +08:00
Dongjoon Hyun
d3dc60299e [SPARK-55307][K8S][INFRA] Update setup-minikube to v0.0.21
### What changes were proposed in this pull request?

This PR aims to update `setup-minikube` to the latest version v0.0.21.

### Why are the changes needed?

To use the latest one.
- https://github.com/medyagh/setup-minikube/releases/tag/v0.0.21
  - https://github.com/medyagh/setup-minikube/pull/779
  - https://github.com/medyagh/setup-minikube/pull/712

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #54087 from dongjoon-hyun/SPARK-55307.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2026-02-01 23:05:50 -08:00
Kent Yao
0aeb6f9f49 [SPARK-55286][INFRA] Add test summary to GitHub Actions for better failure visibility
### What changes were proposed in this pull request?

This PR adds `test-summary/actionv2` to GitHub Actions workflows to display test failures directly in the job summary.

**Jobs updated:**
- `build` - Scala/Java unit tests
- `pyspark` - Python tests
- `sparkr` - R tests
- `tpcds` - TPC-DS benchmark tests
- `docker-integration-tests` - Docker integration tests

The action parses JUnit XML test reports and generates a summary table showing:
- Failed tests grouped by class/suite name
- Error messages and stack traces
- Pass/fail/skip statistics

### Why are the changes needed?

Currently, GitHub Actions CI generates verbose logs that make it hard to find test failures quickly. Developers have to scroll through extensive log output to identify which tests failed.

With this change, test failures appear directly in the GitHub Actions workflow summary, making it easy to identify failures at a glance.

**Example output:**
| Result | Test |
|--------|------|
|  | org.apache.spark.sql.SomeTestSuite › testMethod1 |
|  | org.apache.spark.sql.SomeTestSuite › testMethod2 |

### Does this PR introduce _any_ user-facing change?

No. This only affects the CI/CD workflow display.

### How was this patch tested?

- YAML syntax validated locally
- The `test-summary/action` is a well-maintained GitHub Action with 1000+ stars
- Uses `if: always()` to ensure summary is generated even when tests fail

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: GitHub Copilot

Closes #54069 from yaooqinn/SPARK-55286-test-summary.

Authored-by: Kent Yao <kentyao@microsoft.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2026-02-02 06:54:54 +09:00
Ruifeng Zheng
545d9e73a4 [SPARK-55287][INFRA] Consolidate steps in lint
### What changes were proposed in this pull request?
consolidate steps in lint

### Why are the changes needed?
to improve readability

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #54070 from zhengruifeng/lint_py312.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2026-02-02 06:47:45 +09:00
Yicong-Huang
9254e8912a [SPARK-55263][PYTHON][INFRA] Upgrade Python linter from 3.11 to 3.12 in CI
### What changes were proposed in this pull request?

Upgrade Python linter from Python 3.11 to Python 3.12 in CI to match the test environment.

### Why are the changes needed?

The CI test environment has been upgraded to use Python 3.12 as the default version, but the Python linter was still using Python 3.11, creating an inconsistency.

### Does this PR introduce _any_ user-facing change?

No. This is an internal CI infrastructure change that does not affect end users.

### How was this patch tested?

CI.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #54042 from Yicong-Huang/SPARK-55263/feat/upgrade-python-linter-to-3.12.

Authored-by: Yicong-Huang <17627829+Yicong-Huang@users.noreply.github.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2026-01-30 09:35:56 +09:00
Ruifeng Zheng
3e07518ea2 Revert "[SPARK-54943][PYTHON][TESTS][FOLLOW-UP] Mistake Commit"
revert a mistake commit

Closes #54050 from zhengruifeng/revert_mistake.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
2026-01-29 10:59:05 +08:00
Ruifeng Zheng
44f61d5117 [SPARK-54943][PYTHON][TESTS][FOLLOW-UP] Disable test_pyarrow_array_cast
### What changes were proposed in this pull request?
Disable `test_pyarrow_array_cast`

### Why are the changes needed?
it is failing all scheduled jobs

### Does this PR introduce _any_ user-facing change?
no, test-only

### How was this patch tested?
will monitor the workflows

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #54049 from zhengruifeng/test_ubuntu_24.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
2026-01-29 10:55:12 +08:00
Hyukjin Kwon
3092a77626 [SPARK-55265][INFRA] Increase timeout to 150 mins in GitHub Actions
### What changes were proposed in this pull request?

Increases timeout to 150 mins in GitHub Actions.

### Why are the changes needed?

Some jobs started to fail
https://github.com/apache/spark/actions/runs/21453237849/job/61787264414. It seems legitimate. We should probably think about splitting more but should probably increase to unblock other PRs for now.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Not tested.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #54045 from HyukjinKwon/SPARK-55265.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
2026-01-29 07:19:41 +09:00
yangjie01
589fedc9b2 [SPARK-55251][INFRA] Make Python Coverage test with Python 3.12
### What changes were proposed in this pull request?
This pr aims to  make the daily test of `Python Coverage` to be conducted using Python 3.12.

### Why are the changes needed?
`Python Coverage`  should test with Python 3.12

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Monitor after this pr merged.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #54027 from LuciferYang/SPARK-55251.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
2026-01-28 15:38:31 +08:00
yangjie01
c4f62d459f [SPARK-55191][INFRA][DOCS] Adjust the Python version used in the Python-only daily test
### What changes were proposed in this pull request?
This pr aims to adjust the Python version used in the Python-only daily test:
1. Adjust the daily tests using `python_hosted_runner_test.yml` to Python 3.12, including the Python ARM test and Python macOS 26 test.
2. Update the Python classic-only test to Python 3.12 and modify the Dockerfile generation process.
3. Update the Python-only test from Python 3.12 to Python 3.11 to ensure at least one Python-only daily test is validating Python 3.11.

### Why are the changes needed?
Adjust the Python version used in the Python-only daily test

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass Github Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #53973 from LuciferYang/SPARK-55191.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
2026-01-26 15:44:44 +08:00
yangjie01
718cbf54b3 [SPARK-55184][INFRA] Upgrade Python to 3.12 for Maven's daily testing
### What changes were proposed in this pull request?
This pr aims to upgrade Python to 3.12 for Maven's daily testing

### Why are the changes needed?
Keep it consistent with the daily test of sbt.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass GitHub Actions
- test with maven : https://github.com/LuciferYang/spark/actions/runs/21344673830

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #53966 from LuciferYang/maven-daily-py312.

Lead-authored-by: yangjie01 <yangjie01@baidu.com>
Co-authored-by: YangJie <yangjie01@baidu.com>
Signed-off-by: yangjie01 <yangjie01@baidu.com>
2026-01-26 13:57:49 +08:00
Ruifeng Zheng
823fd431a1 [SPARK-55180][PYTHON][INFRA] Remove scheduled job python 3.13 with no gil
### What changes were proposed in this pull request?
Remove scheduled job python 3.13 with no gil

### Why are the changes needed?
No GIL is still expermentail and not compataible with many pyspark features, we added python 3.14 with no GIL, it is not necessary to keep both

### Does this PR introduce _any_ user-facing change?
No, infra-only

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #53960 from zhengruifeng/del_313_no_gil.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
2026-01-26 13:55:14 +08:00
Ruifeng Zheng
357c780d39 [SPARK-55149][PYTHON][INFRA] Upgrade python to 3.12 in SQL tests
### What changes were proposed in this pull request?
Upgrade python to 3.12 in SQL tests

### Why are the changes needed?
to be consistent with python tests

### Does this PR introduce _any_ user-facing change?
no, infra-only

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #53932 from zhengruifeng/py312_sql.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
2026-01-24 14:37:38 +09:00
Wenchen Fan
655787cae1 [SPARK-55115][INFRA] Use master branch's base Dockerfile for release builds
### What changes were proposed in this pull request?

Modified the release GitHub Actions workflow to always use master branch's `Dockerfile.base` when building the release Docker image, instead of using the `Dockerfile.base` from the release branch.

### Why are the changes needed?

Old branch Dockerfiles can become unmaintainable over time due to:

- Expired GPG keys (e.g., Node.js 12 repository keys)
- Outdated base images with broken package dependencies
- Package version conflicts with aging OS versions (e.g., Ubuntu 20.04)

The master branch's `Dockerfile.base` is actively maintained and uses modern base images (Ubuntu 22.04), making release builds more reliable. Only the base Dockerfile is pulled from master, while the main `Dockerfile` remains from the release branch to preserve any release-specific configurations.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Tested locally for all the active branches with all the release steps (except for uploading)

### Was this patch authored or co-authored using generative AI tooling?

Yes. cursor 2.3.41

Closes #53890 from cloud-fan/use-master-dockerfile-for-release.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
2026-01-23 17:50:06 +08:00
yangjie01
3f1c9a37a5 [SPARK-55128][INFRA][FOLLOWUP] Restore SQL tests by pin 'pandas==2.3.3' for maven daily test
### What changes were proposed in this pull request?
Similar to https://github.com/apache/spark/pull/53910, this pr pins the pandas version to 2.3.3.

### Why are the changes needed?
To  restore SQL tests for maven daily test.
- https://github.com/apache/spark/actions/runs/21249870076/job/61148348328

```
- udf/postgreSQL/udf-case.sql - Scalar Pandas UDF *** FAILED ***
  udf/postgreSQL/udf-case.sql - Scalar Pandas UDF
  Python: 3.11 Pandas: 3.0.0 PyArrow: 23.0.0
  Expected Some("struct<Two:string,i:int,f:double,i:int,j:int>"), but got Some("struct<>") Schema did not match for query #30
  SELECT '' AS `Two`, *
    FROM CASE_TBL a, CASE2_TBL b
    WHERE udf(COALESCE(f,b.i) = 2): -- !query
  SELECT '' AS `Two`, *
    FROM CASE_TBL a, CASE2_TBL b
    WHERE udf(COALESCE(f,b.i) = 2)
  -- !query schema
  struct<>
  -- !query output
  org.apache.spark.SparkRuntimeException
  {
    "errorClass" : "CAST_INVALID_INPUT",
    "sqlState" : "22018",
    "messageParameters" : {
      "ansiConfig" : "\"spark.sql.ansi.enabled\"",
      "expression" : "'nan'",
      "sourceType" : "\"STRING\"",
      "targetType" : "\"BOOLEAN\""
    },
    "queryContext" : [ {
      "objectType" : "",
      "objectName" : "",
      "startIndex" : 62,
      "stopIndex" : 85,
      "fragment" : "udf(COALESCE(f,b.i) = 2)"
    } ]
  } (SQLQueryTestSuite.scala:681)
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
monitor maven daily test after pr merged

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #53933 from LuciferYang/SPARK-55128-FOLLOWUP.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
2026-01-23 16:07:36 +08:00
Ruifeng Zheng
b06e9c5466 [SPARK-55142][PYTHON][INFRA] Apply Python 3.12 for PySpark Tests in PR build
### What changes were proposed in this pull request?
Apply Python 3.12 in PR build for pyspark tests

Note, doc, lint, non-python tests are excluded for now, need to upgrade them in separate PRs

### Why are the changes needed?
it has been more than one year since we upgrade it to Python 3.11
3f8e395910 in Spark 4.0

### Does this PR introduce _any_ user-facing change?
No, infra-only

### How was this patch tested?
PR build

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #53927 from zhengruifeng/upgrade_py_312.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
2026-01-23 14:47:36 +08:00
Ruifeng Zheng
8b26f49a22 [SPARK-55141][PYTHON][INFRA] Set up a scheduled workflow for Pandas 3
### What changes were proposed in this pull request?
Set up a scheduled builder for Pandas 3

### Why are the changes needed?
for development purpose, to monitor how pyspark is compatible with pandas 3

### Does this PR introduce _any_ user-facing change?
no, infra-only

### How was this patch tested?
test the image build with PR builder

this image is successfully built in https://github.com/zhengruifeng/spark/actions/runs/21272805063/job/61226373282

```
Successfully installed contourpy-1.3.3 coverage-7.13.1 cycler-0.12.1 et-xmlfile-2.0.0 fonttools-4.61.1 googleapis-common-protos-1.71.0 graphviz-0.20.3 grpcio-1.76.0 grpcio-status-1.76.0 joblib-1.5.3 kiwisolver-1.4.9 lxml-6.0.2 matplotlib-3.10.8 memory-profiler-0.61.0 numpy-2.4.1 openpyxl-3.1.5 packaging-26.0 pandas-3.0.0 pillow-12.1.0 plotly-5.24.1 protobuf-6.33.0 psutil-7.2.1 pyarrow-23.0.0 pyparsing-3.3.2 python-dateutil-2.9.0.post0 scikit-learn-1.8.0 scipy-1.17.0 tenacity-9.1.2 threadpoolctl-3.6.0 typing-extensions-4.15.0 unittest-xml-reporting-4.0.0 zstandard-0.25.0
```

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #53926 from zhengruifeng/infra_pandas_3.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
2026-01-23 11:48:55 +08:00