165 Commits

Author SHA1 Message Date
Divyashree Sreepathihalli
7af6ebf970 Delete .github/workflows/gemini-pr-re-fix.yml (#22746) 2026-04-21 13:54:32 -07:00
hertschuh
47dd150c64 Add a timeout to all test workflows. (#22675)
Apparently self-hosted runners to not have a timeout by default. I'm seeing some test hanging in https://github.com/keras-team/keras/pull/22664
2026-04-13 17:54:13 -07:00
hertschuh
df0ec01329 Silence git security error with recent git. (#22657)
Fixes `fatal: detected dubious ownership in repository at '/__w/keras/keras'`
2026-04-08 18:10:16 -07:00
hertschuh
cda104c1ca Fix CPU tests with CPU runners. (#22656)
`git` needs to be installed first, before the `checkout` and `filter` actions.
2026-04-08 17:22:46 -07:00
hertschuh
7e441013c5 Use self-hosted runners for all CPU tests. (#22653)
Also made the file more consistent, both internally and with the GPU and TPU test files.

The multi-CPU tests are now combined with the normal JAX tests.
2026-04-08 16:48:16 -07:00
Divyashree Sreepathihalli
61fbd50526 Fix auto fix PR workflow (#22652)
* fix: Update Gemini PR Re-Fix Workflow

* remove set up comment

* fix workflow
2026-04-07 16:54:09 -07:00
Divyashree Sreepathihalli
e9c22eb856 Make Gemini create PRs from own fork (#22651)
* fix: Update Gemini PR Re-Fix Workflow

* remove set up comment
2026-04-07 16:32:04 -07:00
Divyashree Sreepathihalli
86d7deacb0 Fix security issue (#22644)
* fix security issue

* additional fixes
2026-04-06 20:55:46 -07:00
Divyashree Sreepathihalli
e083a7027a Add contributor agreement check (#22633)
* add autoclose feature to triage workflow

* Add PR template and check for PR contributions

* why

* remove wrong repo name
2026-04-03 14:54:20 -07:00
Aditya Goyal
c04b5e49d2 [OpenVINO] Move file level exclusions to test level exclusions (#22630)
* [OpenVINO] Move file level exclusions to test level exclusions

* modify gha to remove file level exclusion mechanism

* remove test file

---------

Co-authored-by: hertschuh <1091026+hertschuh@users.noreply.github.com>
2026-04-03 14:37:06 -07:00
Divyashree Sreepathihalli
488fc478f6 add autoclose feature to triage workflow (#22632) 2026-04-03 14:28:41 -07:00
Divyashree Sreepathihalli
050671c8cf Keras Automations : Refine auto fix to add unit tests (#22624)
* auto fix updates

* fix label

* fix auto fix workflow

* refine autofix pr

* add prompt to add unit tests
2026-04-02 14:30:23 -07:00
Divyashree Sreepathihalli
05bbdae31b Keras Automations: Refine Auto fix (#22623)
* auto fix updates

* fix label

* fix auto fix workflow

* refine autofix pr
2026-04-02 14:12:38 -07:00
Divyashree Sreepathihalli
b8ba027f4f Keras Automations: Auto fix improve (#22613)
* auto fix updates

* fix label

* fix auto fix workflow
2026-04-01 18:54:50 -07:00
Divyashree Sreepathihalli
d1e5f1c6d5 Keras automation: fix label name (#22611)
* auto fix updates

* fix label
2026-04-01 18:38:40 -07:00
Divyashree Sreepathihalli
a44d76e7cb auto fix updates (#22610) 2026-04-01 18:29:50 -07:00
hertschuh
8789611f53 Updates to dependabot.yml. (#22606)
- We can unpin `jax[and-cuda]` now that we've migrated to github runners for GPUs
- We have unpinned `ai-edge-litert` in the requirements files
- We do need to pin `tensorflow[and-cuda]` to 2.20.0 as 2.21 doesn't work with our setup

This is to prevent PRs like this: https://github.com/keras-team/keras/pull/22604
2026-04-01 11:26:28 -07:00
dependabot[bot]
7537463bb9 Bump the github-actions group with 6 updates (#22603)
Bumps the github-actions group with 6 updates:

| Package | From | To |
| --- | --- | --- |
| [actions/checkout](https://github.com/actions/checkout) | `4` | `6` |
| [dorny/paths-filter](https://github.com/dorny/paths-filter) | `3.0.2` | `4.0.1` |
| [codecov/codecov-action](https://github.com/codecov/codecov-action) | `5.5.3` | `6.0.0` |
| [actions/github-script](https://github.com/actions/github-script) | `7` | `8` |
| [google-github-actions/run-gemini-cli](https://github.com/google-github-actions/run-gemini-cli) | `0.1.11` | `0.1.21` |
| [github/codeql-action](https://github.com/github/codeql-action) | `4.32.4` | `4.35.1` |


Updates `actions/checkout` from 4 to 6
- [Release notes](https://github.com/actions/checkout/releases)
- [Commits](https://github.com/actions/checkout/compare/v4...v6)

Updates `dorny/paths-filter` from 3.0.2 to 4.0.1
- [Release notes](https://github.com/dorny/paths-filter/releases)
- [Changelog](https://github.com/dorny/paths-filter/blob/master/CHANGELOG.md)
- [Commits](de90cc6fb3...fbd0ab8f3e)

Updates `codecov/codecov-action` from 5.5.3 to 6.0.0
- [Release notes](https://github.com/codecov/codecov-action/releases)
- [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md)
- [Commits](1af58845a9...57e3a136b7)

Updates `actions/github-script` from 7 to 8
- [Release notes](https://github.com/actions/github-script/releases)
- [Commits](https://github.com/actions/github-script/compare/v7...v8)

Updates `google-github-actions/run-gemini-cli` from 0.1.11 to 0.1.21
- [Release notes](https://github.com/google-github-actions/run-gemini-cli/releases)
- [Changelog](https://github.com/google-github-actions/run-gemini-cli/blob/main/CHANGELOG.md)
- [Commits](a3bf790425...9dbec29a20)

Updates `github/codeql-action` from 4.32.4 to 4.35.1
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](89a39a4e59...c10b8064de)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: github-actions
- dependency-name: dorny/paths-filter
  dependency-version: 4.0.1
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: github-actions
- dependency-name: codecov/codecov-action
  dependency-version: 6.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: github-actions
- dependency-name: actions/github-script
  dependency-version: '8'
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: github-actions
- dependency-name: google-github-actions/run-gemini-cli
  dependency-version: 0.1.21
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: github-actions
- dependency-name: github/codeql-action
  dependency-version: 4.35.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: github-actions
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-01 11:01:31 -07:00
hertschuh
28a83c53a1 Fix workflow concurrency for tests. (#22597)
The tests are not correctly kept on pulls in `master`, when many PRs are submitted in `master` back to back, tests are not run: https://github.com/keras-team/keras/commits/master/

This came from a misunderstanding of `cancel-in-progress`. `cancel-in-progress: true` means that any running job is immediately cancelled are replaced by the new one. `cancel-in-progress: false` means that there can be one running job and one pending job, but if more jobs are queued, the existing pending job is cancelled.

For pulls we don't want to cancel any of them. The way to achieve this is by not having the same `group`. This is done by putting `github.run_id`, which is unique.

We can now put `cancel-in-progress: true` since pulls will never be deduped. That's because `github.head_ref` is only populated for `pull_request`.
2026-03-31 18:24:08 -07:00
dagecko
cff7c7dcc8 fix: pin 4 unpinned action(s) (#22555) 2026-03-31 12:10:17 -07:00
Divyashree Sreepathihalli
8a941a4872 Keras automation: Enhance JSON parsing and error handling in workflow 2026-03-27 22:21:54 -07:00
Divyashree Sreepathihalli
1754de16ac Delete .github/workflows/gemini-issue-auto-fix.yml 2026-03-27 22:03:10 -07:00
Divyashree Sreepathihalli
37d6ec48d1 Keras automation: reset to working version (#22584)
* Enhance Gemini issue triage workflow

Updated the issue triage workflow to allow multiple labels and added a step to create the .gemini directory.

* fix issue triage

* reset to working version
2026-03-27 22:00:30 -07:00
Divyashree Sreepathihalli
5c38bb8752 Keras automation: Fix issue triage github wf (#22583)
* Enhance Gemini issue triage workflow

Updated the issue triage workflow to allow multiple labels and added a step to create the .gemini directory.

* fix issue triage
2026-03-27 21:50:57 -07:00
Divyashree Sreepathihalli
4882ddd4d0 Keras automations: decouple issue triage and automatic PR fix (#22581)
* auto generate fix PR

* fix error in workflow

* fix github flow

* fix github actions

* decouple PR and issue triage GH workflow
2026-03-27 21:02:25 -07:00
Divyashree Sreepathihalli
2407d7d859 Keras Automations: fix gh actions issue with Auto generate PR (#22580)
* auto generate fix PR

* fix error in workflow

* fix github flow

* fix github actions
2026-03-27 20:44:17 -07:00
Divyashree Sreepathihalli
964cc685a6 Keras Automation : fix workflow (#22579)
* auto generate fix PR

* fix error in workflow

* fix github flow
2026-03-27 20:09:53 -07:00
Divyashree Sreepathihalli
ce2efc12c7 Keras Automation : Fix Issue triage workflow (#22578)
* auto generate fix PR

* fix error in workflow
2026-03-27 20:03:52 -07:00
Divyashree Sreepathihalli
f26befae97 auto generate fix PR (#22577) 2026-03-27 19:54:56 -07:00
Divyashree Sreepathihalli
1ecfc641e3 Automate issue triage : Allow applying multiple lables (#22576)
* add automated-issue triage

* add gemini key

* automate response

* fix error in github actions

* allow applying multiple lables
2026-03-27 16:10:16 -07:00
Divyashree Sreepathihalli
135d6f9344 Automate issue triage - fix gh action error (#22575)
* add automated-issue triage

* add gemini key

* automate response

* fix error in github actions
2026-03-27 15:57:47 -07:00
Divyashree Sreepathihalli
a0a7c48794 Automate issue response (#22574)
* add automated-issue triage

* add gemini key

* automate response
2026-03-27 13:39:42 -07:00
Divyashree Sreepathihalli
7e33eaf024 Modify Gemini API key (#22573)
Updated Gemini automated issue triage workflow settings.
2026-03-27 13:24:42 -07:00
Divyashree Sreepathihalli
4181daca3b add automated-issue triage (#22572) 2026-03-27 11:34:23 -07:00
hertschuh
a2e97e1890 Revert workflows to be able to run GPU/TPU tests manually. (#22558)
Problem: using `pull_request_target` causes `actions/checkout` to retrieve the code from the `master` branch, thus not checking the code from the PR.

Solution: revert back to `pull_request`.

Problem: the `trigger_gpu_tpu_tests.yml` doesn't work, but makes it nearly impossible to trigger the tests manually.

Solution: remove them for now until we find a working solution.

Also simplified the `cancel-in-progress:` condition.
2026-03-26 21:47:44 -07:00
hertschuh
17581cf090 Configure concurrency on github test workflows. (#22553)
Problem: if the tests are triggered multiple times, for instance when pushing an update to the PR, the old tests are not cancelled. The runners are consumed for tests that are no longer useful.

Solution: configure the `concurrency` of the workflows to cancel the runs on PRs, but not on master and on releases.
2026-03-26 16:02:09 -07:00
hertschuh
f18fca83a0 Use pull_request_target event for GPU and TPU tests. (#22551)
This change https://github.com/keras-team/keras/pull/22504 causes the `kokoro:force-run` label to be removed automatically. However, unlike removing the label manually, it does not trigger the GPU / TPU tests.
2026-03-26 13:39:45 -07:00
hertschuh
6ce18fb22c Run multi-device tests on our 4 TPU machines. (#22492)
We have runners with 4 TPUs.
2026-03-26 11:34:41 -07:00
hertschuh
1d1b02190f Fix incorrect call to removeLabel. (#22508)
Follow-up to https://github.com/keras-team/keras/pull/22504
2026-03-25 16:51:01 -07:00
hertschuh
014a571391 Add delay before removing kokoro:force-run label. (#22507)
This is a follow-up to https://github.com/keras-team/keras/pull/22504

The workflow is currently failing with a `Label does not exist` error.

Trying to add a delay in case this is a race condition and the labels haven't fully been updated yet.
2026-03-25 16:41:30 -07:00
hertschuh
ea5584baa3 Replace Kokoro app unlabeling behavior with github workflow. (#22504)
Kokoro use to automatically remove the `kokoro:force-run` label to trigger the tests. This no longer works as we don't have any Kokoro tests anymore.

This github workflow has the same behavior.
2026-03-25 15:02:13 -07:00
hertschuh
3b2e3f8913 Move JAX CUDA tests to GPU runners. (#22462)
- Disable TF32 for better numerical accuracy
- Use the lastest version a JAX and the CUDA 13 extra
2026-03-24 14:32:56 -07:00
hertschuh
e74cca7474 Mark multi-device tests and run them separately on CPU. (#22484)
Currently, we are running multi-device tests with JAX
- always on CPU
- always on GPU with kokoro (although both `distribution_lib_test.py` are skipped)
- never on TPU

[This code](https://github.com/keras-team/keras/blob/master/keras/src/backend/jax/distribution_lib_test.py#L25-L28) is run while collecting the list of unit tests and unintentionally applies to everything instead of just `jax/distribution_lib_test.py`.

We currently use 4 T4s for JAX GPU tests, however https://github.com/keras-team/keras/pull/22462 moves them to a single L4.

A subsequent PR will add multi-TPU tests.

- Makes the normal JAX CPU tests run on a single CPU
- Adds a JAX multi-CPU check to run all the tests tagged with `pytest.mark.multi_device`
- Makes `jax/distribution_lib_test.py` work with any number of devices (as long as it's even and greater than 4) instead of the hardcoded 8
- Tags tests from `jax/distribution_lib_test.py`, `jax/trainer_test.py`, `orbax_checkpoint_test.py` with `pytest.mark.multi_device` so that we can run them with `pytest -m multi_device`
2026-03-24 10:41:54 -07:00
hertschuh
95eec730c6 Move TensorFlow CUDA tests to GPU runners. (#22461)
We are migrating to GPU custom runners for GPU tests instead of Kokoro.

- Disable TF32 for better numerical
- Pin tensorflow to 2.20 as 2.21 doesn't works with our L4 + CUDA13 setup
2026-03-23 09:26:21 -07:00
hertschuh
7287937214 Fix github runner based Torch GPU tests. Fix torch LSTM. (#22421)
The GPU runner based tests were unintentionally run with the JAX backend and therefore running on CPU instead of GPU. This affected only the Torch test for now.

In order for the tests to pass on the NVidia L4 GPUs that we have, the following changes were needed:
- Added installing of `build-essential` to install a C++ compiler, which is needed by Torch Dynamo (Triton).
- Removed extra logging unintentionally added by the `pytest -s` option
- Changed `masking_test.py` and `lstm_test.py` to only use right padded masks (i.e. the Trues are on the left and the Falses on the right), which is required by CuDNN and is the normal use case for sequences.
- Lowered verification precision to 1e-5 for `bidirectional_test.py`, which now matches all the other RNN tests.
- Allowed fallback for int8 int8 matmul using `torch._int_mm` as it is not supported with CUDA 13.
- Turned off CuDNN's TF32 as they caused numerical differences causing some tests to fail.
- Skip broken LSTM tests, previously the issue was hidden by the fallback to the non-CuDNN implementation.
2026-03-20 16:03:33 -07:00
hertschuh
401e684770 Move Torch CUDA tests to GPU runners. (#21957)
We are migrating to GPU custom runners for GPU tests instead of Kokoro.

This will be done one backend at a time as the other backends require more fixes. However, the `gpu_tests.yml` file has logic for JAX and TensorFlow already.
2026-03-11 10:53:17 -07:00
dependabot[bot]
bfdc116b44 Bump the github-actions group with 2 updates (#22321)
Bumps the github-actions group with 2 updates: [actions/upload-artifact](https://github.com/actions/upload-artifact) and [github/codeql-action](https://github.com/github/codeql-action).


Updates `actions/upload-artifact` from 6.0.0 to 7.0.0
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](b7c566a772...bbbca2ddaa)

Updates `github/codeql-action` from 4.32.0 to 4.32.4
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](b20883b0cd...89a39a4e59)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-version: 7.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: github-actions
- dependency-name: github/codeql-action
  dependency-version: 4.32.4
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: github-actions
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-03-03 12:34:15 -08:00
dependabot[bot]
eeb16c3c4a Bump the github-actions group with 2 updates (#22093)
Bumps the github-actions group with 2 updates: [actions/checkout](https://github.com/actions/checkout) and [github/codeql-action](https://github.com/github/codeql-action).


Updates `actions/checkout` from 6.0.1 to 6.0.2
- [Release notes](https://github.com/actions/checkout/releases)
- [Commits](https://github.com/actions/checkout/compare/v6.0.1...v6.0.2)

Updates `github/codeql-action` from 4.31.9 to 4.32.0
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](5d4e8d1aca...b20883b0cd)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: 6.0.2
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: github-actions
- dependency-name: github/codeql-action
  dependency-version: 4.32.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: github-actions
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-02-02 12:50:16 -08:00
hertschuh
0dd27da953 TPU tests now verify that we can detect TPUs and fails it not. (#22019)
This is to prevent silently falling back to running tests on CPU and thinking the tests pass on TPU.

Also some minor cleanup of the workflow file.
2026-01-19 22:50:39 -08:00
hertschuh
e819c744ba Trigger TPU tests on kokoro label removal rather than addition. (#22001)
Problem: when a PR is approved, `google-ml-butler` adds 2 labels: `kokoro-force-run` and `ready to pull` in this order. The TPU tests workflow is triggered twice, once for each, but it is skipped for `ready to pull`. While the actual TPU tests are still run, what's shown in the UI is the skipped worflow because it was triggered last. So it's not directly possible to see the results of the TPU tests.

Solution: this changes the workflow to trigger on labeled removed. This works because kokoro immediately removes the label once it's set and it only removes one label at a time.

We will probably need to revisit once we migrate off of kokoro.
2026-01-14 09:26:23 -08:00