Add general guidance for fixing flaky UI tests:
- Flaky = race condition, not timeout issue
- Wait for unique state identifiers, not shared elements
- Understand framework built-in waits (findText already waits)
- Trace causality backwards to find correct wait condition
- State transitions have intermediate states
These principles should improve autofix success rate for UI test failures.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Install Neovim in workflows that run tests:
- testsMaintenance.yml: deals with @TestWithoutNeovim annotations
- codebaseMaintenance.yml: can run gradle tests
- youtrackAutoAnalysis.yml: uses TDD for bug fixes and features
Also add guidance in testsMaintenance to verify actual Neovim behavior
when working with skip reasons, and allow nvim/echo bash commands.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add disk space cleanup and reduce video recording size for Linux runners:
- Add "Free up disk space" step to remove .NET, Android SDK, GHC, CodeQL
- Reduce Xvfb resolution from 1920x1080 to 1280x720
- Optimize ffmpeg: 15fps, ultrafast preset, crf 28
This aligns with the existing Rider Linux workflow settings.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add explicit cache-read-only: false to ensure Gradle cache
is properly written and read on macOS builds, matching the
configuration used in the unified IntelliJ workflow.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Added --configuration-cache flag to buildPlugin task in all Linux
UI test workflows to significantly improve build performance.
The configuration cache stores the result of the configuration phase
and reuses it for subsequent builds, avoiding the need to re-evaluate
build scripts when inputs haven't changed. This is particularly
beneficial for UI tests that run frequently on schedule.
Note on "Gradle User Home cache not found" message: This is expected
when the commit SHA changes, as the gradle-home cache key includes it
for security. However, the more important dependency and transform
caches use content-based hashing and are restored correctly across
commits, which is why builds still benefit from caching.
Updated workflows:
- runUiTestsLinux.yml
- runUiRdTestsLinux.yml
- runUiPyTestsLinux.yml
- runUiTestsUnified.yml (both jobs)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Added explicit instruction to all UI test failure analysis prompts:
"IMPORTANT: If you have a concrete suggestion for fixing the test,
ALWAYS proceed with creating a branch and PR. Never ask for
permission - just do it."
This ensures Claude Code will automatically create fix PRs when it
identifies concrete solutions, eliminating the need for user
confirmation and enabling fully automated test maintenance.
Updated workflows:
- runUiTests.yml (IntelliJ macOS)
- runUiTestsLinux.yml (IntelliJ Linux)
- runUiTestsUnified.yml (IntelliJ unified)
- runUiRdTests.yml (Rider macOS)
- runUiRdTestsLinux.yml (Rider Linux)
- runUiPyTests.yml (PyCharm macOS)
- runUiPyTestsLinux.yml (PyCharm Linux)
- runUiOctopusTests.yml (Octopus macOS)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Created a unified workflow that runs macOS and Linux tests in parallel,
then performs a single AI analysis of failures from both platforms.
Key features:
1. Parallel execution: test-macos and test-linux run simultaneously
2. Separate artifact uploads: macos-reports and linux-reports
3. Unified analysis job that:
- Triggers if either platform fails
- Downloads artifacts from both platforms
- Provides Claude Code with context from both test runs
- Identifies common vs platform-specific issues
- Creates a single PR for common issues
- Clearly labels platform-specific fixes
Benefits:
- Single unified fix for issues affecting both platforms
- Better context for AI analysis by comparing across platforms
- Reduced PR noise (one PR instead of two for common issues)
- Cost efficiency (one AI analysis instead of two)
The analyze-failures job has git and gh tools enabled to allow
automatic branch creation and PR submission when fixes are identified.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Windows UI tests have been removed. The project will continue to run
UI tests on macOS and Linux platforms, which provide sufficient coverage
for UI testing across different operating systems.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Added the "Auto-click Allow button for screen recording permission" step
to all macOS UI test workflows that were missing it:
- runUiRdTests.yml (Rider macOS)
- runUiPyTests.yml (PyCharm macOS)
- runUiOctopusTests.yml (Octopus macOS)
This step automatically dismisses the macOS screen recording permission
dialog that appears when ffmpeg starts recording. Without this automation,
the dialog blocks the test execution and causes timeouts.
The step tries multiple coordinate positions using both cliclick and
AppleScript fallback to handle different screen resolutions and dialog
positions.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Updated all UI test workflow prompts to instruct Claude Code to
automatically create fixes when concrete solutions are identified:
1. Create a branch with descriptive name
2. Apply the suggested fix to the codebase
3. Run the specific failing test to verify the fix works
4. Create a PR if the test passes with clear documentation
Each workflow includes the appropriate test command for its IDE type:
- IntelliJ/Octopus: gradle :tests:ui-ij-tests:testUi --tests "..."
- Rider: gradle :tests:ui-rd-tests:testUi --tests "..."
- PyCharm: gradle :tests:ui-py-tests:testUi --tests "..."
This enables fully automated test fix proposals without manual
intervention, reducing the feedback loop for fixing flaky or broken
UI tests caused by platform changes.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Clarified that existing UI tests without OS specification run on macOS
by updating workflow names to include "macOS" suffix.
Created Linux versions of Rider and PyCharm UI tests:
- runUiRdTestsLinux.yml: Rider tests on Linux with Xvfb setup
- runUiPyTestsLinux.yml: PyCharm tests on Linux with Xvfb and Python 3.10
Both new workflows follow the same Linux setup pattern as the existing
runUiTestsLinux.yml workflow, using x11grab for screen recording and
appropriate IDE type parameters (-PideaType=RD/PC).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
When UI tests fail due to timeouts but the element is visible in the
video/screenshot, the failure may be caused by renamed properties or
changed class names. The hierarchy HTML file contains the actual UI
structure and can help identify these issues.
Updated all UI test workflows (macOS, Linux, Windows, Rider, PyCharm,
and Octopus) to instruct Claude Code to check build/reports/hierarchy-ideaVimTest.html
and suggest updated queries when this scenario occurs.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Allows the AI to remove temporary files when creating thumbnail grids
from screen recordings for better video analysis.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Creates a separate workflow for running UI tests on Ubuntu with:
- Xvfb virtual display for headless GUI testing
- FFmpeg screen recording using x11grab
- Claude Code AI analysis of test failures
- Artifact upload for test reports and recordings
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Changed Windows UI test workflow from PowerShell to bash:
- Run Idea: Now properly runs in background with `&`
- Screen recording: Uses bash syntax for background process and PID capture
- Stop recording: Uses standard bash kill command
Problem: PowerShell Start-Process completed immediately instead of
keeping gradle running in background, causing IntelliJ to never start
and health check to fail.
Solution: Use bash shell (available via Git Bash on Windows runners)
which properly handles background processes with `&` syntax.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Added AppleScript automation to automatically grant screen recording
permission when the macOS permission dialog appears:
- Waits 2 seconds after ffmpeg starts
- Tries up to 10 times (1 second intervals) to click Allow button
- Uses SecurityAgent process to interact with system permission dialog
- Fails gracefully if dialog doesn't appear
This eliminates the manual permission prompt that was blocking
automated screen recording on macOS GitHub Actions runners.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Created separate workflow for Windows UI tests:
- Runs on windows-latest
- FFmpeg installation via Chocolatey
- Screen recording using gdigrab (Windows GDI Grabber)
- No permission dialogs on Windows (unlike macOS)
- PowerShell for process management
- Gradle caching enabled
- Only runs IJ tests for now
- AI analysis on test failures
- Records at 30fps with H.264 codec
This provides an alternative platform for UI testing with
working screen recording capabilities.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Removed --no-configuration-cache flag from all runIdeForUiTests commands
to enable Gradle configuration cache and improve build performance.
This works together with the gradle/actions/setup-gradle@v4 action
to provide optimal caching.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Enabled gradle/actions/setup-gradle@v4 for all UI test workflows:
- runUiTests.yml
- runUiPyTests.yml
- runUiRdTests.yml
- runUiOctopusTests.yml
This action automatically caches:
- Gradle wrapper
- Dependencies
- Gradle build cache
This should significantly speed up the "Build Plugin" step on
subsequent runs.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Extended macOS CI screen recording to all UI test workflows:
- Added FFmpeg screen recording to PyCharm, Rider, and Octopus tests
- Removed Linux UI tests workflow (runUiTestsLinux.yml)
Removed video-recorder-junit5 library from codebase:
- Removed dependency from all UI test modules
- Removed video recorder system properties from Gradle configs
- Removed @Video annotations and imports from all test files
- Removed "Move video" steps from all workflows
Updated Claude AI analysis prompts:
- Changed from dual video sources to single CI recording
- Added helpful ffmpeg commands for video analysis:
* Extract frames at specific times
* Create thumbnail grids
* Get video duration
* Extract last N seconds
This simplifies the video recording setup by relying solely on
CI-level screen capture, which provides complete coverage of the
entire test run.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Changed FFmpeg device from "1:0" to "0:none" for proper screen capture.
Added device listing step for debugging.
The previous device index caused "Invalid device index" error.
Using "0:none" captures screen 0 without audio, which is the correct
format for avfoundation screen capture on macOS runners.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Added FFmpeg screen recording at CI level using avfoundation:
- Captures full screen at 30fps starting when IntelliJ launches
- Saves to build/reports/ci-screen-recording/screen-recording.mp4
- Complements Gradle test video with full session recording
- Updated AI analysis prompt to reference both video sources
This provides a complete recording of the entire test run, which may
catch issues that occur outside the focused Gradle test recording.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Created new runUiTestsLinux.yml with modern Linux configuration:
- Uses ubuntu-latest with Xvfb for headless display
- FFmpeg for video recording
- Same test suite as macOS (ui-ij-tests only)
- AI analysis on test failures
- Updated to Java 21 and latest action versions
Removed old commented-out Linux job from runUiTests.yml.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
When UI tests fail, Claude Code now automatically analyzes the failure by examining test reports, video recordings, and IDE logs. The analysis is saved to build/reports/ai-analysis/analysis.txt and included in uploaded artifacts.
Requires ANTHROPIC_API_KEY secret to be configured in GitHub repository settings.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>