cua/.github/workflows at main - cua

trycua/cua

mirror of https://github.com/trycua/cua.git synced 2026-03-26 22:08:16 +00:00

Files

History

ddupont 7d1fa31fb6 feat(sandbox-sdk): Cua Sandbox SDK — unified API for Linux, macOS, Windows, Android (#1218 )

* feat(cua-sandbox): Add sandbox SDK with QEMU WSL2/KVM, Hyper-V, and Docker runtimes

- New cua-sandbox package: declarative Image API, layered disk caching, multi-runtime support
- QEMU WSL2 runtime: runs QEMU inside WSL2 with KVM hardware acceleration on Windows
- Hyper-V runtime: builds Windows images from ISO with native Hyper-V Gen2 VMs
- Shared Windows unattended install (builder/windows_unattend.py): Autounattend.xml, ISO creation
- OCI registry push/pull for QEMU disk images
- Computer-server setup script installs cua-computer-server only (no PyTorch/agent)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs(cua-sandbox): Add usage examples to README

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(cua-sandbox): Add cloud transport with ephemeral VM support

Cloud sandboxes are now the default path — sandbox() connects to the
CUA platform API, provisions VMs, and delegates control via HTTPTransport.
Ephemeral inference: image= creates+destroys, name= connects only.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(cua-sandbox): Add Android emulator runtime, transports, and example sandboxes

Adds AndroidEmulatorRuntime with headless toggle, ADB/VNC/SSH/QMP transports,
cloud transport timeout increase (10min), and example sandbox scripts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(cua-sandbox): Add ephemeral cloud sandbox example

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(cua-sandbox): Remove name from ephemeral cloud example to trigger VM creation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(cua-sandbox): Add Mobile interface for Android touch, gestures, and hardware keys

Adds sb.mobile.* methods (tap, swipe, scroll, pinch, home, back, etc.)
backed by ADB shell commands, and an ephemeral Android example.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(ci): pass SLACK_WEBHOOK to cold start benchmark step

* add benchmark script

* feat(android): true MT Protocol B multitouch, gesture() API, auto port detection

- mobile.py: replace asyncio.gather pinch with single-shell MT Protocol B
  sendevent script; add gesture(*finger_paths) primitive; pinch_in/pinch_out
  delegate to gesture()
- android_emulator.py: make adb_port Optional[int]=None; add
  _find_free_emulator_port() scanning even console ports 5554-5682 via
  socket.bind
- examples/touch_test_app/: Android APK logging every MotionEvent as JSON
  to Logcat under tag "TouchTest"; supports RESET_LOG broadcast
- tests/test_android_multitouch.py: integration test suite using sandbox()
  context manager; Local/Cloud split (Cloud skipped without CUA_TEST_API_KEY)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(sandbox): add get_display_url(share=False) across transports

share=False → vnc://localhost:{port} for local VNC runtimes,
              https://cua.ai/connect/incus/{name} for cloud (auth-gated)
share=True  → noVNC/ws-scrcpy URL with embedded password (cloud only)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* add ephemeral android test

* refactor(tests): move TouchTest APK to standalone repo; download from releases

- Remove examples/touch_test_app — now lives at
  https://github.com/trycua/android-touch-test-app
- test_android_multitouch.py: download APK from GitHub Releases by default
  (latest release URL) instead of building from source
- CUA_ANDROID_TEST_APK can still be set to a local path to override

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(tests): implement cloud Android multitouch tests

Extract shared test logic into _MultitouchTests mixin so Local and Cloud
classes run identical assertions. Add cloud_android_sb session fixture that
spins up an ephemeral cloud Android VM, installs the TouchTest APK via
curl + pm install, and yields the ready sandbox.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(sandbox): implement apk_install for cloud transport; simplify root escalation

- CloudTransport._apply_image_layers: applies apk_install/run layers after
  server is ready (curl + pm install on device)
- Replace transport._adb_cmd("root") with sb.shell.run("su root id") in
  local fixture for consistency with cloud
- Cloud fixture now uses Image.android("14").apk_install(url) same as local

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(sandbox): add multitouch_gesture server action; fix cloud multi-touch injection

Move MT Protocol B sendevent injection to a server-side `multitouch_gesture`
action so that `adb root` can be called before injecting events. This fixes
cloud Android VMs where `su root sendevent` runs silently but events are not
delivered to the app (likely SELinux blocking kernel input injection from the
su context).

Changes:
- computer-server: add `multitouch_gesture` to AndroidAutomationHandler — calls
  `adb root`, detects touch device + axis range via `getevent -p`, builds and
  runs MT Protocol B sendevent script as root adbd
- computer-server/main.py: register `multitouch_gesture` in handlers map
- mobile.py: `gesture()` now sends the `multitouch_gesture` action with
  structured JSON params instead of building a shell script client-side;
  remove `_build_two_finger_script` and MT Protocol B helpers (logic in server)
- adb.py: handle `multitouch_gesture` via `adb root` + sendevent (local path)
- tests: `test_true_multitouch_*` use `sb.mobile.gesture()` instead of manual
  sendevent scripts; remove `su root id` escalation from fixtures

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(sandbox): add _apply_image_layers to CloudTransport for apk_install support

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(computer-server): add missing logger in android handler

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(computer-server): fix duplicate logger definition

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(cua-sandbox): replace sandbox()/close() with Sandbox.create/connect/ephemeral + disconnect/destroy

- Sandbox.create(image) — provision a persistent sandbox
- Sandbox.connect(name) — attach to an existing sandbox
- Sandbox.ephemeral(image) — async context manager, auto-destroys on exit
- Sandbox.disconnect() — drop connection, sandbox keeps running
- Sandbox.destroy() — disconnect + permanently delete
- Localhost.close() renamed to disconnect()
- sandbox() module-level function kept as deprecated shim
- Updated all tests, examples, conftest, agent docstring, and README

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(cua-sandbox): add Localhost.connect() and make Sandbox.connect() dual-mode

- _ConnectResult supports both await and async with on connect()
- Sandbox.connect("name") works as plain await or context manager (disconnects on exit)
- Localhost.connect() mirrors the same pattern
- localhost() module-level function kept as deprecated shim
- conftest fixtures updated to use Localhost.connect()
- README updated

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(cua-sandbox): update README with new API and connect() dual-mode examples

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat: add JPEG screenshot support and Android RL fleet benchmark

computer-server: add format/quality params to screenshot() on all handlers
(android, linux, macos, windows, base). Defaults to PNG for backwards compat;
pass format="jpeg" to get ~5-10x smaller payloads for RL workloads.
The existing inspect.signature dispatch picks up the new params automatically.

cua-sandbox: thread format/quality through Transport.screenshot(),
HTTPTransport, CloudTransport, Screen interface, and Sandbox.screenshot()
so callers can do sb.screenshot(format="jpeg", quality=85).

tests: add android_rps_benchmark.py — provisions N Android sandboxes in
parallel and drives them at a target aggregate RPS with per-command latency
logging, p50/p95/p99 reporting, and PASS/FAIL verdict for RL infra validation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(cua-sandbox): update default screenshot quality to 95

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(cua-sandbox): add pwa_install — build & install TWA APK from a PWA manifest URL

- Image.pwa_install(manifest_url) — new Android-only chainable layer that uses
  Bubblewrap to generate a signed debug APK from a Web App Manifest URL and
  install it via adb
- _bw_init.js — Node.js helper that calls @bubblewrap/core directly to generate
  twa-manifest.json non-interactively (bypasses the interactive CLI)
- AndroidEmulatorRuntime._apply_layers: handle pwa_install layer (init → update
  → build → adb install); auto-creates debug keystore; passes passwords via env
  vars; caches built APKs by manifest URL hash
- transport/*: add format/quality params to all screenshot() implementations;
  add convert_screenshot() helper in base.py for png→jpeg conversion
- examples/pwa_install_test.py: end-to-end test — installs Starbucks PWA,
  resolves launcher activity dynamically, launches and screenshots

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(benchmark): refactor android benchmark to measure max RPS

Remove --target-rps / _TokenBucket / PASS-FAIL verdict; workers now loop
as fast as possible so the run measures achievable throughput. Add flush=True
globally for real-time log output, and use JPEG screenshots.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(cua-sandbox): validate screenshot magic bytes match requested format

Raise ValueError if the returned image magic bytes don't match the requested
format, e.g. requested 'jpeg' but got 'png' (magic bytes: 89504e47).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(benchmark): add local android benchmark using AndroidEmulatorRuntime

Mirror of android_rps_benchmark.py but uses local=True + AndroidEmulatorRuntime
for baremetal comparison against cloud.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(cua-sandbox): add JPEG conversion to ADBTransport.screenshot

ADBTransport always returned PNG regardless of the format parameter.
Now converts to JPEG via Pillow when format='jpeg'/'jpg', matching
the behaviour of the server-side android handler.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(cua-sandbox): run ADB subprocess calls in thread executor

_adb_cmd was a synchronous subprocess.run that blocked the event loop,
preventing asyncio.sleep timers and task cancellation from firing on time.
Add _adb_cmd_async which runs _adb_cmd via loop.run_in_executor, and switch
screenshot, get_screen_size, and send to use it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* perf(cua-sandbox): use raw RGBA screencap + simplejpeg for faster JPEG screenshots

Replace PNG screencap + PIL JPEG encode with raw RGBA screencap (no emulator-side
PNG encode) + simplejpeg (libjpeg-turbo, fastdct=True). Skips the emulator-side
PNG encode entirely and uses a faster JPEG encoder on the host.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* perf(cua-sandbox): revert to PNG screencap, keep simplejpeg for host-side encode

Raw RGBA screencap transfers ~10MB over ADB vs ~1-2MB for PNG (emulator
compresses before sending). Revert to -p PNG screencap, but use simplejpeg
(libjpeg-turbo, fastdct) instead of PIL for the host-side JPEG encode.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* revert(cua-sandbox): revert simplejpeg, back to PIL for JPEG encode

simplejpeg showed no measurable improvement over PIL (p50 507ms vs 519ms,
within noise). The bottleneck is ADB transfer (~400ms), not encode time.
PIL produces smaller output (219KB vs 305KB) due to 4:2:0 vs 4:4:4 subsampling.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(cua-sandbox): add GRPCEmulatorTransport for fast Android screenshots

The Android emulator's gRPC service (EmulatorController) bypasses ADB entirely,
reducing screenshot latency from ~500ms to ~50ms. Changes:

- Add GRPCEmulatorTransport using getScreenshot(RGB888) + PIL JPEG encode
- Generate protobuf stubs from emulator_controller.proto into transport/_grpc_emulator/
- AndroidEmulatorRuntime now launches with -grpc <port> and sets grpc_port in RuntimeInfo
- sandbox._create picks GRPCEmulatorTransport when grpc_port is set, else falls back to ADB
- Add grpcio>=1.60.0 to cua-sandbox dependencies

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(cua-sandbox): add protobuf dependency for gRPC emulator stubs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(cua-sandbox): fix gRPC stubs and increase max message size to 32MB

- Regenerate emulator_controller stubs with grpcio-tools/_proto include path
  to resolve 'google/protobuf/empty.proto not loaded' error
- Fix relative import in generated grpc stub (bare import → from . import)
- Increase gRPC channel max_receive/send_message_length to 32MB
  (RGB888 screenshot is ~6MB, exceeding the 4MB default)

Result: gRPC screenshot transport now fully functional.
Benchmark: 48.90 RPS / p50=20ms vs ADB baseline 1.80 RPS / p50=519ms (27x faster)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(computer-server): note Android emulator gRPC interface and GRPCEmulatorTransport

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(cua-sandbox): implement touch/click and fix screen_size in GRPCEmulatorTransport

- send() now handles left_click, right_click, double_click, mouse_down, mouse_up
  via EmulatorController.sendTouch() (press + release TouchEvent pair)
- move_cursor is a no-op (no hover concept on Android)
- Fix get_screen_size(): was requesting 1x1 thumbnail which returned 1080x1;
  now requests full PNG so emulator returns native display dimensions
- Regenerate _grpc_emulator stubs with grpcio-tools/_proto include path

Benchmark (--action step = screen_size + tap + screenshot):
  42.2 RPS / p50=22ms / p95=32ms

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(cua-sandbox): full gRPC transport — multitouch, shell fallback, sync channel

- Switch grpc.aio → sync grpc channel + run_in_executor
  Avoids "Future attached to a different loop" in pytest session fixtures
- Add shell/run_command handler (ADB fallback via _find_adb)
- Add multitouch_gesture: interpolated N-finger sendTouch frames sent
  simultaneously per frame — passes all 17 multitouch tests
- Pass serial + sdk_root to GRPCEmulatorTransport from sandbox._create
- Regenerate _grpc_emulator stubs

All 17 TestAndroidMultitouchLocal tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(cua-sandbox): pin grpcio==1.78.0 and protobuf==6.31.1

Generated stubs require exact versions — grpcio-tools 1.78.0 was used to
regenerate and emulator_controller_pb2.py calls ValidateProtobufRuntimeVersion
with 6.31.1. Pinning eliminates stub regeneration on venv recreation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(cua-sandbox): add agoda media type backward-compat aliases for ghcr.io images

Existing images on ghcr.io still use vnd.agoda.macosvz.* types. Keep them
as OCI_VM_{CONFIG,DISK,AUX}_LEGACY constants, include in VM_MEDIA_TYPES,
and match them in detect_format/detect_os so pulling those images still works.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lume): fix VNC backend, port, and pull ref for macos-tahoe-cua

- Change LUME_API_PORT from 8000 to 8443 (setup-cua.sh uses port 8443)
- Fix ConnectTimeout not caught in is_ready — was propagating immediately instead of retrying
- Fix pull payload: split full OCI ref (e.g. ghcr.io/trycua/img:tag) into registry/organization/image components to avoid lume API double-prefixing the org
- Install cua-computer-server[vnc] (includes vncdotool/twisted) in setup-cua.sh — required for VNC backend screenshots
- Add test_lume_macos_tahoe_cua test using Image.from_registry with LumeRuntime
- Replace vnd.agoda.macosvz media types with vnd.trycua.lume, keep legacy as backward-compat constants

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(cua-sandbox): auto-runtime, transport selection, macOS versions, error handling

- Fix local=True with no runtime not calling _auto_runtime — now auto-selects
  DockerRuntime/QEMURuntime/LumeRuntime/AndroidEmulatorRuntime/HyperVRuntime
- Fix transport selection preferring VNCTransport over HTTPTransport when both
  api_port and vnc_port are set (e.g. Docker containers, Lume VMs)
- Add MACOS_VERSION_IMAGES dict mapping version strings to OCI refs
  ("15"/"sequoia" → macos-sequoia-cua, "26"/"tahoe" → macos-tahoe-cua)
- Image.macos() now validates version and errors with supported list; default "26"
- LumeRuntime: handle async pull (ReadError on connection close), bump
  _wait_for_ip timeout to 3600s for large image pulls, use version map
- Add httpx.ReadError to is_ready exception handlers in docker/hyperv/lume
- Add auto-runtime tests (linux container, linux vm, macos, android, windows)
- Add cloud ephemeral tests (linux, android) and Sandbox.create persistent tests
- Fix test_macos_vm hardcoded api_port=18005 → LumeRuntime() with default port

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(examples): replace legacy computer SDK examples with Cua Sandbox SDK

- Remove all examples using the old computer/agent SDK imports
- Add 11 new pytest-compatible examples covering all supported runtimes:
  linux/macos/windows/android × local/cloud × container/vm
- Each example is both runnable (if __name__ == "__main__") and a pytest test
- Docstrings optimized for answer engine discoverability
- Wire examples/sandboxes/ into pytest testpaths in pyproject.toml

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(sandbox-sdk): persistent sandboxes, auto-ports, pull progress, lume async pull

Python SDK:
- Add random two-word sandbox names (_random_name) instead of "cua-sandbox" fallback
- Add _find_free_port() to docker/qemu runtimes to avoid port conflicts
- Add AndroidEmulatorRuntime with list/stop support, wired into _list_local
- Parallelize cua sb ls across Docker/Lume/QEMU/Android runtimes
- Fix UnboundLocalError for conditional HTTPTransport import
- Fix sandbox name resolution after runtime start (resolved_name)
- Fix Android reconnect to use GRPCEmulatorTransport
- Fix cua sb delete to skip confirmation prompt in non-interactive mode
- Add sandbox_state.py with grpc_port/adb_serial/sdk_root params
- Suppress httpx/cua_sandbox INFO logs in CLI output

Lume:
- Add POST /lume/pull/start async endpoint (202 immediately, polls via GET /lume/vms/{name})
- Add PullProgressTracker actor tracking download % per VM name
- Add downloadProgress field to GET /lume/vms/{name} during pulls
- Fix setProgress to clear stale errors so retries work
- Add progressHandler to pullImage(), handlePull, and lume pull CLI
- Add setTotal() in pullOCI so progress % is accurate (was always 0%)
- Unify /lume/pull and /lume/pull/start to both use progressHandler
- Add diagnostic logging for OCI config/nvram layer parsing
- Fix _wait_for_ip to raise immediately if VM status is "stopped"
- Reduce _wait_for_ip timeout from 3600s to 300s

Examples:
- Add examples/sandboxes-cli/ with CLI-based persistent sandbox tests
- Tests assert VM appears in cua sb ls --all after launch and disappears after delete

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lume): catch ReadError on sync pull fallback with helpful auth hint

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lume): handle lume v0.3.x connection drop on sync pull — check VM exists after ReadError

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lume): catch ReadError on /pull/start for lume v0.3.x compat

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lume): poll VM status after /pull/start connection drop (lume v0.3.x)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lume): handle lume v0.3.x compat — sync pull + connection drop

lume v0.3.4 doesn't have /pull/start (drops connection immediately)
and also drops the connection on /lume/pull when done. Fall back to
sync /pull, handle the ReadError by verifying VM was created, then
run the VM and return directly instead of falling through to the
async poll path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lume): find lume binary in ~/.local/bin when not on PATH

lume installs to ~/.local/bin which may not be in PATH for non-interactive
shells (e.g. SSH sessions, LaunchAgents). Fall back to checking the
common install location directly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lume,tests): redirect progress to stderr; add ~/.local/bin to PATH in tests

- lume.py: all pull progress prints go to sys.stderr so --json output
  is clean JSON on stdout (fixes JSONDecodeError in test_macos_local_vm)
- conftest.py: pytest_configure adds ~/.local/bin to PATH so cua/lume
  binaries installed there are found in non-interactive shells

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(tests): pin macos-tahoe-cua to known-good sha256 digest

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lume): wait for VNC readiness in is_ready(), not just HTTP /status

macOS VNC (Screen Sharing) starts after the HTTP computer-server, so
screenshot() fails immediately after launch. is_ready() now polls
POST /cmd screenshot until VNC accepts connections before returning.
Timeout extended to 180s to cover both phases.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lume): deliver VNC config to VM before is_ready check

lume v0.3.x doesn't push VNC port/password to the VM via VirtioFS,
so the computer-server uses a stale ~/.vnc.env from a previous run.
After _wait_for_ip, query the lume API for the current vncUrl, parse
port and password, write ~/.vnc.env via `lume ssh`, and restart the
computer-server LaunchAgent. This makes VNC available immediately.
Also reverts is_ready to HTTP-only check (no VNC phase needed).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lume): use pkill to restart computer-server after VNC config update

launchctl kickstart -k fails silently from a non-GUI SSH session.
Kill the python computer_server process directly so launchd revives
it with the new ~/.vnc.env config.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lume): actually delete VM on Sandbox.delete() instead of just stopping

_delete_local called LumeRuntime().suspend() which only stops the VM,
leaving it in lume's registry as 'stopped'. Add LumeRuntime.delete()
which stops then DELETEs via the lume API, and use it in _delete_local.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(tests): use :latest tag for macos-tahoe-cua (lume v0.3.4 can't pull by digest)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(qemu): check Homebrew/MacPorts paths on macOS; improve error message

qemu-system-x86_64 may be installed to /opt/homebrew/bin (Apple Silicon)
or /usr/local/bin (Intel) or /opt/local/bin (MacPorts) without those dirs
being on PATH in subprocess envs. Check known locations before failing.
Error message now also mentions MacPorts as an alternative.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(tests): remove Windows-host-only guard from windows local VM test

QEMU is cross-platform; the test should run on any host where qemu-system-x86_64 is available.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(sandbox): fall back to bare-metal QEMU for Windows when Docker unavailable

When Docker is not installed or not running, and the image is a Windows VM,
use bare-metal QEMU mode instead of failing with "Docker is not installed".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(bench): add --provision/--continue/--delete modes to android benchmark

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(sandbox): add pycdlib as a required dependency

pycdlib is used by the Windows ISO builder (windows_unattend.py) to create
the unattended install ISO. Without it, bare-metal Windows VM creation fails
with ModuleNotFoundError.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(qemu): find OVMF firmware in Homebrew's share/qemu/ layout

When QEMU is installed via Homebrew, the binary is at /opt/homebrew/bin/qemu-system-x86_64
but firmware files are at /opt/homebrew/share/qemu/. The previous search only looked
in <bin_dir>/share/ which doesn't exist. Add <bin_dir>/../share/qemu/ to the search path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(qemu): increase bare-metal boot timeout to 600s for Windows/Android

Windows and Android VMs need 3-10 minutes to boot. The previous 120s default
was causing launch to time out before the OS was ready.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(benchmark): add provision resume + lower default parallel to 4

--provision now reads the existing state file and only provisions the
remaining sandboxes to reach --sandboxes N, appending new names.
Default --parallel lowered from 2 to 4 (fewer concurrent provisions
to reduce kopf event-loop overload at scale).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(tests): add OrbStack and Homebrew to PATH in conftest

Ensures docker (OrbStack) and qemu (Homebrew) are found in subprocess calls
during pytest collection and test execution on macOS.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(benchmark): use Sandbox.connect(name=) for --continue reconnect

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(windows): skip test on macOS ARM; validate cached base image size

- Skip Windows local VM test on macOS Apple Silicon: x86_64 Windows via
  QEMU TCG (no hardware accel) would take hours to install and boot.
- Add minimum size check in ensure_base_image to detect and rebuild
  incomplete/corrupt base images left behind by failed builds.
- Remove unused QEMUBaremetalRuntime assignment in _build_windows_base.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(cloud-transport): fail fast on 4xx in _wait_for_server_ready + add debug logging

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(benchmark): skip 401 sandboxes in --continue, reconnect concurrently

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(benchmark): --continue no longer deletes sandboxes, use --delete explicitly

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(benchmark): fix --delete to use CloudTransport instead of broken Sandbox(name=)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(grpc-emulator): pre-register empty_pb2 to fix protobuf 6.x descriptor load

AddSerializedFile fails on protobuf 6.33+ if google/protobuf/empty.proto
hasn't been loaded yet. Import empty_pb2 before the serialized file to
pre-register it in the descriptor pool.

Also add demo/ scripts for fleet throughput and ephemeral F-Droid.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: replace Computer SDK references with Sandbox SDK throughout

- README.md: update packages table and hero code example to use cua-sandbox
- quickstart.mdx: install cua-sandbox instead of cua-computer; update hello/agent examples
- using-computer-sdk.mdx → using-sandbox-sdk.mdx: new doc with Sandbox SDK API
- using-agent-sdk.mdx: update Python examples to use Sandbox instead of Computer
- reference/sandbox-sdk/: new reference page for cua-sandbox API
- reference/meta.json + get-started/meta.json: update nav to sandbox-sdk

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(readme): unified API example + platform support matrix

* docs(readme): replace iOS with BYOI (.qcow2, .iso) in platform matrix

* docs(readme): move Cua SDK section above CuaBot

* docs(readme): new header + add sb.mobile.gesture() to example

* feat(sandbox): add sb.tunnel.forward() port-forwarding interface

Adds Tunnel interface with forward() supporting ADB (Android), gRPC
emulator, and SSH transports. Includes CDP-over-ADB test.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(tests): gate android tests on Java only, not pre-installed SDK

SDK auto-installs on first run; only Java is a hard prereq.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(android): check java returncode in _java_env() — macOS stub exits non-zero

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(tunnel): support abstract socket forwarding for Chrome DevTools on Android

adb forward tcp:0 localabstract:chrome_devtools_remote instead of tcp:9222.
Update test to use socket name and tunnel.port for all CDP URLs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(tests): add gym-pwa end-to-end Android test with CDP bonus

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(tests): disable Chrome FRE before launching gym-pwa

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(sandbox-sdk): fill documentation gaps vs Modal-style DX

- Add Sandbox section to guide: lifecycle, images, secrets, scale-out
- Add full sub-interface reference (shell, mouse, keyboard, screen,
  clipboard, tunnel, mobile, terminal, window, Localhost)
- Add migration guide from cua-computer to cua-sandbox
- Deprecate Computer SDK page with red callout + migration link
- Update quickstart with local Docker no-account path
- Update what-is-cua to reference Sandbox SDK instead of Computer Framework
- Wire all new pages into nav meta.json files

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(tests): clear Chrome data to bypass first-run wizard on emulator

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(pwa_install): accept keystore param, auto-configure bubblewrap, return fingerprint

- Image.android().pwa_install() now accepts keystore, keystore_alias,
  keystore_password params — pass the keystore bundled in your PWA repo
  for deterministic fingerprints baked into assetlinks.json
- _build_pwa_apk auto-installs @bubblewrap/cli via npm if not on PATH
- _build_pwa_apk auto-writes ~/.bubblewrap/config.json from known JDK/SDK
  paths — no manual interactive setup required
- Returns (apk_path, sha256_fingerprint) tuple
- _bw_init.js accepts keystore path/alias/password as positional args
- Remove get_pwa_keystore_fingerprint (keystore in repo is the pattern)
- test_android_local_gym_pwa uses Sandbox.ephemeral + pwa_install with
  the committed android.keystore; launches TWA app instead of Chrome

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(docs): rewrite quickstart to fix broken ephemeral/CLI flow

- Remove pre-create sandbox step — Sandbox.ephemeral manages its own lifecycle
- Remove outdated cua sandbox create --os/--size CLI usage
- Add local Docker path (no account needed) as primary hello world
- Fix VNC step to use Sandbox.create so sandbox is alive to open
- Clean up CLI reference to only show commands that are correct

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(docs): update CLI reference to real cua-cli sandbox commands

- Rewrite cli/commands.mdx with actual cua sb launch <image> syntax
  (image as positional arg, --cpu/--memory/--disk/--region as options)
- Document all image shorthands (macos, ubuntu:24.04, windows, android)
- Fix quickstart VNC/cleanup steps to use cua sb vnc / cua sb ls / cua sb delete
- Fix using-sandbox-sdk.mdx CLI comment to show correct launch syntax
- Remove libs/python/cli (old mock CLI replaced by cua-cli)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(pwa_install): use Modal-hosted gym-pwa, fix 10.0.2.2 manifest fetch

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(examples): add sandboxes section from examples/sandboxes/

One page per OS family (linux, macos, windows, android, custom-images),
each showing cloud + local variants with runnable code.

Every code block carries a `# source:` comment pointing to the corresponding
test file in examples/sandboxes/ so a future CI workflow can verify that
every doc example has a live test case.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(pwa_install): auto-install Android build-tools required by bubblewrap

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(what-is-cua): rewrite as concise Modal-style intro with full example

- Lead with a complete sandbox + agent snippet instead of graphics/diagrams
- Show the full API surface inline (shell, screenshot, mouse, keyboard, mobile, tunnel)
- Show the image builder pattern
- Remove ASCII diagrams and redundant explanation prose
- Keep use cases and next-steps links

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(quickstart): use sb.get_display_url(share=True) for live view

Replace persistent sandbox + CLI vnc with get_display_url(share=True)
inline in the agent script — simpler, no CLI needed, works with ephemeral.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: remove self-hosted sandboxes page

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: rename Fundamentals section to Agent

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(pwa_install): prefer Java 21, fix jdkPath bundle format, create tools/ stub

- Auto-detect openjdk@21 (Gradle 8.x requires Java ≤ 21; openjdk@25 breaks)
- bubblewrap jdkPath must be .jdk bundle root (it appends Contents/Home)
- JAVA_HOME for gradle resolves to Contents/Home from the bundle
- Create sdk/tools/ stub so bubblewrap SDK validation passes
- Install build-tools;34.0.0 if missing

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(pwa_install): suppress Chrome FRE via set-debug-app + command-line flags

After installing the TWA APK, use adb to:
1. am set-debug-app --persistent com.android.chrome (enables flag file)
2. Write chrome-command-line with --no-first-run --disable-fre

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(pwa_install): don't override startUrl with manifest path in _bw_init.js

twa.startUrl should come from the Web App Manifest's start_url field,
not from the manifest file URL's pathname (which was /manifest.json).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(examples): add android local gym-pwa e2e test

End-to-end test for the gym-pwa PWA running as a TWA on a local Android
emulator. Uses the Modal-hosted gym at cuaai--todo-gym-web.modal.run.

Flow:
- POST /api/gym/start/add_item → fresh session + task prompt
- Launch TWA, warm-up, re-launch to pick up session
- Agent taps input, types "Buy groceries", taps Add
- GET /api/gym/evaluate (x-session-id header) → reward == 1.0
- CDP verification: query li span text via Chrome DevTools Remote

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(examples): gym-pwa test uses ?session= URL + bgColor for session isolation

- POST /api/gym/start with bgColor; get back sessionId
- CDP Page.navigate to /?session=<id>&bg=<color> after TWA warm-up
- All API calls pass x-session-id header; no shared server state needed
- Pre-agent screenshot saved to /tmp/gym_pwa_pre_agent.png

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(examples): use lighter bg color for gym-pwa test

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: expand pwa_install docs with full params, signing flow, and requirements

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(docs): auto-generate sandbox SDK reference from source

Add cua-sandbox to SDK_CONFIGS in python-sdk.ts generator so the
reference page is generated from docstrings via griffe, matching the
format of computer-sdk and agent-sdk reference pages.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(docs): replace cua-computer imports with cua-sandbox across guide and examples

Update all code blocks referencing the deprecated cua-computer SDK to
use cua-sandbox equivalents (Sandbox, Image) across guide, examples,
and reference pages.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(docs): move interactive-shell + add tunneling to Sandbox section

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(examples): move test_android_local_gym_pwa to examples/sandboxes/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* ci: add cua-sandbox to bump/publish pipeline

- Add .bumpversion.cfg for sandbox-v* tag format
- Add cd-py-sandbox.yml workflow triggered by sandbox-v* tags
- Add pypi/sandbox option to release-bump-version.yml

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>