mirror of
https://github.com/trycua/cua.git
synced 2026-03-26 22:08:16 +00:00
* feat(cua-sandbox): Add sandbox SDK with QEMU WSL2/KVM, Hyper-V, and Docker runtimes - New cua-sandbox package: declarative Image API, layered disk caching, multi-runtime support - QEMU WSL2 runtime: runs QEMU inside WSL2 with KVM hardware acceleration on Windows - Hyper-V runtime: builds Windows images from ISO with native Hyper-V Gen2 VMs - Shared Windows unattended install (builder/windows_unattend.py): Autounattend.xml, ISO creation - OCI registry push/pull for QEMU disk images - Computer-server setup script installs cua-computer-server only (no PyTorch/agent) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs(cua-sandbox): Add usage examples to README Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(cua-sandbox): Add cloud transport with ephemeral VM support Cloud sandboxes are now the default path — sandbox() connects to the CUA platform API, provisions VMs, and delegates control via HTTPTransport. Ephemeral inference: image= creates+destroys, name= connects only. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(cua-sandbox): Add Android emulator runtime, transports, and example sandboxes Adds AndroidEmulatorRuntime with headless toggle, ADB/VNC/SSH/QMP transports, cloud transport timeout increase (10min), and example sandbox scripts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(cua-sandbox): Add ephemeral cloud sandbox example Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(cua-sandbox): Remove name from ephemeral cloud example to trigger VM creation Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(cua-sandbox): Add Mobile interface for Android touch, gestures, and hardware keys Adds sb.mobile.* methods (tap, swipe, scroll, pinch, home, back, etc.) backed by ADB shell commands, and an ephemeral Android example. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(ci): pass SLACK_WEBHOOK to cold start benchmark step * add benchmark script * feat(android): true MT Protocol B multitouch, gesture() API, auto port detection - mobile.py: replace asyncio.gather pinch with single-shell MT Protocol B sendevent script; add gesture(*finger_paths) primitive; pinch_in/pinch_out delegate to gesture() - android_emulator.py: make adb_port Optional[int]=None; add _find_free_emulator_port() scanning even console ports 5554-5682 via socket.bind - examples/touch_test_app/: Android APK logging every MotionEvent as JSON to Logcat under tag "TouchTest"; supports RESET_LOG broadcast - tests/test_android_multitouch.py: integration test suite using sandbox() context manager; Local/Cloud split (Cloud skipped without CUA_TEST_API_KEY) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(sandbox): add get_display_url(share=False) across transports share=False → vnc://localhost:{port} for local VNC runtimes, https://cua.ai/connect/incus/{name} for cloud (auth-gated) share=True → noVNC/ws-scrcpy URL with embedded password (cloud only) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * add ephemeral android test * refactor(tests): move TouchTest APK to standalone repo; download from releases - Remove examples/touch_test_app — now lives at https://github.com/trycua/android-touch-test-app - test_android_multitouch.py: download APK from GitHub Releases by default (latest release URL) instead of building from source - CUA_ANDROID_TEST_APK can still be set to a local path to override Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(tests): implement cloud Android multitouch tests Extract shared test logic into _MultitouchTests mixin so Local and Cloud classes run identical assertions. Add cloud_android_sb session fixture that spins up an ephemeral cloud Android VM, installs the TouchTest APK via curl + pm install, and yields the ready sandbox. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(sandbox): implement apk_install for cloud transport; simplify root escalation - CloudTransport._apply_image_layers: applies apk_install/run layers after server is ready (curl + pm install on device) - Replace transport._adb_cmd("root") with sb.shell.run("su root id") in local fixture for consistency with cloud - Cloud fixture now uses Image.android("14").apk_install(url) same as local Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(sandbox): add multitouch_gesture server action; fix cloud multi-touch injection Move MT Protocol B sendevent injection to a server-side `multitouch_gesture` action so that `adb root` can be called before injecting events. This fixes cloud Android VMs where `su root sendevent` runs silently but events are not delivered to the app (likely SELinux blocking kernel input injection from the su context). Changes: - computer-server: add `multitouch_gesture` to AndroidAutomationHandler — calls `adb root`, detects touch device + axis range via `getevent -p`, builds and runs MT Protocol B sendevent script as root adbd - computer-server/main.py: register `multitouch_gesture` in handlers map - mobile.py: `gesture()` now sends the `multitouch_gesture` action with structured JSON params instead of building a shell script client-side; remove `_build_two_finger_script` and MT Protocol B helpers (logic in server) - adb.py: handle `multitouch_gesture` via `adb root` + sendevent (local path) - tests: `test_true_multitouch_*` use `sb.mobile.gesture()` instead of manual sendevent scripts; remove `su root id` escalation from fixtures Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(sandbox): add _apply_image_layers to CloudTransport for apk_install support Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(computer-server): add missing logger in android handler Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(computer-server): fix duplicate logger definition Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(cua-sandbox): replace sandbox()/close() with Sandbox.create/connect/ephemeral + disconnect/destroy - Sandbox.create(image) — provision a persistent sandbox - Sandbox.connect(name) — attach to an existing sandbox - Sandbox.ephemeral(image) — async context manager, auto-destroys on exit - Sandbox.disconnect() — drop connection, sandbox keeps running - Sandbox.destroy() — disconnect + permanently delete - Localhost.close() renamed to disconnect() - sandbox() module-level function kept as deprecated shim - Updated all tests, examples, conftest, agent docstring, and README Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(cua-sandbox): add Localhost.connect() and make Sandbox.connect() dual-mode - _ConnectResult supports both await and async with on connect() - Sandbox.connect("name") works as plain await or context manager (disconnects on exit) - Localhost.connect() mirrors the same pattern - localhost() module-level function kept as deprecated shim - conftest fixtures updated to use Localhost.connect() - README updated Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs(cua-sandbox): update README with new API and connect() dual-mode examples Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat: add JPEG screenshot support and Android RL fleet benchmark computer-server: add format/quality params to screenshot() on all handlers (android, linux, macos, windows, base). Defaults to PNG for backwards compat; pass format="jpeg" to get ~5-10x smaller payloads for RL workloads. The existing inspect.signature dispatch picks up the new params automatically. cua-sandbox: thread format/quality through Transport.screenshot(), HTTPTransport, CloudTransport, Screen interface, and Sandbox.screenshot() so callers can do sb.screenshot(format="jpeg", quality=85). tests: add android_rps_benchmark.py — provisions N Android sandboxes in parallel and drives them at a target aggregate RPS with per-command latency logging, p50/p95/p99 reporting, and PASS/FAIL verdict for RL infra validation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(cua-sandbox): update default screenshot quality to 95 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(cua-sandbox): add pwa_install — build & install TWA APK from a PWA manifest URL - Image.pwa_install(manifest_url) — new Android-only chainable layer that uses Bubblewrap to generate a signed debug APK from a Web App Manifest URL and install it via adb - _bw_init.js — Node.js helper that calls @bubblewrap/core directly to generate twa-manifest.json non-interactively (bypasses the interactive CLI) - AndroidEmulatorRuntime._apply_layers: handle pwa_install layer (init → update → build → adb install); auto-creates debug keystore; passes passwords via env vars; caches built APKs by manifest URL hash - transport/*: add format/quality params to all screenshot() implementations; add convert_screenshot() helper in base.py for png→jpeg conversion - examples/pwa_install_test.py: end-to-end test — installs Starbucks PWA, resolves launcher activity dynamically, launches and screenshots Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(benchmark): refactor android benchmark to measure max RPS Remove --target-rps / _TokenBucket / PASS-FAIL verdict; workers now loop as fast as possible so the run measures achievable throughput. Add flush=True globally for real-time log output, and use JPEG screenshots. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(cua-sandbox): validate screenshot magic bytes match requested format Raise ValueError if the returned image magic bytes don't match the requested format, e.g. requested 'jpeg' but got 'png' (magic bytes: 89504e47). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(benchmark): add local android benchmark using AndroidEmulatorRuntime Mirror of android_rps_benchmark.py but uses local=True + AndroidEmulatorRuntime for baremetal comparison against cloud. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(cua-sandbox): add JPEG conversion to ADBTransport.screenshot ADBTransport always returned PNG regardless of the format parameter. Now converts to JPEG via Pillow when format='jpeg'/'jpg', matching the behaviour of the server-side android handler. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(cua-sandbox): run ADB subprocess calls in thread executor _adb_cmd was a synchronous subprocess.run that blocked the event loop, preventing asyncio.sleep timers and task cancellation from firing on time. Add _adb_cmd_async which runs _adb_cmd via loop.run_in_executor, and switch screenshot, get_screen_size, and send to use it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * perf(cua-sandbox): use raw RGBA screencap + simplejpeg for faster JPEG screenshots Replace PNG screencap + PIL JPEG encode with raw RGBA screencap (no emulator-side PNG encode) + simplejpeg (libjpeg-turbo, fastdct=True). Skips the emulator-side PNG encode entirely and uses a faster JPEG encoder on the host. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * perf(cua-sandbox): revert to PNG screencap, keep simplejpeg for host-side encode Raw RGBA screencap transfers ~10MB over ADB vs ~1-2MB for PNG (emulator compresses before sending). Revert to -p PNG screencap, but use simplejpeg (libjpeg-turbo, fastdct) instead of PIL for the host-side JPEG encode. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * revert(cua-sandbox): revert simplejpeg, back to PIL for JPEG encode simplejpeg showed no measurable improvement over PIL (p50 507ms vs 519ms, within noise). The bottleneck is ADB transfer (~400ms), not encode time. PIL produces smaller output (219KB vs 305KB) due to 4:2:0 vs 4:4:4 subsampling. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(cua-sandbox): add GRPCEmulatorTransport for fast Android screenshots The Android emulator's gRPC service (EmulatorController) bypasses ADB entirely, reducing screenshot latency from ~500ms to ~50ms. Changes: - Add GRPCEmulatorTransport using getScreenshot(RGB888) + PIL JPEG encode - Generate protobuf stubs from emulator_controller.proto into transport/_grpc_emulator/ - AndroidEmulatorRuntime now launches with -grpc <port> and sets grpc_port in RuntimeInfo - sandbox._create picks GRPCEmulatorTransport when grpc_port is set, else falls back to ADB - Add grpcio>=1.60.0 to cua-sandbox dependencies Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(cua-sandbox): add protobuf dependency for gRPC emulator stubs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(cua-sandbox): fix gRPC stubs and increase max message size to 32MB - Regenerate emulator_controller stubs with grpcio-tools/_proto include path to resolve 'google/protobuf/empty.proto not loaded' error - Fix relative import in generated grpc stub (bare import → from . import) - Increase gRPC channel max_receive/send_message_length to 32MB (RGB888 screenshot is ~6MB, exceeding the 4MB default) Result: gRPC screenshot transport now fully functional. Benchmark: 48.90 RPS / p50=20ms vs ADB baseline 1.80 RPS / p50=519ms (27x faster) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs(computer-server): note Android emulator gRPC interface and GRPCEmulatorTransport Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(cua-sandbox): implement touch/click and fix screen_size in GRPCEmulatorTransport - send() now handles left_click, right_click, double_click, mouse_down, mouse_up via EmulatorController.sendTouch() (press + release TouchEvent pair) - move_cursor is a no-op (no hover concept on Android) - Fix get_screen_size(): was requesting 1x1 thumbnail which returned 1080x1; now requests full PNG so emulator returns native display dimensions - Regenerate _grpc_emulator stubs with grpcio-tools/_proto include path Benchmark (--action step = screen_size + tap + screenshot): 42.2 RPS / p50=22ms / p95=32ms Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(cua-sandbox): full gRPC transport — multitouch, shell fallback, sync channel - Switch grpc.aio → sync grpc channel + run_in_executor Avoids "Future attached to a different loop" in pytest session fixtures - Add shell/run_command handler (ADB fallback via _find_adb) - Add multitouch_gesture: interpolated N-finger sendTouch frames sent simultaneously per frame — passes all 17 multitouch tests - Pass serial + sdk_root to GRPCEmulatorTransport from sandbox._create - Regenerate _grpc_emulator stubs All 17 TestAndroidMultitouchLocal tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(cua-sandbox): pin grpcio==1.78.0 and protobuf==6.31.1 Generated stubs require exact versions — grpcio-tools 1.78.0 was used to regenerate and emulator_controller_pb2.py calls ValidateProtobufRuntimeVersion with 6.31.1. Pinning eliminates stub regeneration on venv recreation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(cua-sandbox): add agoda media type backward-compat aliases for ghcr.io images Existing images on ghcr.io still use vnd.agoda.macosvz.* types. Keep them as OCI_VM_{CONFIG,DISK,AUX}_LEGACY constants, include in VM_MEDIA_TYPES, and match them in detect_format/detect_os so pulling those images still works. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(lume): fix VNC backend, port, and pull ref for macos-tahoe-cua - Change LUME_API_PORT from 8000 to 8443 (setup-cua.sh uses port 8443) - Fix ConnectTimeout not caught in is_ready — was propagating immediately instead of retrying - Fix pull payload: split full OCI ref (e.g. ghcr.io/trycua/img:tag) into registry/organization/image components to avoid lume API double-prefixing the org - Install cua-computer-server[vnc] (includes vncdotool/twisted) in setup-cua.sh — required for VNC backend screenshots - Add test_lume_macos_tahoe_cua test using Image.from_registry with LumeRuntime - Replace vnd.agoda.macosvz media types with vnd.trycua.lume, keep legacy as backward-compat constants Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(cua-sandbox): auto-runtime, transport selection, macOS versions, error handling - Fix local=True with no runtime not calling _auto_runtime — now auto-selects DockerRuntime/QEMURuntime/LumeRuntime/AndroidEmulatorRuntime/HyperVRuntime - Fix transport selection preferring VNCTransport over HTTPTransport when both api_port and vnc_port are set (e.g. Docker containers, Lume VMs) - Add MACOS_VERSION_IMAGES dict mapping version strings to OCI refs ("15"/"sequoia" → macos-sequoia-cua, "26"/"tahoe" → macos-tahoe-cua) - Image.macos() now validates version and errors with supported list; default "26" - LumeRuntime: handle async pull (ReadError on connection close), bump _wait_for_ip timeout to 3600s for large image pulls, use version map - Add httpx.ReadError to is_ready exception handlers in docker/hyperv/lume - Add auto-runtime tests (linux container, linux vm, macos, android, windows) - Add cloud ephemeral tests (linux, android) and Sandbox.create persistent tests - Fix test_macos_vm hardcoded api_port=18005 → LumeRuntime() with default port Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(examples): replace legacy computer SDK examples with Cua Sandbox SDK - Remove all examples using the old computer/agent SDK imports - Add 11 new pytest-compatible examples covering all supported runtimes: linux/macos/windows/android × local/cloud × container/vm - Each example is both runnable (if __name__ == "__main__") and a pytest test - Docstrings optimized for answer engine discoverability - Wire examples/sandboxes/ into pytest testpaths in pyproject.toml Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(sandbox-sdk): persistent sandboxes, auto-ports, pull progress, lume async pull Python SDK: - Add random two-word sandbox names (_random_name) instead of "cua-sandbox" fallback - Add _find_free_port() to docker/qemu runtimes to avoid port conflicts - Add AndroidEmulatorRuntime with list/stop support, wired into _list_local - Parallelize cua sb ls across Docker/Lume/QEMU/Android runtimes - Fix UnboundLocalError for conditional HTTPTransport import - Fix sandbox name resolution after runtime start (resolved_name) - Fix Android reconnect to use GRPCEmulatorTransport - Fix cua sb delete to skip confirmation prompt in non-interactive mode - Add sandbox_state.py with grpc_port/adb_serial/sdk_root params - Suppress httpx/cua_sandbox INFO logs in CLI output Lume: - Add POST /lume/pull/start async endpoint (202 immediately, polls via GET /lume/vms/{name}) - Add PullProgressTracker actor tracking download % per VM name - Add downloadProgress field to GET /lume/vms/{name} during pulls - Fix setProgress to clear stale errors so retries work - Add progressHandler to pullImage(), handlePull, and lume pull CLI - Add setTotal() in pullOCI so progress % is accurate (was always 0%) - Unify /lume/pull and /lume/pull/start to both use progressHandler - Add diagnostic logging for OCI config/nvram layer parsing - Fix _wait_for_ip to raise immediately if VM status is "stopped" - Reduce _wait_for_ip timeout from 3600s to 300s Examples: - Add examples/sandboxes-cli/ with CLI-based persistent sandbox tests - Tests assert VM appears in cua sb ls --all after launch and disappears after delete Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(lume): catch ReadError on sync pull fallback with helpful auth hint Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(lume): handle lume v0.3.x connection drop on sync pull — check VM exists after ReadError Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(lume): catch ReadError on /pull/start for lume v0.3.x compat Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(lume): poll VM status after /pull/start connection drop (lume v0.3.x) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(lume): handle lume v0.3.x compat — sync pull + connection drop lume v0.3.4 doesn't have /pull/start (drops connection immediately) and also drops the connection on /lume/pull when done. Fall back to sync /pull, handle the ReadError by verifying VM was created, then run the VM and return directly instead of falling through to the async poll path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(lume): find lume binary in ~/.local/bin when not on PATH lume installs to ~/.local/bin which may not be in PATH for non-interactive shells (e.g. SSH sessions, LaunchAgents). Fall back to checking the common install location directly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(lume,tests): redirect progress to stderr; add ~/.local/bin to PATH in tests - lume.py: all pull progress prints go to sys.stderr so --json output is clean JSON on stdout (fixes JSONDecodeError in test_macos_local_vm) - conftest.py: pytest_configure adds ~/.local/bin to PATH so cua/lume binaries installed there are found in non-interactive shells Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(tests): pin macos-tahoe-cua to known-good sha256 digest Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(lume): wait for VNC readiness in is_ready(), not just HTTP /status macOS VNC (Screen Sharing) starts after the HTTP computer-server, so screenshot() fails immediately after launch. is_ready() now polls POST /cmd screenshot until VNC accepts connections before returning. Timeout extended to 180s to cover both phases. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(lume): deliver VNC config to VM before is_ready check lume v0.3.x doesn't push VNC port/password to the VM via VirtioFS, so the computer-server uses a stale ~/.vnc.env from a previous run. After _wait_for_ip, query the lume API for the current vncUrl, parse port and password, write ~/.vnc.env via `lume ssh`, and restart the computer-server LaunchAgent. This makes VNC available immediately. Also reverts is_ready to HTTP-only check (no VNC phase needed). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(lume): use pkill to restart computer-server after VNC config update launchctl kickstart -k fails silently from a non-GUI SSH session. Kill the python computer_server process directly so launchd revives it with the new ~/.vnc.env config. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(lume): actually delete VM on Sandbox.delete() instead of just stopping _delete_local called LumeRuntime().suspend() which only stops the VM, leaving it in lume's registry as 'stopped'. Add LumeRuntime.delete() which stops then DELETEs via the lume API, and use it in _delete_local. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(tests): use :latest tag for macos-tahoe-cua (lume v0.3.4 can't pull by digest) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(qemu): check Homebrew/MacPorts paths on macOS; improve error message qemu-system-x86_64 may be installed to /opt/homebrew/bin (Apple Silicon) or /usr/local/bin (Intel) or /opt/local/bin (MacPorts) without those dirs being on PATH in subprocess envs. Check known locations before failing. Error message now also mentions MacPorts as an alternative. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(tests): remove Windows-host-only guard from windows local VM test QEMU is cross-platform; the test should run on any host where qemu-system-x86_64 is available. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(sandbox): fall back to bare-metal QEMU for Windows when Docker unavailable When Docker is not installed or not running, and the image is a Windows VM, use bare-metal QEMU mode instead of failing with "Docker is not installed". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(bench): add --provision/--continue/--delete modes to android benchmark Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(sandbox): add pycdlib as a required dependency pycdlib is used by the Windows ISO builder (windows_unattend.py) to create the unattended install ISO. Without it, bare-metal Windows VM creation fails with ModuleNotFoundError. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(qemu): find OVMF firmware in Homebrew's share/qemu/ layout When QEMU is installed via Homebrew, the binary is at /opt/homebrew/bin/qemu-system-x86_64 but firmware files are at /opt/homebrew/share/qemu/. The previous search only looked in <bin_dir>/share/ which doesn't exist. Add <bin_dir>/../share/qemu/ to the search path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(qemu): increase bare-metal boot timeout to 600s for Windows/Android Windows and Android VMs need 3-10 minutes to boot. The previous 120s default was causing launch to time out before the OS was ready. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(benchmark): add provision resume + lower default parallel to 4 --provision now reads the existing state file and only provisions the remaining sandboxes to reach --sandboxes N, appending new names. Default --parallel lowered from 2 to 4 (fewer concurrent provisions to reduce kopf event-loop overload at scale). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(tests): add OrbStack and Homebrew to PATH in conftest Ensures docker (OrbStack) and qemu (Homebrew) are found in subprocess calls during pytest collection and test execution on macOS. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(benchmark): use Sandbox.connect(name=) for --continue reconnect Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(windows): skip test on macOS ARM; validate cached base image size - Skip Windows local VM test on macOS Apple Silicon: x86_64 Windows via QEMU TCG (no hardware accel) would take hours to install and boot. - Add minimum size check in ensure_base_image to detect and rebuild incomplete/corrupt base images left behind by failed builds. - Remove unused QEMUBaremetalRuntime assignment in _build_windows_base. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(cloud-transport): fail fast on 4xx in _wait_for_server_ready + add debug logging Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(benchmark): skip 401 sandboxes in --continue, reconnect concurrently Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(benchmark): --continue no longer deletes sandboxes, use --delete explicitly Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(benchmark): fix --delete to use CloudTransport instead of broken Sandbox(name=) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(grpc-emulator): pre-register empty_pb2 to fix protobuf 6.x descriptor load AddSerializedFile fails on protobuf 6.33+ if google/protobuf/empty.proto hasn't been loaded yet. Import empty_pb2 before the serialized file to pre-register it in the descriptor pool. Also add demo/ scripts for fleet throughput and ephemeral F-Droid. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs: replace Computer SDK references with Sandbox SDK throughout - README.md: update packages table and hero code example to use cua-sandbox - quickstart.mdx: install cua-sandbox instead of cua-computer; update hello/agent examples - using-computer-sdk.mdx → using-sandbox-sdk.mdx: new doc with Sandbox SDK API - using-agent-sdk.mdx: update Python examples to use Sandbox instead of Computer - reference/sandbox-sdk/: new reference page for cua-sandbox API - reference/meta.json + get-started/meta.json: update nav to sandbox-sdk Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs(readme): unified API example + platform support matrix * docs(readme): replace iOS with BYOI (.qcow2, .iso) in platform matrix * docs(readme): move Cua SDK section above CuaBot * docs(readme): new header + add sb.mobile.gesture() to example * feat(sandbox): add sb.tunnel.forward() port-forwarding interface Adds Tunnel interface with forward() supporting ADB (Android), gRPC emulator, and SSH transports. Includes CDP-over-ADB test. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(tests): gate android tests on Java only, not pre-installed SDK SDK auto-installs on first run; only Java is a hard prereq. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(android): check java returncode in _java_env() — macOS stub exits non-zero Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(tunnel): support abstract socket forwarding for Chrome DevTools on Android adb forward tcp:0 localabstract:chrome_devtools_remote instead of tcp:9222. Update test to use socket name and tunnel.port for all CDP URLs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(tests): add gym-pwa end-to-end Android test with CDP bonus Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(tests): disable Chrome FRE before launching gym-pwa Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs(sandbox-sdk): fill documentation gaps vs Modal-style DX - Add Sandbox section to guide: lifecycle, images, secrets, scale-out - Add full sub-interface reference (shell, mouse, keyboard, screen, clipboard, tunnel, mobile, terminal, window, Localhost) - Add migration guide from cua-computer to cua-sandbox - Deprecate Computer SDK page with red callout + migration link - Update quickstart with local Docker no-account path - Update what-is-cua to reference Sandbox SDK instead of Computer Framework - Wire all new pages into nav meta.json files Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(tests): clear Chrome data to bypass first-run wizard on emulator Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(pwa_install): accept keystore param, auto-configure bubblewrap, return fingerprint - Image.android().pwa_install() now accepts keystore, keystore_alias, keystore_password params — pass the keystore bundled in your PWA repo for deterministic fingerprints baked into assetlinks.json - _build_pwa_apk auto-installs @bubblewrap/cli via npm if not on PATH - _build_pwa_apk auto-writes ~/.bubblewrap/config.json from known JDK/SDK paths — no manual interactive setup required - Returns (apk_path, sha256_fingerprint) tuple - _bw_init.js accepts keystore path/alias/password as positional args - Remove get_pwa_keystore_fingerprint (keystore in repo is the pattern) - test_android_local_gym_pwa uses Sandbox.ephemeral + pwa_install with the committed android.keystore; launches TWA app instead of Chrome Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(docs): rewrite quickstart to fix broken ephemeral/CLI flow - Remove pre-create sandbox step — Sandbox.ephemeral manages its own lifecycle - Remove outdated cua sandbox create --os/--size CLI usage - Add local Docker path (no account needed) as primary hello world - Fix VNC step to use Sandbox.create so sandbox is alive to open - Clean up CLI reference to only show commands that are correct Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(docs): update CLI reference to real cua-cli sandbox commands - Rewrite cli/commands.mdx with actual cua sb launch <image> syntax (image as positional arg, --cpu/--memory/--disk/--region as options) - Document all image shorthands (macos, ubuntu:24.04, windows, android) - Fix quickstart VNC/cleanup steps to use cua sb vnc / cua sb ls / cua sb delete - Fix using-sandbox-sdk.mdx CLI comment to show correct launch syntax - Remove libs/python/cli (old mock CLI replaced by cua-cli) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(pwa_install): use Modal-hosted gym-pwa, fix 10.0.2.2 manifest fetch Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs(examples): add sandboxes section from examples/sandboxes/ One page per OS family (linux, macos, windows, android, custom-images), each showing cloud + local variants with runnable code. Every code block carries a `# source:` comment pointing to the corresponding test file in examples/sandboxes/ so a future CI workflow can verify that every doc example has a live test case. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(pwa_install): auto-install Android build-tools required by bubblewrap Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs(what-is-cua): rewrite as concise Modal-style intro with full example - Lead with a complete sandbox + agent snippet instead of graphics/diagrams - Show the full API surface inline (shell, screenshot, mouse, keyboard, mobile, tunnel) - Show the image builder pattern - Remove ASCII diagrams and redundant explanation prose - Keep use cases and next-steps links Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs(quickstart): use sb.get_display_url(share=True) for live view Replace persistent sandbox + CLI vnc with get_display_url(share=True) inline in the agent script — simpler, no CLI needed, works with ephemeral. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs: remove self-hosted sandboxes page Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs: rename Fundamentals section to Agent Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(pwa_install): prefer Java 21, fix jdkPath bundle format, create tools/ stub - Auto-detect openjdk@21 (Gradle 8.x requires Java ≤ 21; openjdk@25 breaks) - bubblewrap jdkPath must be .jdk bundle root (it appends Contents/Home) - JAVA_HOME for gradle resolves to Contents/Home from the bundle - Create sdk/tools/ stub so bubblewrap SDK validation passes - Install build-tools;34.0.0 if missing Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(pwa_install): suppress Chrome FRE via set-debug-app + command-line flags After installing the TWA APK, use adb to: 1. am set-debug-app --persistent com.android.chrome (enables flag file) 2. Write chrome-command-line with --no-first-run --disable-fre Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(pwa_install): don't override startUrl with manifest path in _bw_init.js twa.startUrl should come from the Web App Manifest's start_url field, not from the manifest file URL's pathname (which was /manifest.json). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(examples): add android local gym-pwa e2e test End-to-end test for the gym-pwa PWA running as a TWA on a local Android emulator. Uses the Modal-hosted gym at cuaai--todo-gym-web.modal.run. Flow: - POST /api/gym/start/add_item → fresh session + task prompt - Launch TWA, warm-up, re-launch to pick up session - Agent taps input, types "Buy groceries", taps Add - GET /api/gym/evaluate (x-session-id header) → reward == 1.0 - CDP verification: query li span text via Chrome DevTools Remote Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(examples): gym-pwa test uses ?session= URL + bgColor for session isolation - POST /api/gym/start with bgColor; get back sessionId - CDP Page.navigate to /?session=<id>&bg=<color> after TWA warm-up - All API calls pass x-session-id header; no shared server state needed - Pre-agent screenshot saved to /tmp/gym_pwa_pre_agent.png Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(examples): use lighter bg color for gym-pwa test Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs: expand pwa_install docs with full params, signing flow, and requirements Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(docs): auto-generate sandbox SDK reference from source Add cua-sandbox to SDK_CONFIGS in python-sdk.ts generator so the reference page is generated from docstrings via griffe, matching the format of computer-sdk and agent-sdk reference pages. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(docs): replace cua-computer imports with cua-sandbox across guide and examples Update all code blocks referencing the deprecated cua-computer SDK to use cua-sandbox equivalents (Sandbox, Image) across guide, examples, and reference pages. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(docs): move interactive-shell + add tunneling to Sandbox section Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(examples): move test_android_local_gym_pwa to examples/sandboxes/ Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * ci: add cua-sandbox to bump/publish pipeline - Add .bumpversion.cfg for sandbox-v* tag format - Add cd-py-sandbox.yml workflow triggered by sandbox-v* tags - Add pypi/sandbox option to release-bump-version.yml Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>