255 Commits

Author SHA1 Message Date
ddupont
1b724fa92a fix: upload swift build log as artifact on failure (#1231)
swift build output was silently redirected to build.log with no way to
inspect it on failure. Upload it as an artifact (7-day retention) so
build errors are accessible without changing the job log output.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 21:14:57 -07:00
ddupont
547752b6fa docs: update imports and install commands to use cua metapackage (#1228)
* docs: update imports and install commands to use cua metapackage

- Replace `from cua_sandbox import` / `from agent import` / `from agent.tools import` / `from agent.callbacks import` with `from cua import`
- Replace `pip install cua-sandbox` / `pip install cua-agent[...]` with `pip install cua[...]`
- Replace all-caps CUA (brand name) with Cua in prose (preserving env vars and CUA-Bench)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(readme): add pip install cua snippet, update Python version, fix imports

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(docs): revert versioned cua-agent==X.Y.Z references back to cua-agent

Those versions were published as cua-agent, not cua.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(cua): expose cua_sandbox.runtime classes from cua metapackage

Allows `from cua import QEMURuntime, TartRuntime` etc.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(cua): add cua.runtime and cua.tools submodules; fix docs imports

- cua.runtime re-exports all cua_sandbox.runtime classes (QEMURuntime, TartRuntime, etc.)
- cua.tools re-exports agent.tools + ToolError/IllegalArgumentError from agent.types
- Remove runtime symbols from cua top-level __init__
- Fix docs: from cua import BrowserTool/BaseTool/ToolError → from cua.tools import ...
- Fix docs: from cua import TartRuntime/QEMURuntime → from cua.runtime import ...

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(cua): add cua.callbacks submodule; fix callbacks.mdx imports

- cua.callbacks re-exports all agent.callbacks handlers
- Fix docs/cua/guide/fundamentals/callbacks.mdx to use from cua.callbacks import ...

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: replace pip install "cua[all]" with pip install cua

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(examples): replace cua-agent/cua-sandbox in requirements.txt blocks with cua

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: replace 'import cua_sandbox as cua' with 'import cua'

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: consolidate consecutive 'from cua import' lines into single imports

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 16:11:58 -07:00
ddupont
d1bc0764f6 feat: add cua meta-package and unify telemetry opt-out (#1225)
* feat: add cua meta-package and unify telemetry opt-out

Meta-package (pip install cua):
- New libs/python/cua package exposing unified API:
  from cua import Sandbox, Image, ComputerAgent
- Depends on cua-sandbox, cua-agent[cloud], cua-cli
- cua-agent surface uses lazy __getattr__ imports to avoid
  import-time side effects when only sandbox symbols are needed
- .bumpversion.cfg, cd-py-cua.yml publish workflow, and
  pypi/cua entry in release-bump-version.yml

Telemetry:
- Unify opt-out: CUA_TELEMETRY_ENABLED=false is now canonical
  for both PostHog and OTEL; CUA_TELEMETRY_DISABLED emits a
  DeprecationWarning and is honoured for backwards compatibility
- Move installation ID from site-packages to ~/.config/cua/
  so it survives upgrades and is shared across venvs
- Add cua-core dep to cua-sandbox; instrument sandbox lifecycle
  with sandbox_create and sandbox_destroy PostHog events;
  add telemetry_enabled param to create/connect/ephemeral
- Instrument cua-cli: cli_command event on every invocation
  via try/finally (command, subcommand, status, duration_seconds)
- Fix TESTING.md to use CUA_TELEMETRY_ENABLED=false

* fix: correct docstring/README sandbox scope and Windows telemetry env var

* fix(core): fix telemetry tests failing when CUA_TELEMETRY_DISABLED is set in CI

- Clear CUA_TELEMETRY_DISABLED env var in tests that assert telemetry is enabled
- Fix Path.home() mock chain to match actual usage pattern
- Fix read_text().strip() mock to return string instead of MagicMock

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: replace deprecated CUA_TELEMETRY_DISABLED with CUA_TELEMETRY_ENABLED=false

Update CI workflows, all test conftest.py fixtures, and comments to use
the current CUA_TELEMETRY_ENABLED=false env var instead of the deprecated
CUA_TELEMETRY_DISABLED=1, eliminating DeprecationWarnings that were
causing test failures.

Also fix isort import ordering in cua-sandbox and computer-server files.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lint): apply black formatting to 14 files

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lint): fix 4 ruff errors (unused vars, ambiguous name)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lint): fix remaining isort and black issues

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat: add SandboxComputerHandler and linux agent example

- Add SandboxComputerHandler in agent/computers/sandbox.py that adapts
  cua_sandbox.Sandbox to the AsyncComputerHandler protocol
- Wire Sandbox recognition into is_agent_computer() and make_computer_handler()
  so tools=[sb] works the same as the old Computer wrapper
- Normalize Anthropic/X11 key names (e.g. Return → enter) to pynput names
  used by computer-server's linux handler
- Add examples/agents/test_linux_agent.py demonstrating Sandbox.ephemeral
  with ComputerAgent using an Anthropic model
- Lower cua-agent requires-python to >=3.11 for broader compatibility
- Add cua-agent as editable dev dep in cua-sandbox for testing

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: support Python 3.11 in cua-agent and cua-core

- Drop cua-computer from cua-agent required deps (move to optional 'computer' extra)
- Make all 'from computer import' usages in agent optional (try/except)
- Fix typing.override import for Python <3.12 (use typing_extensions fallback)
- Lower requires-python to >=3.11 in cua-agent, cua-core, and cua-sandbox

Tested: 3.11 ✓  3.12 ✓  3.13 ✓

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(docs): update Sandbox API usage from deprecated os_type/provider_type to Image API

Replace all occurrences of the old cua-computer style parameters
(os_type=, provider_type=VMProviderType.*) with the correct cua-sandbox
Image API (Image.linux(), Image.macos(), Image.windows(), local=True).

Also remove VMProviderType from imports where no longer needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: add Pillow to cua-agent core deps after dropping cua-computer

Pillow was previously a transitive dependency via cua-computer.
After making cua-computer optional, PIL imports fail. Add Pillow
directly to required dependencies.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 15:32:13 -07:00
ddupont
7d1fa31fb6 feat(sandbox-sdk): Cua Sandbox SDK — unified API for Linux, macOS, Windows, Android (#1218)
* feat(cua-sandbox): Add sandbox SDK with QEMU WSL2/KVM, Hyper-V, and Docker runtimes

- New cua-sandbox package: declarative Image API, layered disk caching, multi-runtime support
- QEMU WSL2 runtime: runs QEMU inside WSL2 with KVM hardware acceleration on Windows
- Hyper-V runtime: builds Windows images from ISO with native Hyper-V Gen2 VMs
- Shared Windows unattended install (builder/windows_unattend.py): Autounattend.xml, ISO creation
- OCI registry push/pull for QEMU disk images
- Computer-server setup script installs cua-computer-server only (no PyTorch/agent)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs(cua-sandbox): Add usage examples to README

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(cua-sandbox): Add cloud transport with ephemeral VM support

Cloud sandboxes are now the default path — sandbox() connects to the
CUA platform API, provisions VMs, and delegates control via HTTPTransport.
Ephemeral inference: image= creates+destroys, name= connects only.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(cua-sandbox): Add Android emulator runtime, transports, and example sandboxes

Adds AndroidEmulatorRuntime with headless toggle, ADB/VNC/SSH/QMP transports,
cloud transport timeout increase (10min), and example sandbox scripts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(cua-sandbox): Add ephemeral cloud sandbox example

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(cua-sandbox): Remove name from ephemeral cloud example to trigger VM creation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(cua-sandbox): Add Mobile interface for Android touch, gestures, and hardware keys

Adds sb.mobile.* methods (tap, swipe, scroll, pinch, home, back, etc.)
backed by ADB shell commands, and an ephemeral Android example.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(ci): pass SLACK_WEBHOOK to cold start benchmark step

* add benchmark script

* feat(android): true MT Protocol B multitouch, gesture() API, auto port detection

- mobile.py: replace asyncio.gather pinch with single-shell MT Protocol B
  sendevent script; add gesture(*finger_paths) primitive; pinch_in/pinch_out
  delegate to gesture()
- android_emulator.py: make adb_port Optional[int]=None; add
  _find_free_emulator_port() scanning even console ports 5554-5682 via
  socket.bind
- examples/touch_test_app/: Android APK logging every MotionEvent as JSON
  to Logcat under tag "TouchTest"; supports RESET_LOG broadcast
- tests/test_android_multitouch.py: integration test suite using sandbox()
  context manager; Local/Cloud split (Cloud skipped without CUA_TEST_API_KEY)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(sandbox): add get_display_url(share=False) across transports

share=False → vnc://localhost:{port} for local VNC runtimes,
              https://cua.ai/connect/incus/{name} for cloud (auth-gated)
share=True  → noVNC/ws-scrcpy URL with embedded password (cloud only)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* add ephemeral android test

* refactor(tests): move TouchTest APK to standalone repo; download from releases

- Remove examples/touch_test_app — now lives at
  https://github.com/trycua/android-touch-test-app
- test_android_multitouch.py: download APK from GitHub Releases by default
  (latest release URL) instead of building from source
- CUA_ANDROID_TEST_APK can still be set to a local path to override

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(tests): implement cloud Android multitouch tests

Extract shared test logic into _MultitouchTests mixin so Local and Cloud
classes run identical assertions. Add cloud_android_sb session fixture that
spins up an ephemeral cloud Android VM, installs the TouchTest APK via
curl + pm install, and yields the ready sandbox.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(sandbox): implement apk_install for cloud transport; simplify root escalation

- CloudTransport._apply_image_layers: applies apk_install/run layers after
  server is ready (curl + pm install on device)
- Replace transport._adb_cmd("root") with sb.shell.run("su root id") in
  local fixture for consistency with cloud
- Cloud fixture now uses Image.android("14").apk_install(url) same as local

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(sandbox): add multitouch_gesture server action; fix cloud multi-touch injection

Move MT Protocol B sendevent injection to a server-side `multitouch_gesture`
action so that `adb root` can be called before injecting events. This fixes
cloud Android VMs where `su root sendevent` runs silently but events are not
delivered to the app (likely SELinux blocking kernel input injection from the
su context).

Changes:
- computer-server: add `multitouch_gesture` to AndroidAutomationHandler — calls
  `adb root`, detects touch device + axis range via `getevent -p`, builds and
  runs MT Protocol B sendevent script as root adbd
- computer-server/main.py: register `multitouch_gesture` in handlers map
- mobile.py: `gesture()` now sends the `multitouch_gesture` action with
  structured JSON params instead of building a shell script client-side;
  remove `_build_two_finger_script` and MT Protocol B helpers (logic in server)
- adb.py: handle `multitouch_gesture` via `adb root` + sendevent (local path)
- tests: `test_true_multitouch_*` use `sb.mobile.gesture()` instead of manual
  sendevent scripts; remove `su root id` escalation from fixtures

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(sandbox): add _apply_image_layers to CloudTransport for apk_install support

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(computer-server): add missing logger in android handler

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(computer-server): fix duplicate logger definition

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(cua-sandbox): replace sandbox()/close() with Sandbox.create/connect/ephemeral + disconnect/destroy

- Sandbox.create(image) — provision a persistent sandbox
- Sandbox.connect(name) — attach to an existing sandbox
- Sandbox.ephemeral(image) — async context manager, auto-destroys on exit
- Sandbox.disconnect() — drop connection, sandbox keeps running
- Sandbox.destroy() — disconnect + permanently delete
- Localhost.close() renamed to disconnect()
- sandbox() module-level function kept as deprecated shim
- Updated all tests, examples, conftest, agent docstring, and README

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(cua-sandbox): add Localhost.connect() and make Sandbox.connect() dual-mode

- _ConnectResult supports both await and async with on connect()
- Sandbox.connect("name") works as plain await or context manager (disconnects on exit)
- Localhost.connect() mirrors the same pattern
- localhost() module-level function kept as deprecated shim
- conftest fixtures updated to use Localhost.connect()
- README updated

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(cua-sandbox): update README with new API and connect() dual-mode examples

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat: add JPEG screenshot support and Android RL fleet benchmark

computer-server: add format/quality params to screenshot() on all handlers
(android, linux, macos, windows, base). Defaults to PNG for backwards compat;
pass format="jpeg" to get ~5-10x smaller payloads for RL workloads.
The existing inspect.signature dispatch picks up the new params automatically.

cua-sandbox: thread format/quality through Transport.screenshot(),
HTTPTransport, CloudTransport, Screen interface, and Sandbox.screenshot()
so callers can do sb.screenshot(format="jpeg", quality=85).

tests: add android_rps_benchmark.py — provisions N Android sandboxes in
parallel and drives them at a target aggregate RPS with per-command latency
logging, p50/p95/p99 reporting, and PASS/FAIL verdict for RL infra validation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(cua-sandbox): update default screenshot quality to 95

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(cua-sandbox): add pwa_install — build & install TWA APK from a PWA manifest URL

- Image.pwa_install(manifest_url) — new Android-only chainable layer that uses
  Bubblewrap to generate a signed debug APK from a Web App Manifest URL and
  install it via adb
- _bw_init.js — Node.js helper that calls @bubblewrap/core directly to generate
  twa-manifest.json non-interactively (bypasses the interactive CLI)
- AndroidEmulatorRuntime._apply_layers: handle pwa_install layer (init → update
  → build → adb install); auto-creates debug keystore; passes passwords via env
  vars; caches built APKs by manifest URL hash
- transport/*: add format/quality params to all screenshot() implementations;
  add convert_screenshot() helper in base.py for png→jpeg conversion
- examples/pwa_install_test.py: end-to-end test — installs Starbucks PWA,
  resolves launcher activity dynamically, launches and screenshots

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(benchmark): refactor android benchmark to measure max RPS

Remove --target-rps / _TokenBucket / PASS-FAIL verdict; workers now loop
as fast as possible so the run measures achievable throughput. Add flush=True
globally for real-time log output, and use JPEG screenshots.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(cua-sandbox): validate screenshot magic bytes match requested format

Raise ValueError if the returned image magic bytes don't match the requested
format, e.g. requested 'jpeg' but got 'png' (magic bytes: 89504e47).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(benchmark): add local android benchmark using AndroidEmulatorRuntime

Mirror of android_rps_benchmark.py but uses local=True + AndroidEmulatorRuntime
for baremetal comparison against cloud.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(cua-sandbox): add JPEG conversion to ADBTransport.screenshot

ADBTransport always returned PNG regardless of the format parameter.
Now converts to JPEG via Pillow when format='jpeg'/'jpg', matching
the behaviour of the server-side android handler.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(cua-sandbox): run ADB subprocess calls in thread executor

_adb_cmd was a synchronous subprocess.run that blocked the event loop,
preventing asyncio.sleep timers and task cancellation from firing on time.
Add _adb_cmd_async which runs _adb_cmd via loop.run_in_executor, and switch
screenshot, get_screen_size, and send to use it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* perf(cua-sandbox): use raw RGBA screencap + simplejpeg for faster JPEG screenshots

Replace PNG screencap + PIL JPEG encode with raw RGBA screencap (no emulator-side
PNG encode) + simplejpeg (libjpeg-turbo, fastdct=True). Skips the emulator-side
PNG encode entirely and uses a faster JPEG encoder on the host.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* perf(cua-sandbox): revert to PNG screencap, keep simplejpeg for host-side encode

Raw RGBA screencap transfers ~10MB over ADB vs ~1-2MB for PNG (emulator
compresses before sending). Revert to -p PNG screencap, but use simplejpeg
(libjpeg-turbo, fastdct) instead of PIL for the host-side JPEG encode.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* revert(cua-sandbox): revert simplejpeg, back to PIL for JPEG encode

simplejpeg showed no measurable improvement over PIL (p50 507ms vs 519ms,
within noise). The bottleneck is ADB transfer (~400ms), not encode time.
PIL produces smaller output (219KB vs 305KB) due to 4:2:0 vs 4:4:4 subsampling.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(cua-sandbox): add GRPCEmulatorTransport for fast Android screenshots

The Android emulator's gRPC service (EmulatorController) bypasses ADB entirely,
reducing screenshot latency from ~500ms to ~50ms. Changes:

- Add GRPCEmulatorTransport using getScreenshot(RGB888) + PIL JPEG encode
- Generate protobuf stubs from emulator_controller.proto into transport/_grpc_emulator/
- AndroidEmulatorRuntime now launches with -grpc <port> and sets grpc_port in RuntimeInfo
- sandbox._create picks GRPCEmulatorTransport when grpc_port is set, else falls back to ADB
- Add grpcio>=1.60.0 to cua-sandbox dependencies

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(cua-sandbox): add protobuf dependency for gRPC emulator stubs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(cua-sandbox): fix gRPC stubs and increase max message size to 32MB

- Regenerate emulator_controller stubs with grpcio-tools/_proto include path
  to resolve 'google/protobuf/empty.proto not loaded' error
- Fix relative import in generated grpc stub (bare import → from . import)
- Increase gRPC channel max_receive/send_message_length to 32MB
  (RGB888 screenshot is ~6MB, exceeding the 4MB default)

Result: gRPC screenshot transport now fully functional.
Benchmark: 48.90 RPS / p50=20ms vs ADB baseline 1.80 RPS / p50=519ms (27x faster)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(computer-server): note Android emulator gRPC interface and GRPCEmulatorTransport

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(cua-sandbox): implement touch/click and fix screen_size in GRPCEmulatorTransport

- send() now handles left_click, right_click, double_click, mouse_down, mouse_up
  via EmulatorController.sendTouch() (press + release TouchEvent pair)
- move_cursor is a no-op (no hover concept on Android)
- Fix get_screen_size(): was requesting 1x1 thumbnail which returned 1080x1;
  now requests full PNG so emulator returns native display dimensions
- Regenerate _grpc_emulator stubs with grpcio-tools/_proto include path

Benchmark (--action step = screen_size + tap + screenshot):
  42.2 RPS / p50=22ms / p95=32ms

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(cua-sandbox): full gRPC transport — multitouch, shell fallback, sync channel

- Switch grpc.aio → sync grpc channel + run_in_executor
  Avoids "Future attached to a different loop" in pytest session fixtures
- Add shell/run_command handler (ADB fallback via _find_adb)
- Add multitouch_gesture: interpolated N-finger sendTouch frames sent
  simultaneously per frame — passes all 17 multitouch tests
- Pass serial + sdk_root to GRPCEmulatorTransport from sandbox._create
- Regenerate _grpc_emulator stubs

All 17 TestAndroidMultitouchLocal tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(cua-sandbox): pin grpcio==1.78.0 and protobuf==6.31.1

Generated stubs require exact versions — grpcio-tools 1.78.0 was used to
regenerate and emulator_controller_pb2.py calls ValidateProtobufRuntimeVersion
with 6.31.1. Pinning eliminates stub regeneration on venv recreation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(cua-sandbox): add agoda media type backward-compat aliases for ghcr.io images

Existing images on ghcr.io still use vnd.agoda.macosvz.* types. Keep them
as OCI_VM_{CONFIG,DISK,AUX}_LEGACY constants, include in VM_MEDIA_TYPES,
and match them in detect_format/detect_os so pulling those images still works.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lume): fix VNC backend, port, and pull ref for macos-tahoe-cua

- Change LUME_API_PORT from 8000 to 8443 (setup-cua.sh uses port 8443)
- Fix ConnectTimeout not caught in is_ready — was propagating immediately instead of retrying
- Fix pull payload: split full OCI ref (e.g. ghcr.io/trycua/img:tag) into registry/organization/image components to avoid lume API double-prefixing the org
- Install cua-computer-server[vnc] (includes vncdotool/twisted) in setup-cua.sh — required for VNC backend screenshots
- Add test_lume_macos_tahoe_cua test using Image.from_registry with LumeRuntime
- Replace vnd.agoda.macosvz media types with vnd.trycua.lume, keep legacy as backward-compat constants

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(cua-sandbox): auto-runtime, transport selection, macOS versions, error handling

- Fix local=True with no runtime not calling _auto_runtime — now auto-selects
  DockerRuntime/QEMURuntime/LumeRuntime/AndroidEmulatorRuntime/HyperVRuntime
- Fix transport selection preferring VNCTransport over HTTPTransport when both
  api_port and vnc_port are set (e.g. Docker containers, Lume VMs)
- Add MACOS_VERSION_IMAGES dict mapping version strings to OCI refs
  ("15"/"sequoia" → macos-sequoia-cua, "26"/"tahoe" → macos-tahoe-cua)
- Image.macos() now validates version and errors with supported list; default "26"
- LumeRuntime: handle async pull (ReadError on connection close), bump
  _wait_for_ip timeout to 3600s for large image pulls, use version map
- Add httpx.ReadError to is_ready exception handlers in docker/hyperv/lume
- Add auto-runtime tests (linux container, linux vm, macos, android, windows)
- Add cloud ephemeral tests (linux, android) and Sandbox.create persistent tests
- Fix test_macos_vm hardcoded api_port=18005 → LumeRuntime() with default port

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(examples): replace legacy computer SDK examples with Cua Sandbox SDK

- Remove all examples using the old computer/agent SDK imports
- Add 11 new pytest-compatible examples covering all supported runtimes:
  linux/macos/windows/android × local/cloud × container/vm
- Each example is both runnable (if __name__ == "__main__") and a pytest test
- Docstrings optimized for answer engine discoverability
- Wire examples/sandboxes/ into pytest testpaths in pyproject.toml

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(sandbox-sdk): persistent sandboxes, auto-ports, pull progress, lume async pull

Python SDK:
- Add random two-word sandbox names (_random_name) instead of "cua-sandbox" fallback
- Add _find_free_port() to docker/qemu runtimes to avoid port conflicts
- Add AndroidEmulatorRuntime with list/stop support, wired into _list_local
- Parallelize cua sb ls across Docker/Lume/QEMU/Android runtimes
- Fix UnboundLocalError for conditional HTTPTransport import
- Fix sandbox name resolution after runtime start (resolved_name)
- Fix Android reconnect to use GRPCEmulatorTransport
- Fix cua sb delete to skip confirmation prompt in non-interactive mode
- Add sandbox_state.py with grpc_port/adb_serial/sdk_root params
- Suppress httpx/cua_sandbox INFO logs in CLI output

Lume:
- Add POST /lume/pull/start async endpoint (202 immediately, polls via GET /lume/vms/{name})
- Add PullProgressTracker actor tracking download % per VM name
- Add downloadProgress field to GET /lume/vms/{name} during pulls
- Fix setProgress to clear stale errors so retries work
- Add progressHandler to pullImage(), handlePull, and lume pull CLI
- Add setTotal() in pullOCI so progress % is accurate (was always 0%)
- Unify /lume/pull and /lume/pull/start to both use progressHandler
- Add diagnostic logging for OCI config/nvram layer parsing
- Fix _wait_for_ip to raise immediately if VM status is "stopped"
- Reduce _wait_for_ip timeout from 3600s to 300s

Examples:
- Add examples/sandboxes-cli/ with CLI-based persistent sandbox tests
- Tests assert VM appears in cua sb ls --all after launch and disappears after delete

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lume): catch ReadError on sync pull fallback with helpful auth hint

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lume): handle lume v0.3.x connection drop on sync pull — check VM exists after ReadError

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lume): catch ReadError on /pull/start for lume v0.3.x compat

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lume): poll VM status after /pull/start connection drop (lume v0.3.x)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lume): handle lume v0.3.x compat — sync pull + connection drop

lume v0.3.4 doesn't have /pull/start (drops connection immediately)
and also drops the connection on /lume/pull when done. Fall back to
sync /pull, handle the ReadError by verifying VM was created, then
run the VM and return directly instead of falling through to the
async poll path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lume): find lume binary in ~/.local/bin when not on PATH

lume installs to ~/.local/bin which may not be in PATH for non-interactive
shells (e.g. SSH sessions, LaunchAgents). Fall back to checking the
common install location directly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lume,tests): redirect progress to stderr; add ~/.local/bin to PATH in tests

- lume.py: all pull progress prints go to sys.stderr so --json output
  is clean JSON on stdout (fixes JSONDecodeError in test_macos_local_vm)
- conftest.py: pytest_configure adds ~/.local/bin to PATH so cua/lume
  binaries installed there are found in non-interactive shells

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(tests): pin macos-tahoe-cua to known-good sha256 digest

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lume): wait for VNC readiness in is_ready(), not just HTTP /status

macOS VNC (Screen Sharing) starts after the HTTP computer-server, so
screenshot() fails immediately after launch. is_ready() now polls
POST /cmd screenshot until VNC accepts connections before returning.
Timeout extended to 180s to cover both phases.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lume): deliver VNC config to VM before is_ready check

lume v0.3.x doesn't push VNC port/password to the VM via VirtioFS,
so the computer-server uses a stale ~/.vnc.env from a previous run.
After _wait_for_ip, query the lume API for the current vncUrl, parse
port and password, write ~/.vnc.env via `lume ssh`, and restart the
computer-server LaunchAgent. This makes VNC available immediately.
Also reverts is_ready to HTTP-only check (no VNC phase needed).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lume): use pkill to restart computer-server after VNC config update

launchctl kickstart -k fails silently from a non-GUI SSH session.
Kill the python computer_server process directly so launchd revives
it with the new ~/.vnc.env config.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(lume): actually delete VM on Sandbox.delete() instead of just stopping

_delete_local called LumeRuntime().suspend() which only stops the VM,
leaving it in lume's registry as 'stopped'. Add LumeRuntime.delete()
which stops then DELETEs via the lume API, and use it in _delete_local.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(tests): use :latest tag for macos-tahoe-cua (lume v0.3.4 can't pull by digest)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(qemu): check Homebrew/MacPorts paths on macOS; improve error message

qemu-system-x86_64 may be installed to /opt/homebrew/bin (Apple Silicon)
or /usr/local/bin (Intel) or /opt/local/bin (MacPorts) without those dirs
being on PATH in subprocess envs. Check known locations before failing.
Error message now also mentions MacPorts as an alternative.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(tests): remove Windows-host-only guard from windows local VM test

QEMU is cross-platform; the test should run on any host where qemu-system-x86_64 is available.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(sandbox): fall back to bare-metal QEMU for Windows when Docker unavailable

When Docker is not installed or not running, and the image is a Windows VM,
use bare-metal QEMU mode instead of failing with "Docker is not installed".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(bench): add --provision/--continue/--delete modes to android benchmark

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(sandbox): add pycdlib as a required dependency

pycdlib is used by the Windows ISO builder (windows_unattend.py) to create
the unattended install ISO. Without it, bare-metal Windows VM creation fails
with ModuleNotFoundError.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(qemu): find OVMF firmware in Homebrew's share/qemu/ layout

When QEMU is installed via Homebrew, the binary is at /opt/homebrew/bin/qemu-system-x86_64
but firmware files are at /opt/homebrew/share/qemu/. The previous search only looked
in <bin_dir>/share/ which doesn't exist. Add <bin_dir>/../share/qemu/ to the search path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(qemu): increase bare-metal boot timeout to 600s for Windows/Android

Windows and Android VMs need 3-10 minutes to boot. The previous 120s default
was causing launch to time out before the OS was ready.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(benchmark): add provision resume + lower default parallel to 4

--provision now reads the existing state file and only provisions the
remaining sandboxes to reach --sandboxes N, appending new names.
Default --parallel lowered from 2 to 4 (fewer concurrent provisions
to reduce kopf event-loop overload at scale).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(tests): add OrbStack and Homebrew to PATH in conftest

Ensures docker (OrbStack) and qemu (Homebrew) are found in subprocess calls
during pytest collection and test execution on macOS.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(benchmark): use Sandbox.connect(name=) for --continue reconnect

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(windows): skip test on macOS ARM; validate cached base image size

- Skip Windows local VM test on macOS Apple Silicon: x86_64 Windows via
  QEMU TCG (no hardware accel) would take hours to install and boot.
- Add minimum size check in ensure_base_image to detect and rebuild
  incomplete/corrupt base images left behind by failed builds.
- Remove unused QEMUBaremetalRuntime assignment in _build_windows_base.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(cloud-transport): fail fast on 4xx in _wait_for_server_ready + add debug logging

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(benchmark): skip 401 sandboxes in --continue, reconnect concurrently

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(benchmark): --continue no longer deletes sandboxes, use --delete explicitly

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(benchmark): fix --delete to use CloudTransport instead of broken Sandbox(name=)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(grpc-emulator): pre-register empty_pb2 to fix protobuf 6.x descriptor load

AddSerializedFile fails on protobuf 6.33+ if google/protobuf/empty.proto
hasn't been loaded yet. Import empty_pb2 before the serialized file to
pre-register it in the descriptor pool.

Also add demo/ scripts for fleet throughput and ephemeral F-Droid.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: replace Computer SDK references with Sandbox SDK throughout

- README.md: update packages table and hero code example to use cua-sandbox
- quickstart.mdx: install cua-sandbox instead of cua-computer; update hello/agent examples
- using-computer-sdk.mdx → using-sandbox-sdk.mdx: new doc with Sandbox SDK API
- using-agent-sdk.mdx: update Python examples to use Sandbox instead of Computer
- reference/sandbox-sdk/: new reference page for cua-sandbox API
- reference/meta.json + get-started/meta.json: update nav to sandbox-sdk

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(readme): unified API example + platform support matrix

* docs(readme): replace iOS with BYOI (.qcow2, .iso) in platform matrix

* docs(readme): move Cua SDK section above CuaBot

* docs(readme): new header + add sb.mobile.gesture() to example

* feat(sandbox): add sb.tunnel.forward() port-forwarding interface

Adds Tunnel interface with forward() supporting ADB (Android), gRPC
emulator, and SSH transports. Includes CDP-over-ADB test.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(tests): gate android tests on Java only, not pre-installed SDK

SDK auto-installs on first run; only Java is a hard prereq.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(android): check java returncode in _java_env() — macOS stub exits non-zero

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(tunnel): support abstract socket forwarding for Chrome DevTools on Android

adb forward tcp:0 localabstract:chrome_devtools_remote instead of tcp:9222.
Update test to use socket name and tunnel.port for all CDP URLs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(tests): add gym-pwa end-to-end Android test with CDP bonus

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(tests): disable Chrome FRE before launching gym-pwa

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(sandbox-sdk): fill documentation gaps vs Modal-style DX

- Add Sandbox section to guide: lifecycle, images, secrets, scale-out
- Add full sub-interface reference (shell, mouse, keyboard, screen,
  clipboard, tunnel, mobile, terminal, window, Localhost)
- Add migration guide from cua-computer to cua-sandbox
- Deprecate Computer SDK page with red callout + migration link
- Update quickstart with local Docker no-account path
- Update what-is-cua to reference Sandbox SDK instead of Computer Framework
- Wire all new pages into nav meta.json files

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(tests): clear Chrome data to bypass first-run wizard on emulator

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(pwa_install): accept keystore param, auto-configure bubblewrap, return fingerprint

- Image.android().pwa_install() now accepts keystore, keystore_alias,
  keystore_password params — pass the keystore bundled in your PWA repo
  for deterministic fingerprints baked into assetlinks.json
- _build_pwa_apk auto-installs @bubblewrap/cli via npm if not on PATH
- _build_pwa_apk auto-writes ~/.bubblewrap/config.json from known JDK/SDK
  paths — no manual interactive setup required
- Returns (apk_path, sha256_fingerprint) tuple
- _bw_init.js accepts keystore path/alias/password as positional args
- Remove get_pwa_keystore_fingerprint (keystore in repo is the pattern)
- test_android_local_gym_pwa uses Sandbox.ephemeral + pwa_install with
  the committed android.keystore; launches TWA app instead of Chrome

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(docs): rewrite quickstart to fix broken ephemeral/CLI flow

- Remove pre-create sandbox step — Sandbox.ephemeral manages its own lifecycle
- Remove outdated cua sandbox create --os/--size CLI usage
- Add local Docker path (no account needed) as primary hello world
- Fix VNC step to use Sandbox.create so sandbox is alive to open
- Clean up CLI reference to only show commands that are correct

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(docs): update CLI reference to real cua-cli sandbox commands

- Rewrite cli/commands.mdx with actual cua sb launch <image> syntax
  (image as positional arg, --cpu/--memory/--disk/--region as options)
- Document all image shorthands (macos, ubuntu:24.04, windows, android)
- Fix quickstart VNC/cleanup steps to use cua sb vnc / cua sb ls / cua sb delete
- Fix using-sandbox-sdk.mdx CLI comment to show correct launch syntax
- Remove libs/python/cli (old mock CLI replaced by cua-cli)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(pwa_install): use Modal-hosted gym-pwa, fix 10.0.2.2 manifest fetch

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(examples): add sandboxes section from examples/sandboxes/

One page per OS family (linux, macos, windows, android, custom-images),
each showing cloud + local variants with runnable code.

Every code block carries a `# source:` comment pointing to the corresponding
test file in examples/sandboxes/ so a future CI workflow can verify that
every doc example has a live test case.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(pwa_install): auto-install Android build-tools required by bubblewrap

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(what-is-cua): rewrite as concise Modal-style intro with full example

- Lead with a complete sandbox + agent snippet instead of graphics/diagrams
- Show the full API surface inline (shell, screenshot, mouse, keyboard, mobile, tunnel)
- Show the image builder pattern
- Remove ASCII diagrams and redundant explanation prose
- Keep use cases and next-steps links

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs(quickstart): use sb.get_display_url(share=True) for live view

Replace persistent sandbox + CLI vnc with get_display_url(share=True)
inline in the agent script — simpler, no CLI needed, works with ephemeral.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: remove self-hosted sandboxes page

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: rename Fundamentals section to Agent

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(pwa_install): prefer Java 21, fix jdkPath bundle format, create tools/ stub

- Auto-detect openjdk@21 (Gradle 8.x requires Java ≤ 21; openjdk@25 breaks)
- bubblewrap jdkPath must be .jdk bundle root (it appends Contents/Home)
- JAVA_HOME for gradle resolves to Contents/Home from the bundle
- Create sdk/tools/ stub so bubblewrap SDK validation passes
- Install build-tools;34.0.0 if missing

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(pwa_install): suppress Chrome FRE via set-debug-app + command-line flags

After installing the TWA APK, use adb to:
1. am set-debug-app --persistent com.android.chrome (enables flag file)
2. Write chrome-command-line with --no-first-run --disable-fre

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(pwa_install): don't override startUrl with manifest path in _bw_init.js

twa.startUrl should come from the Web App Manifest's start_url field,
not from the manifest file URL's pathname (which was /manifest.json).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(examples): add android local gym-pwa e2e test

End-to-end test for the gym-pwa PWA running as a TWA on a local Android
emulator. Uses the Modal-hosted gym at cuaai--todo-gym-web.modal.run.

Flow:
- POST /api/gym/start/add_item → fresh session + task prompt
- Launch TWA, warm-up, re-launch to pick up session
- Agent taps input, types "Buy groceries", taps Add
- GET /api/gym/evaluate (x-session-id header) → reward == 1.0
- CDP verification: query li span text via Chrome DevTools Remote

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(examples): gym-pwa test uses ?session= URL + bgColor for session isolation

- POST /api/gym/start with bgColor; get back sessionId
- CDP Page.navigate to /?session=<id>&bg=<color> after TWA warm-up
- All API calls pass x-session-id header; no shared server state needed
- Pre-agent screenshot saved to /tmp/gym_pwa_pre_agent.png

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(examples): use lighter bg color for gym-pwa test

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: expand pwa_install docs with full params, signing flow, and requirements

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(docs): auto-generate sandbox SDK reference from source

Add cua-sandbox to SDK_CONFIGS in python-sdk.ts generator so the
reference page is generated from docstrings via griffe, matching the
format of computer-sdk and agent-sdk reference pages.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(docs): replace cua-computer imports with cua-sandbox across guide and examples

Update all code blocks referencing the deprecated cua-computer SDK to
use cua-sandbox equivalents (Sandbox, Image) across guide, examples,
and reference pages.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(docs): move interactive-shell + add tunneling to Sandbox section

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(examples): move test_android_local_gym_pwa to examples/sandboxes/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* ci: add cua-sandbox to bump/publish pipeline

- Add .bumpversion.cfg for sandbox-v* tag format
- Add cd-py-sandbox.yml workflow triggered by sandbox-v* tags
- Add pypi/sandbox option to release-bump-version.yml

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 17:53:18 -07:00
Harsh Verma
b2b88ec6bb ci: run model tests on weekly schedule instead of per-PR (#1180) 2026-03-16 14:13:24 -07:00
Francesco Bonacci
489302a0c1 fix(lume): fix pause() and resume() incorrectly calling start() (#1130)
* fix(lume): fix pause() and resume() calling start() instead of correct VZVirtualMachine methods

Both `pause()` and `resume()` in `BaseVirtualizationService` were
incorrectly calling `virtualMachine.start` instead of
`virtualMachine.pause()` and `virtualMachine.resume()` respectively.

This meant pausing or resuming a VM would restart it instead.
Simplified both methods to use the async VZVirtualMachine APIs directly.

* fix(lume): avoid actor-isolation error in pause/resume

* docs(lume): sync generated reference docs

* ci(docs): configure lychee root dir for absolute links

* docs(cua): fix broken HUD environments link
2026-02-27 21:38:26 +01:00
f-trycua
fd51bacb2e fix(ci): use GitHub App token in auto-release workflow
The release-on-merge workflow was using secrets.GITHUB_TOKEN which
lacks permission to dispatch other workflows. Switch to the same
GitHub App token (RELEASE_APP_ID/RELEASE_APP_PRIVATE_KEY) used by
release-bump-version.yml so gh workflow run succeeds.
2026-02-26 17:24:37 +01:00
ddupont
d6298ebc9d filter by 4xx/5xx for link CI slack summary (#1126) 2026-02-26 10:29:31 -05:00
ddupont
11d2b9de91 Fix CI: Link Checker errors & webhook bugs (#1124)
* fix errors caught by link checker

* fix slack always showing "Link checker failed to run"

* include compact summary in slack webhook

* fix ci check links again

* add .lycheeignore, prevent fail message from showing if summary was generated

* fix incorrect links

* debug prints in link check

* link checker checking wrong branch

* revert slack fire condition
2026-02-26 10:26:04 -05:00
Francesco Bonacci
49ee6d45cb feat(lume): restructure release as .app bundle with bridged networking (#1122)
* fix(lume): update test mocks to match current API

- Update MockVM.run() signature to include networkMode and clipboard
  parameters added to the VM base class
- Update VMDetailsPrinterTests to expect the network column in table
  output

* feat(lume): restructure release binary as .app bundle for bridged networking

Restructure the lume release artifact from a standalone CLI binary into a
macOS .app bundle so that a provisioning profile can be loaded by the OS,
enabling the com.apple.vm.networking restricted entitlement for bridged
networking support in release builds.

Closes #1076

* fix(lume): fetch notarization log on failure for debugging

* fix(lume): fix codesign for notarization - add timestamp, fix entitlements flag, show errors

* fix(lume): add codesign verification and use ditto for signature-safe copy

* fix(lume): add keychain to search list and pass --keychain to codesign

* fix(lume): sign resource bundle before binary (inside-out signing order)

* fix(lume): use --deep codesign, move resource bundle to Resources/

The lume_lume.bundle is a flat SPM resource directory (no Info.plist),
not a proper macOS bundle. codesign was failing with "bundle format
unrecognized" which caused silent fallback to adhoc signing.

Fix: use --deep on the .app bundle so codesign handles nested code
automatically and seals flat resource directories properly.

* fix(lume): remove resource bundle from Contents/MacOS to fix codesign

The lume_lume.bundle is a flat SPM resource directory without Info.plist.
When placed in Contents/MacOS/, codesign fails with "bundle format
unrecognized" and silently falls back to adhoc signing.

Move it to Contents/Resources/ only, which codesign seals as data.

* fix(lume): update install-local.sh and build-release.sh to match resource bundle fix

Move lume_lume.bundle to Contents/Resources/ instead of Contents/MacOS/
to avoid codesign "bundle format unrecognized" errors. Also fix
--entitlement -> --entitlements typo in build-release.sh.

* fix(lume): place SPM resource bundle at .app root for Bundle.module resolution

SPM's auto-generated Bundle.module looks up resources via
Bundle.main.bundleURL (the .app root), NOT Bundle.main.resourceURL
(Contents/Resources/). Placing lume_lume.bundle in Contents/Resources/
would cause a fatal crash at runtime when Bundle.module tries to load it.

Move the resource bundle to the .app root level across all three build
scripts (build-release-notarized.sh, build-release.sh, install-local.sh).
This keeps it out of Contents/MacOS/ (which breaks codesign) while
ensuring SPM can find it at runtime.

Also adds *.provisionprofile to .gitignore.

* fix(lume): fix Bundle.module resolution for .app bundle resource loading

SPM's auto-generated Bundle.module looks up resources via
Bundle.main.bundleURL (the .app root), but codesign rejects content
at the .app root ("unsealed contents") and in Contents/MacOS/
("bundle format unrecognized"). The only valid location for codesign
is Contents/Resources/, but Bundle.module doesn't check there.

Solution:
- Add Bundle.lumeResources custom accessor that checks resourceURL
  first (for .app bundles) then bundleURL (for standalone binaries)
- Replace all Bundle.module usages in UnattendedConfig.swift
- Revert build scripts to place lume_lume.bundle in Contents/Resources/

The unused SPM-generated Bundle.module is never accessed, so its
fatalError path is never triggered.
2026-02-26 12:39:30 +01:00
ddupont
ebe9f88097 feat: Add interactive terminal (PTY) support w/ tests to cua-auto, computer-server, computer, and the cua CLI (#1114)
* add agent-computer style usage to cua-cli, refactor pyautogui-like handlers from computer-server into its own SDK for reuse by our various SDKs

* address CR comments, add auto-focus when zooming to windows on the host

* Add cua-auto to pypi workflow

* Bump cua-cli requirements

* default `cua do ls` to listing all sandboxes

* Fix linting error

* fix linting

* Add trajectory recording to cua do CLI (#1110)

Every cua do action is now automatically recorded to a replayable
trajectory at ~/.cua/trajectories/{machine}/{session}/. Viewing opens
cua.ai/trajectory-viewer via a local CORS-enabled file server.

New files:
- trajectory_recorder.py: session management, turn writing, zip, clean
- trajectory.py: cua trajectory ls/view/stop/clean commands

Modified:
- do.py: --no-record flag, post-action screenshot recording in all handlers,
  session reset on switch
- main.py, __init__.py: register trajectory command
- SKILL.md: document trajectory recording

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: ddupont <3820588+ddupont808@users.noreply.github.com>

* add interactive terminal support

* add try/except around imports that require X display server

* address CR comments

* lint & isort

* bump cua-core dep

* add tests for windows

* fix cua do shell unknown command error

* reorder imports

* add interactive shell to docs

* update computer sdk reference

* lint windows tests

* fix external link checking lychee workflow checking internal links

* attempt to fix lychee workflow again

* fix external links

---------

Co-authored-by: Sarina Li <sarinajin.li@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 16:58:14 -05:00
ddupont
90278dd730 Add agent-computer/peakaboo style CLI group to cua-cli, add cua do SKILL.md, and cua-auto package (#1107)
* add agent-computer style usage to cua-cli, refactor pyautogui-like handlers from computer-server into its own SDK for reuse by our various SDKs

* address CR comments, add auto-focus when zooming to windows on the host

* Add cua-auto to pypi workflow

* Bump cua-cli requirements

* default `cua do ls` to listing all sandboxes

* Fix linting error

* fix linting
2026-02-25 09:44:25 -05:00
asaf-genie
3030c46b17 fix(agent): use Opus 4.6/4.5 computer-use beta (#1090)
* fix(agent): use Opus 4.6/4.5 computer-use beta

* add sonnet 4.6

* add models to test matrix
2026-02-25 11:53:51 +05:30
f-trycua
f48d10365c Revert "feat(lume): restructure release binary as .app bundle for bridged networking (#1080)"
This reverts commit 649a873d0c.
2026-02-12 18:41:37 -08:00
Francesco Bonacci
649a873d0c feat(lume): restructure release binary as .app bundle for bridged networking (#1080)
* feat(lume): restructure release binary as .app bundle for bridged networking

Restructure the lume release artifact from a standalone CLI binary into a
macOS .app bundle so that a provisioning profile can be loaded by the OS,
enabling the com.apple.vm.networking restricted entitlement for bridged
networking support in release builds.

Closes #1076

* fix(lume): update MockVM.run() signature to match base class

Add missing networkMode and clipboard parameters that were added to
VM.run() but not reflected in the test mock.
2026-02-13 03:17:53 +01:00
r33drichards
e22eacae18 Add documentation link checker CI workflow (#1034)
* Fix CuaBench docs 404 by correcting broken example links

The examples section is a sibling of guide, not nested under it.
All links using /cuabench/guide/examples/ were incorrect and should
use /cuabench/examples/ instead. Fixed in navigation header and
4 MDX content files.

https://claude.ai/code/session_01Q3U3p5HjFJfQRjicuhPEpW

* Add documentation link checker for internal and external links

Adds a TypeScript script that scans all MDX and TSX files for broken
links. Validates internal links against the docs page tree and
optionally checks external URLs with HTTP requests.

- `pnpm docs:check-links` for internal links only
- `pnpm docs:check-links:external` for internal + external
- CI workflow triggers on docs content/src changes
- Skips static assets (images, fonts, etc.)

https://claude.ai/code/session_01Q3U3p5HjFJfQRjicuhPEpW

* Replace custom link checker with next-validate-link + lychee

Use next-validate-link (by the Fumadocs author) for internal link
validation with proper MDX parsing via remark. Use lychee GitHub
Action for external link checking in CI.

- next-validate-link: validates internal cross-references in MDX
- lychee: fast Rust-based external URL checker with caching
- CI runs both checks in parallel on docs changes

https://claude.ai/code/session_01Q3U3p5HjFJfQRjicuhPEpW

* Fix 15 broken documentation links across 8 files (#1045)

- Remove trailing slashes from /cua/guide/fundamentals/vlms/ links
- Remove incorrect /docs prefix from internal links in vnc-recorder and
  demonstration-guided-skills
- Fix cli-playbook → cloud-cli path in cloud-cli reference
- Replace /lume with /lume/guide/getting-started/introduction (no index page)
- Fix /lume/guide/advanced/unattended-setup → /lume/guide/fundamentals/unattended-setup
- Remove links from Linux Container and QEMU Container headings (no index pages)

https://claude.ai/code/session_01AsxA3MLUo6xMgFUD5e43Dv

Co-authored-by: Claude <noreply@anthropic.com>

* fix: remove deprecated --exclude-mail flag from lychee action (#1046)

Lychee v0.21.0 removed the --exclude-mail flag. Mail addresses are now
excluded by default, so the flag is no longer needed.

https://claude.ai/code/session_01SGxvjUaBKjgwQuLyDQz1HS

Co-authored-by: Claude <noreply@anthropic.com>

* fix: update broken domain links (cua.dev -> cua.ai, openclaw.dev -> openclaw.ai) (#1050)

https://claude.ai/code/session_01UUas213X98Fr6xKRarmvRu

Co-authored-by: Claude <noreply@anthropic.com>

* fix: resolve broken documentation links across docs (#1051)

- Create index pages for linux-container and qemu-container directories
  to fix internal link checker errors
- Replace broken external cua.ai/docs/cuabot/* links with internal paths
  in cuabot changelog
- Replace broken docs-woad-phi.vercel.app migration guide link with
  inline heading in agent-sdk changelog
- Convert external cua.ai/docs/get-started/quickstart to internal link
  in post-event-contact-export example
- Remove dead openclaw.ai/docs/vps-hosting link from openclaw example

https://claude.ai/code/session_012v9UqLpLqpjoJy7XXnjwL2

Co-authored-by: Claude <noreply@anthropic.com>

* Exclude file:// URLs from lychee docs link checker (#1056)

The lychee external link checker was resolving relative markdown
links (e.g., ./screenspot-v2, ../hud) as file:// URLs and failing
because the target files use .mdx extensions not present in the links.
Internal/relative links are already validated by the next-validate-link
job, so lychee should only check external HTTP(S) links.

This matches the approach already used in ci-check-links.yml.

https://claude.ai/code/session_01TCohF6h4raHA1o4eQjhsdP

Co-authored-by: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-02-11 11:47:09 -08:00
Francesco Bonacci
512e2a30b3 fix: free disk space before docker builds (#1060)
Large images like QEMU Android consistently fail with "No space left
on device" on standard GitHub runners. Remove pre-installed .NET,
Android SDK, GHC, and toolcache (~25GB) before building.
2026-02-10 06:54:54 +01:00
Francesco Bonacci
e368b29cd0 fix: release workflow bugs (sed delimiter + pnpm --if-present) (#1059)
* fix: use ~ as sed delimiter in release notes to handle / in commit messages

The sed command for linking PR numbers used / as delimiter, which
broke when commit subjects contained / characters (e.g. paths).

* fix: pnpm --if-present flag position in ts-reusable-publish

pnpm run build --if-present passes --if-present as an arg to the
build script (tsc), causing TS5023. The correct syntax is
pnpm run --if-present build.
2026-02-10 05:16:05 +01:00
Francesco Bonacci
053f7b3668 fix: remove cascade bumps from release pipeline (#1058)
Each package is now bumped independently. Dependent packages use
version ranges (e.g. cua-computer>=0.4.0) and pip resolves the
latest at install time.

Changes:
- Remove cascade bump steps (core→computer→agent, som→agent, npm/core→npm/computer)
- Simplify tag collection to HEAD only (single tag per bump)
- Simplify version capture conditions to match only own service
- Remove cascade dedup from release-on-merge.yml
2026-02-10 04:46:24 +01:00
Francesco Bonacci
76dcbfbd07 feat: auto-release on PR merge with required release labels (#1055)
* feat: auto-release on PR merge with required release labels

- Add CI check that requires a release label (release:patch, release:minor,
  release:major, or no-release) on PRs that change publishable packages
- Add workflow that triggers release-bump-version on merge based on label
- Cascade dedup prevents double-bumping (e.g., core change won't also
  separately bump computer/agent since the cascade handles it)

* refactor: use per-package release labels instead of global bump labels

- release:pypi/cli, release:pypi/agent, etc. for each service
- bump:minor / bump:major modifiers (default patch)
- no-release covers all packages as opt-out
- CI check verifies each affected package has its own label
- Merge workflow reads release:<service> labels and triggers bumps

* refactor: drop blocking CI check, add Slack reminder for unreleased packages

- Remove ci-require-release-label.yml (no longer blocks merge)
- Release labels are now opt-in: add release:<service> to auto-publish on merge
- Unlabeled affected packages trigger a Slack alert via AlertManager (am.cua.ai)
- no-release label still skips everything

* feat: daily Slack digest for unreleased package changes

Runs Mon-Fri at 9am UTC. For each package, compares latest tag to HEAD
and counts commits in the package directory. Posts a summary to Slack
via AlertManager listing packages with unreleased changes.

Also supports manual trigger via workflow_dispatch.

* chore: change unreleased digest schedule to 8pm PT

* feat: non-blocking CI check that reminds about publishable package changes

Posts a PR comment listing affected packages as a checklist.
Packages with release:<service> labels show as checked.
Updates on each push and label change. Never blocks merge.

* refactor: remove per-merge Slack alert, rely on daily digest instead
2026-02-10 04:08:55 +01:00
Francesco Bonacci
e053147a82 feat: add pypi/cli option to version bump workflow (#1052)
The cua-cli Python package had a CD workflow (cd-py-cli.yml) and
bumpversion config but was missing from the release-bump-version.yml
dropdown, making it impossible to trigger a release from the UI.
2026-02-09 19:02:56 +01:00
r33drichards
dd9aedcdcd feat: add Claude auto-fix CI workflow (#1048)
* fix: remove message filtering that was causing variable reference error

Removed filterEmptyMessages call and use messages directly. This fixes
the build error where filteredMessages was undefined after previous
partial changes.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat: add Claude auto-fix CI workflow

Adds a GitHub Actions workflow that automatically attempts to fix CI
failures on PRs labeled with "auto-fix". Uses Claude Code with sandbox
runtime to analyze failure logs and push fixes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: only trigger auto-fix on pull_request labeled event

Remove the workflow_run trigger that fired on every CI failure.
The workflow now only runs when the 'auto-fix' label is added to a PR.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* change back

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-09 09:56:08 -08:00
Francesco Bonacci
174ae253ba feat: auto-generated SDK docs, Python CLI, and docs improvements (#1040)
* feat: auto-generated SDK docs, Python CLI, and docs improvements

- Add auto-generated SDK reference pages (computer-sdk, agent-sdk) with version selector
- Add Python CLI package (cua-cli) with auth, sandbox, image, MCP commands
- Deprecate TypeScript CLI in favor of Python CLI
- Add versioned docs (agent-sdk v0.3-v0.7, computer-sdk v0.3-v0.5)
- Rename cloud-cli to cli in docs
- Add mobile header fix with sidebar toggle
- Restructure guide pages (quickstart, self-hosted-sandboxes)
- Add redirects for old /api URLs
- Update workflows, lume docs, cuabench docs, desktop sandbox docs

* refactor: auto-generate CLI index page like computer/agent SDKs

Change CLI docs to use the same auto-generated index.mdx pattern as
computer-sdk and agent-sdk. Removes hand-written index page that could
become stale, and deletes the separate api.mdx.

* fix: rename "Cua Bench API Reference" to "API Reference" in menu

* fix: update lume examples to macos-tahoe-vanilla and shorten page titles

- Replace macos-sequoia-vanilla:latest with macos-tahoe-vanilla:latest
  in lume docs and generator
- Rename "Lume CLI Reference" to "CLI Reference"
- Rename "Lume HTTP API Reference" to "API Reference"

* feat: rename CuaBot to Cua-Bot and add to dropdown selector

- Rename CuaBot to Cua-Bot in docs meta.json and content pages
- Add Cua-Bot entry to the header dropdown selector

* refactor: restructure Cua-Bot docs to match Cua/Cua-Bench pattern

Reorganize cuabot docs from flat structure into guide/getting-started/
hierarchy matching other collections:
- cuabot.mdx → guide/getting-started/introduction.mdx
- install.mdx → guide/getting-started/installation.mdx
- Add meta.json files with proper icons and structure
- Update dropdown selector href to new path

* feat(docs): add auto-generated API reference, changelog, and versioning for Cua-Bot

Add TypeScript SDK doc generator (regex-based, no compiler dependency) and
configure cuabot for changelog generation and versioned docs snapshots.

* feat(ci): add cuabot to docs drift check and improve failure message

Wire cuabot into CI path triggers, runner config, and changed-file
detection. Add --check mode to typescript-sdk.ts for drift comparison.
Update failure banner with per-library and versioning commands.

* fix: resolve Python lint issues (black, ruff)

Run black formatting on 12 files, fix ruff F841 (unused variables) in
tests, and add TYPE_CHECKING import for FastMCP forward references.

* fix: resolve TS typecheck and Lume Swift 6 CI failures

- typescript-typecheck.js: build @trycua/core before running typecheck
  so its dist/ type declarations are available for @trycua/computer
- SSHClient.swift: avoid crossing Sendable boundary with NIOSSHHandler
  by keeping handler access + createChannel within flatMap on the event
  loop, fixing Swift 6 strict concurrency errors

* fix: TS typecheck pnpm version strict mode and Lume mock conformance

- Set COREPACK_ENABLE_STRICT=0 in typecheck script to allow pnpm 9.x
  to run commands in workspace packages declaring pnpm 10.x
- Update MockVNCService.sendText signature to match protocol (add
  delayMs parameter)

* fix: run prettier formatting and ignore auto-generated docs files

Format all files to pass prettier 3.8.1 check. Add docs/.source/ and
docs/next-env.d.ts to .prettierignore (auto-generated, not editable).

* fix: restore MDX comment syntax broken by prettier

Prettier 3.8.1 converts {/* */} to {/_ _/} in MDX files, which breaks
the acorn parser. Restore all comments and add *.mdx to .prettierignore.

* fix: regenerate docs to pass drift check after prettier revert

* fix: CI docs check fetch-depth, regenerate Lume docs, fix header layout shift

- Use fetch-depth: 0 in CI checkout so git tags are available for
  version discovery (was using fetch-depth: 2, causing version fallback)
- Regenerate Lume docs from local Swift build (0.2.75 → 0.2.76)
- Fix header product selector layout shift with consistent icon/text sizing

* fix: format custom-header.tsx with prettier

* fix: use arch-agnostic JAVA_HOME for arm64 Docker build

The openjdk package writes the arch-specific path (e.g. java-17-openjdk-amd64)
to /etc/environment, which sdkmanager sources, overriding the Dockerfile ENV.
Create an arch-agnostic symlink and re-export JAVA_HOME in the sdkmanager RUN
step to ensure it works on both amd64 and arm64.

* fix: skip emulator package on arm64 (not available for that arch)

The Android emulator SDK package is only published for amd64.
Conditionally install it based on dpkg --print-architecture.

* ci: retrigger cuabot docker build
2026-02-09 08:54:11 +01:00
ddupont
4484230b52 Fix npm/playground release-bump-version.yml (#1024)
* add cuabot screenshot

* Simplify cuabot system prompt, add npx skills for agent-browser and agent-device, add lazy installation and caching of android images

* fix missing line in bump workflow
2026-02-05 12:21:14 -05:00
Sarina Li
48ca733da4 feat(playground): extract playground UI into reusable @trycua/playground package (#1013)
* feat(playground): add package foundation and type definitions

Set up @trycua/playground package structure with:
- package.json with React peer deps and build tooling
- tsconfig.json and tsdown.config.ts for TypeScript/bundling
- Type definitions copied from cloud repo (Chat, Computer, etc.)
- Adapter interface contracts (PlaygroundAdapters, PersistenceAdapter,
  ComputerAdapter, InferenceAdapter)
- Re-exports of message types from @trycua/agent

This establishes the foundation for the playground migration,
enabling Agents B, C, D to build adapters, components, and hooks.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(playground): implement local and cloud adapters

Add adapter implementations for the playground package:

Local adapter (src/adapters/local.ts):
- LocalPersistenceAdapter: localStorage-based chat persistence
- LocalComputerAdapter: user-provided computer URLs with health checks
- LocalInferenceAdapter: user-provided API keys (Anthropic, OpenAI)
- createLocalAdapter() factory function

Cloud adapter (src/adapters/cloud.ts):
- CloudPersistenceAdapter: CUA API calls to /v1/playground/*
- CloudComputerAdapter: CUA API calls to /v1/vms
- CloudInferenceAdapter: cloud-managed API keys with /v1/models
- createCloudAdapter() factory function

Also includes:
- localStorage utilities copied from cloud repo (verbatim)
- Barrel exports for adapters and utilities

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(playground): add primitive UI components

Add reusable primitive components copied from cloud repo:
- ChatMessage: renders user/assistant messages with tool call display
- ChatInput: textarea with model/computer selectors
- ToolCallsGroup: expandable tool call viewer with screenshots
- VNCViewer: memoized iframe for VNC display
- ThinkingIndicator: animated thinking state with scrambled text
- ThinkingCompleteAnimation: completion animation
- ScrambledText: typewriter effect component

Also add utilities:
- cn: tailwind class merging (clsx + tailwind-merge)
- processMessagesForRendering: groups messages and tool calls

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(playground): add state management, hooks, and composed components

Add context providers, hooks, and composed components for the playground:

Context:
- PlaygroundContext: Global state types and context definitions
- PlaygroundProvider: Manages chats, computers, models via adapters
- ChatProvider: Per-chat state with message processing

Hooks:
- usePlayground: Access playground context (adapters, state, dispatch)
- useChat/useChatDispatch: Per-chat state access
- useAgentRequest: Agent loop with abort/retry handling
- Helper hooks: useActiveChat, useIsChatGenerating, etc.

Composed Components:
- ChatPanel: Main chat interface with model/computer selection
- ChatList: Chat sidebar with create/delete functionality
- ComputerList: Computer sidebar with status display

Modals:
- SettingsModal: Placeholder settings dialog
- CustomComputerModal: Add custom computer dialog

Main Component:
- Playground: Composition component with slot support for customization

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(playground): add link:global convenience script

Adds a convenience script for developers to register the playground
package globally for cross-repo development with the cloud repo.

Usage: pnpm link:global

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(playground): add trajectory viewer, modals, and telemetry

Add components for viewing and exporting agent trajectories:
- TrajectoryViewer: Replay agent actions with cursor animations
- ExportTrajectoryModal: Export trajectories as ZIP files
- ReplayTrajectoryModal: In-browser trajectory replay
- Modal and Button UI components

Add telemetry integration:
- TelemetryProvider with PostHog
- usePlaygroundTelemetry hook for tracking events

Add trajectory utilities:
- inferRuns: Extract runs from message arrays
- TrajectoryRun type definitions

New dependencies: posthog-js, @radix-ui/react-slot, class-variance-authority

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* feat(playground): add cloud-compatible components and local example

Components:
- Add PlaygroundContent and PlaygroundLayout for cloud integration
- Add ChatContent, ChatArea, ChatSidebar components
- Add VMStatusBanner, VNCOverlayPanel, DeferredChatsLoader
- Add UI primitives: dialog, dropdown-menu, select, tooltip, skeleton
- Add CountdownTimer for request timing display

Local example:
- Add examples/local with Vite setup for standalone testing
- Support API key configuration via settings modal
- Enable local development without cloud infrastructure

Improvements:
- Export ChatProvider for nested usage in cloud route
- Add proper TypeScript exports for all new components
- Update telemetry provider with simplified interface

Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>

* ci(playground): add npm release automation

Add release infrastructure for @trycua/playground:
- .bumpversion.cfg for version management
- cd-ts-playground.yml workflow triggered by npm-playground-v* tags
- Add npm/playground to release-bump-version.yml options

To publish:
1. First release: manually trigger cd-ts-playground workflow
2. Future releases: use bump version workflow with npm/playground

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* fix api key issues, toast errors, improve telemetry callbacks

* fix(playground): fix VNC iframe not updating when VM changes

The VNC iframe was not updating when switching VMs from the dropdown due to
three issues:

1. Iframe not remounting: Browsers don't always reload iframe content when
   only the src attribute changes. Added key={src} to force React to remount
   the iframe when the URL changes.

2. State sync mismatch: The Select dropdown used selectedComputer from chat
   state while the VNC viewer used currentComputerId from playground state.
   Updated ChatContent to prefer currentComputerId from playground state as
   the source of truth, ensuring both stay in sync.

3. Missing state dispatch: ChatPanel and EmptyState weren't dispatching
   SET_CURRENT_COMPUTER to playground state when the computer changed.
   Added the dispatch calls to keep playground state updated.

Changes:
- Add key={src} to iframe elements in VNCIframe and VNCViewer
- Sync Select dropdown with playground state's currentComputerId
- Dispatch SET_CURRENT_COMPUTER in ChatPanel and EmptyState handlers

* feat(playground): add renderThemeToggle prop for animated theme switching

- Add renderThemeToggle prop to PlaygroundLayout and PlaygroundContent
- Allows consumers to provide custom animated theme toggle buttons
- Falls back to default button if not provided

Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-05 09:47:18 -05:00
r33drichards
8d90d142b3 Delete .github/workflows/modal-deploy-docs.yml 2026-02-04 16:02:56 -08:00
Francesco Bonacci
29c2705aa4 Add Docker image release pipeline for cuabot (#1006)
* Add Docker image release pipeline for cuabot and Show HN draft

- Add .bumpversion.docker.cfg with docker-cuabot-v* tag format
- Add VERSION file for Docker bumpversion tracking
- Update release-bump-version.yml to use Docker-specific config
  when bumping docker/cuabot (separate from npm/cuabot)
- Add SHOW_HN.md draft

* Remove SHOW_HN.md from tracking

* Restore cuabot .gitignore
2026-02-04 09:25:47 +01:00
ddupont
03d7c1c105 fix cuabot publish 3 (#1003)
* Fix package names

* update cuabot metadata

* Add `cuabot` alias to onboarding, change server to pull instead of build image by default, add telemetry event for default agent selection
2026-02-04 01:14:28 -05:00
ddupont
514199d75a fix cuabot publish (#1001)
* Fix package names

* fix publish workflow
2026-02-04 00:39:06 -05:00
ddupont
f2f97677ee Update readme & docs (#997)
* update readme & docs

* update readme desc

* add cuabot src
2026-02-04 05:51:19 +01:00
r33drichards
1b932bbb4d Add Docs MCP Server with vector and SQL query capabilities (#969)
* feat(docs-mcp-server): add standalone Docker image with ECR build workflow

Refactor MCP server from modal_app.py into a standalone containerized service:

- services/docs-mcp-server/main.py: Standalone MCP server for CUA docs and code search
- services/docs-mcp-server/pyproject.toml: Dependencies managed with uv
- services/docs-mcp-server/Dockerfile: Multi-stage build with Python 3.12

GitHub Actions workflow (.github/workflows/docs-mcp-server-build-push.yml):
- Multi-arch builds (linux/amd64, linux/arm64) running in parallel
- Push-by-digest pattern for efficient multi-arch manifest creation
- OIDC authentication for AWS ECR push
- GHA cache for faster builds
- Triggers on push/PR to main, manual dispatch with force_push option
- Tags: git SHA, branch name, PR number, latest (for main)

https://claude.ai/code/session_0168Bv3yjSKkrUbGZtVMyNG4

* feat(docs-mcp-server): add main-{timestamp} tag for Flux deployments

Add timestamped tag (main-YYYYMMDDHHmmss) when pushing to main branch,
enabling Flux to track and deploy specific image versions.

https://claude.ai/code/session_0168Bv3yjSKkrUbGZtVMyNG4

* refactor(docs-mcp-server): move to docs/scripts directory

Move docs-mcp-server from services/ to docs/scripts/ to keep
documentation-related scripts together.

https://claude.ai/code/session_0168Bv3yjSKkrUbGZtVMyNG4

* refactor(modal_app): remove MCP server, keep only crawling and DB generation

The MCP server has been extracted to a standalone containerized service
at docs/scripts/docs-mcp-server/. The modal_app now only handles:
- Documentation crawling (crawl_docs, scheduled_crawl)
- Database generation (generate_vector_db, generate_sqlite_db)
- Code indexing (generate_code_index_parallel, aggregate_code_databases)

Removed MCP-related dependencies (fastapi, fastmcp, opentelemetry) from
the Modal image since they're no longer needed.

https://claude.ai/code/session_0168Bv3yjSKkrUbGZtVMyNG4

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-02-02 00:34:12 -08:00
Francesco Bonacci
bb83a83f65 fix(lume): use Xcode 16.2 (Swift 6.2) in CI workflows (#948)
* fix(lume): use Xcode 16.2 (Swift 6.2) in CI workflows

Swift 6.0 treats NIOSSHHandler Sendable conformance violations as errors,
while Swift 6.2 treats them as warnings. This fixes the CI build failure
for lume v0.2.53.

* fix(lume): add @preconcurrency imports to fix Swift 6 concurrency errors

Add @preconcurrency import for NIOCore and NIOSSH to suppress
actor boundary crossing errors with non-Sendable types like
NIOSSHHandler and ChannelHandlerContext.

* docs(lume): update uninstall instructions to use uninstall script

Replace manual uninstall commands with the one-liner uninstall script.
The script handles stopping services, removing binaries, and cleanup
automatically. Added --purge flag documentation for complete removal.

* fix(lume): suppress Swift 6 concurrency warnings in SSHClient

- Mark handler classes as @unchecked Sendable (safe: single event loop)
- Use nonisolated(unsafe) for ChannelHandlerContext captures in closures
- Add @preconcurrency imports for NIO modules

Reduces warnings from 9+ to 2 (remaining are unavoidable library limitations
from NIOSSHHandler's explicitly unavailable Sendable conformance).

* fix(lume): fix NIOLoopBound crash in SSH interactive mode and add docs

- Fix NIOLoopBound precondition failure by capturing channel/eventLoop
  directly instead of using NIOLoopBound (which requires being on the
  event loop to access .value)
- Ensure whenComplete callback runs on event loop before calling
  setupTerminalAndStdin
- Add SSH command documentation to CommandDocExtractor
- Add "Remote Access" section to docs generator
- Regenerate CLI reference docs

Requires Remote Login enabled on VM (automatic with --unattended).
2026-02-01 03:44:31 +01:00
Harsh Verma
d8189043b8 [CI] Add Cua VLM Router Models to Test Harness (#931)
* test(ci): add CUA VLM router models to test harness

* test: add playwright_exec to MockComputer for Fara model support

Add playwright_exec method to MockComputer to support BrowserTool
compatibility, enabling testing of Microsoft Fara-7B model in CI.

* test: fix playwright_exec to return screenshot data

Mock playwright_exec now returns proper response structure with                                                                 screenshot base64 data and handles get_current_url command for
BrowserTool compatibility.
2026-01-30 12:55:54 +05:30
ddupont
23a2966230 Set runner CPU arch to match build matrix (#902) 2026-01-23 12:53:01 -08:00
Sarina Li
70f2713e92 docs: fix broken documentation links (#897)
* docs: fix broken documentation links

Update links to match current sitemap structure:
- cli-playbook → cloud-cli
- guide/examples → examples
- reference/computer-server → reference/computer-sdk
- reference/lumier → guide/advanced/lumier

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* ci: exclude news.ycombinator.com from link checker

Hacker News returns 503 for automated requests as an anti-bot
protection measure. This causes false positives in the link checker.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-23 00:15:51 -08:00
Harsh Verma
52d52a20f8 fix(ci): use GitHub usernames instead of display names in release notes attribution (#863) 2026-01-21 11:06:14 +05:30
Francesco Bonacci
1f27e4eafd Remove all pylume and is_lume_package references from codebase (#840)
* Remove all pylume references from codebase

The pylume package no longer exists in libs/python/, so this removes
all stale references to it across the codebase.

* Remove is_lume_package input from all CD workflows

All package CD workflows were passing is_lume_package: false to the
reusable publish workflow. Since the input was removed, these need
to be updated as well.
2026-01-18 08:49:38 +01:00
Francesco Bonacci
13ed4f98d5 Remove all pylume references from codebase (#833)
The pylume package no longer exists in libs/python/, so this removes
all stale references to it across the codebase.
2026-01-18 02:46:39 +01:00
Francesco Bonacci
a1c4f17ed5 Fix PyPI pipeline triggers for cascade version bumps (#823)
When bumping cua-core, cua-computer, or other packages with dependencies, the workflow was only pushing the last created tag instead of all cascade tags. This prevented dependent packages (e.g., cua-computer and cua-agent) from being published to PyPI.

Changes:
- Collect all tags created during cascade bumps, not just the last one
- Push all collected tags to GitHub to trigger corresponding CD workflows
- Update tag references after rebase to maintain correct commit mappings
- Handle cascade scenarios for pypi/core, pypi/computer, pypi/som, and npm/core
2026-01-17 19:26:06 +01:00
Francesco Bonacci
5104ad5cb3 fix(ci): use dynamic matrix for docker build platforms (#807)
Fix GitHub Actions workflow by moving platform selection logic out of
job-level if condition (where matrix context is unavailable) into a
separate setup job that outputs the platform list based on skip_arm64 input.
2026-01-16 05:39:15 +01:00
r33drichards
0e5ae04463 feat(ci): add GitHub Actions workflow for Modal docs MCP server deplo… (#740)
* feat(ci): add GitHub Actions workflow for Modal docs MCP server deployment

Add automated deployment workflow for the CUA documentation MCP server
running on Modal. The workflow deploys on push to main when modal_app.py
or the workflow file changes, and supports manual triggering with an
optional initial crawl step.

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Add concurrency control to Modal deployment workflow (#741)

* Initial plan

* Add concurrency control to Modal deployment workflow

Co-authored-by: r33drichards <57335981+r33drichards@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: r33drichards <57335981+r33drichards@users.noreply.github.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
2026-01-13 15:18:00 -08:00
ddupont
f6b18d9b8b Fix broken docs formatting for CLI, add CLI completions (#790)
* Fix broken docs formatting

* Add CLI completions

* Update CLI install script to include completions script

* Fix cli.ts lint

* Fix other linitng errors
2026-01-13 12:42:52 -05:00
Francesco Bonacci
5f67844521 fix(ci): sync docker versions & add retry logic for concurrent bumps (#786)
* chore: sync docker container versions with published tags

Sync version files to match already-published docker tags:
- lumier: 0.1.0 → 0.1.1
- qemu-android: 0.1.0 → 0.1.1
- xfce: 0.1.3 → 0.1.4

These versions were published successfully but the main branch
update failed due to concurrent bump-version runs.

* fix(ci): add retry logic with rebase for concurrent bumps

When multiple bump-version workflows run concurrently, the first one
to complete moves main forward, causing subsequent runs to fail with
"Update is not a fast forward".

This fix adds:
- Retry loop with up to 5 attempts
- Automatic rebase onto latest main when fast-forward fails
- Reordered operations: update main BEFORE creating tag (prevents orphan tags)
- Only create tag after main is successfully updated
2026-01-12 19:16:43 +01:00
Francesco Bonacci
3166b8032a fix(ci): use type=match for prefixed docker tags (#785)
The docker/metadata-action's type=semver expects clean semver tags like
v0.1.3, but our tags have prefixes like docker-xfce-v0.1.3. This caused
warnings and only produced `latest` tag instead of version tags.

Changed to type=match with regex patterns to extract version numbers
from prefixed tags, producing tags like 0.1.3, 0.1, 0, and latest.
2026-01-12 18:57:42 +01:00
Francesco Bonacci
078e131606 fix(ci): fix matrix context in docker-reusable-publish.yml (#784)
* chore: add workflow_dispatch to cd-container-xfce.yml

This forces GitHub to register the workflow and allows manual triggering.

* fix(ci): fix matrix context in docker-reusable-publish.yml

The job-level `if` condition was using `matrix.platform` which is not
available at the job level. Changed to use matrix exclude instead.

* fix(ci): push tags via git instead of API to trigger workflows

Tags created via GitHub API don't reliably trigger workflows for
unregistered workflow files. Using git push ensures tag-based
workflows are triggered even if not yet registered.

* revert: keep tag creation via API

Revert the git push change - API approach is needed to bypass branch
protection via GitHub App. The real fix is to register the workflows.
2026-01-12 18:47:57 +01:00
Francesco Bonacci
1180ef8b54 chore: add workflow_dispatch to cd-container-xfce.yml (#783)
This forces GitHub to register the workflow and allows manual triggering.
2026-01-12 18:39:55 +01:00
Francesco Bonacci
fb309d0868 fix(ci): remove duplicate publish jobs from bump-version workflow (#782)
The tag-triggered CD workflows (cd-py-*.yml, cd-ts-*.yml, cd-swift-lume.yml)
already handle publishing when a tag is pushed. Having publish-* jobs in
the bump-version workflow caused duplicate publish attempts, resulting in
"file already exists" errors on PyPI/npm.

Now the workflow only bumps versions and pushes tags. Publishing is handled
entirely by the tag-triggered workflows.
2026-01-12 18:24:35 +01:00
Francesco Bonacci
07fe5d5ff2 fix(ci): use env vars to avoid backtick interpretation in release notes (#778) 2026-01-12 13:25:55 +01:00
Francesco Bonacci
b48a69678c fix(ci): check inputs.version first in all PyPI workflows (#775) 2026-01-12 13:17:31 +01:00
Francesco Bonacci
23ffb9f9fe fix(ci): check inputs.version first for workflow_call (#773) 2026-01-11 23:14:24 +01:00