runanywhere-sdks/.github/pull_request_template.md at main

Sanchit Monga c4c2bb2529 Android Use agent update (#361 )

* Add Android in-app benchmarking implementation prompt (iOS-parity spec)

* Add Android-parity in-app benchmarks implementation prompt

* Add core agent LLM and vision provider abstractions

- Introduced `AgentLLMProvider` interface for LLM reasoning, including methods for action decision-making and planning.
- Implemented `OnDeviceLLMProvider` for on-device LLM operations using the RunAnywhere SDK, supporting utility tool registration and model management.
- Added `VisionProvider` interface for screen analysis, with a `TextOnlyVisionProvider` fallback.
- Registered various utility tools (e.g., time, weather, battery level) to enhance agent capabilities.

These changes lay the groundwork for improved decision-making and context understanding in the agent's operations.

* cleanup

* Refactor build configuration and enhance VLM support

- Updated build.gradle.kts to use plugin aliases for better readability.
- Modified settings.gradle.kts to include specific content filters for Google and AndroidX repositories.
- Increased compileSdk and targetSdk versions to 35.
- Replaced local AAR dependencies with library references for RunAnywhere SDK and other dependencies.
- Implemented OnDeviceVisionProvider for local VLM model analysis, including model registration and loading.
- Enhanced AgentViewModel to manage VLM model state and loading.
- Added UI components for VLM model status and provider mode display.
- Removed obsolete local AAR files for RunAnywhere SDK components.
- Introduced a new ProviderBadge component to indicate the current provider mode in the UI.

* Add on-device LLM benchmarking documentation and enhance agent functionality

- Introduced ASSESSMENT.md for benchmarking study of four on-device LLM models on Samsung Galaxy S24.
- Updated README.md to reflect new features and architecture, emphasizing fully on-device AI capabilities.
- Added permissions and service declarations in AndroidManifest.xml for foreground service to maintain agent activity.
- Implemented AgentForegroundService to prevent process termination during background operation.
- Modified AgentViewModel to start and stop the foreground service appropriately.
- Enhanced ActionExecutor to support opening notes apps and setting alarms.
- Updated ActionHistory to provide a compact format for local models.
- Improved ScreenParser to include foreground package information for better context during agent operation.
- Adjusted AgentKernel to manage app navigation and pre-launch logic more effectively, ensuring smoother user experience.

* minor updates

* update

* updating assessment

* Add X Compose Shortcut Implementation and UI Enhancements

- Introduced a three-piece solution for X (Twitter) compose flow, utilizing deep links and foreground activity management to improve navigation speed and reliability.
- Enhanced AgentViewModel to manage live LLM streaming text and clipboard functionality for log exports.
- Updated AgentAccessibilityService to filter out unlabeled container classes, improving navigation accuracy.
- Implemented a new ThinkingPanel UI component to display streaming tokens during LLM reasoning.
- Enhanced ActionExecutor to support direct clicks on elements, bypassing gesture interceptors for improved interaction.
- Added structured logging for agent steps to facilitate better performance tracking and export capabilities.

* Add Approach 3 X compose flow and document live test results

Implemented fully-assisted X posting flow (Approach 3) with LFM2.5-1.2B:
- Restored xComposeMessage field to track compose state
- Extended extractTweetText() with Pattern 3: "post saying <text>" (no quotes needed)
- Restored openXCompose() deep link in preLaunchApp() for X goals
- Restored ComposerActivity + SINGLE_TOP in bringAppToForeground() to preserve compose during inference
- Restored findPostButtonIndex() quick-tap block for zero-LLM-step POST
- Kept X-FAB keyword FAB tap as fallback

Live test results documented in ASSESSMENT.md:
- Approach 1 (pure LLM): FAIL — 1.2B always picks index 0, stuck in nav drawer
- Approach 2 (keyword FAB): FAIL — opens compose correctly but compose destroyed during inference
- Approach 3 (fully assisted): PASS — tweet posted in ~20s, 0 LLM inference steps

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Add X post demo mode with X-FAB + X-TYPE + X-POST flow and report

- X-TYPE block: auto-types tweet text into compose field when blank
- X-POST block: updated to check text is present before tapping POST
- findComposeTextFieldIndex(): finds [tap,edit] EditText in accessibility tree
- preLaunchApp(): always opens X home feed (no deep link) for visible navigation
- extractTweetText() Pattern 3: matches "post/tweet saying <text>" without quotes
- X_POST.md: full implementation report with live logcat trace and proof of tweet

Tweet posted live: "Hi from RunAnywhere Android agent" — @RunAnywhereAI, Feb 19 2026
27s execution time, 0 LLM inference calls, full navigation steps visible on screen.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* adding stuff

---------

Co-authored-by: runanywhere <runanywhere@runanywheres-MacBook-Pro.local>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Siddhesh <siddheshsonar2377@gmail.com>

2.2 KiB

Raw Permalink Blame History

Description

Type of Change

Testing

Platform-Specific Testing (check all that apply)

Labels

Checklist

Screenshots

2.2 KiB Raw Permalink Blame History