Skip to main content

Overview

magicClicker() is a powerful vision-based automation function that allows you to click on UI elements by describing them in natural language. It uses the Moondream API to analyze screenshots and locate elements on screen.

Function Signature

fun magicClicker(description: String)

Parameters

description
String
required
Natural language description of the UI element you want to click. Be specific about the element’s appearance and location.Examples:
  • “Profile button in bottom right corner”
  • “Search icon at the top”
  • “Red notification bell”
  • “Submit button”

How It Works

  1. Screenshot Capture: Takes a screenshot of the current screen using the ScreenCaptureService
  2. Image Processing: Converts the screenshot to base64-encoded JPEG format
  3. Vision AI Analysis: Sends the image and description to Moondream API’s /v1/point endpoint
  4. Coordinate Detection: Receives normalized coordinates (x, y) of the target element
  5. Click Execution: Converts normalized coordinates to pixel coordinates and performs the click using AccessibilityService

Code Examples

Basic Usage

// Click on a specific button
Android.magicClicker("Login button");

// Click on an icon in a specific location
Android.magicClicker("Settings gear icon in top right");

// Click on a colored element
Android.magicClicker("Blue plus button at the bottom");

Real-World Examples

// Navigate to profile
Android.magicClicker("Profile button in bottom right corner");
delay(2000);

// Open notifications
Android.magicClicker("Notification bell icon");
delay(1500);

// Submit a form
Android.magicClicker("Green submit button");

Email Automation Example

// Open Gmail and compose
Android.launchGmail();
delay(3000);

Android.magicClicker("Compose button in the bottom right corner");
delay(2000);

// Use with other functions to complete the task
Android.simulateTypeInSecondEditableField("user@example.com");

Vision API Details

Moondream API Integration

The function uses Moondream’s point detection API: Endpoint: https://api.moondream.ai/v1/point Request Format:
{
  "image_url": "data:image/jpeg;base64,<base64_image>",
  "object": "<your_description>"
}
Response Format:
{
  "request_id": "abc-123",
  "points": [
    {
      "x": 0.75,
      "y": 0.85
    }
  ]
}

Coordinate Transformation

The API returns normalized coordinates (0.0 to 1.0). These are converted to screen pixels:
val pixelX = (coordinates.x * 720).toFloat() + 50f
val pixelY = (coordinates.y * 1600).toFloat()
Note: The values 720 and 1600 are based on the device screen dimensions.

Best Practices

Be Specific: Include position hints like “top left”, “bottom right”, or “in the center” for better accuracy.
Wait for UI: Ensure the target element is visible on screen before calling magicClicker(). Use delay() after navigation or animations.

Writing Good Descriptions

Good descriptions:
  • “Blue send button in bottom right”
  • “Profile icon with circular avatar”
  • “Red close X button at top”
Poor descriptions:
  • “button” (too vague)
  • “the thing” (not descriptive)
  • “click here” (no visual information)

Error Handling

The function includes built-in error handling:
  • No Screenshot Available: Returns early with voice feedback
  • Element Not Found: Moondream returns null, user is notified via speech
  • Activity Destroyed: Safely exits if the activity is no longer active
  • API Errors: Logged to console with error messages
if (coordinates != null) {
    // Click performed successfully
    speakText("Clicked on $description")
} else {
    speakText("Could not find $description on screen")
}

Performance Considerations

  • Asynchronous Execution: Runs in a coroutine to avoid blocking the UI thread
  • Image Compression: Screenshots are compressed to 85% JPEG quality for faster transmission
  • Timeout: Network requests have 30-second timeout limits
  • Memory Management: Bitmaps are properly recycled after use

Tracking and Analytics

Each magic click is tracked to Firebase for analytics:
trackMagicRun(
    "magicClicker", 
    description,
    "{\"x\": ${pixelX.toInt()}, \"y\": ${pixelY.toInt()}}"
)
  • magicScraper() - Extract text/data from screen using natural language
  • simulateClick() - Direct coordinate-based clicking without vision AI
  • launchGmail() - Launch specific apps before using magicClicker

See Also