Skip to main content

4. Image Automation: Giving Your Workflow "Eyes"

Navigating stubborn interfaces by seeing and acting on visual cues

Sophie avatar
Written by Sophie
Updated yesterday

In standard RPA, your workflow usually finds elements by reading the application's underlying code (like XPaths). However, when you encounter remote desktops, legacy software, or custom graphical interfaces, that internal code is often hidden or inaccessible. In these cases, "reading the code" isn't enough. Image Automation solves this by letting your workflow "see" the screen, identifying buttons and icons by their appearance rather than their programming.

Defining Image-Based Automation

At its core, this technology guides your automation by visually matching patterns on the display. Instead of looking for a code-based label, the automation engine scans the screen for a specific image you’ve captured. Think of it as teaching your workflow to find a "Submit" button by its picture. This allows you to interact with almost any software, regardless of how it was built or how its code is structured.

The Power of Visual Adaptability

The primary benefit of this approach is its ability to handle "un-scannable" environments. Whether it’s a custom-built accounting tool or a visually rich game interface, image recognition provides a reliable way to interact with what is visible to the human eye. Furthermore, Octoparse AI can be configured to recognize multiple variations of the same image, ensuring your workflow remains resilient even if colors or lighting change slightly during execution.

Mastering the Move and Click Commands

Once a visual match is confirmed, you drive the interaction using two primary actions. The Move Mouse to Image command glides the cursor to the center of the detected visual, which is useful for triggering tooltips or hover effects. The Click Image action then performs the physical input—whether it’s a single click to select or a double-click to launch an application. By combining these, you can replicate any manual mouse movement based entirely on what RPA sees on the screen.

Best Practices for Professional Workflows

To ensure high reliability, you should choose distinct, high-contrast images that won't be confused with other on-screen elements. It is also important to set a proper "tolerance" level; this allows the workflow to account for minor shifts in screen resolution or anti-aliasing. Most importantly, always design a fallback path. If the image cannot be found after a certain period, your automation should be instructed to refresh the screen or send a notification rather than stopping without explanation.

Summary: Accuracy Beyond the Code

Image-based automation extends your capabilities beyond traditional UI detection. By enabling your workflow to recognize and act on visual cues, you can conquer graphical interfaces and custom environments with the same precision as standard web pages. With strategic image selection and thoughtful sequencing, you can build human-like automation that navigates the most challenging software scenarios.

Did this answer your question?