Bots

How Pie’s Agents Work

Pie’s automation agents are fully autonomous, visual-first systems designed to test applications without human intervention. Unlike traditional DOM-based tools, our agents rely entirely on visual data to understand and interact with the interface.

Functional Model: Visual Cognition

The agent operates using a pure computer-vision approach, independent of the underlying code structure.

  • Screenshot-Based Intelligence: The agent does not read the DOM tree. Instead, it captures high-fidelity screenshots of the interface and analyzes pixel data to identify buttons, forms, and navigation paths.

  • Visual Inference: By processing these screenshots, the agent infers the state of the application and determines the next logical action based on visual cues, just as a human user would.

Operating Modes

The agent cycles through distinct phases to ensure robust testing:

ModeDescription
ObserveCaptures visual snapshots of the current state to understand the UI layout and context.
ActPerforms UI interactions (clicks, typing, gestures) based on the visual analysis.
AnalyzeVisually verifies the outcome of actions, detecting regressions or errors based on visual changes rather than code exceptions.
HandoffUploads results and visual artifacts to Pie for review.

Key Behaviors

End-to-End Autonomous Control

The system is designed for fully autonomous execution.

  • No Human in the Loop: Once a session begins, the agent takes full control of the environment. It manages the browser context independently, ensuring that the test proceeds from start to finish without any manual input or supervision.

Agentic & Probabilistic Execution

Unlike rigid, script-based automation, Pie’s agents are probabilistic and agentic.

  • Dynamic Paths: Because the agent makes decisions in real-time based on visual input, two consecutive runs may not be identical. The agent adapts to slight variances in the UI or timing, finding the best path forward dynamically rather than following a hard-coded sequence.

  • Resilience: This non-deterministic approach allows the agent to navigate through unexpected pop-ups or layout shifts that would typically break standard “deterministic” scripts.