Bots
How Pie’s Agents Work
Pie’s automation agents are fully autonomous, visual-first systems designed to test applications without human intervention. Unlike traditional DOM-based tools, our agents rely entirely on visual data to understand and interact with the interface.
Functional Model: Visual Cognition
The agent operates using a pure computer-vision approach, independent of the underlying code structure.
Screenshot-Based Intelligence: The agent does not read the DOM tree. Instead, it captures high-fidelity screenshots of the interface and analyzes pixel data to identify buttons, forms, and navigation paths.
Visual Inference: By processing these screenshots, the agent infers the state of the application and determines the next logical action based on visual cues, just as a human user would.
Operating Modes
The agent cycles through distinct phases to ensure robust testing:
| Mode | Description |
|---|---|
| Observe | Captures visual snapshots of the current state to understand the UI layout and context. |
| Act | Performs UI interactions (clicks, typing, gestures) based on the visual analysis. |
| Analyze | Visually verifies the outcome of actions, detecting regressions or errors based on visual changes rather than code exceptions. |
| Handoff | Uploads results and visual artifacts to Pie for review. |
Key Behaviors
End-to-End Autonomous Control
The system is designed for fully autonomous execution.
- No Human in the Loop: Once a session begins, the agent takes full control of the environment. It manages the browser context independently, ensuring that the test proceeds from start to finish without any manual input or supervision.
Agentic & Probabilistic Execution
Unlike rigid, script-based automation, Pie’s agents are probabilistic and agentic.
Dynamic Paths: Because the agent makes decisions in real-time based on visual input, two consecutive runs may not be identical. The agent adapts to slight variances in the UI or timing, finding the best path forward dynamically rather than following a hard-coded sequence.
Resilience: This non-deterministic approach allows the agent to navigate through unexpected pop-ups or layout shifts that would typically break standard “deterministic” scripts.