Introducing DevAssure O2 | The Autonomous Testing Agent for Every Pull Request
Software testing has changed. Your tools should too.
Two years ago, the best we could offer developers was a faster way to write test scripts. Record your actions, generate a script, maintain it when the UI changes, debug it when it breaks.
That model is reaching its limits.
In 2026, AI coding tools write 20% or more of new code at companies like Google and Microsoft. Developers ship multiple PRs per day. Release cycles have compressed from weeks to days. And the old approach — write scripts, maintain scripts, fix flaky scripts — can't keep pace with how fast code moves.
DevAssure started as a low-code test automation platform. We built a recorder, a visual test builder, self-healing locators, and test data management. Hundreds of teams used it to automate faster than they could with Selenium or Playwright alone.
But we kept hearing the same thing from our users:
"The automation is faster, but we're still spending 30–40% of our time maintaining tests."
That feedback led us to rethink the problem entirely. The result is O2 Agent — an autonomous testing agent that reads your code changes, generates targeted tests, and executes them in a real browser. No scripts. No selectors. No maintenance.
This post introduces what DevAssure O2 is, how it works, and why we built it.
What is DevAssure O2?
DevAssure O2 is an autonomous testing agent that tests your web application through the browser — the same way a human tester would — but driven by AI instead of scripts.
You write test cases in plain English. O2 reads them, navigates your application, interacts with UI elements, and validates outcomes. It finds functional bugs (features that don't work correctly) and usability bugs (UI issues that confuse or block users).
O2 is available as:
- A CLI tool (
@devassure/clion npm) — run tests from your terminal or CI/CD pipeline - A VS Code extension — write and execute tests directly from your editor
- A GitHub Action (
devassure-ai/devassure-action@v1) — test every PR automatically before merge
Why we moved beyond scripts
Traditional test automation — whether Selenium, Playwright, Cypress, or our own original platform — follows a common pattern:
- A human writes a test script with specific selectors (CSS, XPath, test IDs)
- The script runs against the application
- The UI changes → selectors break → a human fixes the script
- Repeat forever
This creates a second codebase — a body of test scripts that mirrors your application but doesn't ship features, doesn't generate revenue, and requires constant maintenance.
Our data from early DevAssure customers showed:
- 30–40% of QA team time went to maintaining existing test scripts, not writing new ones
- ~15% of E2E tests were flaky — failing not because of real bugs, but because of stale selectors, slow page loads, or environment inconsistencies
- Teams learned to ignore red pipelines because flaky tests eroded trust in the CI process
The scripts themselves became the problem. Not because they were poorly written, but because scripts are the wrong abstraction for a codebase that changes continuously.
O2 takes a different approach. Instead of scripts that encode how to interact with the UI (click this selector, wait for this element, assert this text), O2 uses intent-driven testing: you describe what the user does, and the agent figures out how to execute it.
How O2 works
Step 1: Install and initialize
npm install -g @devassure/cli
devassure login
devassure init
This creates a .devassure/ folder in your project with configuration files.
Step 2: Configure your application
Tell O2 where your app lives and how to authenticate:
# .devassure/test_data.yaml
default:
url: 'https://staging.yourapp.com'
users:
default:
user_name: 'testuser@yourapp.com'
password: 'test-password'
Describe your application so O2 understands the context:
# .devassure/app.yaml
description: >
A project management SaaS app. Users can create projects,
add tasks, assign team members, track progress, and generate
reports. Key flows include signup, project creation, task
management, team invitations, and billing.
rules:
- All projects must have a name and at least one member
- Tasks require a title and an assigned user
- Billing page should display the current plan and usage
Step 3: Write tests in plain English
# .devassure/tests/task-management/create_task.yaml
summary: Create a new task and verify it appears in the project board
steps:
- Log in with default credentials
- Navigate to the "Marketing Campaign" project
- Click "Add Task"
- Enter "Design landing page mockup" as the task title
- Set the priority to "High"
- Assign the task to "Priya Sharma"
- Set the due date to next Friday
- Click "Create"
- Verify that the task "Design landing page mockup" appears on the board
- Verify that the task shows "High" priority and is assigned to "Priya Sharma"
priority: P0
tags:
- task-management
- smoke
- regression
No selectors. No locators. No page.click('#btn-create-task-v2'). Just plain English that describes what a user does.
Step 4: Run
# Run all tests
devassure run-tests
# Run by tag
devassure run-tests --tag=smoke
# Run by priority
devassure run-tests --priority=P0
# Run a specific folder
devassure run-tests --folder=task-management
Step 5: Review results
devassure open-report --last
O2 opens a detailed report showing what passed, what failed, and session recordings of each test execution.
What makes O2 different from scripted automation
1. No selectors, no locators, no maintenance
Scripted tools require you to tell them exactly which DOM element to interact with: #checkout-btn, .cart-item >> nth=0, [data-testid="submit"]. When the UI changes — a CSS class is renamed, a component is refactored, a design system is updated — these selectors break.
O2 uses visual reasoning to identify elements. When your test says "Click the Submit button," O2 finds the Submit button the same way a human would — by looking at the rendered page. If the button moves, changes color, or gets a new CSS class, O2 still finds it. Because the intent hasn't changed.
This is why O2 works on platforms that are notoriously hard to automate with traditional tools:
- Salesforce Lightning — Shadow DOM, dynamic IDs, Aura/LWC hybrid rendering
- Flutter Web — the entire UI is pixels inside a
<canvas>, no DOM elements to select - Canvas-heavy applications — dashboards, charts, drag-and-drop interfaces
2. Intent-driven, not instruction-driven
A Playwright test says:
await page.locator('#coupon-input').fill('SAVE20');
await page.locator('#apply-btn').click();
const total = await page.locator('#total').textContent();
expect(total).toBe('$80.00');
An O2 test says:
- Apply coupon code "SAVE20"
- Verify the discount is applied
- Verify the total shows the correct amount after discount
Both test the same thing. But when the coupon input ID changes from #coupon-input to .promo-field, the Playwright test breaks. The O2 test doesn't — because "Apply coupon code" is still "Apply coupon code" regardless of the DOM structure.
3. Finds functional and usability bugs
O2 interacts with your application the way a real user does — clicking buttons, filling forms, scrolling, navigating, and resizing viewports. This means it catches bugs that scripted tests miss:
- A button that renders but doesn't respond to clicks (event handler not wired up)
- A form that submits without required field validation (missing error message)
- A layout that breaks on mobile viewports (text overflow, hidden buttons)
- A loading state that never resolves (infinite spinner after form submission)
- A success message that shows when the action actually failed (UI not reflecting server state)
These are the bugs your users hit on day one. O2 finds them before merge.
4. Works with TestRail
If your team manages test cases in TestRail, O2 integrates directly:
devassure run-tests --provider testrail --post-results --add-defects --attach-videos
O2 fetches test cases from your TestRail run, executes them in a real browser, posts results back, creates defects for failures with full context, and attaches session recordings. Your workflow in TestRail doesn't change — it just gets an execution engine.
5. CI/CD native
For teams using GitHub Actions:
# .github/workflows/ci.yml
name: CI
on: [pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: devassure-ai/devassure-action@v1
One line. Every PR tested automatically before merge. Results appear as a GitHub status check. O2 is also available on the GitHub Marketplace.
For other CI systems (Jenkins, CircleCI, GitLab CI, TeamCity):
devassure add-token $DEVASSURE_TOKEN
devassure run-tests --tag=regression --archive=./reports
Who is O2 for?
For developers
You shouldn't have to write and maintain Playwright scripts alongside your production code. O2 lets you write test intent in plain English. When your code changes, the tests don't break — because there are no selectors to go stale.
Add O2 to your GitHub Actions workflow and every PR gets tested before merge. You review O2's results instead of manually clicking through features.
For QA engineers
O2 doesn't replace QA — it replaces the repetitive parts of QA. Instead of spending 40% of your time maintaining test scripts and debugging flaky selectors, you spend that time on:
- Quality strategy — deciding what matters most to test and why
- Exploratory testing — the creative, curiosity-driven testing that finds bugs no automated tool would think to look for
- Domain expertise — understanding that a rounding error in a fintech app isn't just a bug, it's a compliance violation
Your role shifts from script maintenance to quality leadership.
For engineering leaders
O2 gives you a testing layer that scales with your team's output. As your developers adopt AI coding tools and ship more PRs, O2 keeps up — testing each PR without requiring additional QA headcount or a growing test maintenance backlog.
Key metrics from early O2 deployments:
- Test creation time: Days → automated per PR
- Maintenance overhead: Down 80%+
- Production incidents from missed regressions: Reduced significantly within the first month
- QA time reallocation: From 40% maintenance to 95% strategic quality work
Supported platforms and integrations
| Platform | How O2 works with it |
|---|---|
| Any web application | Browser-based testing via CLI or VS Code extension |
| GitHub Actions | One-line YAML integration, PR-native testing |
| TestRail | Fetch cases, execute, post results, log defects, attach videos |
| Salesforce | Flows and LWC tested through the UI — handles Shadow DOM natively |
| Flutter Web | Visual reasoning on canvas-rendered UI — no DOM needed |
| Jenkins / CircleCI / GitLab CI | CLI integration via devassure run-tests |
| VS Code | Write and run tests from your editor |
Security and compliance
DevAssure is SOC 2 Type II certified. We take data security seriously:
- Test credentials are stored encrypted and never leave your environment
- Session recordings are stored in your configured archive location
- O2 runs in an isolated browser context per test session
- No application data is sent to DevAssure's servers beyond what's required for AI reasoning
Read our full security policy →
Getting started
Option 1: CLI (developers)
npm install -g @devassure/cli
devassure login
devassure init
# Write your first test in .devassure/tests/
devassure run-tests
Option 2: GitHub Action (teams)
- uses: devassure-ai/devassure-action@v1
Option 3: VS Code Extension (visual workflow)
Install the DevAssure extension from the VS Code marketplace. Write tests in YAML, run them from the editor, view results inline.
Try O2 free
We're offering $50 in free credits for 30 days — no credit card required. That's enough to test your critical flows across multiple runs.
DevAssure is built in Chennai, India, by a team that spent a decade fighting the same testing problems we now solve. We're backed by Eximius Ventures and trusted by engineering teams across fintech, SaaS, and enterprise.
Questions? Email me at divya@devassure.io — I read every message.
Frequently asked questions
DevAssure O2 is an autonomous testing agent that tests your web application through the browser — the same way a human tester would — but driven by AI instead of scripts. You write test cases in plain English; O2 navigates your application, interacts with UI elements, and validates outcomes.
