Skip to main content

2 posts tagged with "PR Testing"

View All Tags

Microsoft Just Built a Framework to Test AI Agents.

Divya Manohar
Co-Founder and CEO, DevAssure

Short answer

At Microsoft Build 2026, Microsoft shipped ASSERT (policy-driven agent evaluation) and ACS (runtime agent governance) — because the agent that writes the code cannot be the agent that grades the code. That is the same principle behind DevAssure O2: independent, browser-based testing on every PR, written in plain English, with no scripts to maintain.

At Microsoft Build 2026, Microsoft announced something that quietly confirms the core thesis behind DevAssure: as AI agents take over more of the software development lifecycle, the agent that writes the code cannot be the agent that grades the code.

The announcement was a pair of open-source projects — ASSERT (Adaptive Spec-driven Scoring for Evaluation and Regression Testing) and the Agent Control Specification (ACS) — designed to give developers a portable, framework-agnostic way to evaluate and govern AI agents before their behavior ships to production. Coming from the company now positioning itself as the "agent-first" platform for enterprise development, this is a meaningful signal about where the industry is heading.

I want to walk through what Microsoft actually shipped, why it matters beyond agent safety, and what it means for teams where 30–40% of code is already AI-generated — because the validation gap Microsoft just named at the agent layer is the same gap most engineering teams still have at the application layer.

Introducing DevAssure O2 | The Autonomous Testing Agent for Every Pull Request

Divya Manohar
Co-Founder and CEO, DevAssure

Software testing has changed. Your tools should too.

Two years ago, the best we could offer developers was a faster way to write test scripts. Record your actions, generate a script, maintain it when the UI changes, debug it when it breaks.

That model is reaching its limits.

In 2026, AI coding tools write 20% or more of new code at companies like Google and Microsoft. Developers ship multiple PRs per day. Release cycles have compressed from weeks to days. And the old approach — write scripts, maintain scripts, fix flaky scripts — can't keep pace with how fast code moves.

DevAssure started as a low-code test automation platform. We built a recorder, a visual test builder, self-healing locators, and test data management. Hundreds of teams used it to automate faster than they could with Selenium or Playwright alone.

But we kept hearing the same thing from our users:

"The automation is faster, but we're still spending 30–40% of our time maintaining tests."

That feedback led us to rethink the problem entirely. The result is O2 Agent — an autonomous testing agent that reads your code changes, generates targeted tests, and executes them in a real browser. No scripts. No selectors. No maintenance.

This post introduces what DevAssure O2 is, how it works, and why we built it.