Microsoft Just Built a Framework to Test AI Agents.
Short answer
At Microsoft Build 2026, Microsoft shipped ASSERT (policy-driven agent evaluation) and ACS (runtime agent governance) — because the agent that writes the code cannot be the agent that grades the code. That is the same principle behind DevAssure O2: independent, browser-based testing on every PR, written in plain English, with no scripts to maintain.
At Microsoft Build 2026, Microsoft announced something that quietly confirms the core thesis behind DevAssure: as AI agents take over more of the software development lifecycle, the agent that writes the code cannot be the agent that grades the code.
The announcement was a pair of open-source projects — ASSERT (Adaptive Spec-driven Scoring for Evaluation and Regression Testing) and the Agent Control Specification (ACS) — designed to give developers a portable, framework-agnostic way to evaluate and govern AI agents before their behavior ships to production. Coming from the company now positioning itself as the "agent-first" platform for enterprise development, this is a meaningful signal about where the industry is heading.
I want to walk through what Microsoft actually shipped, why it matters beyond agent safety, and what it means for teams where 30–40% of code is already AI-generated — because the validation gap Microsoft just named at the agent layer is the same gap most engineering teams still have at the application layer.
