Skip to main content

From Release QA to Autonomous QA: Why We Built DevAssure O2 After Years of Fighting PR Bottlenecks

Badri Varadarajan
Co-Founder and COO, DevAssure

Short answer

Engineering velocity is rarely limited by how fast people write code. It is limited by testing friction on pull requests — flaky suites, run-everything CI, and release gates that force a choice between speed and quality. DevAssure O2 was built to validate every PR from intent and impact inside the developer workflow, without maintaining brittle test scripts.

For years, I led teams focused on engineering productivity and release management across startups and enterprises. My primary job was not just shipping features — it was ensuring release trains moved predictably, developers remained productive, and quality gates did not become velocity killers.

One thing became very clear over time:

Engineering velocity was rarely limited by coding speed. It was limited by testing friction.

At leadership reviews, the metrics CTOs consistently wanted visibility into were not just release dates or sprint velocity. The focus was increasingly shifting to developer productivity metrics:

  • PR velocity per developer
  • Bottlenecks causing PR slowdowns
  • Percentage of pull requests passing validation on the first attempt
  • Green build rates at first pass
  • Release cycle delays caused by quality issues
  • Patch requests post feature freeze

These metrics told the real story behind engineering efficiency. They also pointed to a gap between how we talked about quality (automation, coverage, gates) and how quality behaved in the pipeline (queues, reruns, exceptions).

That gap is why we built DevAssure O2 — to move from release QA as a late, heavy checkpoint to autonomous QA at every pull request.

The productivity gap hidden inside PR metrics

One observation repeatedly stood out.

Developers who had strong testing discipline and understood quality early were able to consistently push 2–3 production-ready PRs every week.

Others struggled.

Not because they wrote worse code.

But because they spent enormous time fighting unstable validation pipelines.

A simple PR update often triggered:

  • Flaky UI tests
  • Full regression reruns
  • Broken locators
  • Environment instability
  • CI failures unrelated to the code change itself

Instead of coding, developers became part-time test maintainers.

The result?

  • PR queues slowed down
  • Quarterly goals slipped
  • Patch requests increased to hit deadlines
  • Eventually, release trains suffered

If you are measuring PR health today, compare first-pass green rate and time-to-merge across developers with similar scope — the spread is often testing pipeline debt, not skill.

Symptom in PR metricsLikely root cause
Low first-pass validation successFlaky or over-broad CI
Long time in “waiting for CI”Full-suite reruns on small diffs
High patch rate after freezeGates bypassed under deadline pressure
Uneven PR velocity across teamUneven exposure to test maintenance load

Related: How to automatically test every pull request — without adding another script library to own.

The cost of “run everything” testing

One of the biggest frustrations I experienced as a release leader was the lack of impact awareness.

A developer modifies a small frontend component.

The system responds by rerunning 10,000 tests.

Why?

Because nobody knows the impact radius.

The engineering system defaults to safety:

Run everything. Hope nothing breaks.

This created multiple problems:

1. Expensive CI costs

Thousands of tests rerunning for every change meant huge compute consumption and long execution windows. Finance sees “CI bill up 40%”; engineering sees “same defects in prod.”

2. Slower developer feedback

Developers waited hours for results on changes affecting only a tiny portion of the application. Feedback arrived after context switched — and after three other PRs stacked up behind the same queue.

3. Reduced feature velocity

Long PR validation cycles meant slower merges and delayed releases. Sprint points closed; release train dates slipped anyway.

4. More deadline pressure

As timelines tightened, teams inevitably requested exceptions:

  • Bypass PR rules
  • Skip validations
  • Reduce quality gates
  • Patch after release

As release managers, we all know where this ends:

Production defects, late nights, customer escalations, and hidden engineering tax.

Impact-based validation — running what the change can break, not the entire catalog — is the architectural opposite of run-everything. That is the model O2 uses on each PR.

Release management became a trade-off between speed and quality

Keeping release trains moving often meant making uncomfortable decisions.

  • Do we hold the release because tests are flaky?
  • Do we bypass failing checks?
  • Do we accept risk and patch later?
  • Do we rerun everything again?

These were not isolated incidents.

This was operational reality.

Ironically, the automation frameworks designed to improve quality sometimes became the bottleneck themselves.

The maintenance overhead kept growing:

  • Fragile scripts
  • Brittle locators
  • Framework upgrades
  • Flaky executions
  • Endless test maintenance cycles

Engineering teams spent more effort maintaining tests than validating outcomes.

Shift-left was supposed to fix this. In practice, many organizations shifted when tests ran but not how much friction they added per PR. The suite still grew. The blast radius of each run did not shrink.

That frustration eventually became the foundation for O2.

Why DevAssure O2 was born

DevAssure O2 was born from years of dealing with release bottlenecks, PR delays, and the constant tension between velocity and quality.

The goal was simple:

  • What if developers could validate every PR directly inside their workflow without carrying the burden of test frameworks?
  • What if validation happened based on intent and impact, not scripts?
  • What if teams could ship confidently without maintaining thousands of brittle tests?

That became the vision behind O2.

What O2 does on every pull request

O2 enables teams to validate and ship every PR within the development workflow — without:

  • Test frameworks you must standardize across repos
  • Test scripts that break when design moves a button
  • Brittle locators to update every sprint
  • Massive regression reruns “just in case”

Instead, it focuses on:

  1. Understanding code changes — the diff and what it touches
  2. Identifying impact areas — flows and components in the blast radius
  3. Executing only what matters — behavioural checks scoped to that change
  4. Catching functional issues early — in a real browser, before merge

The objective is not “more test automation.”

The objective is development acceleration with quality built into every PR.

# Minimal GitHub Actions gate — autonomous agent on each PR
name: O2 PR validation

on:
pull_request:

jobs:
o2:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: devassure-ai/devassure-action@v1
with:
token: ${{ secrets.DEVASSURE_TOKEN }}
# Point at your PR preview / staging URL
app_url: ${{ vars.STAGING_URL }}

O2 posts pass/fail and behavioural findings on the pull request — the same place developers already negotiate review. Release managers get signal without owning a separate QA queue for every train.

For a full setup walkthrough, see DevAssure O2 on GitHub Marketplace and GitHub Actions for faster release velocity.

Release QA vs autonomous QA (at a glance)

Traditional release QAAutonomous QA (O2 on PR)
When validation runsLate cycle, batch regressionEvery pull request
ScopeOften entire suiteImpact-mapped to the diff
Who maintains coverageQA + SDET script estateAgent generates per change
Failure mode under deadlineSkip gates, patch laterNarrower runs finish faster with meaningful signal
Success metric“Suite green before ship”First-pass PR validation, merge confidence

This is the same shift described in The quiet death of the test script — from owning files to owning outcomes.

The future of engineering productivity is quality-native development

Engineering leaders increasingly track developer velocity.

But velocity cannot improve if quality systems slow developers down.

The next evolution is moving from:

FromTo
Script maintenanceOutcome validation
Run all testsRun impacted validations
Post-development QAIn-development quality

The future release manager should not have to choose between speed and quality.

That was the problem we lived with. And that is why O2 was born.

Because engineering productivity is not just about writing code faster. It is also about helping developers ship confidently, continuously, and without friction.

When AI-assisted development increases PR size and frequency — vibe-coded changes included — the release train depends even more on PR-level behavioural gates, not heavier end-of-cycle QA. Pair O2 on PRs with your existing CI; many teams keep nightly legacy suites while the agent covers what changed this week.

Frequently asked questions

Release QA typically validates software late in the cycle — often with full regression suites, manual sign-off, and exceptions when deadlines slip. Autonomous QA validates each pull request inside the development workflow: an agent like DevAssure O2 reads the diff, maps impact, runs only relevant behavioural checks, and reports on the PR without maintaining thousands of scripts.