Skip to main content

10 posts tagged with "CI/CD"

View All Tags

How to Set Up Vibe Testing on Every Pull Request: A Step-by-Step Guide

Divya Manohar
Co-Founder and CEO, DevAssure

TL;DR

You can add agent-driven E2E testing to your repository in under two minutes by dropping one GitHub Actions workflow file into .github/workflows/. Once it is in, every PR triggers an AI agent that reads the diff, generates targeted end-to-end tests, runs them on real browsers, and posts results back as a GitHub check. No Playwright scripts, no Cypress maintenance, no QA bottleneck. This guide walks through the exact setup, what each stage does, and how to verify it is working.

Every team that adopted vibe coding eventually hits the same wall: code ships faster, but validation does not keep up. This post is the implementation manual for closing that gap on every pull request.

How to Test Cursor-Generated Code: A Developer's Guide to Catching AI-Written Bugs Before They Ship

Divya Manohar
Co-Founder and CEO, DevAssure

TL;DR

Cursor lets you ship features roughly 5× faster, but AI-generated code contains 1.7× more major issues than hand-written code, and 63% of developers using AI tools now spend more time debugging. The fix is automated end-to-end (E2E) testing that runs inside your IDE and on every pull request. DevAssure's Cursor extension plus GitHub Action gives you both — with zero test scripts to maintain.

Cursor changed the default for how features get built. What has not changed is that untested code still breaks production — it just gets there faster.

The Quiet Death of the Test Script

Divya Manohar
Co-Founder and CEO, DevAssure

For twenty years, automated testing meant writing more code. That era is ending — and most teams haven't noticed yet.

The first automated test I ever wrote was in Playwright. It was 2012. It launched a browser, filled in a login form, and checked that the dashboard loaded. It passed. I felt like a wizard.

More than a decade later, the fundamental contract hasn't changed. To test software, you write more software. You describe, in code, what your code is supposed to do. Then you maintain that second codebase forever.

We've built entire careers, conferences, certifications, and consultancies on this premise. Selenium. Cypress. Playwright. Test pyramids. BDD. Page object models. The whole apparatus rests on a single assumption: humans must specify, in writing, what to test.

That assumption is quietly dying.

The Hidden Bill - What It Actually Costs to Use Your Coding Agent as Your Testing Agent

Divya Manohar
Co-Founder and CEO, DevAssure

A CTO told me last month, very pleased with himself:

"We're already paying $200/month per dev for Claude. Testing is basically free now — we just ask Claude to also write the tests."

I asked him to pull up his Anthropic bill. The number was 14x what he'd budgeted at the start of the quarter. And his team still hadn't shipped the regression suite.

This is the most expensive trap in the AI tooling stack right now, and it's expensive precisely because it looks free. If you've already bought a coding agent, asking it to do double duty as a testing agent feels like the obvious move. One subscription, one workflow, one bill.

Except there is no "one bill." There are six.

Why Your Coding Agent Can't Be Your Testing Agent

Divya Manohar
Co-Founder and CEO, DevAssure

Last week, on a customer call, a CTO asked me the question I now get every single week:

"I'm already using Claude to write my code. Why can't I just point the same agent at the code and have it test itself?"

It's a fair question. If one AI can write a React component, surely it can write the test for that component too. The economics look seductive — one tool, one workflow, one bill.

But here's an insight:

Testing your own PR is like proofreading your own essay. You'll read it 10 times. You'll miss the same typo 10 times. Because your brain autocorrects what it wrote.

That insight is what this blog is about. And it explains why a coding agent, no matter how capable, is structurally the wrong tool to verify its own work.