Skip to main content

How to Automatically Test Every Pull Request in 2026 (Without Writing a Single Test)

Divya Manohar
Co-Founder and CEO, DevAssure

Short answer

To automatically test every pull request on GitHub in 2026, add a pull_request workflow in GitHub Actions. The usual path is Playwright or Cypress plus tests you write and maintain. The alternative is an autonomous testing agent that reads the PR diff, generates E2E tests for that change, runs them, and posts a check — with zero test files in your repo.

Every team wants E2E tests on every PR. Almost none actually has them — because someone has to write those tests, fix them when the UI changes, and defend a thirty-minute CI job that still flakes.

If you are the developer opening the PR, that someone is often you, after hours, clicking re-run on a red check you do not trust.

This guide is for engineers who want auto test before merge without turning into the team's unpaid test maintainer. We will walk through the standard run tests on pull request GitHub Actions setup (Playwright and Cypress), name the real cost honestly, then cover the path where you did not write a single test and validation still runs on every PR.

Why PR testing breaks down in practice

The policy is simple: nothing merges without green checks. The reality is messier.

Flaky selectors and false reds

Playwright and Cypress are solid tools. Your tests are not flaky because the framework is bad. They are flaky because they encode today's DOM — and tomorrow's PR changes the DOM.

A renamed data-testid, a loading spinner that sometimes takes 200ms longer, a modal that moved behind a feature flag — the suite goes red. You did not break checkout. The test did.

Developers learn to ignore the check or hit Re-run jobs until it passes. That is worse than no gate.

Maintenance is a second job

E2E tests on every PR only work if the suite stays green. When product ships three UI tweaks a week, QA (or you) spend more time updating locators than writing features.

The backlog looks like this:

  • 14 tests skipped with fixme
  • 6 tests quarantined in a non-blocking job
  • 1 required check everyone hates but leadership will not remove

Slow CI blocks merges

Full regression on every PR does not scale. Teams slice the problem:

  • Run smoke on PR, full suite nightly
  • Run E2E only on main
  • Run tests only when someone touches /e2e

Each compromise means the PR you are merging was not fully validated.

That is the gap. Not missing GitHub Actions — missing sustainable coverage at PR time.

The traditional approach: GitHub Actions + Playwright or Cypress

Tool-agnostic setup guides (including strong posts from teams like Shiplight and Oneuptime) show how to wire browsers into Actions. That is worth doing if you are committed to owning a script suite.

Below is a minimal, production-shaped pattern you can paste today.

Prerequisites

  • A GitHub repo with pull requests enabled
  • A deploy preview or staging URL the runner can reach (or a service container running your app in CI)
  • Node.js project with Playwright or Cypress already scaffolded

Playwright: test every PR on GitHub Actions

Create .github/workflows/e2e-playwright.yml:

name: E2E on pull request

on:
pull_request:
branches: [main]

concurrency:
group: e2e-${{ github.head_ref }}
cancel-in-progress: true

jobs:
playwright:
runs-on: ubuntu-latest
timeout-minutes: 30
steps:
- uses: actions/checkout@v4

- uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'

- name: Install dependencies
run: npm ci

- name: Install Playwright browsers
run: npx playwright install --with-deps

- name: Run Playwright tests
run: npx playwright test
env:
BASE_URL: ${{ secrets.STAGING_URL }}
CI: true

- uses: actions/upload-artifact@v4
if: failure()
with:
name: playwright-report
path: playwright-report/
retention-days: 7

What this gives you: real browsers, artifacts on failure, PR-triggered runs. What it does not give you: tests. You still author *.spec.ts files, maintain selectors, and decide what belongs in the PR slice versus nightly.

Cypress: run tests on pull request (GitHub Actions)

name: Cypress E2E on PR

on:
pull_request:
branches: [main]

jobs:
cypress:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- uses: cypress-io/github-action@v6
with:
build: npm run build
start: npm run start
wait-on: 'http://localhost:3000'
browser: chrome
env:
CYPRESS_baseUrl: http://localhost:3000

- uses: actions/upload-artifact@v4
if: failure()
with:
name: cypress-screenshots
path: cypress/screenshots

Cypress can boot your app inside the job (start + wait-on) — useful for Next.js / React / Vue SPAs when you do not have an external preview URL. You still own cypress/e2e/** forever.

Making the check required

In Settings → Branches → Branch protection for main, enable Require status checks to pass and select playwright or cypress (your job name). That is how auto test before merge becomes policy.

For architecture background, see GitHub Actions for faster releases.

The real cost: who writes these tests?

YAML is the easy part. The expensive part is organizational.

QuestionWhat usually happens
Who writes the first 50 E2E tests?Whoever had time — often a senior dev or a QA hire
Who updates them when Settings moves to a drawer?Same person, between feature work
Who triages flaky reds at 4 p.m. Friday?Whoever merged last
Who adds coverage for the PR shipping Monday?Frequently nobody — merge with fingers crossed

Automatically test pull requests sounds like a CI problem. It is a headcount and attention problem.

Teams hit one of three walls:

  1. No suite — PRs merge with unit tests only; production catches regressions.
  2. Stale suite — tests exist but nobody trusts them; required checks get disabled.
  3. Growing suite — coverage improves until maintenance eats the team; velocity drops.

Posts that stop at “add this workflow” skip the wall every staff engineer has already hit.

If your interview answer for “how do you test PRs?” is still “we should add Playwright,” you are describing wall #1. If your answer is “we have Playwright but it is always red,” you are on wall #2.

The useful question for 2026: what if the tests were generated from the PR itself — and nothing lived in the repo to rot?

The autonomous alternative: tests from the change, not from the repo

Autonomous testing means an agent reads the diff, decides what user-visible behavior could break, generates tests for that scope, runs them in a real browser, and posts results on the PR. No tests/e2e/checkout.spec.ts to update when marketing changes a headline.

That is different from:

  • Running existing tests in Actions (Playwright/Cypress guides)
  • Recording tests in a low-code tool (still maintenance)
  • Asking Copilot to write a spec (same blind spots as the code it helped write)

The agent is independent of the author. It does not share your session context or assumptions. It validates the diff the way a careful reviewer would — by exercising behavior, not by re-reading your TypeScript.

This model is what we call shift smart: keep validation at the PR, remove the maintenance tax shift left accidentally dumped on developers.

For a deeper product overview, see the O2 testing agent. Below is the implementation path on GitHub.

Step-by-step: automatically test PRs with DevAssure O2

DevAssure O2 is an autonomous testing agent built for test every PR github workflows. You add one Actions file and a secret — you do not add a test framework to the repo.

1. Create a DevAssure token

Sign up at app.devassure.io, generate an API token, and add it in GitHub:

Settings → Secrets and variables → Actions → New repository secret

  • Name: DEVASSURE_TOKEN
  • Value: your token

2. Add the workflow (real YAML)

Create .github/workflows/devassure-o2.yml:

name: DevAssure O2

on:
pull_request:
branches: [main]

concurrency:
group: devassure-o2-${{ github.head_ref }}
cancel-in-progress: true

jobs:
test:
runs-on: ubuntu-latest
timeout-minutes: 45
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0

- uses: devassure-ai/devassure-action@v1
env:
DEVASSURE_TOKEN: ${{ secrets.DEVASSURE_TOKEN }}

Three lines matter for automatically test pull requests correctness:

  • fetch-depth: 0 — full git history so the agent can diff against main
  • pull_request — fires when you push to an open PR, not only on open
  • DEVASSURE_TOKEN — never hard-code; use the secret

Point the agent at a reachable app via repository configuration or environment variables your team already uses for preview deploys (Vercel, Netlify, custom staging). The agent exercises a running URL — same requirement as Playwright with BASE_URL.

Marketplace listing: devassure-ai/devassure-action.

3. What runs when you push

On each PR update, the agent typically:

  1. Reads the diff — files, functions, downstream impact
  2. Maps affected flows — checkout touched, settings probably safe
  3. Generates tests — plain-English scenarios scoped to the change (optional YAML in .devassure/tests/ if you want explicit control)
  4. Executes in headless Chrome — semantic element resolution, not a brittle selector file in your repo
  5. Posts a GitHub check — pass/fail, failure detail, session replay links

Coverage is change-scoped, which is why PR feedback often lands faster than “run all 400 tests.”

For a longer walkthrough with screenshots, see How to set up vibe testing on every pull request.

Branch protection → require status check DevAssure O2. Now auto test before merge is enforced the same way you would enforce Playwright — except nobody is on the hook to fix getByRole('button', { name: /Submit/i }) when the label changes to Continue.

5. Optional: test before push

Install the Invisible (QA) Agent in VS Code or Cursor, or run npm i -g @devassure/cli. Same agent, local feedback, same logic as CI.

What happens on a failing PR

When the agent finds a real regression, the developer sees a failed required check on the PR — same mental model as a broken unit test or Playwright run.

Typical experience in GitHub:

  • Checks tab: DevAssure O2 — failed, with a link to logs
  • Summary: how many scenarios ran, which flow failed, plain-language step that did not pass
  • Artifacts / session: screenshot or replay of the browser state at failure (so you debug behavior, not a stack trace from line 42 of a spec you did not write)

Example failure you might see annotated on the PR:

DevAssure O2 — 26 passed, 1 failed

✗ checkout.apply_promo_on_retry
Expected: discount visible after payment retry
Observed: total unchanged after retry path

You fix the product bug, push again, the agent re-runs on the new diff. You do not “update the test” because there is no permanent test file for that promo edge case — the agent regenerates from the new code.

That loop is what makes e2e tests on every PR viable for small teams: the cost of adding coverage for a new edge case is not “create a Jira for QA.”

Contrast with Playwright failure mode: file tests/checkout.spec.ts line 88, selector timeout, you are not sure if the app or the test is wrong.

Playwright/Cypress vs autonomous PR testing

Playwright/Cypress on PRAutonomous agent (O2)
Tests in repoYes — you own every fileNo — generated per PR
Maintenance on UI changeHigh — locators breakLow — semantic execution
Who authors coverageYour teamAgent from diff
Setup complexityFramework + browsers + specsWorkflow + secret
Best whenYou need pixel-perfect custom specsYou need reliable PR gates fast

Many teams run both during migration: agent on every PR, legacy suite nightly.

Frequently asked questions

Yes. The agent reads the PR diff, so only packages and paths touched in that change are in scope. In a Turborepo or Nx monorepo, a change limited to apps/web still maps to web flows — you are not forced to run the entire company-wide suite on every PR.

Ready to gate merges without owning another test repo?

See how DevAssure does this in 5 minutes →