Why developers should stop writing Playwright tests

Co-Founder and CTO, DevAssure

Playwright is one of the best browser automation frameworks ever built. It is fast, reliable, developer-friendly, and powerful. If your team knows exactly what to test, how the UI is structured, which locators are stable, and who will maintain the tests forever, Playwright is an excellent choice.

But that is also the problem.

Most product teams do not struggle because Playwright is weak. They struggle because writing and maintaining browser tests is still manual engineering work:

Every test needs scripts.
Every script needs locators.
Every locator can break.
Every UI change needs test maintenance.
Every failure needs someone to investigate whether the app is broken or the test is broken.

At some point, the test suite becomes another codebase to maintain.

Agent-based testing changes this model. Instead of writing scripts that tell the browser exactly what to click, type, and assert, you give the agent an objective. The agent understands the application, interacts with the UI, adapts to changes, validates behavior, and reports issues.

That is why developers should start moving away from writing Playwright tests manually.

The real problem is not Playwright — it is test maintenance

A typical Playwright test looks clean when it is first written:

import { test, expect } from '@playwright/test';

test('user can create a project', async ({ page }) => {
  await page.goto('https://app.example.com');
  await page.getByLabel('Email').fill('user@example.com');
  await page.getByLabel('Password').fill('password123');
  await page.getByRole('button', { name: 'Login' }).click();
  await page.getByRole('button', { name: 'New Project' }).click();
  await page.getByLabel('Project Name').fill('Marketing Website');
  await page.getByRole('button', { name: 'Create' }).click();
  await expect(page.getByText('Marketing Website')).toBeVisible();
});

Then the UI changes. The button text changes from New Project to Create Project. The form moves into a modal. The label changes from Project Name to Name. The success message becomes a toast. The app adds an onboarding step before the dashboard.

Now the test fails. But did the product break? Maybe not — the test may simply be outdated.

This is where teams start losing trust in browser tests. For a deeper look at why scripted suites rot over time, see The Quiet Death of the Test Script.

Scripted tests are not adaptive

Traditional Playwright tests are deterministic scripts. They do exactly what they are told:

await page.getByRole('button', { name: 'New Project' }).click();

This is precise, but it is not intelligent. If the button changes to Create Project, the test fails. If the button moves to a menu, the test fails. If the flow adds an extra confirmation step, the test fails.

A human tester would adapt. They would understand that Create Project and New Project probably mean the same thing, open the menu, complete the confirmation step, and still validate whether the user can create a project.

Agent-based testing works more like that. Instead of fixed commands, you give the agent intent:

Login as a valid user.
Create a new project called "Marketing Website".
Verify that the project is created successfully and is visible in the project list.

The agent inspects the page, understands available actions, chooses the right controls, and continues even if the UI structure changes.

Scripted Playwright	Agent-based
`await page.getByRole('button', { name: 'New Project' }).click();`	Create a new project named "Marketing Website" and confirm it appears in the list.
Developer encodes every UI interaction	Developer defines outcome and validation intent

The developer no longer needs to encode every UI interaction as a brittle script.

No scripts, no locators, no constant rewriting

Most browser tests eventually become locator maintenance work:

await page.getByRole('button', { name: 'Submit' }).click();
// later...
await page.getByTestId('submit-button').click();
// later...
await page.locator('[data-testid="submit-button"]').nth(1).click();

The test becomes harder to understand than the feature it covers.

Agent-based testing removes this dependency. You do not need:

await page.locator('#checkout-form > div:nth-child(3) > button').click();

You can say:

Complete checkout using the default shipping address.
Apply the available coupon if shown.
Place the order.
Verify that the order confirmation page is displayed.

Locators are still useful internally for execution. Developers should not have to write and maintain them manually for every test case.

UI changes should not break functional testing

In a healthy product team, UI changes happen frequently: buttons move, layouts change, forms are redesigned, tables become cards, modals become side panels.

With scripted tests, even harmless UI changes can break tests. A Playwright test may depend on a specific table structure:

await expect(page.locator('table tbody tr').first()).toContainText('Invoice #1001');

If the table becomes a card layout, the functionality may still work — but the test fails.

An agent-based test can be written at the functional level:

Open the invoices page.
Verify that invoice #1001 is present.
Open the invoice and confirm the amount, customer name, and payment status are correct.

The test validates behavior, not whether the invoice is shown in a table, card, list, or searchable view.

Agents can validate non-deterministic outputs

Modern applications include AI-generated text, summaries, recommendations, charts, and natural language responses. Traditional Playwright assertions are not designed for this:

await expect(
  page.getByText('Your project risk is high because of missing test coverage')
).toBeVisible();

This works only if the exact text appears. AI-generated content may say:

This project has a high risk score because several critical flows are not covered by tests.

The meaning is the same; the text is different. A scripted test may fail. An agent can validate meaning:

Ask the AI assistant to summarize the project quality.
Verify that the response mentions high risk and explains that missing test coverage is one of the reasons.
The wording does not need to match exactly, but the meaning should be correct.

Agents can validate images, charts, and visual output

Many products depend on visual correctness: revenue charts, dashboard warnings, AI-generated images, PDF previews, design layouts.

With Playwright alone, you often end up with screenshot comparisons:

await expect(page).toHaveScreenshot('dashboard.png');

Pixel comparison is useful but fragile. A small font or spacing change can fail the test even when the product is correct.

Agent-based validation can be semantic:

Open the analytics dashboard.
Verify that the revenue chart shows an upward trend for the last 6 months.
Verify that the failed test count is clearly visible.
Verify that a warning is shown if the failure rate is above 10%.

For an AI image app:

Generate an image with the prompt: "A red sports car parked near a beach during sunset."
Verify that the image contains a red car, beach-like background, and sunset lighting.

This kind of validation is difficult to express with traditional browser automation alone.

Playwright tests need ownership — and that is where teams fail

When production code breaks, ownership is usually obvious. When Playwright tests fail, it is often unclear:

Is it a product bug, a flaky test, a changed locator, or an environment issue?
Is the test outdated?
Should QA, frontend, backend, or the original author fix it?

Failures pile up. CI turns red often. Developers ignore failures. Tests get disabled, then skipped, then the whole suite becomes unreliable.

Teams do not abandon Playwright because it is bad. They abandon it because maintaining a large suite requires continuous ownership — and most teams lack a clean ownership model for test code.

Agent-based testing reduces this burden. Instead of hundreds of brittle scripts, teams maintain high-level intent:

Validate that a user can sign up, create an organization, add a project, and see the dashboard.

The agent handles execution. The test stays closer to product behavior than to another fragile implementation layer.

Playwright is not intelligent about test scope

If a developer changes one line of code, teams often still run a large suite because the runner does not understand product impact:

function calculateDiscount(total: number, coupon: Coupon) {
  return total * coupon.percentage / 100;
}

A traditional CI setup may still run:

npx playwright test

That can execute login, dashboard, checkout, invoice, admin, and notification tests — even when only discount logic changed. Playwright does not know which tests are relevant; it only runs what you ask.

Teams can tag tests manually:

test.describe('@checkout', () => {
  test('user can apply coupon', async ({ page }) => {
    // test logic
  });
});

Then run npx playwright test --grep @checkout. Someone still has to maintain the mapping between code changes and tags.

An agent can inspect the diff, understand that checkout discounts changed, and focus on relevant flows:

Code change detected in discount calculation.
Test the checkout flow with:
- valid coupon
- expired coupon
- percentage discount
- fixed amount discount
- cart total below coupon minimum
Validate that the final payable amount is correct.

Instead of running everything for every small change, agents can identify affected scope and test what matters. This is the same PR-native model described in why your coding agent can't be your testing agent.

Examples: Playwright vs agent instructions

Checkout with a coupon

Playwright:

import { test, expect } from '@playwright/test';

test('user can apply coupon during checkout', async ({ page }) => {
  await page.goto('https://shop.example.com');
  await page.getByRole('link', { name: 'Products' }).click();
  await page.getByText('Wireless Keyboard').click();
  await page.getByRole('button', { name: 'Add to Cart' }).click();
  await page.getByRole('link', { name: 'Cart' }).click();
  await page.getByLabel('Coupon Code').fill('SAVE10');
  await page.getByRole('button', { name: 'Apply' }).click();
  await expect(page.getByText('Coupon applied')).toBeVisible();
  await expect(page.getByText('$90.00')).toBeVisible();
  await page.getByRole('button', { name: 'Checkout' }).click();
  await page.getByLabel('Address').fill('123 Main Street');
  await page.getByLabel('City').fill('Chennai');
  await page.getByLabel('Zip Code').fill('600001');
  await page.getByRole('button', { name: 'Place Order' }).click();
  await expect(page.getByText('Order confirmed')).toBeVisible();
});

Agent instruction:

Test the checkout coupon flow.
1. Open the shopping app and add a product to the cart.
2. Apply coupon code SAVE10 and verify the discount.
3. Complete checkout with a valid shipping address and place the order.
4. Verify order confirmation.
Validation:
- Coupon reduces total by 10%.
- Final amount is correct.
- User sees a clear order confirmation.

The Playwright test tells the browser exactly what to do. The agent instruction defines what outcome to validate. The first is automation code; the second is testing intent.

AI-generated content

Playwright-style assertion:

await page.getByLabel('Message').fill('Summarize this customer complaint');
await page.getByRole('button', { name: 'Generate' }).click();
await expect(
  page.getByText('The customer is unhappy with delayed delivery')
).toBeVisible();

This can fail even when the AI response is valid.

Agent instruction:

Ask the AI assistant to summarize the customer complaint.
Complaint: "The customer ordered a laptop two weeks ago. Delivery was promised
within 5 days, but the order has not arrived. The customer contacted support
twice without a clear answer."
Validate that the summary includes:
- delayed delivery
- customer frustration
- lack of clear support response
Exact wording can differ; meaning should be preserved.

Testing after a code diff

A pull request changes src/features/billing/couponCalculator.ts. A scripted setup may not know what to run unless tests are manually tagged.

An agent can reason from the change:

The code change affects coupon calculation in billing.
Test:
- applying a valid coupon
- applying an expired coupon
- coupon below minimum cart value
- coupon on a multi-item cart
- final invoice amount
Do not run unrelated flows (profile update, password reset, dashboard filtering)
unless they are impacted.

This is where agent-based testing becomes especially useful in CI: focus on the impact of the change instead of blindly running a large static suite.

Cost: development + maintenance vs execution

Playwright tests have two major costs:

Development — writing tests, fixtures, selectors, test data, auth, mocks, and suite structure.
Maintenance — every UI, locator, flow, and redesign change creates more work.

In many teams, maintenance exceeds original development cost.

Agent-based testing shifts the cost structure. You may spend more per execution because agents use AI models and reasoning, but you save on:

Writing and rewriting test scripts
Maintaining locators and broken flows
Debugging false failures
Deciding what tests to run
Manually validating AI-generated or visual content

The better question is not whether one agent run is cheaper than one Playwright run. It is whether agent-based testing is cheaper than writing, maintaining, debugging, and owning a large Playwright suite over months or years. For many teams, the answer is yes.

Developers should own quality, not test scripts

Developers should care about quality. That should not mean manually maintaining hundreds of browser automation scripts.

Developers should define intent:

A user should be able to create a project.
A user should not invite an invalid email.
Billing should show the correct amount after a coupon.
An AI summary should capture the right meaning.
A user should be warned before deleting a workspace.

Agents convert intent into execution. Developers stay focused on product behavior, edge cases, and user impact — not selector maintenance.

When should you still use Playwright directly?

Playwright is still excellent for:

Low-level browser automation
Deterministic regression checks
Infrastructure-level browser testing
Highly stable flows
Custom test frameworks
Visual snapshots and network mocking
Advanced browser control

Many agent-based systems use browser automation frameworks internally. The difference is that developers do not need to manually write every Playwright script. Playwright can be the execution engine; the agent is the testing brain.

The future is intent-driven testing

Old model	New model
Write scripts → maintain locators → run full suite → debug failures → fix tests → repeat	Describe intent → agent identifies scope → executes → validates behavior → reports issues

Applications, UIs, and AI-generated content are changing faster. CI needs smarter, faster feedback. Static scripted tests cannot keep pace without a heavy maintenance burden.

Agent-based testing offers a better abstraction: not browser commands, not selectors, not fragile scripts — product behavior and validation intent.

Conclusion

Developers should stop writing Playwright tests manually because the cost of maintaining scripted browser tests is becoming too high.

The future of testing is not about writing more locators. It is about giving agents the right intent and letting them validate the product like an intelligent tester.

Agent-based testing can adapt to UI changes, validate non-deterministic outputs, understand test scope from code changes, reduce maintenance, and help teams keep quality high without building a second codebase of fragile tests.

DevAssure O2 helps teams run agent-based end-to-end testing from code changes — without writing or maintaining Playwright scripts.

The real problem is not Playwright — it is test maintenance​

Scripted tests are not adaptive​

No scripts, no locators, no constant rewriting​

UI changes should not break functional testing​

Agents can validate non-deterministic outputs​

Agents can validate images, charts, and visual output​

Playwright tests need ownership — and that is where teams fail​

Playwright is not intelligent about test scope​

Examples: Playwright vs agent instructions​

Checkout with a coupon​

AI-generated content​

Testing after a code diff​

Cost: development + maintenance vs execution​

Developers should own quality, not test scripts​

When should you still use Playwright directly?​

The future is intent-driven testing​

Conclusion​

Links​