Your Test Suite Passes. Your Users Found 6 Bugs. What Went Wrong?
Last month I had a conversation with a CTO that stuck with me.
Their team has 3,400 tests. 94% coverage. A CI pipeline that runs on every PR. Tests pass reliably — less than 2% flaky rate. By every industry metric, this is a well-tested codebase.
They also had 6 production bugs in the past 30 days. All reported by users. All missed by the test suite.
I asked him to send me the bugs. Here's what they were:
- A modal didn't close when clicking outside it. Users had to refresh the page to dismiss a confirmation dialog.
- A price displayed as $1,299 in the cart but charged $12.99. Decimal formatting inconsistency between the display component and the payment API.
- The "Export to CSV" button worked on Chrome, broke on Safari. Downloaded an empty file.
- A newly added field was editable for admins but displayed as read-only for regular users — the opposite of what it should have been. Permission logic was inverted.
- A search that returned 0 results showed the previous results instead of an empty state. Stale state from a React component not resetting.
- The onboarding flow skipped step 3 entirely when the user's timezone was UTC+0. A conditional that checked for a truthy timezone value — and 0 is falsy in JavaScript.
None of these are exotic edge cases. Every one of them is something a human using the app would hit within 5 minutes.
And none of them were caught by 3,400 tests at 94% coverage.
Why?
The gap between what tests verify and what users experience
I've been thinking about this gap for a long time — first as an engineer at Microsoft and eBay, and now as someone who builds testing tools. And I keep seeing the same pattern.
Most test suites verify that code does what the developer intended. They don't verify that the application behaves the way a user would expect.
This is a subtle but critical distinction. Let me break it down with the actual bugs from above.
Bug 1: Modal doesn't close on outside click
The team had a test for the modal:
test('confirmation modal opens and closes', () => {
render(<ConfirmationModal />);
fireEvent.click(screen.getByText('Delete'));
expect(screen.getByRole('dialog')).toBeVisible();
fireEvent.click(screen.getByText('Cancel'));
expect(screen.queryByRole('dialog')).not.toBeInTheDocument();
});
The test verifies: modal opens on trigger, closes on Cancel button. Both pass. Coverage counts this as tested.
What the test doesn't verify: clicking outside the modal. Pressing Escape. Clicking the overlay. These are all standard modal interactions that every user expects. But the developer who wrote the modal test was thinking about the modal's API (open/close), not about every way a user might dismiss it.
Bug 3: CSV export works on Chrome, breaks on Safari
The test:
test('export generates CSV file', async () => {
const blob = await exportToCSV(mockData);
expect(blob.type).toBe('text/csv');
expect(blob.size).toBeGreaterThan(0);
});
This test runs in Node.js via Jest. It tests the data transformation logic — correctly. The CSV content is valid. The test passes.
But in Safari, the browser's Blob constructor handles the type parameter slightly differently when combined with URL.createObjectURL. The download triggers, but the file is empty. This is a browser-specific runtime behavior that no unit test — running in Node.js — would ever catch.
Bug 6: Timezone 0 is falsy in JavaScript
if (user.timezone) {
showStep3(); // Timezone-specific onboarding
}
The developer's tests used timezone 'America/New_York' and 'Asia/Kolkata' — both truthy strings. Coverage tool says this branch is tested.
But UTC+0, represented as the number 0, is falsy. The conditional skips step 3 for every user in the UTC+0 timezone. This includes London, Lisbon, Accra, and Reykjavik.
The test covered the branch. It didn't cover the boundary.
Why coverage lies
I'm not the first person to say that coverage is a misleading metric. But I want to be specific about how it misleads, because the pattern is consistent.
Coverage measures which lines of code were executed. It doesn't measure which behaviors were validated.
You can have 100% line coverage and zero behavioral coverage. Here's a function with a "perfect" coverage test that validates nothing:
function calculateDiscount(price, code) {
if (code === 'SAVE20') return price * 0.8;
if (code === 'HALF') return price * 0.5;
return price;
}
// "100% coverage" test:
test('calculateDiscount', () => {
calculateDiscount(100, 'SAVE20');
calculateDiscount(100, 'HALF');
calculateDiscount(100, 'NONE');
});
Every line is executed. Coverage is 100%. But there are zero assertions. The function could return undefined for every input and the test would still pass.
This is an extreme example, but the pattern shows up in subtler forms everywhere:
- Tests that assert the function "doesn't throw" but don't check the return value
- Tests that verify a component "renders" but don't check what it renders
- Tests that call an API endpoint and check the status code but don't validate the response body
- Tests that verify state changes but don't verify the UI reflects those changes
In each case, the coverage tool says the code is tested. The user says otherwise.
The three layers most test suites miss
After looking at hundreds of production bug reports across multiple companies, I've noticed that most escaped bugs fall into three categories:
Layer 1: Interaction patterns
Users don't just click buttons in sequence. They:
- Click outside modals to dismiss them
- Double-click submit buttons (especially on slow connections)
- Use keyboard navigation (Tab, Enter, Escape)
- Resize their browser window mid-flow
- Switch between tabs and come back
- Hit the back button at unexpected moments
- Copy-paste into input fields (which doesn't trigger
onChangein some frameworks)
Most test suites test the "click button, see result" path. They don't test the 15 other ways a user might interact with the same element.
Layer 2: Cross-boundary behavior
The most costly bugs live at the boundary between two systems:
- Frontend displays a price; backend charges a different amount (bug #2 above)
- API returns data in one format; frontend expects another
- Database stores a value; cache returns a stale version
- Component A updates state; Component B doesn't re-render
Unit tests, by definition, test units in isolation. Integration tests can catch these, but most integration tests still mock the boundaries — they mock the API response, mock the database, mock the external service. The mock returns what the developer expects. The real system returns something slightly different.
Layer 3: Environment-specific behavior
Code that works in your test environment can break in production because:
- Different browser engines (Safari's Blob behavior, Firefox's date parsing)
- Different screen sizes (mobile viewport hides a button behind another element)
- Different data volumes (the page works with 10 items, hangs with 10,000)
- Different locales (number formatting, date formats, currency symbols, RTL text)
- Different timezones (the UTC+0 bug above)
Tests running in Node.js via Jest/Vitest can't catch browser-specific bugs. Tests running on a developer's MacBook can't catch mobile layout issues. Tests running with seeded data can't catch pagination performance problems.
What actually catches these bugs
I don't think the answer is "write more tests." The CTO I talked to has 3,400 tests. Adding 340 more wouldn't have caught bugs #1–6. The kind of testing matters more than the quantity.
Here's what I've seen work:
1. Test behaviors, not implementations
Instead of testing that handleClose() sets isOpen to false, test that a user can dismiss the modal in every way they'd try: Cancel button, outside click, Escape key, overlay click. The implementation can change; the expected behavior shouldn't.
The shift is from "does the function return the right value?" to "can the user accomplish the task?"
