Flaky Tests from Race Conditions- Root Causes and Fixes
Flaky tests are a common challenge in software development, often leading to unreliable test suites and wasted time debugging. They're dangerous because they can mask the real issues in your codebase, leading to a false sense of security. This erodes trust in test automation and can slow down continuous integration processes. One of the primary cause of flaky tests is race conditions, which occur when the timing of events leads to unpredictable outcomes.
A race condition occurs when two or more processes or threads attempt to change shared data at the same time. This can lead to unpredictable behavior and bugs that are difficult to reproduce. In the context of testing, race conditions can cause tests to pass or fail intermittently, depending on the timing of events.

What is a Race Condition in Test Automation?
In test automation, race conditions can occur when tests are not properly synchronized with the application state. This is especially common in asynchronous web applications, where the user interface (UI) may be updated independently of the underlying network requests. For example, a test might attempt to click a button before it is fully rendered or enabled, leading to intermittent failures.
Deterministic vs Non-Deterministic Outcomes:
- Deterministic Outcomes: These occur when tests consistently produce the same results under the same conditions. For example, if a test always passes when a button is clicked after it is fully loaded, it is considered deterministic.
- Non-Deterministic Outcomes: These occur when tests produce different results under the same conditions. For example, if a test sometimes passes and sometimes fails when a button is clicked before it is fully loaded, it is considered non-deterministic.
To address race conditions in test automation, it is essential to implement strategies that ensure tests are executed in a controlled and predictable manner.
Symptoms You’ll See
- Intermittent Test Failures: Tests that pass and fail unpredictably.
- Timing Issues: Tests that fail due to timing issues, such as waiting for elements to load or become interactive.
- Inconsistent Test Results: Tests that produce different results when run multiple times.
- Resource Contention: Tests that fail due to contention for shared resources, such as databases or file systems.
- Order-Dependent Failures: Tests that fail when run in a specific order but pass when run independently.
- Environment-Specific Failures: Tests that pass in one environment (e.g., local) but fail in another (e.g., CI/CD pipeline).
Error Messages
- Timeout 30000ms exceeded while waiting for element to be clickable
- Element not visible / not attached to the DOM
- Stale Element Reference: The element you are trying to interact with is no longer attached to the DOM.
- Element is not interactable
- Network request failed or timed out
- Database deadlock or timeout errors
- File access errors (e.g., file is locked or in use by another process)
- Unexpected application state (e.g., modal not open, form not submitted)
Common Root Causes
AJAX/Fetch Calls Still Pending
When tests interact with web applications that use AJAX or Fetch API for asynchronous data loading, they may attempt to interact with elements before the data has fully loaded. This can lead to tests failing because the expected elements are not yet present in the DOM.
Animations/Transitions
Animations and transitions can cause elements to be in an intermediate state when tests attempt to interact with them. For example, a button may be in the process of fading in or sliding into view, making it unclickable at the moment the test tries to click it.
Stale Elements
React/Vue/Angular re-render detached DOM nodes. When the application state changes, elements that were previously referenced in the test may no longer be valid. This can lead to tests failing with "stale element" errors when they try to interact with these outdated references.
