Reason for Flakiness
Tips for Triaging
Type of Remedy
Improper initialization or cleanup.
Look for compiler warnings about uninitialized variables. Inspect initialization and cleanup code. Check that the environment is set up and torn down correctly. Verify that test data is correct.
Explicitly initialize all variables with proper values before their use.
Properly set up and tear down the testing environment. Consider an initial test that verifies the state of the environment.
Invalid assumptions about the state of test data.
Rerun test(s) independently.
Make tests independent of any state from other tests and previous runs.
Invalid assumptions about the state of the system, such as the system time.
Explicitly check for system dependency assumptions.
Remove or isolate the SUT dependencies on aspects of the environment that you do not control.
Dependencies on execution time, expecting asynchronous events to occur in a specific order, waiting without timeouts, or race conditions between the tests and the application.
Log the times when accesses to the application are made.
As part of debugging, introduce delays in the application to check for differences in test results.
Add synchronization elements to the tests so that they wait for specific application states. Disable unnecessary caching to have a predictable timeline for the application responses.
Note: Do NOT add arbitrary delays as these can become flaky again over time and slow down the test unnecessarily.
Dependencies on the order in which the tests are run. (Similar to the second case above.)
Make tests independent of each other and of any state from previous runs.
Failure to allocate enough resources for the SUT, thus preventing it from running.
Check logs to see if SUT came up.
Allocate sufficient resources.
Improper scheduling of the tests so they “collide” and cause each other to fail.
Explicitly run tests independently in different order.
Make tests runnable independently of each other.
Insufficient system resources to satisfy the test requirements. (Similar to the first case but here resources are consumed while running the workflow.)
Check system logs to see if SUT ran out of resources.
Fix memory leaks or similar resource “bleeding.”
Allocate sufficient resources to run tests.
Race conditions.
Log accesses of shared resources.
Add synchronization elements to the tests so that they wait for specific application states. Note: Do NOT add arbitrary delays as these can become flaky again over time.
Uninitialized variables.
Look for compiler warnings about uninitialized variables.
Being slow to respond or being unresponsive to the stimuli from the tests.
Log the times when requests and responses are made.
Check and remove any causes for delays.
Memory leaks.
Look at memory consumption during test runs. Use tools such as Valgrind to detect.
Fix programming error causing memory leak. This Wikipedia article has an excellent discussion on these types of errors.
Oversubscription of resources.
Changes to the application (or dependent services) out of sync with the corresponding tests.
Examine revision history.
Institute a policy requiring code changes to be accompanied by tests.
Networking failures or instability.
Check for hardware errors in system logs.
Fix hardware errors or run tests on different hardware.
Disk errors.
Resources being consumed by other tasks/services not related to the tests being run.
Examine system process activity.
Reduce activity of other processes on test system(s).
This article was adapted from a Google Testing on the Toilet (TotT) episode. You can download a printer-friendly version of this TotT episode and post it in your office.
SpeedyImgImage decodeImage(List<SpeedyImgDecoder> decoders, byte[] data) { SpeedyImgOptions options = getDefaultConvertOptions(); for (SpeedyImgDecoder decoder : decoders) { SpeedyImgResult decodeResult = decoder.decode(decoder.formatBytes(data)); SpeedyImgImage image = decodeResult.getImage(options); if (validateGoodImage(image)) { return image; } } throw new RuntimeException(); }
Image decodeImage(List<ImageDecoder> decoders, byte[] data) { for (ImageDecoder decoder : decoders) { Image decodedImage = decoder.decode(data); if (validateGoodImage(decodedImage)) { return decodedImage; } } throw new RuntimeException(); }
“Separation of Concerns” in the context of external APIs is also described by Martin Fowler in his blog post, Refactoring code that accesses external services.
describe('Terms of service are handled', () => { it('accepts terms of service', async () => { const user = getUser('termsNotAccepted'); await login(user); await see(termsOfServiceDialog()); await click('Accept') await logoff(); await login(user); await not.see(termsOfServiceDialog()); }); });
describe('Terms of service are handled', () => { it('accepts terms of service', async () => { const user = getUser('someUser'); await hook('TermsOfService.Get()', true); await login(user); await see(termsOfServiceDialog()); await click('Accept') await logoff(); await hook('TermsOfService.Get()', false); await login(user); await not.see(termsOfServiceDialog()); }); });
public class FakeTermsOfService implements TermsOfService.Service { private static final Map<String, Boolean> accepted = new ConcurrentHashMap<>(); @Override public TosGetResponse get(TosGetRequest req) { return accepted.getOrDefault(req.UserID(), Boolean.FALSE); } @Override public void accept(TosAcceptRequest req) { accepted.put(req.UserID(), Boolean.TRUE); } }
describe('Terms of service are handled', () => {
it('accepts terms of service', async () => { const user = getUser('termsNotAccepted'); await login(user); await see(termsOfServiceDialog()); await click('Accept') await logoff(); await login(user); await not.see(termsOfServiceDialog()); }); });
<button disabled click=”$handleBuyClick(data)”>Buy</button>
it('submits purchase request', () => { controller = new PurchasePage(); // Call the method that handles the "Buy" button click controller.handleBuyClick(data); expect(service).toHaveBeenCalledWith(expectedData); });
it('submits purchase request', () => { // Renders the page with the “Buy” button and its associated code. render(PurchasePage); // Tries to click the button, fails the test, and catches the bug! buttonWithText('Buy').dispatchEvent(new Event(‘click’)); expect(service).toHaveBeenCalledWith(expectedData); });