Google Testing Blog: March 2021

Testing Blog

Test Flakiness - One of the main challenges of automated testing (Part II)

Wednesday, March 24, 2021

This is part two of a series on test flakiness. The first article discussed the four components under which tests are run and the possible reasons for test flakiness. This article will discuss the triage tips and remedies for flakiness for each of these possible reasons.
Components
To review, the four components where flakiness can occur include:

The tests themselves
The test-running framework
The application or system under test (SUT) and the services and libraries that the SUT and testing framework depend upon
The OS and hardware and network that the SUT and testing framework depend upon

This was captured and summarized in the following diagram.

The reasons, triage tips, and remedies for flakiness are discussed below, by component.

The tests themselves

The tests themselves can introduce flakiness. This can include test data, test workflows, initial setup of test prerequisites, and initial state of other dependencies.

Reason for Flakiness	Tips for Triaging	Type of Remedy
Improper initialization or cleanup.	Look for compiler warnings about uninitialized variables. Inspect initialization and cleanup code. Check that the environment is set up and torn down correctly. Verify that test data is correct.	Explicitly initialize all variables with proper values before their use. Properly set up and tear down the testing environment. Consider an initial test that verifies the state of the environment.
Invalid assumptions about the state of test data.	Rerun test(s) independently.	Make tests independent of any state from other tests and previous runs.
Invalid assumptions about the state of the system, such as the system time.	Explicitly check for system dependency assumptions.	Remove or isolate the SUT dependencies on aspects of the environment that you do not control.
Dependencies on execution time, expecting asynchronous events to occur in a specific order, waiting without timeouts, or race conditions between the tests and the application.	Log the times when accesses to the application are made. As part of debugging, introduce delays in the application to check for differences in test results.	Add synchronization elements to the tests so that they wait for specific application states. Disable unnecessary caching to have a predictable timeline for the application responses. Note: Do NOT add arbitrary delays as these can become flaky again over time and slow down the test unnecessarily.
Dependencies on the order in which the tests are run. (Similar to the second case above.)	Rerun test(s) independently.	Make tests independent of each other and of any state from previous runs.

Table 1 - Reasons, triage tips, and remedies for flakiness in the tests themselves

The test-running framework

An unreliable test-running framework can introduce flakiness.

Reason for Flakiness	Tips for Triaging	Type of Remedy
Failure to allocate enough resources for the SUT, thus preventing it from running.	Check logs to see if SUT came up.	Allocate sufficient resources.
Improper scheduling of the tests so they “collide” and cause each other to fail.	Explicitly run tests independently in different order.	Make tests runnable independently of each other.
Insufficient system resources to satisfy the test requirements. (Similar to the first case but here resources are consumed while running the workflow.)	Check system logs to see if SUT ran out of resources.	Fix memory leaks or similar resource “bleeding.” Allocate sufficient resources to run tests.

Table 2 - Reasons, triage tips, and remedies for flakiness in the test running framework

The application or SUT and the services and libraries that the SUT and testing framework depend upon

Of course, the application itself (or the SUT) could be the source of flakiness.

An application can also have numerous dependencies on other services, and each of those services can have their own dependencies. In this chain, each of the services can introduce flakiness.

Reason for Flakiness	Tips for Triaging	Type of Remedy
Race conditions.	Log accesses of shared resources.	Add synchronization elements to the tests so that they wait for specific application states. Note: Do NOT add arbitrary delays as these can become flaky again over time.
Uninitialized variables.	Look for compiler warnings about uninitialized variables.	Explicitly initialize all variables with proper values before their use.
Being slow to respond or being unresponsive to the stimuli from the tests.	Log the times when requests and responses are made.	Check and remove any causes for delays.
Memory leaks.	Look at memory consumption during test runs. Use tools such as Valgrind to detect.	Fix programming error causing memory leak. This Wikipedia article has an excellent discussion on these types of errors.
Oversubscription of resources.	Check system logs to see if SUT ran out of resources.	Allocate sufficient resources to run tests.
Changes to the application (or dependent services) out of sync with the corresponding tests.	Examine revision history.	Institute a policy requiring code changes to be accompanied by tests.

Table 3 - Reasons, triage tips, and remedies for flakiness in the application or SUT

The OS and hardware that the SUT and testing framework depend upon

Finally, the underlying hardware and operating system can be sources of test flakiness.

Reason for Flakiness	Tips for Triaging	Type of Remedy
Networking failures or instability.	Check for hardware errors in system logs.	Fix hardware errors or run tests on different hardware.
Disk errors.	Check for hardware errors in system logs.	Fix hardware errors or run tests on different hardware.
Resources being consumed by other tasks/services not related to the tests being run.	Examine system process activity.	Reduce activity of other processes on test system(s).

Table 4 - Reasons, triage tips, and remedies for flakiness in the OS and hardware of the SUT

Conclusion

As can be seen from the wide variety of failures, having low flakiness in automated testing can be quite a challenge. This article has outlined both the components under which tests are run and the types of flakiness that can occur, and thus can serve as a cheat sheet when triaging and fixing flaky tests.

References

Where do our flaky tests come from?
Flaky Tests at Google and How We Mitigate Them
My Selenium Tests Aren't Stable!
TotT: Avoiding Flakey Tests
Test Flakiness - One of the main challenges of automated testing

3 comments

Google

Labels: George Pirocanac , Test Flakiness

Labels

TotT 104
GTAC 61
James Whittaker 42
Misko Hevery 32
Code Health 31
Anthony Vallone 27
Patrick Copeland 23
Jobs 18
Andrew Trenk 13
C++ 11
Patrik Höglund 8
JavaScript 7
Allen Hutchison 6
George Pirocanac 6
Zhanyong Wan 6
Harry Robinson 5
Java 5
Julian Harty 5
Adam Bender 4
Alberto Savoia 4
Ben Yu 4
Erik Kuefler 4
Philip Zembrod 4
Shyam Seshadri 4
Chrome 3
Dillon Bly 3
John Thomas 3
Lesley Katzen 3
Marc Kaplan 3
Markus Clermont 3
Max Kanat-Alexander 3
Sonal Shah 3
APIs 2
Abhishek Arya 2
Alan Myrvold 2
Alek Icev 2
Android 2
April Fools 2
Chaitali Narla 2
Chris Lewis 2
Chrome OS 2
Diego Salas 2
Dori Reuveni 2
Jason Arbon 2
Jochen Wuttke 2
Kostya Serebryany 2
Marc Eaddy 2
Marko Ivanković 2
Mobile 2
Oliver Chang 2
Simon Stewart 2
Stefan Kennedy 2
Test Flakiness 2
Titus Winters 2
Tony Voellm 2
WebRTC 2
Yiming Sun 2
Yvette Nameth 2
Zuri Kemp 2
Aaron Jacobs 1
Adam Porter 1
Adam Raider 1
Adel Saoud 1
Alan Faulkner 1
Alex Eagle 1
Amy Fu 1
Anantha Keesara 1
Antoine Picard 1
App Engine 1
Ari Shamash 1
Arif Sukoco 1
Benjamin Pick 1
Bob Nystrom 1
Bruce Leban 1
Carlos Arguelles 1
Carlos Israel Ortiz García 1
Cathal Weakliam 1
Christopher Semturs 1
Clay Murphy 1
Dagang Wei 1
Dan Maksimovich 1
Dan Shi 1
Dan Willemsen 1
Dave Chen 1
Dave Gladfelter 1
David Bendory 1
David Mandelberg 1
Derek Snyder 1
Diego Cavalcanti 1
Dmitry Vyukov 1
Eduardo Bravo Ortiz 1
Ekaterina Kamenskaya 1
Elliott Karpilovsky 1
Elliotte Rusty Harold 1
Espresso 1
Felipe Sodré 1
Francois Aube 1
Gene Volovich 1
Google+ 1
Goran Petrovic 1
Goranka Bjedov 1
Hank Duan 1
Havard Rast Blok 1
Hongfei Ding 1
Jason Elbaum 1
Jason Huggins 1
Jay Han 1
Jeff Hoy 1
Jeff Listfield 1
Jessica Tomechak 1
Jim Reardon 1
Joe Allan Muharsky 1
Joel Hynoski 1
John Micco 1
John Penix 1
Jonathan Rockway 1
Jonathan Velasquez 1
Josh Armour 1
Julie Ralph 1
Kai Kent 1
Kanu Tewary 1
Karin Lundberg 1
Kaue Silveira 1
Kevin Bourrillion 1
Kevin Graney 1
Kirkland 1
Kurt Alfred Kluever 1
Manjusha Parvathaneni 1
Marek Kiszkis 1
Marius Latinis 1
Mark Ivey 1
Mark Manley 1
Mark Striebeck 1
Matt Lowrie 1
Meredith Whittaker 1
Michael Bachman 1
Michael Klepikov 1
Mike Aizatsky 1
Mike Wacker 1
Mona El Mahdy 1
Noel Yap 1
Palak Bansal 1
Patricia Legaspi 1
Per Jacobsson 1
Peter Arrenbrecht 1
Peter Spragins 1
Phil Norman 1
Phil Rollet 1
Pooja Gupta 1
Project Showcase 1
Radoslav Vasilev 1
Rajat Dewan 1
Rajat Jain 1
Rich Martin 1
Richard Bustamante 1
Roshan Sembacuttiaratchy 1
Ruslan Khamitov 1
Sam Lee 1
Sean Jordan 1
Sebastian Dörner 1
Sharon Zhou 1
Shiva Garg 1
Siddartha Janga 1
Simran Basi 1
Stan Chan 1
Stephen Ng 1
Tejas Shah 1
Test Analytics 1
Test Engineer 1
Tim Lyakhovetskiy 1
Tom O'Neill 1
Vojta Jína 1
automation 1
dead code 1
iOS 1
mutation testing 1

Archive

► 2025 (1)
- ► Jan (1)

► 2024 (13)
- ► Dec (1)
- ► Oct (1)
- ► Sep (1)
- ► Aug (1)
- ► Jul (1)
- ► May (3)
- ► Apr (3)
- ► Mar (1)
- ► Feb (1)

► 2023 (14)
- ► Dec (2)
- ► Nov (2)
- ► Oct (5)
- ► Sep (3)
- ► Aug (1)
- ► Apr (1)

► 2022 (2)
- ► Feb (2)

▼ 2021 (3)
- ► Jun (1)
- ► Apr (1)
- ▼ Mar (1)
  - Test Flakiness - One of the main challenges of aut...

► 2020 (8)
- ► Dec (2)
- ► Nov (1)
- ► Oct (1)
- ► Aug (2)
- ► Jul (1)
- ► May (1)

► 2019 (4)
- ► Dec (1)
- ► Nov (1)
- ► Jul (1)
- ► Jan (1)

► 2018 (7)
- ► Nov (1)
- ► Sep (1)
- ► Jul (1)
- ► Jun (2)
- ► May (1)
- ► Feb (1)

► 2017 (17)
- ► Dec (1)
- ► Nov (1)
- ► Oct (1)
- ► Sep (1)
- ► Aug (1)
- ► Jul (2)
- ► Jun (2)
- ► May (3)
- ► Apr (2)
- ► Feb (1)
- ► Jan (2)

► 2016 (15)
- ► Dec (1)
- ► Nov (2)
- ► Oct (1)
- ► Sep (2)
- ► Aug (1)
- ► Jun (2)
- ► May (3)
- ► Apr (1)
- ► Mar (1)
- ► Feb (1)

► 2015 (14)
- ► Dec (1)
- ► Nov (1)
- ► Oct (2)
- ► Aug (1)
- ► Jun (1)
- ► May (2)
- ► Apr (2)
- ► Mar (1)
- ► Feb (1)
- ► Jan (2)

► 2014 (24)
- ► Dec (2)
- ► Nov (1)
- ► Oct (2)
- ► Sep (2)
- ► Aug (2)
- ► Jul (3)
- ► Jun (3)
- ► May (2)
- ► Apr (2)
- ► Mar (2)
- ► Feb (1)
- ► Jan (2)

► 2013 (16)
- ► Dec (1)
- ► Nov (1)
- ► Oct (1)
- ► Aug (2)
- ► Jul (1)
- ► Jun (2)
- ► May (2)
- ► Apr (2)
- ► Mar (2)
- ► Jan (2)

► 2012 (11)
- ► Dec (1)
- ► Nov (2)
- ► Oct (3)
- ► Sep (1)
- ► Aug (4)

► 2011 (39)
- ► Nov (2)
- ► Oct (5)
- ► Sep (2)
- ► Aug (4)
- ► Jul (2)
- ► Jun (5)
- ► May (4)
- ► Apr (3)
- ► Mar (4)
- ► Feb (5)
- ► Jan (3)

► 2010 (37)
- ► Dec (3)
- ► Nov (3)
- ► Oct (4)
- ► Sep (8)
- ► Aug (3)
- ► Jul (3)
- ► Jun (2)
- ► May (2)
- ► Apr (3)
- ► Mar (3)
- ► Feb (2)
- ► Jan (1)

► 2009 (54)
- ► Dec (3)
- ► Nov (2)
- ► Oct (3)
- ► Sep (5)
- ► Aug (4)
- ► Jul (15)
- ► Jun (8)
- ► May (3)
- ► Apr (2)
- ► Feb (5)
- ► Jan (4)

► 2008 (75)
- ► Dec (6)
- ► Nov (8)
- ► Oct (9)
- ► Sep (8)
- ► Aug (9)
- ► Jul (9)
- ► Jun (6)
- ► May (6)
- ► Apr (4)
- ► Mar (4)
- ► Feb (4)
- ► Jan (2)

► 2007 (41)
- ► Oct (6)
- ► Sep (5)
- ► Aug (3)
- ► Jul (2)
- ► Jun (2)
- ► May (2)
- ► Apr (7)
- ► Mar (5)
- ► Feb (5)
- ► Jan (4)

Feed

Google
Privacy
Terms