Google Testing Blog: Test Flakiness - One of the main challenges of automated testing

Test Flakiness - One of the main challenges of automated testing

Wednesday, December 16, 2020

Dealing with test flakiness is a critical skill in testing because automated tests that do not provide a consistent signal will slow down the entire development process. If you haven’t encountered flaky tests, this article is a must-read as it first tries to systematically outline the causes for flaky tests. If you have encountered flaky tests, see how many fall into the areas listed.

A follow-up article will talk about dealing with each of the causes.

Over the years I’ve seen a lot of reasons for flaky tests, but rather than review them one by one, let’s group the sources of flakiness by the components under which tests are run:

The tests themselves
The test-running framework
The application or system under Test (SUT) and the services and libraries that the SUT and testing framework depend upon
The OS and hardware that the SUT and testing framework depend upon

This is illustrated below. Figure 1 first shows the hardware/software stack that supports an application or system under test. At the lowest level is the hardware. The next level up is the operating system followed by the libraries that provide an interface to the system. At the highest level, is the middleware, the layer that provides application specific interfaces.

In a distributed system, however, each of the services of the application and the services it depends upon can reside on a different hardware / software stack as can the test running service. This is illustrated in Figure 2 as the full test running environment.

As discussed above, each of these components is a potential area for flakiness.

The tests themselves

The tests themselves can introduce flakiness. Typical causes include:

Improper initialization or cleanup.
Invalid assumptions about the state of test data.
Invalid assumptions about the state of the system. An example can be the system time.
Dependencies on the timing of the application.
Dependencies on the order in which the tests are run. (Similar to the first case above.)

The test-running framework
An unreliable test-running framework can introduce flakiness. Typical causes include:

Failure to allocate enough resources for the system under test thus causing it to fail coming up.
Improper scheduling of the tests so they “collide” and cause each other to fail.
Insufficient system resources to satisfy the test requirements.

The application or system under test and the services and libraries that the SUT and testing framework depend upon

Of course, the application itself (or the system under test) could be the source of flakiness. An application can also have numerous dependencies on other services, and each of those services can have their own dependencies. In this chain, each of the services can introduce flakiness. Typical causes include:

Race conditions.
Uninitialized variables.
Being slow to respond or being unresponsive to the stimuli from the tests.
Memory leaks.
Oversubscription of resources.
Changes to the application (or dependent services) happening at a different pace than those to the corresponding tests.

Testing environments are called hermetic when they contain everything that is needed to run the tests (i.e. no external dependencies like servers running in production). Hermetic environments, in general, are less likely to be flaky.
The OS and hardware that the SUT and testing framework depend upon

Finally, the underlying hardware and operating system can be the source of test flakiness. Typical causes include:

Networking failures or instability.
Disk errors.
Resources being consumed by other tasks/services not related to the tests being run.

As can be seen from the wide variety of failures, having low flakiness in automated testing can be quite a challenge. This article has both outlined the areas and the types of flakiness that can occur in those areas, so it can serve as a cheat sheet when triaging flaky tests.

In the follow-up of this blog we’ll look at ways of addressing these issues.

References

Where do our flaky tests come from?
Flaky Tests at Google and How We Mitigate Them
My Selenium Tests Aren't Stable!
TotT: Avoiding Flakey Tests

9 comments :

Wayne RoseberryDecember 16, 2020 at 4:09:00 PM PST
What I would like to see is a breakdown of how many failures fall into which category.

And of the above categories, while useful for root cause analysis and eventual fix, for sake of triaging results, and disposition of what to do if hitting such a failure, it seems that whether or not the failure is a true product failure is a HUGE difference from the other three.

For the other three, the major risk is cost in time and compute.
For product failure, the major risk is that the bug might escape if it is ignored, misunderstood, or dismissed.

This suggests that if we could be very good at determining if a failure is in product versus one of the other three categories we can respond differently to the failure when we see it. For non-product category, run the test again and if it passes again consider the result passing. Capture the issue for sake of engineering system cost and capacity - but at least you have saved an engineer some confusion and time. For product category, running the test again does not give us any assurance other than understanding its intermittent nature.

The trick, I believe, is getting very good at detecting the difference between product failure and non-product failure. If we could do that with high confidence, we would have a way of saving considerable engineer time.
ReplyDelete
Replies
Jason RudolphDecember 16, 2020 at 5:36:00 PM PST
> This article has both outlined the areas and the types of flakiness that can occur in those areas, so it can serve as a cheat sheet when triaging flaky tests.

Speaking of triaging, as the number of flaky tests in the code base grows, I've often found it becomes laborious to reliably keep track of them. And even if you can keep track of them, you then need to determine which ones are causing the biggest problems so you can focus on them first. Teams frequently start by trying to track this info in issues or a spreadsheet, but nobody really _wants_ to do that, so (in my experience) everyone eventually stops doing it, and now you're back to square one. ��

After experiencing this too many times, I set out to offer a way for teams to automatically detect, track, and rank flaky tests: https://buildpulse.io

Sharing here in case any other readers are in the same boat that I've found myself in so many times in the past. ��
ReplyDelete
Replies
Greg PaskalDecember 17, 2020 at 5:09:00 AM PST
Thank you for sharing your article George. From my experience, Flaky Tests originate from three fundamental problems.

1) Synchronization issues - A synchronization issue comes from not having a precise understanding of the environment's state. Most Automation Engineers would reduce most of their flaky tests by mastering the following four synchronizations (translated into automation code).
1) Does an object exist at this exact moment in time?
2) Does an object not-exist at this exact moment in time?
3) Does an object exist in this maximum amount of time, rechecking on this interval of time?
4) Does an object not-exist in this maximum amount of time, rechecking on this interval of time?

These four fundamental synchronization methods will make a significant difference in reducing the flakiness of automation.

2) Object Locator Strategy - There are so many different ways to identify an object. Each test engineer generally has their favorites, and I have mine as well. Regardless of approach, your locator strategy should be testable, without the need to see it work in the running automation. Chrome Developer Tools provides an excellent way to debug locators, allowing the automation engineer to tune and refine them before implementing them within the automation.

3) Automation Evaluation - The automation Engineer should have an approach to evaluate what should and should not be automated. Developing automation can be like playing with Lego's. An undisciplined Automation Engineer can be tempted to automate everything in sight, regardless of it being the right candidate for automation. -

Here's a good article on these topics - https://testguild.com/podcast/automation/a278-greg-max/
ReplyDelete
Replies
oleksandr.podoliakoJanuary 5, 2021 at 9:06:00 AM PST
> What I would like to see is a breakdown of how many failures fall into which category.

I conducted a small survey, which partially answers to the question. The interesting part is that often the main reason of untenability is env issues. The survey results can be viewed by link - https://docs.google.com/forms/d/1yMedOYcnA8VBuL-ROfcimv9v21wmiRB8xfjN-np4UHM/viewanalytics
ReplyDelete
Replies
The voyagerJanuary 12, 2021 at 5:48:00 AM PST
This is very nicely written and includes everything that we encounter on a usual test automation day ! I am looking forward to an equally nice article to fix these causes of flakiness or at least some creative workaround to prevent the same. I would also like to share (out of my experiences)that automation tools can also act intrusive and induce flakiness as well.
ReplyDelete
Replies
KrishnaApril 8, 2021 at 2:37:00 AM PDT
Thank you George for your detailed description and categorization of reasons causing the flaky tests.

On a seperate note, I do have a request - It would be greatly helpful if you can let us know your thoughts on the shift-left paradigm of software testing, where we are trying to emphasize on Unit and Integration Tests. Particularly, I'm very interested to know your thoughts on the value additions and defect coverage that Unit and Integration brings into the product.
ReplyDelete
Replies
vasanthaJune 3, 2021 at 9:59:00 AM PDT
One of the main challenges of automated testing. Dealing with test flakiness is a critical skill in testing because automated tests that do not provide a consistent signal will slow down the entire development process... check here https://www.h2kinfosys.com/blog/quality-assurance-tutorials/
ReplyDelete
Replies
Insfluencer WorldAugust 29, 2021 at 8:31:00 PM PDT
This is very nicely written and includes everything that we encounter on a usual test automation day ! I am looking forward to an equally nice article to fix these causes of flakiness or at least some creative workaround to prevent the same. I would also like to share (out of my experiences)that automation tools can also act intrusive and induce flakiness as well.
ReplyDelete
Replies

Add comment

New comments are not allowed.

Testing Blog

Test Flakiness - One of the main challenges of automated testing

9 comments :

Labels

Archive

Feed