Few questions : 1) Is google having any manual testing team ? or everything is automated ? If google has manual testing team then is it possible to have one blog on how the manual testing is managed by google ?
2) Also how we measure the automation team's performance in google ? More specifically how to count % automation testing done for a project in google
1) Google does have some degree of manual testing, though we try to focus human effort on finding interesting problems with our products. We try to avoid using manual testing as a way to catch regressions or verify basic functionality. You can probably imagine would kind of effort it would take to exhaustively test something like Google Search. For us, more automation is always the goal.
2) There are a lot of ways to measure how well automated your testing effort and it varies from team-to-team on which ones are more important. Though I should mention that we also try to avoid large 'test automation teams' - we treat testing as a shared responsibility and wherever possible ask the developers building the products to be involved in developing the associated automated tests.
In any case, here are a few ways I think we look at automation performance:
* How often are you finding bugs in manual tests if you have them? More bugs during manual testing suggests opportunities for improved automation.
* How many defect reports are you getting from the field? Same logic as the above measure.
* If you have identified critical use cases, to what degree is each use case covered by automation (unit, integration or functional tests)? Some things make more sense to invest in heavily.
* How long does it take to release your product with a full test-cycle? Longer releases are often bottlenecked by slow manual testing, replacing it with automation will allow you to release faster.
* How quickly and confidently are you able to make large scale changes? The more fear you have about change, the more likely you need additional automation!
Good information, I have one question, how do you track how much is automated from a critical use case at which test level, and what part is Manual. How the consolidation of results happen to gain confidence?
First, we expect all code to be thoroughly unit tested so hopefully when it comes to small, fast, tests we can assume that individual code paths have been covered completely. We can use coverage-analysis to verify whether or not that is true.
Unfortunately, at this level there is a little 'forest-from-the-trees' problem as many lines of code will be part of more than one feature. We don't usually make an effort to connect unit tests to critical feature coverage.
When tests get larger and more distinct we have some internal test tracking tools that we can use to define user-level behaviors and then indicate tested or not and automated or not.
When it comes to confidence, teams I have worked with tend to focus on things like defect rate, release failure rate and release time as high level metrics indicating overall code health.
Sure, but I don't think we are doing anything special here.
Defect rate: The rate of new bugs being filed, either from users or in response to our own monitoring and quality processes.
Release Failure Rate: Of the builds we intend to land in production, what percentage actually make it and stick. Common causes of release failure include: pre-release integration/e2e test failing, automatic monitoring detecting an issue, or an uptick user reported bugs. There is some nuance here because we (or you) can count things like cherry picks into a prod release a a failure or something else.
Release time: How long does it take for your release process to run? Big complex tests tend to take a lot longer and that can become problematic over time. If it gets really bad the release time can actually prevent you from releasing as fast as you want. Imagine you have a test that takes 6 hours to run. If you assume a normal 8 hour day and a team that is located in a single timezone, the length of that one test means you can only release once per business day. Maybe you don't want to go faster on principle, but you almost certainly don't want one test to foreclose the opportunity to go faster.
First, consider that at our scale, Google's products are the result of many, many services working together. Keeping an E2E test stable when it involves 50+ service dependencies, many of which are being modified on a daily basis, is quite a task. It can take real engineering effort and often management support to ensure everyone responsible for those services is committed to enabling automated testing. Even if everyone involved is aiming for the same goal there is still going to be time spent making sure all this stuff actually works.
As a result we do often recommend that teams employ fully E2E tests sparingly. We encourage teams to use unit and smaller integration testing much more heavily than E2E testing. As the number of systems involved in a test goes down they are faster, more reliable, easier to debug and less costly to maintain.
If your products have fewer, or slower changing dependencies you may find that a week per-quarter is too high. However, a complex test is still software and will require maintenance effort proportional to its complexity.
Each team will vary a little bit here based on the runtime of the test. Ideally, we would like to run E2E tests before every commit. However, because E2E tests are often quite slow we have to run them at a slightly lower frequency. Many teams will run these tests triggered by a commit, but after the code lands in the repo, waiting no more than a couple hours to trigger a run. Other teams will run a scheduled job every N hours to run all the known E2E tests.
We tend to decouple the testing from the deployment and instead wait for a signal from our continuous integration tool letting us know that a particular commit is safe for deployment because all applicable tests have been run against it.
When a unit test you can look at the failure and often hop right to the exact line of code causing the problem. Unfortunately, in an E2E test there are often multiple things conspiring to fail your test, so such precision in fault-finding is usually unachievable. We often encourage those who write the tests to include human readable documentation, either in test comments or on the internal team wiki documenting known reasons for a test to fail.
For example, in the system outlined above we are dependent on an authentication system to complete our test. In some cases the authentication system may report an error authenticating a user because of a timeout in one of its upstream dependencies. If this happens more than once the engineer working on the test might write something like this on the internal team page:
"If you see error logs indicating a timeout in the auth system, you can assume this was down to quota issues in bigtable storage."
From then on, any one tasked with fixing the tests can refer to this note when attempting to diagnose a failure.
Great article! Quick question: does Google have automated UI tests? If yes, how is been decided which UI tests to automate, when are they run, etc.? Thanks!
Hey Maki, glad you enjoyed it. Yes Google has lots of automated UI tests. We use a variety of different tools depending on the platforms we are targeting (Android, iOS, desktop, or mobile web). To decide which things to test via the UI we often look at a combination of things:
* Can this test be written as a unit test and be just as effective? If so write it as a unit test. The bigger a test, and the more infrastructure involved, the harder it will be to make the test reliable.
* Do I care how the UI looks (pixel for pixel) as a result of this one behavior? If so then use a screenshot-based testing tool, but constrain the viewport as much as possible to only focus on the elements that are relevant to the specific behavior.
* If I don't care about pixel-for-pixel looks then I might use something like Selenium/WebDriver to drive the test. These tests take about as long as a screenshoot test but are often more robust against change.
With those last two bullets it pays to be a little careful, it can tempting to throw everything into a UI test but many times those tests are very slow and can become flaky easily. Make sure you use good practices like 'waiting' instead of just asserting and avoid testing every possible behavior with a UI test. In many cases different behaviors only affect the UI minimally and sometimes a single test can tell you about how a whole class of behaviors will look (think password validation rules). Finally, we run these tests all the time, ideally with each change to the code base.
Having a debate on testing error messages.. I say no. Waste of time and brittle. Perhaps check on the 'presence' of a message but certainly not the text itself. Any thoughts?
I've learned to avoid absolutes so when it comes to 'testing error messages' I would say "it depends" :)
First of all we need to be clear who the intended audience of the error message is. If it is an end user, then I might want to test it precisely because that message may contain information to be passed onto customer support. Even more so if i18n & a11y are required.
If the audience is developers (command line or debug console), then unless the error message format is critical to being able to diagnose and troubleshoot a problem, Im with you, confirming the existence of the message may be enough.
Regardless, when checking strings it is best to be very precise. Instead of trying to match for entire strings, you should check that the string contains the specific information (error code, account #) etc. Normally checking this kind of data amounts to validating that a template was filled in correctly and can allows the test to be made fairly resilient to formatting and wording changes.
I am trying to figure out how to integrate a testing tool in my CI/CD pipeline to automate tests for my project, currently we do tests manually, all of them, unit, service, UI tests.
So my questions are: What are the best ways to automate E2E testing? What are the main tips and tricks? Which tools are the best to integrate in CI/CD pipeline? Is it even possible to automate testing completely in my CI/CD?
Hey Stefan apologies for the delayed response here! You asked a lot of really good questions, unfortunately, I'm probably not going to have a satisfying answer. There are whole virtual bookshelves on Amazon that try to answer these questions. Choosing technologies, adoption strategies, and trading off on test coverage are all hard problems that are often very specific to your team or company. So my first piece of advice is go pick up some books and hunt around on the twitter or the rest of the web for people who've written about this subject.
One thing to keep in mind is that you want lots of opinions. There isn't one right way to tackle these kinds of problems so make sure to seek our sources that seem to disagree with each other. Figure out why, then think about how that might apply to your situation.
Next, never lose track of the ultimate goal here. Your job is probably to ship working software. E2E tests, CI, CD - they are a tools that can help you with that goal, but you need to define what "working" means, and then you need ot choose strategies that make that happen. It could be SLOs, it could be counting bug reports, or performance metrics - whatever it is, start by defining "working" for your team and your company. Then use automated testing to help get there.
After that, keep in mind trade offs - if given enough time you could conceivably test every behavior in the system. However, you will eventually need to ship something. You need to figure out the right balance of different test types including E2E tests, unit tests, exploratory tests - or whatever. That balance will be tuned to your companies needs. Again, keep in mind the goal to ship software, balance accordingly.
Last, don't be afraid to experiment. Try some ideas, see what works well and what doesn't, stop doing things that aren't working, make time to try new stuff. As you move through your experiments communicate actively with your team and your management. You don't need to change everything all at once, plan on running lots of little experiments over time.
Getting all of this up and running took Google years and years. We may have been bigger but you will face a lot of the same challenges. You should plan to spend a long time trying to work out the right approach for you and your products. That's ok, sometimes doing good things takes time :)
Oh and one more thing, we do cover this a fair amount in the book Software Engineering at Google, which you can read for free here: https://abseil.io/resources/swe_at_google.2.pdf
Few questions :
ReplyDelete1)
Is google having any manual testing team ? or everything is automated ?
If google has manual testing team then is it possible to have one blog on how the manual testing is managed by google ?
2) Also how we measure the automation team's performance in google ?
More specifically how to count % automation testing done for a project in google
1) Google does have some degree of manual testing, though we try to focus human effort on finding interesting problems with our products. We try to avoid using manual testing as a way to catch regressions or verify basic functionality. You can probably imagine would kind of effort it would take to exhaustively test something like Google Search. For us, more automation is always the goal.
Delete2) There are a lot of ways to measure how well automated your testing effort and it varies from team-to-team on which ones are more important. Though I should mention that we also try to avoid large 'test automation teams' - we treat testing as a shared responsibility and wherever possible ask the developers building the products to be involved in developing the associated automated tests.
In any case, here are a few ways I think we look at automation performance:
* How often are you finding bugs in manual tests if you have them? More bugs during manual testing suggests opportunities for improved automation.
* How many defect reports are you getting from the field? Same logic as the above measure.
* If you have identified critical use cases, to what degree is each use case covered by automation (unit, integration or functional tests)? Some things make more sense to invest in heavily.
* How long does it take to release your product with a full test-cycle? Longer releases are often bottlenecked by slow manual testing, replacing it with automation will allow you to release faster.
* How quickly and confidently are you able to make large scale changes? The more fear you have about change, the more likely you need additional automation!
Good information, I have one question, how do you track how much is automated from a critical use case at which test level, and what part is Manual. How the consolidation of results happen to gain confidence?
DeleteFirst, we expect all code to be thoroughly unit tested so hopefully when it comes to small, fast, tests we can assume that individual code paths have been covered completely. We can use coverage-analysis to verify whether or not that is true.
DeleteUnfortunately, at this level there is a little 'forest-from-the-trees' problem as many lines of code will be part of more than one feature. We don't usually make an effort to connect unit tests to critical feature coverage.
When tests get larger and more distinct we have some internal test tracking tools that we can use to define user-level behaviors and then indicate tested or not and automated or not.
When it comes to confidence, teams I have worked with tend to focus on things like defect rate, release failure rate and release time as high level metrics indicating overall code health.
Hi Adam, can you provide more details on how you measure defect rate, release failure rate and release time?
DeleteSure, but I don't think we are doing anything special here.
DeleteDefect rate: The rate of new bugs being filed, either from users or in response to our own monitoring and quality processes.
Release Failure Rate: Of the builds we intend to land in production, what percentage actually make it and stick. Common causes of release failure include: pre-release integration/e2e test failing, automatic monitoring detecting an issue, or an uptick user reported bugs. There is some nuance here because we (or you) can count things like cherry picks into a prod release a a failure or something else.
Release time: How long does it take for your release process to run? Big complex tests tend to take a lot longer and that can become problematic over time. If it gets really bad the release time can actually prevent you from releasing as fast as you want. Imagine you have a test that takes 6 hours to run. If you assume a normal 8 hour day and a team that is located in a single timezone, the length of that one test means you can only release once per business day. Maybe you don't want to go faster on principle, but you almost certainly don't want one test to foreclose the opportunity to go faster.
> Be prepared to allocate at least one week a quarter per test to keep your end-to-end tests stable
ReplyDeleteCould you please elaborate on this? If it takes a month of person-time per year per test, do you simply have very few (but very large?) E2E tests?
First, consider that at our scale, Google's products are the result of many, many services working together. Keeping an E2E test stable when it involves 50+ service dependencies, many of which are being modified on a daily basis, is quite a task. It can take real engineering effort and often management support to ensure everyone responsible for those services is committed to enabling automated testing. Even if everyone involved is aiming for the same goal there is still going to be time spent making sure all this stuff actually works.
DeleteAs a result we do often recommend that teams employ fully E2E tests sparingly. We encourage teams to use unit and smaller integration testing much more heavily than E2E testing. As the number of systems involved in a test goes down they are faster, more reliable, easier to debug and less costly to maintain.
If your products have fewer, or slower changing dependencies you may find that a week per-quarter is too high. However, a complex test is still software and will require maintenance effort proportional to its complexity.
Good info.
ReplyDeleteWhen do you run your E2E tests? is it triggered by a deployment or scheduled runs?
Each team will vary a little bit here based on the runtime of the test. Ideally, we would like to run E2E tests before every commit. However, because E2E tests are often quite slow we have to run them at a slightly lower frequency. Many teams will run these tests triggered by a commit, but after the code lands in the repo, waiting no more than a couple hours to trigger a run. Other teams will run a scheduled job every N hours to run all the known E2E tests.
DeleteWe tend to decouple the testing from the deployment and instead wait for a signal from our continuous integration tool letting us know that a particular commit is safe for deployment because all applicable tests have been run against it.
Could you provide more information about "documenting common test failure modes". Please, give some examples.
ReplyDeleteWhen a unit test you can look at the failure and often hop right to the exact line of code causing the problem. Unfortunately, in an E2E test there are often multiple things conspiring to fail your test, so such precision in fault-finding is usually unachievable. We often encourage those who write the tests to include human readable documentation, either in test comments or on the internal team wiki documenting known reasons for a test to fail.
DeleteFor example, in the system outlined above we are dependent on an authentication system to complete our test. In some cases the authentication system may report an error authenticating a user because of a timeout in one of its upstream dependencies. If this happens more than once the engineer working on the test might write something like this on the internal team page:
"If you see error logs indicating a timeout in the auth system, you can assume this was down to quota issues in bigtable storage."
From then on, any one tasked with fixing the tests can refer to this note when attempting to diagnose a failure.
Small typo above - should read "When a unit test fails you can look..."
DeleteGreat article! Quick question: does Google have automated UI tests? If yes, how is been decided which UI tests to automate, when are they run, etc.? Thanks!
ReplyDeleteHey Maki, glad you enjoyed it. Yes Google has lots of automated UI tests. We use a variety of different tools depending on the platforms we are targeting (Android, iOS, desktop, or mobile web). To decide which things to test via the UI we often look at a combination of things:
Delete* Can this test be written as a unit test and be just as effective? If so write it as a unit test. The bigger a test, and the more infrastructure involved, the harder it will be to make the test reliable.
* Do I care how the UI looks (pixel for pixel) as a result of this one behavior? If so then use a screenshot-based testing tool, but constrain the viewport as much as possible to only focus on the elements that are relevant to the specific behavior.
* If I don't care about pixel-for-pixel looks then I might use something like Selenium/WebDriver to drive the test. These tests take about as long as a screenshoot test but are often more robust against change.
With those last two bullets it pays to be a little careful, it can tempting to throw everything into a UI test but many times those tests are very slow and can become flaky easily. Make sure you use good practices like 'waiting' instead of just asserting and avoid testing every possible behavior with a UI test. In many cases different behaviors only affect the UI minimally and sometimes a single test can tell you about how a whole class of behaviors will look (think password validation rules). Finally, we run these tests all the time, ideally with each change to the code base.
Hope that helps!
Hi Adam,
ReplyDeleteGood stuff tx
Having a debate on testing error messages.. I say no. Waste of time and brittle. Perhaps check on the 'presence' of a message but certainly not the text itself. Any thoughts?
Thanks,
James
Hey James, I'm glad you enjoyed the article!
DeleteI've learned to avoid absolutes so when it comes to 'testing error messages' I would say "it depends" :)
First of all we need to be clear who the intended audience of the error message is. If it is an end user, then I might want to test it precisely because that message may contain information to be passed onto customer support. Even more so if i18n & a11y are required.
If the audience is developers (command line or debug console), then unless the error message format is critical to being able to diagnose and troubleshoot a problem, Im with you, confirming the existence of the message may be enough.
Regardless, when checking strings it is best to be very precise. Instead of trying to match for entire strings, you should check that the string contains the specific information (error code, account #) etc. Normally checking this kind of data amounts to validating that a template was filled in correctly and can allows the test to be made fairly resilient to formatting and wording changes.
Cheers Adam.. appreciate your timely reply
DeleteHi Adam,
ReplyDeleteI am trying to figure out how to integrate a testing tool in my CI/CD pipeline to automate tests for my project, currently we do tests manually, all of them, unit, service, UI tests.
So my questions are:
What are the best ways to automate E2E testing?
What are the main tips and tricks?
Which tools are the best to integrate in CI/CD pipeline?
Is it even possible to automate testing completely in my CI/CD?
Cheers,
Stefan
Hey Stefan apologies for the delayed response here! You asked a lot of really good questions, unfortunately, I'm probably not going to have a satisfying answer. There are whole virtual bookshelves on Amazon that try to answer these questions. Choosing technologies, adoption strategies, and trading off on test coverage are all hard problems that are often very specific to your team or company. So my first piece of advice is go pick up some books and hunt around on the twitter or the rest of the web for people who've written about this subject.
ReplyDeleteOne thing to keep in mind is that you want lots of opinions. There isn't one right way to tackle these kinds of problems so make sure to seek our sources that seem to disagree with each other. Figure out why, then think about how that might apply to your situation.
Next, never lose track of the ultimate goal here. Your job is probably to ship working software. E2E tests, CI, CD - they are a tools that can help you with that goal, but you need to define what "working" means, and then you need ot choose strategies that make that happen. It could be SLOs, it could be counting bug reports, or performance metrics - whatever it is, start by defining "working" for your team and your company. Then use automated testing to help get there.
After that, keep in mind trade offs - if given enough time you could conceivably test every behavior in the system. However, you will eventually need to ship something. You need to figure out the right balance of different test types including E2E tests, unit tests, exploratory tests - or whatever. That balance will be tuned to your companies needs. Again, keep in mind the goal to ship software, balance accordingly.
Last, don't be afraid to experiment. Try some ideas, see what works well and what doesn't, stop doing things that aren't working, make time to try new stuff. As you move through your experiments communicate actively with your team and your management. You don't need to change everything all at once, plan on running lots of little experiments over time.
Getting all of this up and running took Google years and years. We may have been bigger but you will face a lot of the same challenges. You should plan to spend a long time trying to work out the right approach for you and your products. That's ok, sometimes doing good things takes time :)
Oh and one more thing, we do cover this a fair amount in the book Software Engineering at Google, which you can read for free here: https://abseil.io/resources/swe_at_google.2.pdf
Or you can buy a copy on Amazon.
Hope that helps!