Testing Blog

Testing on the Toilet: What Makes a Good End-to-End Test?

Wednesday, September 21, 2016
Share on Twitter Share on Facebook
Google
Labels: Adam Bender , TotT

20 comments :

  1. Sachin SabaleSeptember 22, 2016 at 7:18:00 AM PDT

    Few questions :
    1)
    Is google having any manual testing team ? or everything is automated ?
    If google has manual testing team then is it possible to have one blog on how the manual testing is managed by google ?

    2) Also how we measure the automation team's performance in google ?
    More specifically how to count % automation testing done for a project in google

    ReplyDelete
    Replies
    1. Adam BenderSeptember 22, 2016 at 8:10:00 AM PDT

      1) Google does have some degree of manual testing, though we try to focus human effort on finding interesting problems with our products. We try to avoid using manual testing as a way to catch regressions or verify basic functionality. You can probably imagine would kind of effort it would take to exhaustively test something like Google Search. For us, more automation is always the goal.

      2) There are a lot of ways to measure how well automated your testing effort and it varies from team-to-team on which ones are more important. Though I should mention that we also try to avoid large 'test automation teams' - we treat testing as a shared responsibility and wherever possible ask the developers building the products to be involved in developing the associated automated tests.

      In any case, here are a few ways I think we look at automation performance:

      * How often are you finding bugs in manual tests if you have them? More bugs during manual testing suggests opportunities for improved automation.

      * How many defect reports are you getting from the field? Same logic as the above measure.

      * If you have identified critical use cases, to what degree is each use case covered by automation (unit, integration or functional tests)? Some things make more sense to invest in heavily.

      * How long does it take to release your product with a full test-cycle? Longer releases are often bottlenecked by slow manual testing, replacing it with automation will allow you to release faster.

      * How quickly and confidently are you able to make large scale changes? The more fear you have about change, the more likely you need additional automation!

      Delete
      Replies
        Reply
    2. Vaibhav ZodgeSeptember 22, 2016 at 12:34:00 PM PDT

      Good information, I have one question, how do you track how much is automated from a critical use case at which test level, and what part is Manual. How the consolidation of results happen to gain confidence?

      Delete
      Replies
        Reply
    3. Adam BenderSeptember 22, 2016 at 4:40:00 PM PDT

      First, we expect all code to be thoroughly unit tested so hopefully when it comes to small, fast, tests we can assume that individual code paths have been covered completely. We can use coverage-analysis to verify whether or not that is true.

      Unfortunately, at this level there is a little 'forest-from-the-trees' problem as many lines of code will be part of more than one feature. We don't usually make an effort to connect unit tests to critical feature coverage.

      When tests get larger and more distinct we have some internal test tracking tools that we can use to define user-level behaviors and then indicate tested or not and automated or not.

      When it comes to confidence, teams I have worked with tend to focus on things like defect rate, release failure rate and release time as high level metrics indicating overall code health.

      Delete
      Replies
        Reply
    4. AnonymousApril 10, 2022 at 6:24:00 AM PDT

      Hi Adam, can you provide more details on how you measure defect rate, release failure rate and release time?

      Delete
      Replies
        Reply
    5. Adam BenderAugust 17, 2022 at 3:12:00 PM PDT

      Sure, but I don't think we are doing anything special here.

      Defect rate: The rate of new bugs being filed, either from users or in response to our own monitoring and quality processes.

      Release Failure Rate: Of the builds we intend to land in production, what percentage actually make it and stick. Common causes of release failure include: pre-release integration/e2e test failing, automatic monitoring detecting an issue, or an uptick user reported bugs. There is some nuance here because we (or you) can count things like cherry picks into a prod release a a failure or something else.

      Release time: How long does it take for your release process to run? Big complex tests tend to take a lot longer and that can become problematic over time. If it gets really bad the release time can actually prevent you from releasing as fast as you want. Imagine you have a test that takes 6 hours to run. If you assume a normal 8 hour day and a team that is located in a single timezone, the length of that one test means you can only release once per business day. Maybe you don't want to go faster on principle, but you almost certainly don't want one test to foreclose the opportunity to go faster.

      Delete
      Replies
        Reply
    6. Reply
  2. AnonymousSeptember 22, 2016 at 7:53:00 AM PDT

    > Be prepared to allocate at least one week a quarter per test to keep your end-to-end tests stable

    Could you please elaborate on this? If it takes a month of person-time per year per test, do you simply have very few (but very large?) E2E tests?

    ReplyDelete
    Replies
    1. Adam BenderSeptember 22, 2016 at 8:27:00 AM PDT

      First, consider that at our scale, Google's products are the result of many, many services working together. Keeping an E2E test stable when it involves 50+ service dependencies, many of which are being modified on a daily basis, is quite a task. It can take real engineering effort and often management support to ensure everyone responsible for those services is committed to enabling automated testing. Even if everyone involved is aiming for the same goal there is still going to be time spent making sure all this stuff actually works.

      As a result we do often recommend that teams employ fully E2E tests sparingly. We encourage teams to use unit and smaller integration testing much more heavily than E2E testing. As the number of systems involved in a test goes down they are faster, more reliable, easier to debug and less costly to maintain.

      If your products have fewer, or slower changing dependencies you may find that a week per-quarter is too high. However, a complex test is still software and will require maintenance effort proportional to its complexity.

      Delete
      Replies
        Reply
    2. Reply
  3. AnonymousSeptember 25, 2016 at 2:28:00 AM PDT

    Good info.
    When do you run your E2E tests? is it triggered by a deployment or scheduled runs?

    ReplyDelete
    Replies
    1. Adam BenderSeptember 26, 2016 at 6:09:00 PM PDT

      Each team will vary a little bit here based on the runtime of the test. Ideally, we would like to run E2E tests before every commit. However, because E2E tests are often quite slow we have to run them at a slightly lower frequency. Many teams will run these tests triggered by a commit, but after the code lands in the repo, waiting no more than a couple hours to trigger a run. Other teams will run a scheduled job every N hours to run all the known E2E tests.

      We tend to decouple the testing from the deployment and instead wait for a signal from our continuous integration tool letting us know that a particular commit is safe for deployment because all applicable tests have been run against it.

      Delete
      Replies
        Reply
    2. Reply
  4. UnknownSeptember 25, 2016 at 5:53:00 PM PDT

    Could you provide more information about "documenting common test failure modes". Please, give some examples.

    ReplyDelete
    Replies
    1. Adam BenderSeptember 26, 2016 at 6:18:00 PM PDT

      When a unit test you can look at the failure and often hop right to the exact line of code causing the problem. Unfortunately, in an E2E test there are often multiple things conspiring to fail your test, so such precision in fault-finding is usually unachievable. We often encourage those who write the tests to include human readable documentation, either in test comments or on the internal team wiki documenting known reasons for a test to fail.

      For example, in the system outlined above we are dependent on an authentication system to complete our test. In some cases the authentication system may report an error authenticating a user because of a timeout in one of its upstream dependencies. If this happens more than once the engineer working on the test might write something like this on the internal team page:

      "If you see error logs indicating a timeout in the auth system, you can assume this was down to quota issues in bigtable storage."

      From then on, any one tasked with fixing the tests can refer to this note when attempting to diagnose a failure.

      Delete
      Replies
        Reply
    2. Adam BenderSeptember 26, 2016 at 6:32:00 PM PDT

      Small typo above - should read "When a unit test fails you can look..."

      Delete
      Replies
        Reply
    3. Reply
  5. AnonymousSeptember 12, 2017 at 12:34:00 PM PDT

    Great article! Quick question: does Google have automated UI tests? If yes, how is been decided which UI tests to automate, when are they run, etc.? Thanks!

    ReplyDelete
    Replies
    1. Adam BenderSeptember 18, 2017 at 11:30:00 AM PDT

      Hey Maki, glad you enjoyed it. Yes Google has lots of automated UI tests. We use a variety of different tools depending on the platforms we are targeting (Android, iOS, desktop, or mobile web). To decide which things to test via the UI we often look at a combination of things:

      * Can this test be written as a unit test and be just as effective? If so write it as a unit test. The bigger a test, and the more infrastructure involved, the harder it will be to make the test reliable.

      * Do I care how the UI looks (pixel for pixel) as a result of this one behavior? If so then use a screenshot-based testing tool, but constrain the viewport as much as possible to only focus on the elements that are relevant to the specific behavior.

      * If I don't care about pixel-for-pixel looks then I might use something like Selenium/WebDriver to drive the test. These tests take about as long as a screenshoot test but are often more robust against change.

      With those last two bullets it pays to be a little careful, it can tempting to throw everything into a UI test but many times those tests are very slow and can become flaky easily. Make sure you use good practices like 'waiting' instead of just asserting and avoid testing every possible behavior with a UI test. In many cases different behaviors only affect the UI minimally and sometimes a single test can tell you about how a whole class of behaviors will look (think password validation rules). Finally, we run these tests all the time, ideally with each change to the code base.

      Hope that helps!

      Delete
      Replies
        Reply
    2. Reply
  6. JamesMarch 1, 2018 at 2:56:00 AM PST

    Hi Adam,

    Good stuff tx

    Having a debate on testing error messages.. I say no. Waste of time and brittle. Perhaps check on the 'presence' of a message but certainly not the text itself. Any thoughts?

    Thanks,

    James

    ReplyDelete
    Replies
    1. Adam BenderMarch 2, 2018 at 3:43:00 PM PST

      Hey James, I'm glad you enjoyed the article!

      I've learned to avoid absolutes so when it comes to 'testing error messages' I would say "it depends" :)

      First of all we need to be clear who the intended audience of the error message is. If it is an end user, then I might want to test it precisely because that message may contain information to be passed onto customer support. Even more so if i18n & a11y are required.

      If the audience is developers (command line or debug console), then unless the error message format is critical to being able to diagnose and troubleshoot a problem, Im with you, confirming the existence of the message may be enough.

      Regardless, when checking strings it is best to be very precise. Instead of trying to match for entire strings, you should check that the string contains the specific information (error code, account #) etc. Normally checking this kind of data amounts to validating that a template was filled in correctly and can allows the test to be made fairly resilient to formatting and wording changes.

      Delete
      Replies
        Reply
    2. JamesMarch 3, 2018 at 12:16:00 AM PST

      Cheers Adam.. appreciate your timely reply

      Delete
      Replies
        Reply
    3. Reply
  7. StefanSeptember 10, 2021 at 6:05:00 AM PDT

    Hi Adam,

    I am trying to figure out how to integrate a testing tool in my CI/CD pipeline to automate tests for my project, currently we do tests manually, all of them, unit, service, UI tests.

    So my questions are:
    What are the best ways to automate E2E testing?
    What are the main tips and tricks?
    Which tools are the best to integrate in CI/CD pipeline?
    Is it even possible to automate testing completely in my CI/CD?

    Cheers,

    Stefan

    ReplyDelete
    Replies
      Reply
  8. Adam BenderFebruary 18, 2022 at 5:13:00 PM PST

    Hey Stefan apologies for the delayed response here! You asked a lot of really good questions, unfortunately, I'm probably not going to have a satisfying answer. There are whole virtual bookshelves on Amazon that try to answer these questions. Choosing technologies, adoption strategies, and trading off on test coverage are all hard problems that are often very specific to your team or company. So my first piece of advice is go pick up some books and hunt around on the twitter or the rest of the web for people who've written about this subject.

    One thing to keep in mind is that you want lots of opinions. There isn't one right way to tackle these kinds of problems so make sure to seek our sources that seem to disagree with each other. Figure out why, then think about how that might apply to your situation.

    Next, never lose track of the ultimate goal here. Your job is probably to ship working software. E2E tests, CI, CD - they are a tools that can help you with that goal, but you need to define what "working" means, and then you need ot choose strategies that make that happen. It could be SLOs, it could be counting bug reports, or performance metrics - whatever it is, start by defining "working" for your team and your company. Then use automated testing to help get there.

    After that, keep in mind trade offs - if given enough time you could conceivably test every behavior in the system. However, you will eventually need to ship something. You need to figure out the right balance of different test types including E2E tests, unit tests, exploratory tests - or whatever. That balance will be tuned to your companies needs. Again, keep in mind the goal to ship software, balance accordingly.

    Last, don't be afraid to experiment. Try some ideas, see what works well and what doesn't, stop doing things that aren't working, make time to try new stuff. As you move through your experiments communicate actively with your team and your management. You don't need to change everything all at once, plan on running lots of little experiments over time.

    Getting all of this up and running took Google years and years. We may have been bigger but you will face a lot of the same challenges. You should plan to spend a long time trying to work out the right approach for you and your products. That's ok, sometimes doing good things takes time :)

    Oh and one more thing, we do cover this a fair amount in the book Software Engineering at Google, which you can read for free here: https://abseil.io/resources/swe_at_google.2.pdf

    Or you can buy a copy on Amazon.

    Hope that helps!

    ReplyDelete
    Replies
      Reply
Add comment
Load more...

The comments you read and contribute here belong only to the person who posted them. We reserve the right to remove off-topic comments.

  

Labels


  • TotT 106
  • GTAC 61
  • James Whittaker 42
  • Misko Hevery 32
  • Code Health 31
  • Anthony Vallone 27
  • Patrick Copeland 23
  • Jobs 18
  • Andrew Trenk 13
  • C++ 11
  • Patrik Höglund 8
  • JavaScript 7
  • Allen Hutchison 6
  • George Pirocanac 6
  • Zhanyong Wan 6
  • Harry Robinson 5
  • Java 5
  • Julian Harty 5
  • Adam Bender 4
  • Alberto Savoia 4
  • Ben Yu 4
  • Erik Kuefler 4
  • Philip Zembrod 4
  • Shyam Seshadri 4
  • Chrome 3
  • Dillon Bly 3
  • John Thomas 3
  • Lesley Katzen 3
  • Marc Kaplan 3
  • Markus Clermont 3
  • Max Kanat-Alexander 3
  • Sonal Shah 3
  • APIs 2
  • Abhishek Arya 2
  • Alan Myrvold 2
  • Alek Icev 2
  • Android 2
  • April Fools 2
  • Chaitali Narla 2
  • Chris Lewis 2
  • Chrome OS 2
  • Diego Salas 2
  • Dori Reuveni 2
  • Jason Arbon 2
  • Jochen Wuttke 2
  • Kostya Serebryany 2
  • Marc Eaddy 2
  • Marko Ivanković 2
  • Mobile 2
  • Oliver Chang 2
  • Simon Stewart 2
  • Stefan Kennedy 2
  • Test Flakiness 2
  • Titus Winters 2
  • Tony Voellm 2
  • WebRTC 2
  • Yiming Sun 2
  • Yvette Nameth 2
  • Zuri Kemp 2
  • Aaron Jacobs 1
  • Adam Porter 1
  • Adam Raider 1
  • Adel Saoud 1
  • Alan Faulkner 1
  • Alex Eagle 1
  • Amy Fu 1
  • Anantha Keesara 1
  • Antoine Picard 1
  • App Engine 1
  • Arham Jain 1
  • Ari Shamash 1
  • Arif Sukoco 1
  • Benjamin Pick 1
  • Bob Nystrom 1
  • Bruce Leban 1
  • Carlos Arguelles 1
  • Carlos Israel Ortiz García 1
  • Cathal Weakliam 1
  • Christopher Semturs 1
  • Clay Murphy 1
  • Dagang Wei 1
  • Dan Maksimovich 1
  • Dan Shi 1
  • Dan Willemsen 1
  • Dave Chen 1
  • Dave Gladfelter 1
  • David Bendory 1
  • David Mandelberg 1
  • Derek Snyder 1
  • Diego Cavalcanti 1
  • Dmitry Vyukov 1
  • Eduardo Bravo Ortiz 1
  • Ekaterina Kamenskaya 1
  • Elliott Karpilovsky 1
  • Elliotte Rusty Harold 1
  • Espresso 1
  • Felipe Sodré 1
  • Francois Aube 1
  • Gene Volovich 1
  • Google+ 1
  • Goran Petrovic 1
  • Goranka Bjedov 1
  • Hank Duan 1
  • Havard Rast Blok 1
  • Hongfei Ding 1
  • Jason Elbaum 1
  • Jason Huggins 1
  • Jay Han 1
  • Jeff Hoy 1
  • Jeff Listfield 1
  • Jessica Tomechak 1
  • Jim Reardon 1
  • Joe Allan Muharsky 1
  • Joel Hynoski 1
  • John Micco 1
  • John Penix 1
  • Jonathan Rockway 1
  • Jonathan Velasquez 1
  • Josh Armour 1
  • Julie Ralph 1
  • Kai Kent 1
  • Kanu Tewary 1
  • Karin Lundberg 1
  • Kaue Silveira 1
  • Kevin Bourrillion 1
  • Kevin Graney 1
  • Kirkland 1
  • Kurt Alfred Kluever 1
  • Kyle Freeman 1
  • Manjusha Parvathaneni 1
  • Marek Kiszkis 1
  • Marius Latinis 1
  • Mark Ivey 1
  • Mark Manley 1
  • Mark Striebeck 1
  • Matt Lowrie 1
  • Meredith Whittaker 1
  • Michael Bachman 1
  • Michael Klepikov 1
  • Mike Aizatsky 1
  • Mike Wacker 1
  • Mona El Mahdy 1
  • Noel Yap 1
  • Palak Bansal 1
  • Patricia Legaspi 1
  • Per Jacobsson 1
  • Peter Arrenbrecht 1
  • Peter Spragins 1
  • Phil Norman 1
  • Phil Rollet 1
  • Pooja Gupta 1
  • Project Showcase 1
  • Radoslav Vasilev 1
  • Rajat Dewan 1
  • Rajat Jain 1
  • Rich Martin 1
  • Richard Bustamante 1
  • Roshan Sembacuttiaratchy 1
  • Ruslan Khamitov 1
  • Sam Lee 1
  • Sean Jordan 1
  • Sebastian Dörner 1
  • Sharon Zhou 1
  • Shiva Garg 1
  • Siddartha Janga 1
  • Simran Basi 1
  • Stan Chan 1
  • Stephen Ng 1
  • Tejas Shah 1
  • Test Analytics 1
  • Test Engineer 1
  • Tim Lyakhovetskiy 1
  • Tom O'Neill 1
  • Vojta Jína 1
  • automation 1
  • dead code 1
  • iOS 1
  • mutation testing 1


Archive


  • ►  2025 (3)
    • ►  Oct (1)
    • ►  Sep (1)
    • ►  Jan (1)
  • ►  2024 (13)
    • ►  Dec (1)
    • ►  Oct (1)
    • ►  Sep (1)
    • ►  Aug (1)
    • ►  Jul (1)
    • ►  May (3)
    • ►  Apr (3)
    • ►  Mar (1)
    • ►  Feb (1)
  • ►  2023 (14)
    • ►  Dec (2)
    • ►  Nov (2)
    • ►  Oct (5)
    • ►  Sep (3)
    • ►  Aug (1)
    • ►  Apr (1)
  • ►  2022 (2)
    • ►  Feb (2)
  • ►  2021 (3)
    • ►  Jun (1)
    • ►  Apr (1)
    • ►  Mar (1)
  • ►  2020 (8)
    • ►  Dec (2)
    • ►  Nov (1)
    • ►  Oct (1)
    • ►  Aug (2)
    • ►  Jul (1)
    • ►  May (1)
  • ►  2019 (4)
    • ►  Dec (1)
    • ►  Nov (1)
    • ►  Jul (1)
    • ►  Jan (1)
  • ►  2018 (7)
    • ►  Nov (1)
    • ►  Sep (1)
    • ►  Jul (1)
    • ►  Jun (2)
    • ►  May (1)
    • ►  Feb (1)
  • ►  2017 (17)
    • ►  Dec (1)
    • ►  Nov (1)
    • ►  Oct (1)
    • ►  Sep (1)
    • ►  Aug (1)
    • ►  Jul (2)
    • ►  Jun (2)
    • ►  May (3)
    • ►  Apr (2)
    • ►  Feb (1)
    • ►  Jan (2)
  • ▼  2016 (15)
    • ►  Dec (1)
    • ►  Nov (2)
    • ►  Oct (1)
    • ▼  Sep (2)
      • Testing on the Toilet: What Makes a Good End-to-En...
      • What Test Engineers do at Google
    • ►  Aug (1)
    • ►  Jun (2)
    • ►  May (3)
    • ►  Apr (1)
    • ►  Mar (1)
    • ►  Feb (1)
  • ►  2015 (14)
    • ►  Dec (1)
    • ►  Nov (1)
    • ►  Oct (2)
    • ►  Aug (1)
    • ►  Jun (1)
    • ►  May (2)
    • ►  Apr (2)
    • ►  Mar (1)
    • ►  Feb (1)
    • ►  Jan (2)
  • ►  2014 (24)
    • ►  Dec (2)
    • ►  Nov (1)
    • ►  Oct (2)
    • ►  Sep (2)
    • ►  Aug (2)
    • ►  Jul (3)
    • ►  Jun (3)
    • ►  May (2)
    • ►  Apr (2)
    • ►  Mar (2)
    • ►  Feb (1)
    • ►  Jan (2)
  • ►  2013 (16)
    • ►  Dec (1)
    • ►  Nov (1)
    • ►  Oct (1)
    • ►  Aug (2)
    • ►  Jul (1)
    • ►  Jun (2)
    • ►  May (2)
    • ►  Apr (2)
    • ►  Mar (2)
    • ►  Jan (2)
  • ►  2012 (11)
    • ►  Dec (1)
    • ►  Nov (2)
    • ►  Oct (3)
    • ►  Sep (1)
    • ►  Aug (4)
  • ►  2011 (39)
    • ►  Nov (2)
    • ►  Oct (5)
    • ►  Sep (2)
    • ►  Aug (4)
    • ►  Jul (2)
    • ►  Jun (5)
    • ►  May (4)
    • ►  Apr (3)
    • ►  Mar (4)
    • ►  Feb (5)
    • ►  Jan (3)
  • ►  2010 (37)
    • ►  Dec (3)
    • ►  Nov (3)
    • ►  Oct (4)
    • ►  Sep (8)
    • ►  Aug (3)
    • ►  Jul (3)
    • ►  Jun (2)
    • ►  May (2)
    • ►  Apr (3)
    • ►  Mar (3)
    • ►  Feb (2)
    • ►  Jan (1)
  • ►  2009 (54)
    • ►  Dec (3)
    • ►  Nov (2)
    • ►  Oct (3)
    • ►  Sep (5)
    • ►  Aug (4)
    • ►  Jul (15)
    • ►  Jun (8)
    • ►  May (3)
    • ►  Apr (2)
    • ►  Feb (5)
    • ►  Jan (4)
  • ►  2008 (75)
    • ►  Dec (6)
    • ►  Nov (8)
    • ►  Oct (9)
    • ►  Sep (8)
    • ►  Aug (9)
    • ►  Jul (9)
    • ►  Jun (6)
    • ►  May (6)
    • ►  Apr (4)
    • ►  Mar (4)
    • ►  Feb (4)
    • ►  Jan (2)
  • ►  2007 (41)
    • ►  Oct (6)
    • ►  Sep (5)
    • ►  Aug (3)
    • ►  Jul (2)
    • ►  Jun (2)
    • ►  May (2)
    • ►  Apr (7)
    • ►  Mar (5)
    • ►  Feb (5)
    • ►  Jan (4)

Feed

  • Google
  • Privacy
  • Terms