We thought you might be interested in another article from our internal monthly testing newsletter called CODE GREEN... Originally titled: "Opinion: But it works on my machine!"
We spent so much time hearing about "make your tests small and run fast." While this is important for quick CL verification, system level testing is important, too, and doesn't get enough air time.
You write cool features. You write lots and lots of unit tests to make sure your features work. You make sure the unit tests run as part of your project's continuous build. Yet when the QA engineer tries out a few user scenarios, she finds many defects. She logs them as bugs. You try to reproduce them, but ... you can't! Sound familiar? It might to a lot of you who deal with complex systems that touch many other dependent systems. Want to test a simple service that just talks to a database? Simple, write a few unit tests with a mocked-out database. Want to test a service that connects to authentication to manage user accounts, talks to a risk engine, a biller, and a database? Now that's a different story!
So what are system level tests again? System level tests to the rescue. They're also referred to as integration tests, scenario tests, and end-end tests. No matter what they're called, these tests are a vital part of any test strategy. They wait for screen responses, they punch in HTML form fields, they click on buttons and links, they verify text on the UI (sometimes in different languages and locales). Heck, sometimes they even poke open inboxes and verify email content! But I have a gazillion unit tests and I don't need system level tests! Sure you do. Unit tests are useful in helping you quickly decide whether your latest code changes haven't caused your existing code to regress. They are an invaluable part of the agile developers' tool kit. But when code is finally packaged and deployed, it could look and behave very differently. And no amount of unit tests can help you decide whether that awesome UI feature you designed works the way it was intended, or that one of the services your feature depended on is broken or not behaving as expected. If you think of a "testing diet," system level tests are like carbohydrates -- they are a crucial part of your diet, but only in the right amount! System level tests provide that sense of comfort that everything works the way it should, when it lands in the customer's hands. In short, they're the closest thing to simulating your customers. And that makes them pretty darn valuable. Wait a minute -- how stable are these tests? Very good question. It should be pretty obvious that if you test a full blown deployment of any large, complex system you're going to run into some stability issues. Especially since large, complex systems consist of components that talk to many other components, sometimes asynchronously. And real world systems aren't perfect. Sometimes the database doesn't respond at all, sometimes the web server responds a few seconds later, and sometimes a simple confirmation message takes forever to reach an email inbox! Automated system level tests are sensitive to such issues, and sometimes report false failures. The key is utilizing them effectively, quickly identifying and fixing false failures, and pairing them up with the right set of small, fast tests.
[Side note: Even before his first day at the Googleplex, Jason showed an amazing dedication to Google. After leaving Chicago with the big moving truck, he and his family had to stop after just a few hours because of an ice storm. Cars sliding off the road left and right. Further along, in Kansas, one of his kids caught pneumonia. His family stayed in the local hospital while Jason drove on. Heading west, there was a big snow storm in Colorado, which he wanted to avoid. He drove further south and ended up in a rare (but really real) dust storm over the Border States. He promised us some great video footage of his drive through tumbleweeds. He finally arrived and has settled in and is looking forward to calmer times in the Bay Area. After that trip, he isn't even worried about earthquakes, fires, mud slides, or traffic on the 101.]
Couple of GTAC Videos with Jason: SeleniumRC, Selenium vs WebDriver
CG: Have you discovered any interesting teams using Selenium in ways you'd never intended? Huggins: At my previous company, I did see some developers write Selenium scripts to create their time and expense reports for them from YAML or XLS files. Since we hadn't exposed a back-end API, automating the browser for data entry was the next best thing. It was never designed for this purpose, but I started (ab)using it as coded bug reports. Asking users for steps on how to reproduce a bug naturally lends itself to looking like a Selenium test for that bug. Also, I've used the Selenium IDE Firefox plug-in to enter NBC's "Deal or No Deal" contest on their website from home, but I stopped doing that when I read in the fine print that the use of automation tools to enter their contest was grounds for disqualification.
CG: What advice do you have to offer Google groups interested in Selenium?Huggins: Well, one of the biggest drawbacks with user interface testing tools is that they're slow for various reasons. One way to bring the test run times down is to run them in parallel on a grid of servers, instead of sequentially. Of course, that isn't news to your average Googler. Engineers would be more likely to run automated browser UI tests if they could run 1000 tests in 1 minute total time on 1000 machines instead of 1000 tests in 1000 minutes on 1 machine. Sadly, though, most projects allocate only one machine, maybe two, to browser testing. I'm really excited to come to Google with the resources, the corporate interest, and the internal client base to make a large scale Selenium UI test farm possible. Eventually, I’d like to take Selenium in some new directions that we’ll talk about in later blog posts. But I'm getting ahead of myself. I have to survive Noogler training first.
Posted by George Pirocanac, Test Engineering ManagerEarlier Blog entries described the strategy and methodology for testing the functionality of various kinds of applications. The basic approach is to isolate the logic of the application from the external API calls that the application makes through the use of various constructs called mocks, fakes, dummy routines, etc. Depending on how the application is designed and written, this can lead to smaller, simpler tests that cover more, execute more quickly, and lead to quicker diagnosis of problems than the larger end-to-end or system tests. On the other hand, they are not a complete replacement for end-to-end testing. By their very nature, the small tests don't test the assumptions and interactions between the application and the APIs that it calls. As a result, a diversified application testing strategy includes small, medium, and large tests. (See Copeland’s GTAC Video, fast forward about 5 minutes in to hear a brief description of developer testing and small, medium, large)
What about testing the APIs themselves? What if anything is different? The first approach mirrors the small test approach. Each of the API calls is exercised with a variety of inputs and the outputs that are verified according to the specification. For isolated, stateless APIs (math library functions come to mind), this can be very effective by itself. However, many APIs are not isolated or stateless, and their results can vary according to the *combinations* of calls that were made. One way to deal with this is to analyze the dependencies between the calls and create mini-applications to exercise and verify these combinations of calls. Often, this falls into the so-called typical usage patterns or user scenarios. While good, this first approach only gives us limited confidence. We also need to test what happens when not-so-typical sets of calls are made as well. Often application writers introduce usage patterns that the spec didn't anticipate.
Another approach is to capture the API calls made by real applications under controlled situations and then replay only the calls under the same controlled situations. These types of tests fall into the medium category. Again, the idea is to test series and combinations of calls, but the difficulty can lie in recreating the exact environment. In addition, this approach is vulnerable to building tests that traverse the same paths in the code. Adding fuzz to the parameters and call patterns can help, but not eliminate, this problem.
The third approach is to pull out the big hammer. Does it make sense to test the APIs with large applications? After all, if something goes wrong, you have to have specific knowledge about the application to triage the problem. You also have to be familiar with the techniques in testing the application. Testing a map-based application can be quite different from a calendar-based one, even if they share a common subset of APIs. The strongest case for testing APIs with large applications is compatibility testing. APIs not only have to return correct results, but they have to do it in the same manner from revision to revision. It's a sort of contract between the API writer and the application writer. When the API is private, then only a relative small number of parties have to agree on a change to the contract, but when it is public, even a small change can break a lot of applications.
So when it comes to API testing, it seems we are back to small, medium, and large approaches after all. Just as in application testing where you can't completely divorce the API from the application, we cannot completely divorce the application from API testing.
Posted by Patrick Copeland, Test Engineering Director
I visited the University of Arizona last night, along with several Googlers, as part of a series of Tech Talks being given at selected schools. The talk was about GFS (Google File System). The auditorium was standing room only, with a turn out of over 150 computer science and computer engineering students (probably enticed with the free pizza and t-shirts :^). We really appreciated the turn out and the enthusiasm for Google. The questions following the talk were great and we probably could have gone on for several hours. I had a chance to talk to a few folks afterwards about their projects ranging from security research, to traffic simulation. I also met one of the professors and discussed the potential of doing some joint research.
During the trip I also visited my grandmother who lives in Tucson. I was showing her a picture of my son on a BlackBerry. She was fascinated and asked me how it worked. I started to think about how to explain it and in that moment, it humbled me to think about the number of complex systems employed to do such a simple thing. Displaying a photo from a web page includes device side operating systems, run time languages, cell technology, network stacks, cell receivers, routers, serving front and back-ends,…and more. An interesting side note: the bits for my jpg file ultimately get stored at Google in GFS, the topic of my talk that night. Obviously, each part of that chain is complex and important to get my simple scenario to work. I started to explain it in plain language and she quickly stopped me and said that when she was a child her family had a crank operated “party-line” phone, where multiple families shared a single line. What hit me was that even though technology has gotten more complex, the types of scenarios we are enabling are still very basic and human: communicating, sharing, and connecting people. Even with all of the automated testing that we do, deep testing done from the perspective of customers is still absolutely critical.
Again, thanks to the students at the University of Arizona. Bear Down! We’re looking forward to visiting all of the schools on our list this season.
Here's some unsolicited feedback from Kurt Kluever who ended up on his school's web page (Rochester Institute of Technology). While he was at Google this summer he tested JotSpot. Kurt's project included enhancing a Selenium Framework for testing JotSpot. He add automated tests to an existing suite and refactored code to allow the framework to be more extensible. In addition, he fixed several issues with Selenium RC that allows for better concurrent execution of tests within a browser’s object model. The scope of the project was focused on one application, but he looked through code from various other projects and applied the design patterns to the JotSpot framework.
His experience in his own words…link to RIT site pdf.