We thought you might be interested in another article from our internal monthly testing newsletter called CODE GREEN... Originally titled: "Opinion: But it works on my machine!"
We spent so much time hearing about "make your tests small and run fast." While this is important for quick CL verification, system level testing is important, too, and doesn't get enough air time.
You write cool features. You write lots and lots of unit tests to make sure your features work. You make sure the unit tests run as part of your project's continuous build. Yet when the QA engineer tries out a few user scenarios, she finds many defects. She logs them as bugs. You try to reproduce them, but ... you can't! Sound familiar? It might to a lot of you who deal with complex systems that touch many other dependent systems. Want to test a simple service that just talks to a database? Simple, write a few unit tests with a mocked-out database. Want to test a service that connects to authentication to manage user accounts, talks to a risk engine, a biller, and a database? Now that's a different story!
So what are system level tests again? System level tests to the rescue. They're also referred to as integration tests, scenario tests, and end-end tests. No matter what they're called, these tests are a vital part of any test strategy. They wait for screen responses, they punch in HTML form fields, they click on buttons and links, they verify text on the UI (sometimes in different languages and locales). Heck, sometimes they even poke open inboxes and verify email content! But I have a gazillion unit tests and I don't need system level tests! Sure you do. Unit tests are useful in helping you quickly decide whether your latest code changes haven't caused your existing code to regress. They are an invaluable part of the agile developers' tool kit. But when code is finally packaged and deployed, it could look and behave very differently. And no amount of unit tests can help you decide whether that awesome UI feature you designed works the way it was intended, or that one of the services your feature depended on is broken or not behaving as expected. If you think of a "testing diet," system level tests are like carbohydrates -- they are a crucial part of your diet, but only in the right amount! System level tests provide that sense of comfort that everything works the way it should, when it lands in the customer's hands. In short, they're the closest thing to simulating your customers. And that makes them pretty darn valuable. Wait a minute -- how stable are these tests? Very good question. It should be pretty obvious that if you test a full blown deployment of any large, complex system you're going to run into some stability issues. Especially since large, complex systems consist of components that talk to many other components, sometimes asynchronously. And real world systems aren't perfect. Sometimes the database doesn't respond at all, sometimes the web server responds a few seconds later, and sometimes a simple confirmation message takes forever to reach an email inbox! Automated system level tests are sensitive to such issues, and sometimes report false failures. The key is utilizing them effectively, quickly identifying and fixing false failures, and pairing them up with the right set of small, fast tests.
[Side note: Even before his first day at the Googleplex, Jason showed an amazing dedication to Google. After leaving Chicago with the big moving truck, he and his family had to stop after just a few hours because of an ice storm. Cars sliding off the road left and right. Further along, in Kansas, one of his kids caught pneumonia. His family stayed in the local hospital while Jason drove on. Heading west, there was a big snow storm in Colorado, which he wanted to avoid. He drove further south and ended up in a rare (but really real) dust storm over the Border States. He promised us some great video footage of his drive through tumbleweeds. He finally arrived and has settled in and is looking forward to calmer times in the Bay Area. After that trip, he isn't even worried about earthquakes, fires, mud slides, or traffic on the 101.]
Couple of GTAC Videos with Jason: SeleniumRC, Selenium vs WebDriver
CG: Why did you invent Selenium? What was the motivation? Huggins: Selenium was extracted from a web-based (Python + Plone!) time-and-expense (T&E) application my team and I were building for my previous employer. One of the mandates for the new T&E app was that it needed to be "fast, fast, fast." The legacy application was a client-server Lotus Notes application and wasn't scalable to the current number of offices and employees at the time. To live up to the "fast, fast, fast" design goal, we tried to improve and speed up the user experience as much as possible. For example, expense reports can get pretty long for people who travel a lot. No matter how many default rows we put in a blank expense form, people often needed to add more rows of expense items to their reports. So we added an "Add row" button to the expense report form. To make this "fast, fast, fast," I decided to use a button that triggered JavaScript to dynamically add one blank row to the form. At the time (Spring 2004), however, JavaScript was considered buggy and evil by most web developers, so I caught a lot of flak for not going with the classic approach of POSTing the form and triggering a complete page refresh with the current form data, plus one blank row.
Going down the road of using JavaScript had consequences. For one, we had a really, really difficult time testing that little "Add row" button. And sadly, it broke often. One week "Add row" would be working in Mozilla (Firefox was pre-1.0), but broken in Internet Explorer. And of course, nothing was working in Safari since few developers were allowed to have Macs. ;-) The following week, we'd open a bug saying "'Add row' is broken in IE!!" The developer assigned to the issue would fix and test it in IE, but not test for regressions in Mozilla. So, "Add row" would now be broken in Mozilla, and I'd have to open a ticket saying "'Add row' is now broken in Mozilla!!!". Unlike most other corporate IT shops, we didn't have the luxury of telling all employees to use a single browser, and developers didn't want to manually test every feature in all supported browsers every time. Also, we had a very tiny budget and commercial testing tools were -- and still are -- ridiculously over-priced on a per-seat basis. The T&E project was done the "Agile Way" -- every developer does testing -- so shelling out thousands of dollars per developer for a testing tool wasn't going to happen. Never mind the fact that there were no commercial tools that did what we needed anyway!
After many months of trial and error and various code workarounds, I came to the conclusion I needed a testing tool that would let me functional-test JavaScript-enhanced web user interfaces (aka "DHTML" or now "Ajax"). More specifically, I needed to test the browser UIs: Internet Explorer, Mozilla Firefox, and Safari. There were no commercial apps at the time that could do this, and the only open source option was JsUnit, which was more focused on unit testing pure JavaScript functions, rather than being a higher-level black-box/smoke-test walk through a web app. So we needed to write our own tool. Our first attempt was a Mozilla extension called "driftwood" (never released), coded by Andrew McCormick, another co-worker of mine at the time. It did make testing the "Add row" button possible, but since it was Mozilla-only, it wasn't what we needed for testing in all browsers. Paul Gross and I started over, and I started to educate myself on functional testing tools and techniques and stumbled upon Ward Cunningham's Framework for Integrated Testing (FIT). I originally set out to implement "FIT for JavaScript," but quickly realized we were drifting away from the FIT API, so Selenium became its own thing.
CG: Why does the world need another test tool?Huggins: At the time I created Selenium, had there been another testing tool that could test JavaScript UI features in all browsers on all platforms, believe me, I would have saved lots of time *not* writing my own tool.
CG: What's special about it?Huggins: Well, maybe the right question is "What's “lucky” about it? Selenium was created in a time when JavaScript was considered "bad" and generally avoided by most professional web developers. Then Google Maps hit the scene a year later, the term "Ajax" was coined, and BOOM! Overnight, JavaScript became "good." Also, Firefox started stealing market share from IE. The combination of needing to test 1) JavaScript features 2) in several browsers (not just IE) was a "right place, right time" moment for Selenium.
CG: When did you realize that Selenium was a big deal? What was the tipping point? Huggins: When I started being asked to give presentations or write about Selenium by people I didn't know. The tipping point for Selenium technically relied on two things: 1) the Selenium IDE for Firefox, written by Shinya Kasatani, which made installation and the first-hour experience tons better for new users. And 2), Selenium Remote Control (RC) created by Paul Hammant, and extended by Dan Fabulich, Nelson Sproul, and Patrick Lightbody, which let developers write their tests in Java, C#, Perl, Python, or Ruby, and not have to write all their tests in the original FIT-like HTML tables. Socially, if Google Maps or Gmail never existed and thus the whole Ajax gold rush, I wonder if JavaScript would still be considered "bad," with a similar "why bother?" attitude to testing it.
CG: Have you discovered any interesting teams using Selenium in ways you'd never intended? Huggins: At my previous company, I did see some developers write Selenium scripts to create their time and expense reports for them from YAML or XLS files. Since we hadn't exposed a back-end API, automating the browser for data entry was the next best thing. It was never designed for this purpose, but I started (ab)using it as coded bug reports. Asking users for steps on how to reproduce a bug naturally lends itself to looking like a Selenium test for that bug. Also, I've used the Selenium IDE Firefox plug-in to enter NBC's "Deal or No Deal" contest on their website from home, but I stopped doing that when I read in the fine print that the use of automation tools to enter their contest was grounds for disqualification.
CG: What advice do you have to offer Google groups interested in Selenium?Huggins: Well, one of the biggest drawbacks with user interface testing tools is that they're slow for various reasons. One way to bring the test run times down is to run them in parallel on a grid of servers, instead of sequentially. Of course, that isn't news to your average Googler. Engineers would be more likely to run automated browser UI tests if they could run 1000 tests in 1 minute total time on 1000 machines instead of 1000 tests in 1000 minutes on 1 machine. Sadly, though, most projects allocate only one machine, maybe two, to browser testing. I'm really excited to come to Google with the resources, the corporate interest, and the internal client base to make a large scale Selenium UI test farm possible. Eventually, I’d like to take Selenium in some new directions that we’ll talk about in later blog posts. But I'm getting ahead of myself. I have to survive Noogler training first.
Posted by George Pirocanac, Test Engineering ManagerEarlier Blog entries described the strategy and methodology for testing the functionality of various kinds of applications. The basic approach is to isolate the logic of the application from the external API calls that the application makes through the use of various constructs called mocks, fakes, dummy routines, etc. Depending on how the application is designed and written, this can lead to smaller, simpler tests that cover more, execute more quickly, and lead to quicker diagnosis of problems than the larger end-to-end or system tests. On the other hand, they are not a complete replacement for end-to-end testing. By their very nature, the small tests don't test the assumptions and interactions between the application and the APIs that it calls. As a result, a diversified application testing strategy includes small, medium, and large tests. (See Copeland’s GTAC Video, fast forward about 5 minutes in to hear a brief description of developer testing and small, medium, large)
What about testing the APIs themselves? What if anything is different? The first approach mirrors the small test approach. Each of the API calls is exercised with a variety of inputs and the outputs that are verified according to the specification. For isolated, stateless APIs (math library functions come to mind), this can be very effective by itself. However, many APIs are not isolated or stateless, and their results can vary according to the *combinations* of calls that were made. One way to deal with this is to analyze the dependencies between the calls and create mini-applications to exercise and verify these combinations of calls. Often, this falls into the so-called typical usage patterns or user scenarios. While good, this first approach only gives us limited confidence. We also need to test what happens when not-so-typical sets of calls are made as well. Often application writers introduce usage patterns that the spec didn't anticipate.
Another approach is to capture the API calls made by real applications under controlled situations and then replay only the calls under the same controlled situations. These types of tests fall into the medium category. Again, the idea is to test series and combinations of calls, but the difficulty can lie in recreating the exact environment. In addition, this approach is vulnerable to building tests that traverse the same paths in the code. Adding fuzz to the parameters and call patterns can help, but not eliminate, this problem.
The third approach is to pull out the big hammer. Does it make sense to test the APIs with large applications? After all, if something goes wrong, you have to have specific knowledge about the application to triage the problem. You also have to be familiar with the techniques in testing the application. Testing a map-based application can be quite different from a calendar-based one, even if they share a common subset of APIs. The strongest case for testing APIs with large applications is compatibility testing. APIs not only have to return correct results, but they have to do it in the same manner from revision to revision. It's a sort of contract between the API writer and the application writer. When the API is private, then only a relative small number of parties have to agree on a change to the contract, but when it is public, even a small change can break a lot of applications.
So when it comes to API testing, it seems we are back to small, medium, and large approaches after all. Just as in application testing where you can't completely divorce the API from the application, we cannot completely divorce the application from API testing.
Posted by Patrick Copeland, Test Engineering Director
I visited the University of Arizona last night, along with several Googlers, as part of a series of Tech Talks being given at selected schools. The talk was about GFS (Google File System). The auditorium was standing room only, with a turn out of over 150 computer science and computer engineering students (probably enticed with the free pizza and t-shirts :^). We really appreciated the turn out and the enthusiasm for Google. The questions following the talk were great and we probably could have gone on for several hours. I had a chance to talk to a few folks afterwards about their projects ranging from security research, to traffic simulation. I also met one of the professors and discussed the potential of doing some joint research.
During the trip I also visited my grandmother who lives in Tucson. I was showing her a picture of my son on a BlackBerry. She was fascinated and asked me how it worked. I started to think about how to explain it and in that moment, it humbled me to think about the number of complex systems employed to do such a simple thing. Displaying a photo from a web page includes device side operating systems, run time languages, cell technology, network stacks, cell receivers, routers, serving front and back-ends,…and more. An interesting side note: the bits for my jpg file ultimately get stored at Google in GFS, the topic of my talk that night. Obviously, each part of that chain is complex and important to get my simple scenario to work. I started to explain it in plain language and she quickly stopped me and said that when she was a child her family had a crank operated “party-line” phone, where multiple families shared a single line. What hit me was that even though technology has gotten more complex, the types of scenarios we are enabling are still very basic and human: communicating, sharing, and connecting people. Even with all of the automated testing that we do, deep testing done from the perspective of customers is still absolutely critical.
Again, thanks to the students at the University of Arizona. Bear Down! We’re looking forward to visiting all of the schools on our list this season.
Here's some unsolicited feedback from Kurt Kluever who ended up on his school's web page (Rochester Institute of Technology). While he was at Google this summer he tested JotSpot. Kurt's project included enhancing a Selenium Framework for testing JotSpot. He add automated tests to an existing suite and refactored code to allow the framework to be more extensible. In addition, he fixed several issues with Selenium RC that allows for better concurrent execution of tests within a browser’s object model. The scope of the project was focused on one application, but he looked through code from various other projects and applied the design patterns to the JotSpot framework.
His experience in his own words…link to RIT site pdf.