Google Testing Blog: 2007

TotT: Avoiding friend Twister in C++

Tuesday, October 30, 2007

(resuming our testing on the toilet posts...)
In a previous episode, we extracted methods to simplify testing in Python. But if these extracted methods make the most sense as private class members, how can you write your production code so it doesn't depend on your test code? In Python this is easy; but in C++, testing private members requires more friend contortions than a game of Twister®.


// my_package/dashboard.h
class Dashboard {
 private:
  scoped_ptr<Database> database_;  // instantiated in constructor

  // Declaration of functions GetResults(), GetResultsFromCache(),
  // GetResultsFromDatabase(), CountPassFail()

  friend class DashboardTest; // one friend declaration per test
                              // fixture
};

You can apply the Extract Class and Extract Interface refactorings to create a new helper class containing the implementation. Forward declare the new interface in the .h of the original class, and have the original class hold a pointer to the interface. (This is similar to the Pimpl idiom.) You can distinguish between the public API and the implementation details by separating the headers into different subdirectories (/my_package/public/ and /my_package/ in this example):


// my_package/public/dashboard.h
class ResultsLog;  // extracted helper interface
class Dashboard {
 public:
  explicit Dashboard(ResultsLog* results) : results_(results) { }
 private:
  scoped_ptr<ResultsLog> results_;
};

// my_package/results_log.h
class ResultsLog {
 public:
  // Declaration of functions GetResults(),
  // GetResultsFromCache(),
  // GetResultsFromDatabase(), CountPassFail()
};

// my_package/live_results_log.h
class LiveResultsLog : public ResultsLog {
 public:
  explicit LiveResultsLog(Database* database)
      : database_(database) { }
};

Now you can test LiveResultsLog without resorting to friend declarations. This also enables you to inject MockResultsLog instance when testing the Dashboard class. The functionality is still private to the original class, and the use of a helper class results in smaller classes with better-defined responsibilities.

5 comments

Automating tests vs. test-automation

Wednesday, October 24, 2007

Posted by Markus Clermont, Test Engineering Manager, Zurich

In the last couple of years the practice of testing has undergone more than superficial changes. We have turned our art into engineering, introduced process-models, come up with best-practices, and developed tools to support our daily work and make each test engineer more productive. Some tools target test execution. They aim to automate the repetitive steps that a tester would take to exercise functions through the user interface of a system in order to verify its functionality. I am sure you have all seen tools like Selenium, WebDriver, Eggplant or other proprietary solutions, and that you learned to love them.

On the downside, we observe problems when we employ these tools:

Scripting your manual tests this way takes far longer than just executing them manually.

The UI is one of the least stable interfaces of any system, so we can start automating quite late in the development phase.

Maintenance of the tests takes a significant amount of time.

Execution is slow, and sometimes cumbersome.

Tests become flaky.

Tests break for the wrong reasons.

Of course, we can argue that none of these problems is particularly bad, and the advantages of automation still outweigh the cost. This might well be true. We learned to accept some of these problems as 'the price of automation', whereas others are met by some common-sense workarounds:

It takes long to automate a test—Well, let's automate only tests that are important, and will be executed again and again in regression testing.

Execution might be slow, but it is still faster than manual testing.

Tests cannot break for the wrong reason—When they break we found a bug.

In the rest of this post I'd like to summarize some experiences I had when I tried to overcome these problems, not by working around them, but by eliminating their causes.

Most of these problems are rooted in the fact that we are just automating manual tests. By doing so we are not taking into account whether the added computational power, access to different interfaces, and faster execution speed should make us change the way we test systems.

Considering the fact that a system exposes different interfaces to the environment—e.g., the user-interface, an interface between front-end and back-end, an interface to a data-store, and interfaces to other systems—it is obvious that we need to look at each and every interface and test it. More than that we should not only take each interface into account but also avoid testing the functionality in too many different places.

Let me introduce the example of a store-administration system which allows you to add items to the store, see the current inventory, and remove items. One straightforward manual test case for adding an item would be to go to the 'Add' dialogue, enter a new item with quantity 1, and then go to the 'Display' dialogue to check that it is there. To automate this test case you would instrument exactly all the steps through the user-interface.

Probably most of the problems I listed above will apply. One way to avoid them in the first place would have been to figure out how this system looks inside.

Is there a database? If so, the verification should probably not be performed against the UI but against the database.

Do we need to interface with a supplier? If so, how should this interaction look?

Is the same functionality available via an API? If so, it should be tested through the API, and the UI should just be checked to interact with the API correctly.

This will probably yield a higher number of tests, some of them being much 'smaller' in their resource requirements and executing far faster than the full end-to-end tests. Applying these simple questions will allow us to:

write many more tests through the API, e.g., to cover many boundary conditions,

execute multiple threads of tests on the same machine, giving us a chance to spot race-conditions,

start earlier with testing the system, as we can test each interface when it becomes 'quasi-stable',

makes maintenance of tests and debugging easier, as the tests break closer to the source of the problem,

require fewer machine resources, and still execute in reasonable time.

I am not advocating the total absence of UI tests here. The user interface is just another interface, and so it deserves attention too. However I do think that we are currently focusing most of our testing-efforts on the UI. The common attitude, that the UI deserves most attention because it is what the user sees, is flawed. Even a perfect UI will not satisfy a user if the underlying functionality is corrupt.

Neither should we abandon our end-to-end tests. They are valuable and no system can be considered tested without them. Again, the question we need to ask ourselves is the ratio between full end-to-end tests and smaller integration tests.

Unfortunately, there is no free lunch. In order to change the style of test-automation we will also need to change our approach to testing. Successful test-automation needs to:

start early in the development cycle,

take the internal structure of the system into account,

have a feedback loop to developers to influence the system-design.

Some of these points require quite a change in the way we approach testing. They are only achievable if we work as a single team with our developers. It is crucial that there is an absolute free flow of information between the different roles in this team.

In previous projects we were able to achieve this by

removing any spatial separation between the test engineers and the development engineers. Sitting on the next desk is probably the best way to promote information exchange,

using the same tools and methods as the developers,

getting involved into daily stand-ups and design-discussions.

This helps not only in getting involved really early (there are projects where test development starts at the same time as development), but it is also a great way to give continuous feedback. Some of the items in the list call for very development-oriented test engineers, as it is easier for them to be recognized as a peer by the development teams.

To summarize, I figured out that a successful automation project needs:

to take the internal details and exposed interface of the system under test into account,

to have many fast tests for each interface (including the UI),

to verify the functionality at the lowest possible level,

to have a set of end-to-end tests,

to start at the same time as development,

to overcome traditional boundaries between development and testing (spatial, organizational and process boundaries), and

to use the same tools as the development team.

24 comments

Overview of Infrastructure Testing

Monday, October 15, 2007

Posted by Marc Kaplan, Test Engineering LeadAt Google, we have infrastructure that is shared between many projects. This infrastructure creates a situation where we have a many dependencies in terms of build requirements, but also in terms of test requirements. We've found that we actually need two approaches to deal with these requirements depending on whether we are looking to run larger system tests or smaller unittests, both of which ultimately need to be executed to improve quality.

For unittests, we are typically interested in only the module or function that is under test at the time, and we don't care as much about downstream dependencies, except insofar as they relate to the module under test. So we will typically write test mocks to mock out the downstream components that we aren't interested in actually running that simulate their behaviors and failure modes. Of course, this can only be done after understanding how the downstream module works and interfaces with our module.

As an example of mocking out a downstream component in Bigtable, we want to simulate the failure of Chubby , our external lockservice, so we we write a Chubby test mock that simulates the various ways that Chubby can interact with Bigtable. We then use this for the Bigtable unittests so that they a) run faster, b) reduce external dependencies and c) enable us to simulate various failure and retry conditions in the Bigtable Chubby related code.

There are also cases where we want to simulate components that are actually upstream to the component under test. In these cases we write what is called a test driver. This is very similar to a mock, except that instead of being called by our module (downstream) it calls our module (upstream). For example, if Bigtable component has some Mapreduce specific handling, we might want to write a test driver to simulate these Mapreduce-specific interfaces so we don't have to run the full Mapreduce framework inside our unittest framework. The benefits are all the same as those of using test mocks. In fact, in many cases it may be desirable to use both drivers and mocks, or perhaps multiple of each.

In system tests where we're more interested in the true system behaviors and timings, or in other cases where we can't write a driver or mocks we might turn to fault injection. Typically, this involves either completely failing certain components sporadically in system tests, or injecting particular faults via a fault injection layer that we write. Looking back to Bigtable again, since Bigtable uses GFS when we run system tests, we are running fault injection for GFS by failing actual masters and chunkservers sporadically, and seeing how Bigtable reacts under load to verify that when we deploy new versions of Bigtable that they it will work given the frequent rate of hardware failures. Another approach that we're currently work on is actually simulating the GFS behavior via a fault injection library so we can reduce the need to use private GFS cells which will result in better use of resources.

Overall, the use of Test Drivers, Test Mocks, and Fault Injection allows developers and test engineers at Google to test components more accurately, quickly, and above all helps improve quality.

4 comments

Testing Google Mashup Editor Class

Thursday, October 11, 2007

Posted by Patrick Copeland, Test Engineering Director

Wanted to let you know about a partnership Google Test Engineering is doing with the University of California, Irvine. We've teamed up with Professor Hadar Ziv to sponsor a course that focuses on preparing students for industry (code.google.com and several other companies are also participating). Naturally, our project focuses on testing. George Pirocanac is heading up this work and recently went down to Irvine to talk about how they will test our mash-up editor. Here's the basic project outline if you are curious.

Class Project Plan: Testing Google's Mash-up Editor

Overall Class Goal: To understand the basic software functional testing concepts through the experience of a case study of testing the Google Mash-up Editor and to provide meaningful feedback to Google about the effectiveness and usability of the tool.

Phase I - Gaining Domain expertise and Exploratory Testing (four months)
Goals: Be able to explain what a mash-up is and why it is becoming important in today's internet. Be able to code a simple mash-up using a javascript api. Be able to code that same mash-up using Google Mash-up Editor tags. Be able to outline the basic features of the Google Mash-up editor. Be able to identify the essential elements of a functional test plan. Create a functional test plan outline for the Google Mash-up editor.

Phase II - Test Plan Execution over time (Keeping in step with development) (three months)
Goals: Be able to identify the major challenges in executing a test plan during the life of a software project. Be able to identify testing technologies for dealing with these challenges. Be able to identify the effectiveness of a testing approach. Execute the test plan and provide feedback to Google.

Phase III - Usability & Competing Technologies Survey (two months)
Goals: Be able to identify the essential elements of a usability study. Apply the topic of usability to programming. Compare and contrast the GME with three other industry mash-up editors.

7 comments

Performance Testing

Monday, October 08, 2007

Posted by Goranka Bjedov, Senior Test Engineer

This post is my best shot at explaining what I do, why I do it, and why I think it is the right thing to do. Performance testing is a category of testing that seems to evoke strong feelings in people: feelings of fear (Oh, my God, I have no idea what to do because performance testing is so hard!), feelings of inadequacy (We bought this tool that does every aspect of performance testing, we paid so much for it, and we are not getting anything done!), feelings of confusion (So, what the heck am I supposed to be doing again?), and I don't think this is necessary.

Think of performance testing as another tool in your testing arsenal - something you will do when you need to. It explores several system qualities, that can be simplified to:

Speed - does the system respond quickly enough

Capacity - is the infrastructure sized adequately

Scalability - can the system grow to handle future volumes

Stability - does the system behave correctly under load

So, I do performance testing of a service when risk analysis indicates that failing in any of the above categories would be more costly to the company than performing the tests. (Which, if your name is Google and you care about your brand, happens with any service you launch.) Note that I am talking about services - I work almost exclusively with servers and spend no time worrying about client-side rendering/processing issues. While those are becoming increasingly more important, and have always been more complex than my work, I consider those to be a part of functionality tests, and they are designed, created and executed by functional testing teams.

Another interesting thing about performance testing is that you will never be able to be 100% "right" or 100% "done. Accept it, deal with it, and move on. Any system in existence today will depend on thousands of different parameters, and if I spent the time analyzing each one of them, understanding the relationships between each two or each three, graphing their impact curves, trying to non-dimensionalize them, I would still be testing my first service two years later. The thought of doing anything less filled me with horror (They cannot seriously expect me to provide meaningful performance results in less than a year, can they?) but I have since learned that I can provide at least 90% of meaningful information to my customers by applying only 10% of my total effort and time. And, 90% is more than enough for vast majority of problems.

So, here is what I really do - I create benchmarks. If I am lucky and have fantastic information about current usage patterns of a particular product (which I usually do), I will make sure this benchmark covers most operations that are top resource hogs (either per single use or cumulative). I'll run this benchmark with different loads (number of virtual users) against a loosely controlled system (it would be nice to have 100 machines all to myself for every service we have, which I can use once a day or once a week, but that would be expensive and unrealistic) and investigate its behavior. Which transactions are taking the most time? Which transactions seem to get progressively worse with increasing load? Which transactions seem unstable (I cannot explain their behavior)? I call this exploratory performance testing, and I'll repeat my tests until I am convinced I am observing real system behavior. While I am doing this, I make sure I am not getting biased by investigating the code. If I have questions, I ask programmers, but I know they are biased, and I will avoid getting biased myself!

Once I have my graphs (think, interesting transaction latencies and throughput vs. load here) I meet with the development team and discuss the findings. Usually, there is one or two things they know and have been working on, and a few more they were unaware of. Sometimes, they look over my benchmark and suggest changes (could you make the ratio 80:20, and not 50:50?) After this meeting, we create our final benchmark, I modify the performance testing scripts, and now this benchmark will run as often as possible, but hopefully at least once a night. And, here is the biggest value of this effort: if there is a code change that has impacted performance in an unacceptable way, you will find out about it the next day. Not a week or a month later (How many of us remember what we did in the last month? So, why expect our developers to do so?)

Here is why I think this is the right thing to do: I have seen more bad code developed as a result of premature performance optimizations - before the team even thought they had a problem! Please don't do that. Develop your service in a clean, maintainable and extensible manner. Let me test it, and keep regression testing it. If we find we have a problem in a particular area, we can then address that problem easily - because our code is not obfuscated with performance optimization that have improved code paths that execute once a month by 5%.

I can usually do this in two - four weeks depending on the complexity of the project. Occasionally, we will find an issue that cannot be explained or understood with performance tests. At that point in time, we look under the hood. This is where performance profiling and performance modeling come in. And, both of those are considerably more complex than performance testing. Both great tools, but should be used only when the easy tool fails.

Tools, tools, tools... So, what do we use? I gave a presentation at Google Test Automation Conference in London on exactly this topic. I use open source tools. I discuss the reasons why in the presentation. In general, even if you have decided to go one of the other two routes (vendor tools or develop your own) check out what is available. You may find out that you will get a lot of information about your service using JMeter and spending some time playing around with it. Sure, you can also spend $500K and get similar information or you can spend two years developing "the next best performance testing tool ever," but before you are certain free is not good enough, why would you want to?

Final word: monitor your services during performance tests. If you do not have service related monitoring developed and set up to be used during live operations, you do not need performance testing. If the risks of your service failing are not important enough that you would want to know about it *before* it happens, then you should not be wasting time or money on performance testing. I am incredibly lucky in this area - Google infrastructure is developed by a bunch of people who, if they had a meeting where the topic would be "How to make Goranka's life easy?", could not have done better. I love them - they make my job trivial. At a minimum, I monitor CPU, memory and I/O usage. I cannot see a case when you would want to do less, but you may want to do a lot more on occasion.

23 comments

Post Release: Closing the loop

Tuesday, October 02, 2007

Posted by Michael Bachman, Test Engineering Manager

9 comments

But it works on my machine!

Tuesday, September 25, 2007

Posted by John Thomas, Software Engineer

We thought you might be interested in another article from our internal monthly testing newsletter called CODE GREEN... Originally titled: "Opinion: But it works on my machine!"

We spent so much time hearing about "make your tests small and run fast." While this is important for quick CL verification, system level testing is important, too, and doesn't get enough air time.

You write cool features. You write lots and lots of unit tests to make sure your features work. You make sure the unit tests run as part of your project's continuous build. Yet when the QA engineer tries out a few user scenarios, she finds many defects. She logs them as bugs. You try to reproduce them, but ... you can't!

Sound familiar? It might to a lot of you who deal with complex systems that touch many other dependent systems. Want to test a simple service that just talks to a database? Simple, write a few unit tests with a mocked-out database. Want to test a service that connects to authentication to manage user accounts, talks to a risk engine, a biller, and a database? Now that's a different story!

So what are system level tests again?
System level tests to the rescue. They're also referred to as integration tests, scenario tests, and end-end tests. No matter what they're called, these tests are a vital part of any test strategy. They wait for screen responses, they punch in HTML form fields, they click on buttons and links, they verify text on the UI (sometimes in different languages and locales). Heck, sometimes they even poke open inboxes and verify email content!

But I have a gazillion unit tests and I don't need system level tests!
Sure you do. Unit tests are useful in helping you quickly decide whether your latest code changes haven't caused your existing code to regress. They are an invaluable part of the agile developers' tool kit. But when code is finally packaged and deployed, it could look and behave very differently. And no amount of unit tests can help you decide whether that awesome UI feature you designed works the way it was intended, or that one of the services your feature depended on is broken or not behaving as expected. If you think of a "testing diet," system level tests are like carbohydrates -- they are a crucial part of your diet, but only in the right amount!

System level tests provide that sense of comfort that everything works the way it should, when it lands in the customer's hands. In short, they're the closest thing to simulating your customers. And that makes them pretty darn valuable.

Wait a minute -- how stable are these tests?
Very good question. It should be pretty obvious that if you test a full blown deployment of any large, complex system you're going to run into some stability issues. Especially since large, complex systems consist of components that talk to many other components, sometimes asynchronously. And real world systems aren't perfect. Sometimes the database doesn't respond at all, sometimes the web server responds a few seconds later, and sometimes a simple confirmation message takes forever to reach an email inbox!

Automated system level tests are sensitive to such issues, and sometimes report false failures. The key is utilizing them effectively, quickly identifying and fixing false failures, and pairing them up with the right set of small, fast tests.

4 comments

Selenium's Inventor

Wednesday, September 19, 2007

Posted by Patrick Copeland, Test Engineering Director

We thought you might be interested in some articles from our internal monthly testing newsletter called CODE GREEN...

Many projects at Google have started using or are considering using Selenium. We recently interviewed Noogler (new people to Google) Jason Huggins, who is the creator and developer of Selenium, to learn about how it all started and where it's going. [Side note: Even before his first day at the Googleplex, Jason showed an amazing dedication to Google. After leaving Chicago with the big moving truck, he and his family had to stop after just a few hours because of an ice storm. Cars sliding off the road left and right. Further along, in Kansas, one of his kids caught pneumonia. His family stayed in the local hospital while Jason drove on. Heading west, there was a big snow storm in Colorado, which he wanted to avoid. He drove further south and ended up in a rare (but really real) dust storm over the Border States. He promised us some great video footage of his drive through tumbleweeds. He finally arrived and has settled in and is looking forward to calmer times in the Bay Area. After that trip, he isn't even worried about earthquakes, fires, mud slides, or traffic on the 101.]Couple of GTAC Videos with Jason: SeleniumRC, Selenium vs WebDriver
CG: Why did you invent Selenium? What was the motivation?
Huggins: Selenium was extracted from a web-based (Python + Plone!) time-and-expense (T&E) application my team and I were building for my previous employer. One of the mandates for the new T&E app was that it needed to be "fast, fast, fast." The legacy application was a client-server Lotus Notes application and wasn't scalable to the current number of offices and employees at the time. To live up to the "fast, fast, fast" design goal, we tried to improve and speed up the user experience as much as possible. For example, expense reports can get pretty long for people who travel a lot. No matter how many default rows we put in a blank expense form, people often needed to add more rows of expense items to their reports. So we added an "Add row" button to the expense report form. To make this "fast, fast, fast," I decided to use a button that triggered JavaScript to dynamically add one blank row to the form. At the time (Spring 2004), however, JavaScript was considered buggy and evil by most web developers, so I caught a lot of flak for not going with the classic approach of POSTing the form and triggering a complete page refresh with the current form data, plus one blank row.Going down the road of using JavaScript had consequences. For one, we had a really, really difficult time testing that little "Add row" button. And sadly, it broke often. One week "Add row" would be working in Mozilla (Firefox was pre-1.0), but broken in Internet Explorer. And of course, nothing was working in Safari since few developers were allowed to have Macs. ;-) The following week, we'd open a bug saying "'Add row' is broken in IE!!" The developer assigned to the issue would fix and test it in IE, but not test for regressions in Mozilla. So, "Add row" would now be broken in Mozilla, and I'd have to open a ticket saying "'Add row' is now broken in Mozilla!!!". Unlike most other corporate IT shops, we didn't have the luxury of telling all employees to use a single browser, and developers didn't want to manually test every feature in all supported browsers every time. Also, we had a very tiny budget and commercial testing tools were -- and still are -- ridiculously over-priced on a per-seat basis. The T&E project was done the "Agile Way" -- every developer does testing -- so shelling out thousands of dollars per developer for a testing tool wasn't going to happen. Never mind the fact that there were no commercial tools that did what we needed anyway!After many months of trial and error and various code workarounds, I came to the conclusion I needed a testing tool that would let me functional-test JavaScript-enhanced web user interfaces (aka "DHTML" or now "Ajax"). More specifically, I needed to test the browser UIs: Internet Explorer, Mozilla Firefox, and Safari. There were no commercial apps at the time that could do this, and the only open source option was JsUnit, which was more focused on unit testing pure JavaScript functions, rather than being a higher-level black-box/smoke-test walk through a web app. So we needed to write our own tool. Our first attempt was a Mozilla extension called "driftwood" (never released), coded by Andrew McCormick, another co-worker of mine at the time. It did make testing the "Add row" button possible, but since it was Mozilla-only, it wasn't what we needed for testing in all browsers. Paul Gross and I started over, and I started to educate myself on functional testing tools and techniques and stumbled upon Ward Cunningham's Framework for Integrated Testing (FIT). I originally set out to implement "FIT for JavaScript," but quickly realized we were drifting away from the FIT API, so Selenium became its own thing.CG: Why does the world need another test tool?
Huggins: At the time I created Selenium, had there been another testing tool that could test JavaScript UI features in all browsers on all platforms, believe me, I would have saved lots of time *not* writing my own tool.CG: What's special about it?
Huggins: Well, maybe the right question is "What's “lucky” about it? Selenium was created in a time when JavaScript was considered "bad" and generally avoided by most professional web developers. Then Google Maps hit the scene a year later, the term "Ajax" was coined, and BOOM! Overnight, JavaScript became "good." Also, Firefox started stealing market share from IE. The combination of needing to test 1) JavaScript features 2) in several browsers (not just IE) was a "right place, right time" moment for Selenium.CG: When did you realize that Selenium was a big deal? What was the tipping point?
Huggins: When I started being asked to give presentations or write about Selenium by people I didn't know. The tipping point for Selenium technically relied on two things: 1) the Selenium IDE for Firefox, written by Shinya Kasatani, which made installation and the first-hour experience tons better for new users. And 2), Selenium Remote Control (RC) created by Paul Hammant, and extended by Dan Fabulich, Nelson Sproul, and Patrick Lightbody, which let developers write their tests in Java, C#, Perl, Python, or Ruby, and not have to write all their tests in the original FIT-like HTML tables. Socially, if Google Maps or Gmail never existed and thus the whole Ajax gold rush, I wonder if JavaScript would still be considered "bad," with a similar "why bother?" attitude to testing it. CG: Have you discovered any interesting teams using Selenium in ways you'd never intended?
Huggins: At my previous company, I did see some developers write Selenium scripts to create their time and expense reports for them from YAML or XLS files. Since we hadn't exposed a back-end API, automating the browser for data entry was the next best thing. It was never designed for this purpose, but I started (ab)using it as coded bug reports. Asking users for steps on how to reproduce a bug naturally lends itself to looking like a Selenium test for that bug. Also, I've used the Selenium IDE Firefox plug-in to enter NBC's "Deal or No Deal" contest on their website from home, but I stopped doing that when I read in the fine print that the use of automation tools to enter their contest was grounds for disqualification. CG: What advice do you have to offer Google groups interested in Selenium?
Huggins: Well, one of the biggest drawbacks with user interface testing tools is that they're slow for various reasons. One way to bring the test run times down is to run them in parallel on a grid of servers, instead of sequentially. Of course, that isn't news to your average Googler. Engineers would be more likely to run automated browser UI tests if they could run 1000 tests in 1 minute total time on 1000 machines instead of 1000 tests in 1000 minutes on 1 machine. Sadly, though, most projects allocate only one machine, maybe two, to browser testing. I'm really excited to come to Google with the resources, the corporate interest, and the internal client base to make a large scale Selenium UI test farm possible. Eventually, I’d like to take Selenium in some new directions that we’ll talk about in later blog posts. But I'm getting ahead of myself. I have to survive Noogler training first.

22 comments

Testing Applications and APIs

Friday, September 14, 2007

Posted by George Pirocanac, Test Engineering Manager

Earlier Blog entries described the strategy and methodology for testing the functionality of various kinds of applications. The basic approach is to isolate the logic of the application from the external API calls that the application makes through the use of various constructs called mocks, fakes, dummy routines, etc. Depending on how the application is designed and written, this can lead to smaller, simpler tests that cover more, execute more quickly, and lead to quicker diagnosis of problems than the larger end-to-end or system tests. On the other hand, they are not a complete replacement for end-to-end testing. By their very nature, the small tests don't test the assumptions and interactions between the application and the APIs that it calls. As a result, a diversified application testing strategy includes small, medium, and large tests. (See Copeland’s GTAC Video, fast forward about 5 minutes in to hear a brief description of developer testing and small, medium, large) What about testing the APIs themselves? What if anything is different? The first approach mirrors the small test approach. Each of the API calls is exercised with a variety of inputs and the outputs that are verified according to the specification. For isolated, stateless APIs (math library functions come to mind), this can be very effective by itself. However, many APIs are not isolated or stateless, and their results can vary according to the *combinations* of calls that were made. One way to deal with this is to analyze the dependencies between the calls and create mini-applications to exercise and verify these combinations of calls. Often, this falls into the so-called typical usage patterns or user scenarios. While good, this first approach only gives us limited confidence. We also need to test what happens when not-so-typical sets of calls are made as well. Often application writers introduce usage patterns that the spec didn't anticipate. Another approach is to capture the API calls made by real applications under controlled situations and then replay only the calls under the same controlled situations. These types of tests fall into the medium category. Again, the idea is to test series and combinations of calls, but the difficulty can lie in recreating the exact environment. In addition, this approach is vulnerable to building tests that traverse the same paths in the code. Adding fuzz to the parameters and call patterns can help, but not eliminate, this problem. The third approach is to pull out the big hammer. Does it make sense to test the APIs with large applications? After all, if something goes wrong, you have to have specific knowledge about the application to triage the problem. You also have to be familiar with the techniques in testing the application. Testing a map-based application can be quite different from a calendar-based one, even if they share a common subset of APIs. The strongest case for testing APIs with large applications is compatibility testing. APIs not only have to return correct results, but they have to do it in the same manner from revision to revision. It's a sort of contract between the API writer and the application writer. When the API is private, then only a relative small number of parties have to agree on a change to the contract, but when it is public, even a small change can break a lot of applications. So when it comes to API testing, it seems we are back to small, medium, and large approaches after all. Just as in application testing where you can't completely divorce the API from the application, we cannot completely divorce the application from API testing.

3 comments

University of Arizona Tech Talk

Wednesday, September 12, 2007

Posted by Patrick Copeland, Test Engineering DirectorI visited the University of Arizona last night, along with several Googlers, as part of a series of Tech Talks being given at selected schools. The talk was about GFS (Google File System). The auditorium was standing room only, with a turn out of over 150 computer science and computer engineering students (probably enticed with the free pizza and t-shirts :^). We really appreciated the turn out and the enthusiasm for Google. The questions following the talk were great and we probably could have gone on for several hours. I had a chance to talk to a few folks afterwards about their projects ranging from security research, to traffic simulation. I also met one of the professors and discussed the potential of doing some joint research.During the trip I also visited my grandmother who lives in Tucson. I was showing her a picture of my son on a BlackBerry. She was fascinated and asked me how it worked. I started to think about how to explain it and in that moment, it humbled me to think about the number of complex systems employed to do such a simple thing. Displaying a photo from a web page includes device side operating systems, run time languages, cell technology, network stacks, cell receivers, routers, serving front and back-ends,…and more. An interesting side note: the bits for my jpg file ultimately get stored at Google in GFS, the topic of my talk that night. Obviously, each part of that chain is complex and important to get my simple scenario to work. I started to explain it in plain language and she quickly stopped me and said that when she was a child her family had a crank operated “party-line” phone, where multiple families shared a single line. What hit me was that even though technology has gotten more complex, the types of scenarios we are enabling are still very basic and human: communicating, sharing, and connecting people. Even with all of the automated testing that we do, deep testing done from the perspective of customers is still absolutely critical.
Again, thanks to the students at the University of Arizona. Bear Down! We’re looking forward to visiting all of the schools on our list this season.

2 comments

More feedback from Google Interns

Thursday, September 06, 2007

Posted by Patrick Copeland, Test Engineering DirectorHere's some unsolicited feedback from Kurt Kluever who ended up on his school's web page (Rochester Institute of Technology). While he was at Google this summer he tested JotSpot. Kurt's project included enhancing a Selenium Framework for testing JotSpot. He add automated tests to an existing suite and refactored code to allow the framework to be more extensible. In addition, he fixed several issues with Selenium RC that allows for better concurrent execution of tests within a browser’s object model. The scope of the project was focused on one application, but he looked through code from various other projects and applied the design patterns to the JotSpot framework.
His experience in his own words…link to RIT site pdf.

No comments

GTAC videos now online

Tuesday, August 28, 2007

Google Test Automation ConferenceYouTube

Google GroupGTAC Community Threadblog search for GTACPicasa Web Albums

9 comments

GTAC Community Thread

Thursday, August 23, 2007

Google Test Automation ConferenceYouTubeGoogle Group

29 comments

Abdul's Summer Intern Testimonial

Wednesday, August 15, 2007

Posted by Patrick Copeland, Test Engineering Director

With intern season coming to a close, we wanted to share a testimonial from one of the 90 interns who joined us in Test Engineering this summer. This one is from a Software Engineer in Test who worked in the NYC office on the "blog search" project...Interning as a Software Engineering Tester for Google was certainly an experience I won't forget. It started three weeks before the end of my summer break, when I received an email from a recruiter in Google's New York City office requesting an interview for a Fall internship position. I had not planned for this, but I figured I would give it a shot! Three weeks and a few interviews later, I was in New York City, apartment-shopping for my internship, which started the following week! Having never worked in test engineering before, I wasn't quite sure what to expect. All I knew was Google works fast! I hoped I could keep up.

The first couple of weeks at Google required the most adjustment. While diving into my project's code and "Noogler" (new Googler) training courses, I came to the realization that this was going to be one of the most challenging experiences of my life. In the past, I've always associated challenging experiences with stress. But this was beginning to feel quite the opposite. Everyone around me was incredibly helpful, and the facilities were so comfortable that I actually found myself enjoying the challenge to the point of not wanting to leave work. The opportunity to learn at Google is literally endless, and I couldn't get enough.

The great thing about test engineering is the ability to pick the challenges you want to attack. You get to find problems and attack code from angles that you never thought of before. The more you work and the more you test, the more you find out the RIGHT way of doing things, and just when you think you've got it perfect, you find a new way of optimizing your project's code. The product is as good as your testing proves it to be, and in a sense you become the gatekeeper for a product that is being used by millions worldwide. Having that responsibility is an amazing feeling.

There were many projects available for me to work on at Google. From a test engineer's standpoint, there is always room for improvement in any product. I started my internship working on the back-end crawling and indexing components of Blog Search. After a few weeks of studying RSS technology and looking through Blog Search's parsing and indexing approach, the testing team and I were able to formulate a formal test plan that addressed potential security vulnerabilities and potential avenues for failure in many special case scenarios. Our code analysis and test cases will be constantly revisited and extended at every code change and will be the basis of many design decisions through the life of Blog Search. As my internship continued, I had the opportunity to work on several other projects, from small, new projects to larger, more established ones. I personally preferred the newer projects. I enjoy being involved from the beginning and being able to sit side-by-side with project managers and technical leads and give feedback about issues in the latest development.

What's really fun is that every week, the entire NYC test engineering team gets together to sync up, share ideas, communicate issues, and share the latest news. This is where some of the great ideas get drawn up, and when something good hits the table, we pick it up and run with it! In my last few weeks at Google, we were able to draw a correlation between some common issues that we were all facing and identify the underlying challenges that needed to be addressed. Immediately, we started work on an automated testing framework that would solve most of our problems. When finished, it will be used not only by us, but possibly by every project team within Google.

Overall, working at Google was a great experience. I had my share of fun, and I learned a great deal and am very grateful for the experience.

3 comments

Contributing to an all-star CAST

Thursday, July 26, 2007

Conference of the Association for Software TestingQuardevData Set Analysis: Approaches to Testing when the Build is the DataThe Bionic TesterTester Exhibitiondisable the pageSlides and follow-up material from all CAST presentations are available through the CAST 2007 wiki.

1 comment

Google New York Test Engineering Forum July 25

Friday, July 13, 2007

here

3 comments

GTAC registration closed

Tuesday, June 19, 2007

a noteAll the presentations from the conference will be posted on YouTube Google Channel within a New York minute after the speaker leaves the podium (well, maybe 72 New York minutes). We look forward to seeing you at the conference.

4 comments

TotT: Extracting Methods to Simplify Testing

Wednesday, June 13, 2007

When a method is long and complex, it is harder to test. You can make it easier by extracting methods: finding pieces of code in existing, complex methods (or functions) that can be replaced with method calls (or function calls). Consider the following complicated method:


 def GetTestResults(self):
   # Check if results have been cached.
   results = cache.get('test_results', None)
   if results is None:
     # No results in the cache, so check the database.
     results = db.FetchResults(SQL_SELECT_TEST_RESULTS)
   # Count passing and failing tests.
   num_passing = len([r for r in results if r['outcome'] == 'pass'])
   num_failing = len(results) - num_passing
   return num_passing, num_failing

This method is difficult to test because it not only relies on a database, but also on a cache. In addition, it performs some post processing of the retrieved results. The first hint that this method could use refactoring is the abundance of comments. Extracting sections of code into well-named methods reduces the original method's complexity. When complexity is reduced, comments often become unnecessary. For example, consider the following:


 def GetTestResults(self):
   results = self._GetTestResultsFromCache()
   if results is None:
     results = self._GetTestResultsFromDatabase()
   return self._CountPassFail(results)

 def _GetTestResultsFromCache(self):
   return cache.get('test_results', None)

 def _GetTestResultsFromDatabase(self):
   return db.FetchResults(SQL_SELECT_TEST_RESULTS)

 def _CountPassFail(self, results):
   num_passing = len([r for r in results if r['outcome'] == 'pass'])
   num_failing = len(results) - num_passing
   return num_passing, num_failing

Now, tests can focus on each individual piece of the original method by testing each extracted method. This has the added benefit of making the code more readable and easier to maintain.(Note: Method extraction can be done for you automatically in Python by the open-source refactoring browser BicycleRepairMan, and in Java by several IDEs, including IntelliJ IDEA and Eclipse.)

2 comments

Sign up to attend our Test Automation Conference

Wednesday, May 23, 2007

posted by Allen Hutchison, Engineering Manager

A few months ago we announced that we would be holding a Google Test Automation Conference in New York City on August 23-24. Today, I'm happy to tell you that we've finalized the speaker list and have opened the attendee registration process.

The conference is completely free; however, because we have room for only 150 people, everyone who wishes to attend must apply and be accepted. You can apply to attend via this link, and we'll let you know by June 15 if your application has been accepted. At last year's conference, we found that the informal hallway discussions were tremendously valuable to everyone who came, and we would like to continue that custom. Therefore, the main point in the application process is to tell us how you can contribute to the conversation and what you would like to learn from the conference. To get you started, here is what our speakers will be talking about:

Apple Chow and Santiago Etchebehere -- Building a flexible and extensible automation framework around Selenium

Ali-Akber Saifee -- muvee Framework for Autonomous Testing

Olivier Warin -- Spirent's WorkSuite Manager

Douglas Sellers -- CustomInk Domain Specific Language for automating an AJAX based application

Julian Harty -- Mobile Wireless Test Automation

Matt Heusser and Sean McMillan -- Examining Interaction-Based testing

Risto Kumpulainen -- Automated testing for F-Secure's Linux/UNIX Anti-Virus products

Sergio Pinon -- User Interface Functional Testing with AFTER (Automated Functional Testing Engine in Ruby)

Vivek Prahlad -- Functional Testing Swing applications with Frankenstein

Simon Stewart -- Web Driver

Adam Porter and Atif Memon -- Skoll distributed continuous quality assurance system

Cedric Beust -- TestNG

Hadar Ziv -- Specification-based Testing

14 comments

8-year-old exploratory testing

Wednesday, May 02, 2007

this postdoodlebyteI had played with UCBLogo for two weeks and hadn’t made it crash once. Brian brought the whole thing down in three commands. The most telling part is that when I tried to reproduce the defect a week later I couldn’t. I issued rt with a ton of 9s and just couldn’t get it to break. As it turns, it only crashes when you omit the space, which of course I didn’t think of doing. It took me more time to reproduce the defect than it took Brian to discover it.
exploratory testing

5 comments

TotT: Refactoring Tests in the Red

Thursday, April 26, 2007

With a good set of tests in place, refactoring code is much easier, as you can quickly gain a lot of confidence by running the tests again and making sure the code still passes.As suites of tests grow, it's common to see duplication emerge. Like any code, tests should ideally be kept in a state that's easy to understand and maintain. So, you'll want to refactor your tests, too.However, refactoring tests can be hard because you don't have tests for the tests.How do you know that your refactoring of the tests was safe and you didn't accidentally remove one of the assertions?If you intentionally break the code under test, the failing test can show you that your assertions are still working. For example, if you were refactoring methods in CombineHarvesterTest, you would alter CombineHarvester, making it return the wrong results.Check that the reason the tests are failing is because the assertions are failing as you'd expect them to. You can then (carefully) refactor the failing tests. If at any step they start passing, it immediately lets you know that the test is broken – undo! When you're done, remember to fix the code under test and make sure the tests pass again.
(revert is your friend, but don't revert the tests!)Let's repeat that important point:When you're done...remember to fix the code under test!Summary

Refactor production code with the tests passing. This helps you determine that the production code still does what it is meant to.

Refactor test code with the tests failing. This helps you determine that the test code still does what it is meant to.

5 comments

Test Engineering in Zurich

Wednesday, April 18, 2007

Zurich, Switzerland is the location of one of Google's largest engineering offices in Europe. We like to say it is no longer necessary to live in Silicon Valley to develop great software for Google, and the Zurich office has over 100 software engineers working on Google products and infrastructure. As a result the Zurich Test Engineering team is kept very busy. Producing great software is the driving motivation for each member of the Test Engineering team, which is relatively small by industry standards. But we're growing quickly and are passionate about software development, software quality, and testing. We work on several projects: some are customer-facing and some are infrastructure. We currently work on Google Maps, which has testers and developers in several of our offices around the world. Our team builds tools to check the consistency of the data feeds that support the local search features in Maps. Another project mainly developed in Zurich is Google Transit, which provides public transportation itineraries and schedule information for commuters who take trains, buses, and trams. On this project, we build tools to verify the proper alignment of the transportation layer of the map with the actual location coordinates of transit stops. We also focus on many projects related to Google’s infrastructure. For example, with Google Base API, we work with the Software Engineering team to measure response time and to track bottlenecks during large-scale bulk updates. Our aim is to assign local test engineers to most projects developed in this office, so hiring for this team is always a priority. Candidates are from all over the world, and many different nationalities are represented in our office. Adapting to Zurich is quite easy, because it is already an international place: many companies have their headquarters here, and Zurich has been named the best city in the world for its quality of life in 2005, 2006, and 2007.

3 comments

Test Engineering around the world

Wednesday, April 18, 2007

Hutchison

1 comment

GTAC Speaker Submission Window Closed

Tuesday, April 17, 2007

Allen Hutchison,Google Test Automation ConferencethenthenGoogle Testing Blog

1 comment

Test Engineering Internships

Monday, April 16, 2007

Global - Google has engineering offices (and, of course, customers) all over the world. So not only do we have engineers from everywhere, every engineer gets the chance to make software that will be as great in Singapore as it is in Finland.

Comfortable - Engineers need to be comfortable to be effective...and Google gives us the equipment to be comfortable. And if our muscles get tight from sitting, we can get an on-site professional massage.

Flexible - Getting work done is more important than what time we work.

Unhampered - We're a company designed by engineers for engineers, so functional merit tends to outweigh other considerations at decision-making time.

Entertained - We have lots of fun activities. Every winter, there's a company-wide ski trip. Every summer, there's an all-engineer picnic on the Santa Cruz beach boardwalk. Google has bought out theaters for opening days of films like Star Wars and Lord of the Rings.

Listened to - Google has a very open environment and truly values each employee's opinions. Every Friday, there's a company-wide meeting where Larry and Sergey (our founders) report on significant events and have an open-to-all Q&A session.

Stuffed - In most offices, Google provides free gourmet-quality breakfast, lunch, and dinner. If you come in too late to get breakfast (or you get hungry between meals), you can always head to the nearby mini-kitchen and grab some fruit, cereal, and tea (or crisps, chocolate, and espresso for the less healthily inclined).

Cutting-edge - We have the audacious goal of organizing the world's information -- and to meet it, we've built the largest distributed computer system in the world and written some of the world's most widely-used software.

High-impact - People in every country and every language use our products.

Dependable - We try to keep a high bar for hiring, so you can depend on the ability of your teammates to handle most any engineering problem.

and this

3 comments

TotT: Stubs Speed up Your Unit Tests

Wednesday, April 04, 2007

Michael Feathers defines the qualities of a good unit test as: “they run fast, they help us localize problems.” This can be hard to accomplish when your code accesses a database, hits another server, is time-dependent, etc.By substituting custom objects for some of your module's dependencies, you can thoroughly test your code, increase your coverage, and still run in less than a second. You can even simulate rare scenarios like database failures and test your error handling code.A variety of different terms are used to refer to these “custom objects”. In an effort to clarify the vocabulary, Gerard Meszaros provides the following definitions:

Test Double is a generic term for any test object that replaces a production object.

Dummy objects are passed around but not actually used. They are usually fillers for parameter lists.

Fakes have working implementations, but take some shortcut (e.g., InMemoryDatabase).

Stubs provide canned answers to calls made during a test.

Mocks have expectations which form a specification of the calls they do and do not receive.

For example, to test a simple method like getIdPrefix() in the IdGetter class:


public class IdGetter {  // Constructor omitted.
  public String getIdPrefix() {
    try {
      String s = db.selectString("select id from foo");
      return s.substring(0, 5);
    } catch (SQLException e) { return ""; }
  }
}

You could write:


  db.execute("create table foo (id varchar(40))");  // db created in setUp().
  db.execute("insert into foo (id) values ('hello world!')");
  IdGetter getter = new IdGetter(db);
  assertEquals("hello", getter.getIdPrefix());

The test above works but takes a relatively long time to run (network access), can be unreliable (db machine might be down), and makes it hard to test for errors. You can avoid these pitfalls by using stubs:


  public class StubDbThatReturnsId extends Database {
    public String selectString(String query) { return "hello world"; }
  }
  public class StubDbThatFails extends Database {
    public String selectString(String query) throws SQLException {
      throw new SQLException("Fake DB failure");
    }
  }
  public void testReturnsFirstFiveCharsOfId() throws Exception {
    IdGetter getter = new IdGetter(new StubDbThatReturnsId());
    assertEquals("hello", getter.getIdPrefix());
  }
  public void testReturnsEmptyStringIfIdNotFound() throws Exception {
    IdGetter getter = new IdGetter(new StubDbThatFails());
    assertEquals("", getter.getIdPrefix());
  }

5 comments

Google Test Automation Conference submission deadline coming soon

Tuesday, April 03, 2007

Allen Hutchison,we announced the Google Test Automation Conference heregtac-submission@google.comApril 6

3 comments

TotT: JavaScript: Simulating Time in jsUnit Tests

Thursday, March 29, 2007

setTimeout()jsUnitClock.tick()


function showProgress(status) {
  status.message = "Loading";
  for (var time = 1000; time <= 3000; time += 1000) {
    // Append a '.' to the message every second for 3 secs.
    setTimeout(function() {
      status.message += ".";
    }, time);
  }
  setTimeout(function() {
    // Special case for the 4th second.
    status.message = "Done";
  }, 4000);
}


function testUpdatesStatusMessageOverFourSeconds() {
  Clock.reset(); // Clear any existing timeout functions on the event queue.
  var  status = {};
  showProgress(status); // Call our function.
  assertEquals("Loading", status.message);
  Clock.tick(2000); // Call any functions on the event queue that have
                    // been scheduled for the first two seconds.
  assertEquals("Loading..",  status.message);
  Clock.tick(2000); // Same thing again, for the next two seconds.
  assertEquals("Done", status.message);
}

ClocksetTimeout()setInterval()clearTimeout()clearInterval()jsUnitMockTimeout.jsjsUnitCore.js

1 comment

Robot testers invade Portland

Thursday, March 15, 2007

Harry Robinson,Software testing is tough. It can be exhausting, and there is rarely enough time to find all the important bugs. Wouldn't it be nice to have a staff of tireless servants working day and night to make you look good? Well, those days are here. On Thursday, March 22, I'll give a lunchtime presentation titled "How to Build Your Own Robot Army" for the Quality Assurance SIG of the Software Association of Oregon.
Two decades ago, machine time was expensive, so test suites had to run as quickly and efficiently as possible. Today, CPUs are cheap, so it becomes reasonable to move test creation to the shoulders of a test machine army. But we're not talking about the run-of-the-mill automated scripts that only do what you explicitly told them. We're talking about programs that create and execute tests you never thought of to find bugs you never dreamed of. From Orcs to Zergs to Droids to Cyborgs, this presentation will show how to create a robot test army using tools lying around on the Web. Most importantly, it will cover how to take appropriate credit for your army's work!

8 comments

Developer-Tester/Tester-Developer Summit

Monday, March 12, 2007

Harry Robinson,

The first-ever industry Developer-Tester/Tester-Developer Summit was held at the Mountain View Googleplex on Saturday, February 24th. Hosted by Elisabeth Hendrickson and Chris McMahon, the all-day workshop consisted of experience reports and lightning talks including:

Al Snow - Form Letter Generator Technique

Chris McMahon – Emulating User Actions in Random and Deterministic Modes

Dave Liebreich – Test Mozilla

David Martinez – Tk-Acceptance

Dave W. Smith – System Effects of Slow Tests

Harry Robinson – Exploratory Automation

Jason Reid – Not Trusting Your Developers

Jeff Brown – MBUnit

Jeff Fry – Generating Methods on the Fly

Keith Ray – ckr_spec

Kurman Karabukaev – Whitebox testing using Watir

Mark Striebeck – How to Get Developers and Tester to Work Closer Together

Sergio Pinon – UI testing + Cruise Control

There were also brainstorming exercises and discussions on the benefits that DT/TDs can bring to organizations and the challenges they face. Several participants have blogged about the Summit. The discussions continue at http://groups.google.com/group/td-dt-discuss.

If you spend your days coding and testing, try this opening exercise from the Summit. Imagine that:

T – – – – – – – – – – D
is a spectrum that has "Tester" at one end and "Developer" at the other. Where would you put yourself, and why?

2 comments

Testing Blog

TotT: Avoiding friend Twister in C++

Automating tests vs. test-automation

Overview of Infrastructure Testing

Testing Google Mashup Editor Class

Performance Testing

Post Release: Closing the loop

But it works on my machine!

Selenium's Inventor

Testing Applications and APIs

University of Arizona Tech Talk

More feedback from Google Interns

GTAC videos now online

GTAC Community Thread

Abdul's Summer Intern Testimonial

Contributing to an all-star CAST

Google New York Test Engineering Forum July 25

GTAC registration closed

TotT: Extracting Methods to Simplify Testing

Sign up to attend our Test Automation Conference

8-year-old exploratory testing

TotT: Refactoring Tests in the Red

Test Engineering in Zurich

Test Engineering around the world

GTAC Speaker Submission Window Closed

Test Engineering Internships

TotT: Stubs Speed up Your Unit Tests

Google Test Automation Conference submission deadline coming soon

TotT: JavaScript: Simulating Time in jsUnit Tests

Robot testers invade Portland

Developer-Tester/Tester-Developer Summit

Labels

Archive

Feed