Google Testing Blog: March 2008

TotT: TestNG on the Toilet

Thursday, March 20, 2008

Recently, somewhere in the Caribbean Sea, you implemented the PirateShip class. You want to test the cannons thoroughly in preparation for a clash with the East India Company. This requires that you run the crucial testFireCannonDepletesAmmunition() method many times with many different inputs.

TestNG is a test framework for Java unit tests that offers additional power and ease of use over JUnit. Some of TestNG's features will help you to write your PirateShip tests in such a way that you'll be well prepared to take on the Admiral. First is the @DataProvider annotation, which allows you to add parameters to a test method and provide argument values to it from a data provider.

public class PirateShipTest {
  @Test(dataProvider = "cannons")
  public void testFireCannonDepletesAmmunition(int ballsToLoad,
         int ballsToFire,
         int expectedRemaining) {
    PirateShip ship = new PirateShip("The Black Pearl");
    ship.loadCannons(ballsToLoad);
    for (int i = 0; i < ballsToFire; i++) {
      ship.fireCannon();
    }
    assertEquals(ship.getBallsRemaining(), expectedRemaining);
  }
  @DataProvider(name = "cannons")
  public Object[][] getShipSidesAndAmmunition() {
    // Each 1-D array represents a single execution of a @Test that
    // refers to this provider. The elements in the array represent
    // parameters to the test call.
    return new Object[] {
      {5, 1, 4}, {5, 5, 0}, {5, 0, 5}
    };
  }
}

Now let's focus on making the entire test suite run faster. An old, experienced pirate draws your attention to TestNG's capacity for running tests in parallel. You can do this in the definition of your test suite (described in an XML file) with the parallel and thread-count attributes.

A great pirate will realize that this parallelization can also help to expose race conditions in the methods under test.
Now you have confidence that your cannons fired in parallel will work correctly. But you didn't get to be a Captain by slacking off! You know that it's also important for your code to fail as expected. For this, TestNG offers the ability to specify those exceptions (and only those exceptions) that you expect your code to throw.

@Test(expectedExceptions = { NoAmmunitionException.class })
public void testFireCannonEmptyThrowsNoAmmunitionException() {
PirateShip ship = new PirateShip("The Black Pearl");
ship.fireCannon();
}

4 comments

Watching movies to find localization bugs

Wednesday, March 12, 2008

By Sharon Zhou, Kirkland Client Test Lead

In December, Google Pack shipped 10 new languages in 10 new countries/regions including China Pack. This was in addition to the 30 languages Pack was all ready available in. Localization testing for these 10 languages is not trivial. The testing needs to be done very quickly by experts in the language who may not have seen the application before. Localization testing (LQA) can also be costly since it requires multiple external vendors, and the LQA schedule is highly sensitive to changes in the product schedule.

The process that has been followed so far has been to have each product documented by an engineer detailing the workflow to navigate to each area of the UI, the appropriate inputs to be entered at each step, and what should be expected. The documentation time is considerable, and changes with product changes. The vendors must each consume this documentation and become functional at using the product in order to navigate through the product. There are also challenges to get vendors the appropriate permissions to access our unreleased products, and to download them at the site where they work.

To minimize the test cost, the Pack test team has implemented a significant amount of automation across the entire product driving the UI. One feature of the automation harness is the ability to record movies. For the new 10 languages, the Pack team tried a new process of using the automation to drive the UI, recording movies of the product UI, and sending these movies to our vendors along with a top level test plan. To evaluate the new approach, we also asked them to fill out a survey to have a quantitative concept of how much time we can save, and hence how much cost we can reduce.

The survey results come back very positive and encouraging. We received valuable feedback on what vendors need to conduct a fast and efficient test pass. Overall, this experiment saved the vendors an estimated 25% of their time overall. It was just as effective as the previous process, but was much simpler for them to complete. Our next steps will be to drive more of the UI. If the automation can touch every page, link and dialog, it can replace the traditional LQA testing method of installing and running build as people perform functional testing. Read More

3 comments

Cost-Benefit Analysis of a Test

Wednesday, March 12, 2008

Posted by Antoine Picard

We have become test hoarders. Our focus on test-driven development, developer testing and other testing practices has allowed us to accumulate a large collection of tests of various types and sizes. Although this is valiant and beneficial, it is too easy to forget that each test, whether a unit test or a manual test has a cost as well. This cost should be balanced against the benefits of the test when deciding whether a test should be deleted or whether it should be written in the first place.

Let's start with the benefits of a test. It is all too easy to think that the benefits of a test are to increase coverage or to satisfy an artificial policy set by the Test Certified program. Not so. Although it is difficult to measure for an individual test, its benefits are the number of bugs that it kept from reaching production.

There are side benefits to well-tested code as well such as enforcing good design practices such as decomposition, encapsulation, etc. but these are secondary to avoiding bugs.

Short examples of highly-beneficial tests are hard to come by, however counter-examples abound. The following examples have been anonymized but were found at various time in our code tree. Take this test:

def testMyModule(self):
mymodule.main()

Although it probably creates a lot of coverage in mymodule, this test will only fail if main throws an exception. Certainly this is a useful condition to detect but it is wasteful to consume the time of a full run of mymodule without verifying its output. Let's look at another low-value test:

def testFooInitialization(self):
try:
foo = Foo()
self.assertEquals(foo.name, 'foo')
self.assertEquals(foo.bar, 'bak')
except:
pass

This one probably hits the bottom of the value scale for a test. Although it exercises Foo's constructor, catching all exceptions means that the test will never fail. It creates coverage but never catches any bugs.

A fellow bottom-dweller of the value scale is the test that doesn't get fixed. This can be a broken test in a continuous build or a manual test that generates a bug that stays open: either way it's a waste of time. If it's an automated test, it is worth deleting and was probably not worth writing in the first place. If it's a manual test, it's probably a sign that QA is not testing what the PM cares about. Some even apply the broken-window principle to these tests saying that tests that don't get fixed give the impression that testing is not valuable.

Our final specimen is slightly higher value but still not very high. Consider the function Bar:

def Bar():
SlowBarHelper1()
SlowBarHelper2()
SlowBarHelper3()

We could employ stubs or mocks to write a quick unit test of Bar but all we could assert is that the three helpers got called in the right order. Hardly a very insightful test. In non-compiled languages, this kind of test does serve as a substitute syntax-checker but provides little value beyond that.

Let's now turn our attention to the dark side of testing: its cost. The budget of a whole testing team is easy to understand but what about the cost of an individual test?

The first such cost is the one-time cost creating the test. Whether it is the time it takes to write down the steps to reproduce a manual test or the time it takes to code an automated test it is mostly dependent on the testability of the system or the code. Keeping this cost down is an essential part of test-driven development: think about your tests before you start coding.

While the creation of a test has a significant cost, it can be dwarfed by the incremental cost of running it. This is the most common objection to manual testing since the salary of the tester must be paid with every run of the test but it applies to automated tests too. An automated test uses a machine while it's running, that machine and it's maintenance both have a cost. If a test requires specialized hardware to run, those costs go up. Similarly, adding a test that takes 20 minutes to run will consume 20 minutes of the time of each engineer that tries to run it, every time s/he tries to run it! If it's a test that's run before each check-in the cost of that test will go up rapidly. It could be worth the engineering time to reduce its run time to a more reasonable level.

There is one more incremental cost to a test: the cost of its failure. Whenever a test fails, time is spent to diagnose the failure. The reduction of this cost is the reason behind two key principles of good testing:
- don't write flaky tests: flaky tests waste time by making us investigate failures that are not really there
- write self-diagnosing tests: a test should make it clear what went wrong when it fails to allow us to rapidly move towards a fix

The 'economics' of testing can be used to analyze various testing methodologies. For example, true unit tests (small, isolated tests) take one approach to this problem: they minimize the repeated costs (by being cheap to run and easy to diagnose) while incurring a slightly higher creation cost (have to mock/stub, refactor, ...) and slightly lesser benefits (confidence about a small pieces of the system as opposed to the overall system). By contrast, regression tests tend to incur a greater cost (since most regression tests are large tests) but attempt to maximize their benefits by targeting areas of previous failures under the assumption that those are most likely to have bugs in the future.

So think about both the benefits and the costs of each test that you write. Weigh the one-time costs against the repeated costs that you and your team will incur and make sure that you get the benefits that you want at the least possible cost.

2 comments

TotT: Understanding Your Coverage Data

Thursday, March 06, 2008

(also called test coverage) measures which lines of source code have been executed by tests. A ...Read More

Code coverage (also called test coverage) measures which lines of source code have been executed by tests. A common misunderstanding of code coverage data is the belief that:

My source code has a high percentage of code coverage; therefore, my code is well-tested.

The above statement is FALSE! High coverage is a necessary, but not sufficient, condition.

Well-tested code =======> High coverage
Well-tested code <===X=== High coverage

The most common type of coverage data collected is statement coverage. It is the least expensive type to collect, and the most intuitive. Statement coverage measures whether a particular line of code is ever reached by tests. Statement coverage does not measure the percentage of unique execution paths exercised.

Limitations of statement coverage:

It does not take into account all the possible data inputs. Consider this code:

int a = b / c;

This could be covered with b = 18 and c = 6, but never tested where c = 0.

Some tools do not provide fractional coverage. For instance, in the following code, when condition a is true, the code already has 100% coverage. Condition b is not evaluated.

if (a || b) {
// do something
}

Coverage analysis can only tell you how the code that exists has been exercised. It cannot tell you how code that ought to exist would have been executed. Consider the following:

error_code = FunctionCall();
// returns kFatalError, kRecoverableError, or kSuccess
if (error_code == kFatalError) {
// handle fatal error, exit
} else {
// assume call succeeded
}

This code is only handling two out of the three possible return values (a bug!). It is missing code to do error recovery when kRecoverableError is returned. With tests that generate only the values kFatalError and kSuccess, you will see 100% coverage. The test case for kRecoverableError does not increase coverage, and appears “redundant” for coverage purposes, but it exposes the bug!

So the correct way to do coverage analysis is:

Make your tests as comprehensive as you can, without coverage in mind. This means writing as many test case as are necessary, not just the minimum set of test cases to achieve maximum coverage.

Check coverage results from your tests. Find code that's missed in your testing. Also look for unexpected coverage patterns, which usually indicate bugs.

Add additional test cases to address the missed cases you found in step 2.

Repeat step 2-3 until it's no longer cost effective. If it is too difficult to test some of the corner cases, you may want to consider refactoring to improve testability.

Reference for this episode: How to Misuse Code Coverage by Brian Marick from Testing Foundations.

7 comments

Testing Blog

TotT: TestNG on the Toilet

Watching movies to find localization bugs

Cost-Benefit Analysis of a Test

TotT: Understanding Your Coverage Data

Labels

Archive

Feed