I want to edit my JavaScript code in my favorite IDE, and when I hit Ctrl-S, I want all of the tests to execute across all browsers and return the results immediately.
Part 1 and Part 2 of this series provided how-tos and usefulness tips for creating acceptance tests for Web apps. This final post reflects on some of the broader topics for our acceptance tests.
Aims and drivers of our tests
In my experience and that of my colleagues, there are drivers and aims for acceptance tests. They should act as ‘safety rails’, by analogy similar to crash barriers at the sides of roads, that keep us from straying too far from the right direction. Our tests need to ensure development doesn’t break essential functionality. The tests must also provide early warning, preferably minutes after relevant changes have been made to the code.
My advice for developing acceptance tests for Web applications: start simple, keep them simple, and find ways to build and establish trust in your automation code. One of the maxims I use when assessing the value of a test is to think of ways to fool my test into giving erroneous results. Then I decide whether the test is good enough or whether we need to add safeguards to the test code to make it harder to fool. I’m pragmatic and realise that all my tests are imperfect; I prefer to make tests ‘good enough’ to be useful where essential preconditions are embedded into the test. Preconditions should include checking for things that invalidate assumptions for that test (for example, the logged-in account is assumed to have administrative rights) and checking for the appropriate system state (for example, to confirm the user is starting from the correct homepage and has several items in the shopping basket).
The value of the tests, and their ability to act as safety rails, is directly related to how often failing tests are a "false positive." Too many false positives, and a team loses trust in their acceptance tests entirely.
Acceptance tests aren’t a ‘silver bullet.’ They don’t solve all our problems or provide complete confidence in the system being tested (real life usage generates plenty of humbling experiences). They should be backed up by comprehensive automated unit tests and tests for quality attributes such as performance and security. Typically, unit tests should comprise 70% of our functional tests, integration tests 20%, and acceptance tests the remaining 10%.
We need to be able to justify the benefits of the automated tests and understand both the return on investment (ROI) and Opportunity Cost – the time we spend on creating the automated tests is not available to do other things, so we need to ask whether we could spend our time better. Here, the intent is to consider the effects and costs rather than provide detailed calculations; I typically spend a few minutes thinking about these factors as I’m deciding whether to create or modify an automated test. As code spends the vast majority of time in maintenance mode, living on for a long time after active work has ceased, I recommend assessing most costs and benefits over the life of the software. However, opportunity cost must be considered within the period I’m actively working on the project, as that’s all the time I have available.
Unlike testing of traditional web sites, where the contents tend not to change once they have been loaded, tests for web applications need to cope with highly dynamic contents that may change several times a second, sometimes in hard-to-predict ways, caused by factors outside our control.
As web applications are highly dynamic, the tests need to detect relevant changes, wait until the desired behaviour has occurred, and interrogate the application state before the system state changes again. There is a window of opportunity for each test where the system is in an appropriate state to query. The changes can be triggered by many sources, including input, such as a test script clicking a button; clock based, such as a calendar reminder is displayed for 1 minute; and server initiated changes, such as when a new chat message is received.
The tests can simply poll the application, trying to detect relevant changes or timeouts. If the test only looks for expected behaviour, it might spend a long time waiting in the event of problems. We can improve the speed and reliability of the tests by checking for problems, such as error messages.
Browser-based UI tests are relatively heavy-weight, particularly if each test has to start from a clean state, such as the login screen. Individual tests can take seconds to execute. While this is much faster than a human could execute a test, it’s much slower than a unit test (which takes milliseconds). There is a trade-off between optimizing tests by reducing the preliminary steps (such as bypassing the need to log in by using an authentication cookie) and maintaining the independence of the tests – the system or the browser may be affected by earlier tests. Fast tests make for happier developers, unless the test results prove to be erroneous.
As with other software, automated tests need ongoing nurturing to retain their utility, especially when the application code is changed. If each test contains information on how to obtain information, such as an xpath expression to get the count of unread email, then a change to the UI can affect many tests and require each of those tests to be changed and retested. By applying good software design practices, we can encapsulate the ‘how’ from the rest of our tests. That way, if the application changes, we only need to change how we get the email count in one piece of code, instead of having to change it in every piece of code.
Practical tests
Lots of bugs are discovered by means other than automated testing – they might be reported by users, for example. Once these bugs are fixed, the fixes must be tested. The tests must establish whether the problem has been fixed and, where practical, show that the root cause has been addressed. Since we want to make sure the bug doesn’t resurface unnoticed in future releases, having automated tests for the bug seems sensible. Create the acceptance tests first, and make sure they expose the problem; then fix the bug and run the tests again to ensure the fix works. Antony Marcano is one of the pioneers of acceptance tests for bugs.
Although this article focuses on acceptance tests, I’d like to encourage you to consider creating smaller tests when practical. Smaller tests are more focused, run significantly faster, and are more likely to be run sooner and more often. We sweep through our acceptance tests from time to time and replace as many as we can with small or medium tests. The remaining acceptance tests are more likely to be maintained because we know they’re essential, and the overall execution time is reduced – keeping everyone happy!
Further information
A useful tutorial on xpathhttp://www.zvon.org/xxl/XPathTutorial/General/examples.html
Google Test Automation Conference (GTAC) 2008: The value of small testshttp://www.youtube.com/watch?v=MpG2i_6nkUg
GTAC 2008: Taming the Beast - How to Test an AJAX Applicationhttp://www.youtube.com/watch?v=5jjrTBFZWgk
Part 1 of this article contains an additional long list of excellent resources.
Part 1 of this series provided practical how-tos to create acceptance tests. Read on to learn how to make your tests more useful.
Once we have some automated acceptance tests, they must be run, without delay, as often as appropriate to answer the concerns of the team. We may want to run a subset of the tests after each change of the code. The process of running tests can be automated and integrated with a source control system that continuously builds the code and runs various automated tests. If the acceptance tests are sufficiently fast and can run unattended, they should be included in the tests run by the continuous build system. One challenge for our acceptance tests at Google is to enable the web browser to run without appearing on screen; our machines don’t typically have a physical or logical GUI. Utilities such as vnc and xvfb can host the web browser and enable us to run our acceptance tests. A useful guide on test automation is the book Pragmatic Project Automation by Mike Clark.
Fluid writing of test automation code smooths over obstacles, coping with the interface twixt web application and your code. When the application has been designed with testing in mind, hooks exist; keyboard shortcuts are proffered; and debug data is available for the asking. Hooks include IDs on key elements such as the search field, enabling tests to identify the correct element quickly, unambiguously, and correctly even as the layout of the UI changes.
Tests that repeat exactly the same steps using the same parameters tread a well-worn path through the application and may side-step some nearby bugs which we could find by changing a couple of parameters in the tests. Ways to change the tests include using external data sources and using random values for number of repetitions, sleeps, number of items to order, etc. You need to be able to distinguish between tests that fail because they are flaky and those that report valid failures in the software being tested, so make sure the tests record the parameters they used in sufficient detail to enable the test to be re-run consistently and predictably.
Browsers differ between one provider and another and between versions. Your application may be trouble-free in one browser, yet entirely unusable in another. Make sure your automated tests execute in each of the major browsers used by your users; our list typically includes Internet Explorer, Firefox, Safari, Opera, and Chrome. Tools such as Selenium RC (http://seleniumhq.org/) and WebDriver (http://code.google.com/p/webdriver/) support most of the browsers, and if your tests are designed to run in parallel, you may be able to take advantage of parallel test execution frameworks such as Selenium Grid.
Many web applications are now used on mobile phones such as the iPhone or G1. While there are some early versions of WebDriver for these devices, you may find emulating these devices in a desktop browser is sufficient to give you the confidence you need. Firefox’s excellent extensions and profiles make such testing easy to implement. Safari’s development tools can be used to specify the parameters you need, such as which device to emulate. Here’s an example of how to configure Firefox in WebDriver to emulate a version 1.1 iPhone.
private static final String IPHONE_USER_AGENT_V1_1 = "Mozilla/5.0 (iPhone; U; CPU like Mac OS X; en) AppleWebKit/420.1 " + "(KHTML; like Gecko) Version/3.0 Mobile/3B48b Safari/419.3"; /** * Returns the WebDriver instance with settings to emulate * an iPhone V1.1 */ public static WebDriver createWebDriverForIPhoneV1_1() { final String emptyString = ""; FirefoxProfile profile = new FirefoxProfile(); // Blank out headers that would otherwise confuse the web server. profile.setPreference("general.appversion.override", ""); profile.setPreference("general.description.override", ""); profile.setPreference("general.platform.override", ""); profile.setPreference("general.vendor.override",""); profile.setPreference("general.vendorsub.override",""); profile.setPreference("general.appname.override", "iPhone"); profile.setPreference( "general.useragent.override", IPHONE_USER_AGENT_V1_1); WebDriver webDriver = new FirefoxDriver(profile); return webDriver; }
The user-agent string can be found online in many cases or captured from a tame web server that records the HTTP headers. I use http://www.pycopia.net/webtools/headers, which even emails the values to me in a format I can easily adapt to use in my test code.
Robust tests can continue to operate correctly even when things change in the application being tested or in the environment. Web applications use HTML, so try to add IDs and CSS classes to relevant elements of the application. Although these additions potentially increase the size of the page, they enable easier and more consistent identification, navigation, and selection of the user interface.
Try to avoid brittle identifiers, such as xpath expressions that rely on positional data. For example, /div[3]/div[1] becomes unreliable as soon as any of the positions change – and problems may be hard to identify unless the change is easy to identify.
Add guard conditions that assert your assumptions are still accurate. Design the tests to fail if any of the assumptions prove false. If possible, make the tests fail at compile time to provide the earliest possible feedback.
Try to only make positive assertions. For example, if you expect an action to cause an item to be added to a list, assert that after the action the list contains the expected value, not that the list has changed size (because other functionality may affect the size). Also, if it's not something your test is concerned about, don't make assertions about it.
Help your tests to help others by being informative. Use a combination of meaningful error messages and more detailed logs to help people to tell whether the tests are working as intended and, if problems occur, to figure out what’s going wrong.
Taking screenshots of the UI when a problem occurs can help to debug the issue and disambiguate between mismatches in our assumptions vs. problems in the application. It’s not an exact science: screenshots are seldom recorded at exactly the same time as the interaction with the application; typically they’re recorded afterwards, and the application may have changed in the interim period, no matter how short that period is.
Debug traces are useful for diagnosing acute problems, and range from simple debug statements like ‘I made it to this point’ to dumps of the entire state of values returned from the application by our automation tool. In comparison, logging is intended for longer-term tracking of behaviour which enables larger-scale thinking, such as enabling a test to be reproduced reliably over time.
Good error messages should say what’s expected and include the actual values being compared. Here are two examples of combinations of tests and assert messages, the second more helpful than the first:
1. Int actualResult = addTwoRandomOddNumbers();
assertTrue("Something wrong with calculation", actualResult % 2 == 0);
2. Int actualResult = addTwoRandomOddNumbers(number1, number2);
assertEquals(String.format("Adding two odd numbers [%d] and [%d] should return an even result. Calculated result = %d", number1, number2, actualResult) actualResult % 2 == 0);
Vint Cerf coined the phrase bit-rot to reflect the decay of usefulness or availability of software and data stored on computers. In science, half-life is a measurement of the decay of radioactivity over time, and is the period taken for the radioactivity to reduce by 50%. Similarly, our tests are likely to suffer from bit-rot and will become less useful over time as the system and its use change.
The only cure for bit-rot is prevention. Encourage the developers to adopt and own the tests.
As our tests get bigger and more complex, let’s add unit tests to help ensure our acceptance tests behave as expected. Mock objects are one practical way to reliably automate the tests, with several good and free frameworks available for common programming languages. I suggest you create unit tests for more involved support functions and for ‘driver’ code, rather than for the tests themselves.
If you think creating automated tests for a web application is hard, try using the web site with accessibility software such as a screen reader to learn just how inaccessible some of our web applications are! Screen readers, like automated tests, need ways to interrogate, interact with, and interpret the contents of web applications. In general, increasing the accessibility of a site can improve testability and vice-versa. So while you’re working hard with the team to improve the testability, try to use the site with a screen reader. Here's one example: Fire Vox, a screen reader for Firefox. http://firevox.clcworld.net/about.html
The third and final post of this series will reflect on the aims and challenges of acceptance tests.