Discomfort as a Tool for Change
Monday, February 13, 2017
by Dave Gladfelter (SETI, Google Drive)
The answer is scope. A SWE is rewarded for being an expert in their particular area and domain and is highly motivated to make optimizations to their carved-out space. SETIs (and Test Engineers and EngProd in general) identify and solve product-wide problems.
Product-wide problems frequently arise because local optimizations don't necessarily add up to product-wide optimizations. The reason may be the limits of attention, blind spots, or mis-aligned incentives, but a group of SWEs each optimizing for their own sub-projects will not achieve product-wide maxima.
Often SETIs and Test Engineers (TEs) know what behavior they'd like to see, such as more integration tests. We may even have management's ear and convince them to mandate such tests. However, in the absence of incentives, it's unlikely that the decisions SWEs make in response to such mandates will add up to the behavior we desire. Mandates around methods/practices are often ineffective. For example, a mandate of documentation for each public method on an interface often results in "method foo does foo."
The best way to create product-wide efficiencies is to change the way the team or process works in ways that will (initially) be uncomfortable for the engineering team, but that pays dividends that can't be achieved any other way. SETIs and TEs must work to identify the blind spots and negative interactions between engineering teams and change the environment in ways that align engineering teams' incentives. When properly incentivized, SWEs will make optimal decisions enhanced by product-wide vision rather than micro-management.
This makes sense from the position of any individual SWE: releases are painful, you have to ensure that there are no UI and API regressions, watch traffic and error rates for some time, and re-learn and use tools and processes that are complex and specific to releases.
Multiple teams will naturally gravitate to having one big release so that all of these costs can be bundled into one operation for "efficiency." The result is that engineers don't get feedback on features for weeks and versioning of APIs and data stores is ignored (since all the parts of the system are bundled together into one big release). This greatly slows down developer and feature velocity and greatly increases risks of cascading failures when the release fails.
Heroes are few and far between so we must turn to incentives, which is where discomfort comes in. Continuity is comfortable and change is painful. EngProd looks at how to change the problem so that teams are incentivized to work together fruitfully and disincentivized (discomforted) to pursue local optimizations exclusively.
So how does EngProd align incentives? Certainly there is a place for optimizing for optimal behaviors, such as easy-to-use integration environments. However, incentivizing optimal behaviors via negative feedback should not be overlooked. Each problem is different, so let's look at how to address the two examples above:
Fakes are as-simple-as-possible implementations of a service that still can be used to do pre-submit testing of client interactions with the system. They don't replace integration tests, but they reduce the likelihood of finding errors in subsequent integration test runs by an order of magnitude.
Furthermore, have some subset of the same client-owned and server-owned tests run against the fakes (for quick presubmit testing) as well as the real implementation (for continuous integration testing) and work with management to make it the responsibility of the Fake owner to debug any discrepancies for either the client- or the server-owned tests.
This reverses the pain! API owners, who are in a position to make APIs better, are now the ones experiencing negative incentives when APIs are not easy to use. Previously, when clients felt the pain, they had no recourse other than to file easily-ignored bugs ("Closed: working as intended") or contribute changes to the API owners' codebase, hurting their own performance with distractions.
This will incentivize API owners to design APIs to be as simple as possible with as few side-effects as possible, and to provide high-quality fakes that make it easy for clients to integrate with the API. Some teams will certainly not like this change at first, but I have seen API teams come to the realization that this is the best choice for the larger effort and implement these practices despite their cost to the team in the short run.
Helping management set engineering team objectives may not seem like a typical SETI responsibility, but although management is responsible for setting performance incentives and objectives, they are not well-positioned to understand how the low-level decisions of different teams create harmful interactions and lower cross-team performance, so they need SETI and TE guidance to create an environment that encourages optimal behaviors.
That said, the biggest problem that EngProd is positioned to solve is to break the chain of local optimizations resulting in cross-team de-optimizations. To that end, discomfort is a tool that can incentivize engineers to find solutions that are optimal for the entire product. We should look for and advocate for these transformative changes.
Introduction
The SETI (Software Engineer, Tools and Infrastructure) role at Google is a strange one in that there's no obvious reason why it should exist. The SWEs (Software Engineers) on a project understand its problems best, and understanding a problem is most of the way to fixing it. How can SETIs bring unique value to a project when SWEs have more on-the-ground experience with their impediments?The answer is scope. A SWE is rewarded for being an expert in their particular area and domain and is highly motivated to make optimizations to their carved-out space. SETIs (and Test Engineers and EngProd in general) identify and solve product-wide problems.
Product-wide problems frequently arise because local optimizations don't necessarily add up to product-wide optimizations. The reason may be the limits of attention, blind spots, or mis-aligned incentives, but a group of SWEs each optimizing for their own sub-projects will not achieve product-wide maxima.
Often SETIs and Test Engineers (TEs) know what behavior they'd like to see, such as more integration tests. We may even have management's ear and convince them to mandate such tests. However, in the absence of incentives, it's unlikely that the decisions SWEs make in response to such mandates will add up to the behavior we desire. Mandates around methods/practices are often ineffective. For example, a mandate of documentation for each public method on an interface often results in "method foo does foo."
The best way to create product-wide efficiencies is to change the way the team or process works in ways that will (initially) be uncomfortable for the engineering team, but that pays dividends that can't be achieved any other way. SETIs and TEs must work to identify the blind spots and negative interactions between engineering teams and change the environment in ways that align engineering teams' incentives. When properly incentivized, SWEs will make optimal decisions enhanced by product-wide vision rather than micro-management.
Common Product-Wide Problems
Hard-to-use APIs
One common example of local optimizations resulting in cross-team de-optimization is documentation and ease-of-use of internal APIs. The team that implements an internal API is not rewarded for making it easy to use except in the most oblique ways. Clients are compelled to use the internal APIs provided to them, so the API owner has a monopoly and will set the price of using it at "you must read all the code and debug it yourself" in the absence of incentives or (rare) heroes.Big, slow releases
Another example is large and slow releases. Without EngProd help or external pressure, teams will gravitate to the slowest, biggest release possible.This makes sense from the position of any individual SWE: releases are painful, you have to ensure that there are no UI and API regressions, watch traffic and error rates for some time, and re-learn and use tools and processes that are complex and specific to releases.
Multiple teams will naturally gravitate to having one big release so that all of these costs can be bundled into one operation for "efficiency." The result is that engineers don't get feedback on features for weeks and versioning of APIs and data stores is ignored (since all the parts of the system are bundled together into one big release). This greatly slows down developer and feature velocity and greatly increases risks of cascading failures when the release fails.
How EngProd fixes product-wide problems
SETIs can nibble around the edges of these kinds of problems by writing tools and automation. TEs can create easy-to-use test environments that facilitate isolating and debugging faults in integration and ambiguities in APIs. We can use fancy technologies to sample live traffic and ensure that new versions of systems behave the same as previous versions. We can review design docs to ensure that they have an appropriate test plan. Often these actions do have real value. However, these are not the best way to align incentives to create a product-wide solution. Facilitating engineering teams' fruitful collaboration (and dis-incentivizing negative interactions) gives EngProd a multiplier that is hard to achieve with only tooling and automation.Heroes are few and far between so we must turn to incentives, which is where discomfort comes in. Continuity is comfortable and change is painful. EngProd looks at how to change the problem so that teams are incentivized to work together fruitfully and disincentivized (discomforted) to pursue local optimizations exclusively.
So how does EngProd align incentives? Certainly there is a place for optimizing for optimal behaviors, such as easy-to-use integration environments. However, incentivizing optimal behaviors via negative feedback should not be overlooked. Each problem is different, so let's look at how to address the two examples above:
Incentivizing easy-to-use APIs
Engineers will make the things they're incentivized to make. For APIs, make teams incentivized to provide integration help in the form of fakes. EngProd works with team leads to ensure there are explicit objectives to provide Fakes for their APIs as part of the rollout.Fakes are as-simple-as-possible implementations of a service that still can be used to do pre-submit testing of client interactions with the system. They don't replace integration tests, but they reduce the likelihood of finding errors in subsequent integration test runs by an order of magnitude.
Furthermore, have some subset of the same client-owned and server-owned tests run against the fakes (for quick presubmit testing) as well as the real implementation (for continuous integration testing) and work with management to make it the responsibility of the Fake owner to debug any discrepancies for either the client- or the server-owned tests.
This reverses the pain! API owners, who are in a position to make APIs better, are now the ones experiencing negative incentives when APIs are not easy to use. Previously, when clients felt the pain, they had no recourse other than to file easily-ignored bugs ("Closed: working as intended") or contribute changes to the API owners' codebase, hurting their own performance with distractions.
This will incentivize API owners to design APIs to be as simple as possible with as few side-effects as possible, and to provide high-quality fakes that make it easy for clients to integrate with the API. Some teams will certainly not like this change at first, but I have seen API teams come to the realization that this is the best choice for the larger effort and implement these practices despite their cost to the team in the short run.
Helping management set engineering team objectives may not seem like a typical SETI responsibility, but although management is responsible for setting performance incentives and objectives, they are not well-positioned to understand how the low-level decisions of different teams create harmful interactions and lower cross-team performance, so they need SETI and TE guidance to create an environment that encourages optimal behaviors.
Fast, small releases
Being forced to release more frequently than is required by feature deployment requirements has many beneficial side-effects that make release velocity a goal unto itself. SETIs and TEs faced with big, slow releases work with management to mandate a move to a set of smaller, more frequent releases. As release velocity is ratcheted up, negative behaviours such as too much manual testing or too much internal coupling become more painful, and many optimal behaviors are incentivized.Less coupling between systems
When software is released together, it is easy to treat the seams between different components as implementation details. Resulting systems becoming so intertwined (coupled) that responsibilities between them are completely and randomly mixed and their interactions are too complex for any one person to understand. When two components are released separately and at different times, different versions of them must be compatible with one another. Engineers who were previously complacent about this fragility will become fearful of failed releases due to implicit contract changes. They will change their behavior in beneficial ways such as defining the contract between components explicitly and creating regression testing for it. The result is a system composed of robust, self-contained, more easily understood components.Better/More automated testing
Manual testing becomes more painful as release velocity is ramped up. This will incentivize automated regression, UI and performance tests. This makes the team more agile and able to catch defects sooner and more cheaply.Faster feedback
When incremental feature changes can be released to dogfood or other beta channels more frequently, user interaction designers and product managers get much faster feedback about what paths lead to better user engagement and experience than in big, slow releases where an entire feature is deployed simultaneously. This results in a better product.Conclusion
The SETIs and TEs optimize interactions between teams and create fixes for product-wide, cross-team problems in order to improve engineering productivity and velocity. There are many worthwhile projects that EngProd can do using broad knowledge of the system and expertise in refactoring, automation and testing, such as creating test fixtures that enable continuous integration testing or identifying and combining duplicative tests or tools.That said, the biggest problem that EngProd is positioned to solve is to break the chain of local optimizations resulting in cross-team de-optimizations. To that end, discomfort is a tool that can incentivize engineers to find solutions that are optimal for the entire product. We should look for and advocate for these transformative changes.
Thanks for a fun post - I enjoyed reading it.
ReplyDeleteA short question w.r.t. SWEs writing APIs and the fakes. If writing fakes is painful enough to incentivize SWEs to write good and simple APIs, wouldn't the same force incentivize the SWEs to update the APIs? API design is hard to thing to get it right - even more so on the first attempt. If the SWEs are now responsible for updating Fakes (which I assume to be less than pleasant), wouldn't they shy away from improving the API?
Thanks for the question, taeold. I agree that all else being equal, if you make something more expensive it'll happen less.
DeleteUsually API changes are actually additions. It's hard to change the black box behavior of a service in a loosely-coupled system because you have to convince all clients to migrate to expect the new reponse.
Off the top of my head, API definitions or black-box behavior can change for several reasons:
1) The API leaks implementation details and the implementation needs to change for some reason.
2) A new feature is added.
3) The domain changes. (typically rare)
4) There is an error in the implementation that leaks bad API responses.
API owners are motivated by management performance reviews. For 2-4, they will be given poor marks if they do not respond to defects and changing business demands, so they will implement the changes and update the Fake despite the additional cost.
Almost all bad APIs I've seen are examples of 1. This is a symptom of a bad design and probably made the Fake hard to write in the first place. In a system that hides no implementation details, the Fake is as complicated as the original system. I would hope that teams in this situation would come to the awareness of their fundamental problem and fix it since that would lower long-term costs for everyone.
A few years ago I worked with a team whose primary query API sent clients the entire storage record, which was a kind of state machine snapshot, consisting of dozens of poorly-defined, nested, high-cardinality fields. Clients generally only needed one or two pieces of data, but they had little guidance as what was public and unchanging and often picked an internal, implementation-specific field to use. The service owners didn't know what clients needed because they were internally focused. Imagine how hard that API would be to support with a Fake and how motivated that team would be to learn what the domain actually was and implement an API using that knowledge if a realistic Fake was mandatory.
Dave
p.s. In case you're curious, the team eventually did move to a better API. They came to realize how much their API constrained their implementation since any internal change was a potential hard-to-debug defect only caught in integration tests.
Thanks for your thorough reply Dave.
DeleteI've neglected to think about the many other factors at play in a significant API change; substantial rewrite of a fake may only play a minor role. But the motivation write a good API to make it easy to write fakes is at full force in both initial design and painful redesign of the API.
Daniel
Very nice post Dave
ReplyDeleteHi Dave - Thank you for writing out this post to minimize product-wide problems. In particular I was interested in Fakes as a solution to identify if an API is easy to use. I don't know if this would be the right place to inquire implementation details of Fakes but thought I would ask:
ReplyDelete1) Is Fakes a good way for controlling the underlying persisted data(test data/fixtures) which the actual service use to generate the responses?
2) And if so, Should Fakes also implement ways to insert and delete test data/fixtures?
> 1) Is Fakes a good way for controlling the underlying persisted data(test data/fixtures) which the actual service use to generate the responses?
DeleteI think the answer to your question is entirely dependent on your architecture. The only hard-and-fast definition of Fakes is that they are simplified implementations of the service useful for functional testing.
I work on a product now where the external API is basically a filesystem as web service, so a good Fake at that level would definitely not control an underlying persistent datastore but would most likely use an in-memory representation. I've worked on a product where there are multiple levels of bidirectional-interacting services, some of which touch the same underlying datastores or otherwise have side communication channels such that you can't write a Fake of service A without some way of propagating state changes (side-effects) to the Fake of service B.
It's probable that there are domains where such a complex, coupled architecture is inevitable. In those cases, the Fakes would have to have some communication channel, and it might be by talking to a lightweight common storage. However, I think it's likely that when confronted with the complication of writing and maintaining a web of interacting Fakes, teams will realize that there are better architectural alternatives (e.g. isolated microservices) in many cases. In either case, it's not the Fakes that make life difficult, they simply put the pain that is felt by the clients onto the service owners.
> 2) And if so, Should Fakes also implement ways to insert and delete test data/fixtures?
I think based on my answer to 1) the general answer is "no". For test fixtures using Fakes I expect the Fakes to provide seeded data for read-only services and for the test to generate the test data during the setup phase by making writes to read/write services' Fakes. I don't expect any of that data to reach a real database like Bigtable or MySQL because such heavy-weight dependencies are inappropriate for functional tests.
My approach to test data in end-to-end (Fake-less) tests is to generate it using the public APIs of the system, since they are least likely to change in ways that cause the data to violate the data stores' schema and most likely to generate realistic data (aside from thorny issues such as legacy/migrated data.)
Hi David I enjoyed your post, with fakes in your experience would you design them to support prod like response times delays or is production response delays a feature more suitable more a mock stub framework? I am asking this from a performance testing point of view. Thanks again for the post
ReplyDeleteHi John, I've helped SWEs add statistical delay strategies into fakes and that's worked fine. At Google we have a very standardized RPC system such that the often easier choice is to insert transparent, data-driven proxies that can fuzz timing as needed. In the case above the bespoke delay code was for performance testing a third-party, non-Google interaction so there was no existing infrastructure for inserting performance delays. If you have a common RPC mechanism, using dynamic proxies is the more general, lower-maintenance solution.
Delete