Learning From Broken Unit Tests

Last updated: May 17 2005

sven.gorts@refactoring.be

Abstract: When multiple tests break due to a small change those tests can learn us a lot about the code. In this article we take a closer look at how we can use the implicit knowledge carried by the tests to our advantage.

Introduction

Every day we see more teams adopt unit testing or shift to test-driven-development. For both styles, the general rule of thumb is that the suite of tests should pass all of the time. And this, for really good reason. Not only do the tests tell us when we're done developing a new feature, the suite of tests also acts as a safeguard for regressions.

In software development project requirements can change. Being agile developers we 're generally comfortable with changing requirements, even late in development. However not all changes are of equal difficulty. Sometimes a change request entirely misfits the existing design. When that happens, we know quickly. Trying to implement the change, breaks tests. Not just a few tests, a lot of tests.

The difficulty in such situations is that our proven techniques such as: undo the last change and take smaller steps; start with a simpler version; look for an alternative route, may not work as well as we're used to. At least the tests can tell us what we broke, and provide guidance while working out a solution.

Embedded Knowledge

Whether refactoring or implementing a new feature, if changing a handful lines of code results in many broken tests there's a lot these tests can learn us. After all each test fails for a reason. Combined together the tests can help us pinpoint the actual problem. The most common things we can learn from broken unit tests are:

Duplication In Test Code

Dealing with numerous failing tests one of the first things to check for is whether they all fail at similar asserts. When this is the case, it's usually a test smell, hinting at duplication in the test code. Before proceeding we better invest some time cleaning up the testsuite so the number of failing tests is reduced.

Package Dependencies

When running the testsuite signals problems in various packages, the tests are telling us something about the dependencies between the different layers in our code. Seemingly unrelated tests that suddenly break after changing a little code indicates some unrecognized coupling between the code or its tests. In such cases we may want to find out exactly why those tests fail and whether that makes sense.

Hidden Assumptions

Even while writing unit tests a lot of hidden assumption are made. We may not even be aware of these assumptions, untill our first attempt to change them. The many tests that break may then point us to hidden domain assumption that we, unknowingly have implicit hardcoded into the tests.

Difficulty Estimate

Even when broken tests do not directly point us at a problem, they can still be helpful. Often their number gives a good first indication of the difficulty to implement a certain change. Their number also informs us about the risk we take should we decide to push onward with broken tests. When the amount of broken tests is really huge, we're touching core functionality. Better be extra careful with that.

With all this extra knowledge derived from the broken tests we can make informed descisions on how to proceed. Should we get rid of the dependencies ? Or can that wait ? Perhaps we should deal with the duplication first ? Are the broken tests really that bad ? Maybe we can push onward and get them all running.

Gradual Improvement

What the broken tests learned us so far is all very well, but meanwhile they are still broken. How can they help us to make progress ? In fact, it's pretty simple.

We know the change we want to make causes our tests to fail. Ah, that's good. We can use it to our advantage to trigger the tests to fail. The broken tests will point us to the areas of code that need improvement. Once we've identified what to improve, we undo the change. That makes our tests all green again so we can refactor safely.

We may need to iterate through these steps a couple of times before applying the change and have all tests pass, but its a safe route. Undoing the change allows us to get back to green quickly and as we go the number of broken unit tests will tell us how we're doing. The steps below show the idea more schematically.

Following the schema above our design can gradually evolve until the change we want to apply becomes a natural fit. While evolving the design we keep an eye on the count of broken tests, for an attempt to apply our change. Ideally we want the broken test count to be strictly decendent, because that protects us from chasing our own tail. It's no disaster if the count occasionally goes up, but it's always suspicious.

Conclusion

There's a lot we can learn from broken unit tests. Clusters of broken tests may indicate duplication in the test code, cross layer dependencies or even wrong assumptions that have been hard-coded in the tests. The number of broken tests and their location provides extra information we can use to estimate the impact and difficulty of making a certain change. With the technique presented in this article we can take full advantage of the implicit knowledge embedded in the broken tests.