Untangling Legacy Code

Legacy code (code without good automated test coverage) is an insidious burden that slowly strangles software development velocity, kills development team morale, and ultimately destroys Business Agility by reducing speed and quality, increasing risk, and eroding culture.

Legacy code: code without automated tests, or equivalently, code that developers are afraid to change

MICHAEL FEATHERS & J B RAINSBERGER

Where does Legacy Code come from?

When a new piece of software is still small defects are relatively easy to find and fix. This, combined with time-to-market pressure encourages shortcuts, such as relying on manual testing and tolerating messy design. If the product doesn’t work out and it gets junked, there’s no harm.

But if the product lives on the right thing to do is to rewrite the early version, adding automated tests and cleaning up the internal design before it is too late. But few organisations have the maturity and discipline to forgo short-term development speed — the customers or stakeholders are crying out for new features! — for long-term technical health.

And the frog boils slowly. It won’t be immediately evident that our short-term focus is seriously damaging our future prospects.

Why the standard “fixes” fail

The standard approaches to fixing Legacy Code once the codebase has become large perform predictably poorly. The most straightforward attempt to put the genie back in the bottle — refactoring to introduce automated unit tests that would have been easy to write when the codebase was small no longer work!

Introducing automated unit tests is no longer feasible because of lack of modularity in legacy codebases. In order to introduce unit tests we need to be able to isolate a unit of code sufficiently to test it. But modularity degrades rapidly in codebases in which automated testing is omitted: introducing tests early helps maintain modularity and aids design; without it our legacy codebase almost certainly degrades into a mess of spaghetti and duplication. Instead of building on a virtuous cycle, we’re stuck with trying to reverse a downward spiral.

Restoring that modularity means that the code needs to be refactored in order to introduce tests. But refactoring without automated tests in large codebases introduces new defects that lead to surprising and expensive breakages. To refactor safely, we need automated tests, but that’s exactly what we don’t have! You see the problem: we want to introduce tests, but we need to refactor safely first, for which we would need the tests that we don’t yet have — it’s a Catch 22.

What’s needed instead of going straight for small, low-level unit tests is to proceed indirectly. We must find a way to introduce a few quick-and-dirty higher level automated tests to provide some degree of safety in the initial refactoring. This creates a coarse safety net that allows for the big investment in remedial work — i.e. the much needed refactoring and introduction of fine-grained unit tests — to proceed.

There are a couple of popular tactics that are frequently attempted at this point. Unfortunately both are fatally flawed and only give partial relief. They survive in our industry because they take a long time and give the illusion of action:

  1. Automated UI tests could help, except that they are slow to write, slow to run, and their brittleness makes them expensive to maintain.
  2. A big rewrite from scratch takes too long and meanwhile two systems need to be maintained in parallel.

Both become feasible in modified forms — a smaller number of UI tests for testing the UI and boosting stakeholder confidence, and incremental re-writing of key components — after undertaking the superior approaches to taming legacy code which I will outline next.

How to really fix Legacy Code

The first option is to re-write early. Throw away the proof-of-concept or prototype and use good practices from eXtreme Programming (XP) like pair-programming, test-driven design/development (TDD) and automated 10 minute builds to never get into a mess of legacy in the first place.

However, since this can only work for small, well-understood codebases, we need a second option, which is really a mashup of advanced techniques, which can be situationally adapted. I teach the following three, in addition to the foundational techniques of TDD and pair-programming:

  1. Golden Master testing uses randomness to provide a usable quick-and-dirty test that provides the necessary safety net for refactoring to unit tests. 
  2. Design by Contract introduces pre- and post-conditions that can pinpoint defects with even greater precision than unit tests and strengthen the system design. Finally, 
  3. Property Based Testing again uses randomness, this time with invariant program properties to find corner cases and intermittent errors.

None of these techniques is individually a panacea, but contextually appropriate combinations lead to dramatic improvements in code quality and robustness, and greatly reduce stress levels for developers and technical leaders by making it feasible to pay down technical debt by incrementally refactoring Legacy Code.

Learn more: About Taming Legacy Code


Talk to Dan about Taming Legacy Code

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.