Characterization Tests for Legacy code
When working on legacy code, you will face a dilemma: ideally, you should add test first before refactoring, but sometimes you need to refactor to enable the possibility of adding tests.
Hello, developers! 🚀
Welcome back to the Learn Agile Practices newsletter, your weekly dose of insights to power up your software development journey through Agile Technical Practices and Methodologies!
Before starting, I quickly remind you that my brand new Test-Driven Development 101 5-day Email Course is available!
With this course, you will effectively learn what TDD is, for real: what's the objective of this practice, how the TDD process works, and how it enables you to achieve better software. The course goes beyond theory with interactive quizzes and flashcards for active learning, providing practical tips and suggestions for practicing with katas.
As a subscriber to my newsletter, you can have it with a 10EUR discount!
Now, let's dive into today's micro-topic!
The Legacy Code Dilemma
Legacy Code is one of those overused words in Software Development that has no globally accepted definition. Generally speaking, a “legacy code” can be any code inherited from the past, but we need a more specific definition to agree on, and my favorite one is the definition from Michael C. Feathers in his book, Working Effectively With Legacy Code, which describes it as follows:
Legacy code is simply code without test. Code without test is bad code, it doesn't matter how well written it is. Without tests, developers tread lightly and fear introducing regressions.
Generally, we can associate the negative idea of Legacy code to the fear the developer feels when changing it, usually caused by non-tested/bad-tested code. When a test suite that the team trust is in place, this fear disappears.
The main strategy to attack Legacy code is to add test firsts, and then refactor the now-tested piece of code; while this approach makes absolute sense in theory, when we try to apply it in practice we will face the so-called Legacy Code Dilemma:
You can’t refactor code without test coverage, but you need to refactor some code to add tests.
What can we do then? We can use the following algorithm to make changes safely in a legacy codebase:
Identify change points: Find places where you need to make changes.
Find test points: Find places where tests can be written.
Break dependencies: Dependencies are the most obvious impediment to testing.
Write tests: Writing tests for legacy code is different than writing for new code.
Make changes and refactor: We can use TDD(Test Driven Development) to work in legacy code.
The first 2 points are the hardest one, especially the first: to make the code testable, you need to patiently make only small, safe refactorings (automated IDE refactorings can be extremely helpful here), moving in baby steps until it’s possible to add a test.
Characterization Tests
When following the above algorithm, we want to achieve adding tests so that we can safely refactor our code - but we also need a different approach to testing, because trying to achieve the standard pyramid of tests as a first step on a legacy codebase would require too many unsafe refactorings to make it testable.
We need something different here: we need characterization tests.
Characterization tests, aka Golden Master tests, are a particular kind of test that capture the current behavior of the code: you basically take a snapshot of what it does to ensure that it keeps doing it after you change it.
This approach is incredibly powerful, because:
When it comes to existing, untested systems, what the system actually does is more important than what it should do.
It’s the fastest way to cover Legacy Code with meaningful, useful tests, giving you a safety net to refactor.
Characterization tests are called many names, such as Approval Testing, Snapshot Testing, or Golden Master - all of those names describe the same approach to testing: capture the current behavior and save it to verify it’s still the same after the code changes.
Think again about the process described above: your first target is to make it possible to achieve point 4 (Write tests), so instead of aiming to add unit/integration tests (that would probably require a lot of unsafe refactorings to make it possible) you can aim to add a characterization test that formalizes the current behavior and ensure that following changes doesn’t break that behavior.
This approach is also useful for exploring code: you can use tests to obtain information from the software, characterize it with use cases, and learn the current behavior of the system while you also add a test to cover it from future changes.
Practicing Characterization Tests, with an example
As in most cases for software development, we can take advantage of Katas to practice something new and learning enough of it before trying to use it in real code.
For refactoring and legacy code, we can use some repositories provided by Emily Bache, for example, the Tennis Refactoring Kata: the linked repo is designed to practice refactoring, and already provide tests - but we can easily ignore them, and simulate the code to be without tests, add a characterization test, and then refactor (you can also have a look at this video from CantonCoders where they do exactly this). An alternative kata designed for refactoring is the Gilded Rose Kata: use both of them to practice characterization tests, and after a couple of sessions you will feel confident enough to try to add your first characterization test to real legacy code!
🧠 Test Yourself in 1 minute:
💡 Did you know? An interactive activity, like quizzes or flashcards, can boost your learning!
Take advantage of our set of Flashcards dedicated to this topic: read the word and try to describe its definition - then turn the card to check the correct one for feedback 😲 Don't miss out on the opportunity to boost your learning—try now!
Legacy code is…?
Insights Recap
Legacy code definition:
Legacy code is simply code without test. Code without test is bad code, it doesn't matter how well written it is. Without tests, developers tread lightly and fear introducing regressions.
The main strategy to attack Legacy code is to add test firsts, and then refactor the now-tested piece of code
Legacy Code Dilemma:
You can’t refactor code without test coverage, but you need to refactor some code to add tests.
The algorithm to safely make changes in a legacy codebase:
Identify change points: Find places where you need to make changes.
Find test points: Find places where tests can be written.
Break dependencies: Dependencies are the most obvious impediment to testing.
Write tests: Writing tests for legacy code is different than writing for new code.
Make changes and refactor: We can use TDD(Test Driven Development) to work in legacy code.
Make small, safe refactorings, taking advantage of automated IDE refactorings and moving in baby steps until it’s possible to add a test.
As a first step in testing a legacy codebase, we need characterization tests, a particular kind of test that captures the current behavior of the code, taking a snapshot of what it does to ensure that it keeps doing it after you change it.
Advantages of this approach:
What the system actually does is more important than what it should do.
It’s the fastest way to cover Legacy Code with meaningful, useful tests.
Characterization tests are called with many names:
Approval Testing
Snapshot Testing
Golden Master
Characterization tests are also useful to explore code: you can use them to learn the system's current behavior.
Until next time, happy coding! 🤓👩💻👨💻
Go Deeper 🔎
Legenda
📚 Books
📩 Newsletter issues
📄 Blog posts
🎙️ Podcast episodes
🖥️ Videos
👤 Relevant people to follow
🌐 Any other content