The Definitive Guide to Technical Debt 🦠

How to prevent it, measure it, and eventually pay it back. Backed by deep research and tens of stories from the community.

Apr 24, 2024

∙ Paid

Upgrade to paid to play voiceover

Refactoring has ~75K subscribers today, who include people with very different jobs: engineers, managers, CTOs, founders, and more.

Whenever I write an article, it is naturally more or less interesting to subsets of these people. When I wrote about the tech talent landscape, it especially resonated with those into hiring. When I wrote The Startup Healthcheck, it was especially useful to founders.

But there are also… exceptions: topics that everybody understands and have lived through in a way or another.

The mother of all these topics is technical debt 🦠

I have written twice about it: on the first time, it was to define it, while on the second one it was to keep it under control.

Both articles are more than two years old now. In the meantime, I have learned a lot by speaking with hundreds of managers and engineers — through the newsletter, and especially through the community, which grew from 50 to more than 750 members.

So, today I am publishing a full, thorough guide on how to handle technical debt: how to prevent it, measure it, and ultimately pay it back.

I write one of such Guides every month — this will be the April one. Guides are longer, more in-depth and more researched articles than usual, that act as primers on a given topic.

Refactoring Guides 📖

Luca Rossi

July 31, 2023

Read full story

This guide will also be extremely practical. I want you to leave this article thinking: “ok, now I can do this and that”.

So, here is our agenda for today:

📖 What is Technical Debt — definitions first, to get everybody on the same page.
⚖️ Good vs Bad Debt? — is there even such a thing like good debt?
🔭 How to Prevent Debt — daily practices to keep code clean and simple
💸 How to Repay Debt — what to do when the crap is already there.
📏 How to Measure Debt — how to know how bad things are, and how better they could be.
📣 How to Advocate for Debt — how to convincingly tell others, especially non-engineers.
💬 Community thread — giant thread with 25+ replies about real stories of how tech leaders in the Refactoring community handle tech debt.

Let’s dive in!

📖 What is Technical Debt

Ward Cunnigham is one of the signers of the original Agile manifesto, and the inventor of the technical debt metaphor.

Here is the sentence where it is believed that this metaphor was used for the first time:

If we fail to make the program aligned with what we understand to be the proper way to think about our [...] objects, then we are going to continuously stumble into disagreement, and that would slow us down like paying interest on a loan.

Ward describes debt as the natural result of writing code about something we don't have a proper understanding of.

He doesn't talk about poor code — which, he says, accounts for a very minor share of debt — he talks about disagreement between business needs and how the software has been written.

In other words, software is a formal description of some business reality and needs. Whenever such a description doesn’t match reality, we get tech debt:

A design that doesn’t scale properly fails to take into account the reality of business volume.
A leaky abstraction is like a bad rule that doesn’t fit reality anymore, and needs continuous exceptions to keep working.

So, how did we get there? How do we land in disagreement?

⚖️ Good vs Bad debt

Whenever you find that your software doesn’t fit reality, it is useful to ask yourself: when did it happen?

There are two macro answers to this:

From the start — engineering, product, and business simply didn’t fully understand one another, and built a (partially) wrong thing.
Over time — reality gradually changed into something else and invalidated some assumptions.

These differences are not clear cut, as some future changes can and should be predicted from the start, but you can still get a sense of where the problem is based on how early and frequent the rework is.

Is it only weeks after release, or months, or years? What does that code churn look like? Is it the first time you are touching it or is it a regular?

At this point you may ask: since reality changes anyway, isn’t debt inevitable? Isn’t some debt even good when it allows us to go faster, granted we keep it under control?

This question is tricky and the answer goes back to the knowns and unknowns model. When you design software, you have three sets of knowns:

🟢 Known knowns — requirements you know and are under control of.
🟡 Known unknowns — uncertain items you are at least aware of, like scale, or some future product directions.
🔴 Unknown unknowns — all the unforeseeable ways things can fail.

You can classify bad, good, and best designs based on how they account for these:

🥉 Bad design • doesn’t even account for (1). Engineering didn’t really understand product, made a set of wrong assumptions, and design was wrong from the start.
🥈 Good design • solves for (1), but doesn’t account for (2). E.g. if scale is an unknown, failure means to commit decisively to a solution that is opinionated about it — you either invest a lot on premature (and speculative) scaling, or on something that will never scale and it is hard to undo.
🥇 Best design • solves for (1) and accounts for (2). For example, it quickly pushes out a design that doesn’t scale but it is easy to scrap later, once you have figured things out.

So, back to good vs bad debt, and debt as disagreement:

Bad debt is unaware disagreement — you didn’t account for things that were under your control.
Good debt is delayed disagreement — you shrink the scope of the agreement today, and accept some disagreement later.

This angle is similar to Martin Fowler’s quadrant (it’s arguably a subset of it).

📈 Tech debt vs Product stage

I believe that the opportunity to accept the tradeoff of creating intentional debt depends on your product stage.

Fast growth naturally leads to debt, because when the product changes at a fast rate, your engineering abstractions get invalidated equally fast. Such volatility changes with the maturity of your product.

For the sake of simplicity, let’s consider three stages: