How to Manage Technical Debt 🦠
And how to prioritize it against product features and other initiatives.
About a year ago I stumbled upon an old video by Ward Cunningham, one of the original Agile fathers, about technical debt.
There he described debt as the natural result of writing code about something we don't have a proper understanding of.
If we fail to make the program aligned with what we understand to be the proper way to think about our [...] objects, then we are going to continuously stumble into disagreement, and that would slow us down like paying interest on a loan.
The video and the quote above inspired me to write a full article about the elusive meaning of technical debt, which as of today is still my most popular article ever, with more than 50,000 views.
The article focused on what technical debt really is and how you can prevent it in various scenarios. It did little, though, to help you address and reduce debt once it’s there.
This new article is a practical, thorough guide to managing technical debt with your team, from the quarterly strategy down to the day-by-day.
🔥 The impact of technical debt on your team
📈 How debt changes with product maturity
🏃♂️ How to address it: small vs medium vs large debt
⚖️ How to assess technical debt
As always, the article takes from my own experience, plenty of readings, and conversations with the best engineers I know.
Most of all, it is a joint effort with folks from the Refactoring community, who have been invaluable in sharing real-world processes and examples (shout-outs inside the article). You can find here a sneak peek of the main thread.
Let’s dive in 👇
🔥 The impact of technical debt
We all have an intuitive understanding of how technical debt can drag down engineering teams, but how so?
I talked with Alex Omeyer, who surveyed 200+ engineers and managers around technical debt. Here are the insights that struck me the most:
58% of companies have no process for managing technical debt.
66% of engineers believe the team would ship up to 2x faster if they had a process for technical debt.
52% of engineers believe that technical debt negatively impacts their team’s morale.
I was surprised to see such high numbers, but I can empirically validate them. Think of it this way: how many times have you estimated a feature would take the sprint, and it ended up taking two, or three? Now imagine this at the scale of an entire company and the ramifications it would have. It’s not hard to believe that companies who truly have a good handle on their technical debt ship twice as fast as those who don’t.
Technical debt mostly affects velocity and morale — and companies that have processes to address it constantly outperform those that don’t.
Is all debt created equal, though? Does this stand for all products and companies?
📈 Technical debt vs product maturity
When it comes to startups and fast growing products, you may hear about taking on debt intentionally, as the result of prioritizing speed and growth over engineering quality.
Regardless of whether this would make for a good strategy, debt is most often inevitable rather than intentional. Fast growth, in fact, naturally leads to technical debt, because when the product changes at a fast rate, your engineering abstractions get invalidated equally fast.
Such volatility, though, changes with the maturity of your product. For the sake of simplicity, let’s consider three stages:
0 to 1 — you start building a product from scratch, with a set of initial assumptions.
Product Market Fit — you figure out what works, double down on it, and scrap the rest.
Scale — you grow your business predictably and harden your tech.
The earlier you are on this scale, the more your product needs to move fast, and the more leverage you get by accruing debt.
The later you are on the scale, the more debt becomes a drag that prevents your product from growing. Your scale is such that you get the most leverage by repaying debt.
For early stage startups it might be inevitable and even healthy to accumulate debt early on. At the same time, though, you should create processes to help repay debt from the very beginning.
Let’s talk of processes 👇
🏃♂️ A process for technical debt
Watching how great teams address technical debt, a few themes stay remarkably constant. Processes to reduce debt are always:
Intentional — managers can articulate well what they do to keep debt in check. Everyone on the team knows how this works.
Continuous — processes are never based on isolated, sparse initiatives. People address debt every day.
Multiple — debt is managed in multiple ways, based on its type and size. There is no one-size-fits-all process.
Let’s see what processes you can setup based on the size of the debt itself.
🟩 Small Debt
Small debt is about minor improvements you can make in a few hours of time. A few examples:
Add missing tests
Deduplicate some code by creating a simple abstraction
Adjust a piece of code to make it adhere to your codebase conventions
The best way to address small debt is to make these changes whenever you are working on that code for any other reason. You basically follow Robert C. Martin’s boy scout rule:
Always leave the code better than you found it.
This is effective, as opposed to scheduling an independent task, because it avoids context-switch and makes the change cheaper. However, it also means the timeline for such improvements is best-effort (it depends on other tasks), so it is only suitable for issues that are not too harmful.
Zach Wolfe, Tech Lead at Amazon, describes this perfectly, also detailing how they manage it in the sprint:
For small debts, fostering a culture of constantly making minor improvements to the team's software as part of the “definition of done” is effective both at driving down debt and improving morale.
Especially in teams supporting actively developed software that’s been around for 5+ years, there’s probably dozens of these bite-sized improvements available that add up over time. As a side effect, it teaches devs to think “it doesn’t have to be this way” when a debt is discovered rather than default to defeatism.
I think of the tackling of small debts with a forgiving definition of done as mentioned above is a manifestation of Product respecting the needs of Engineering.
🟨 Medium Debt
Medium Debt is about issues you need to prioritize and schedule, because they are somewhat impactful, but you can still fix them in between a few hours and a couple of days.
Automate a tedious manual process
Adjust performance of a small service that’s been degrading over time
Remove a bottleneck in the CI/CD pipeline to shave a few minutes off the deploy time
How should you address these?
In my experience, the best approach is to allocate a fixed time every sprint to spend on such maintenance. The right share largely depends on your team and product, but it might be anything between 10% to 40%. If you need to spend more than 40%, you should probably address debt at a more strategic level (see large debt).
You can also decide to assign fixed days to this (e.g. Fridays) to build the habit.
This is not the only possible approach. There are teams who put these tasks in a backlog together with new product features, prioritize the ones against the others and plan what to do sprint by sprint. While this is not wrong by itself, it has a few drawbacks:
Expensive — negotiating tech debt against product work every sprint can bring serious overhead, especially for minor tasks.
Apples vs oranges — sometimes it’s hard to identify the ROI of small maintenance tasks (e.g. refactoring a small module).
Unreliable — if you have to negotiate it every time, it is more likely that maintenance will be left behind when you are under pressure, and more generally it will be harder to keep a consistent pace.
Corvin Deboeser, senior software engineer, who also had a PM stint, wrote:
When I was PM, my dev team and I blocked out some hours per sprint to tackle tech debt.
Tech debt would usually only come in towards the end (last 3-4 of 10 working days sprint) and devs that wanted to pull them in would announce this in daily stand-ups first. If anyone felt this would endanger the sprint success, it could be raised and we timebox-discuss that.
🟥 Large Debt
Large debt is about strategic initiatives that span between a few weeks and a few months of work. Examples are:
Refactoring a large piece of legacy code to enable product improvements
Migrate a service to a different library / framework because of some new requirements.
Rework a piece of your infrastructure to improve reliability and performance.
These initiatives need to be planned on a longer timeframe — e.g. quarterly — and evaluated strategically, just like regular product work, based on their business opportunity.
Calculating this opportunity is not easy, though, because such initiatives sometimes don’t impact the business directly, but via second-order effects like team velocity, happiness, or reduced turnover.
To evaluate this, you can get help from stakeholders from other areas, like product, marketing, or support. Large engineering works may enable new features and processes, or improve existing ones. Leaders from these areas can help you measure additional benefits and ultimately support you in advocating for this work.
Zach weighs in on this 👇
I think in the case of large debts, the relationship [between Product and Engineering] is joint discovery of business opportunity. Engineering produces a tech debt reduction proposal that uses ROI units comparable to features, whether to do them is a business decision.
Tech debt proposals should at minimum include a feasible solution and a return on investment in units that can be used to compare them to features. An example ROI: “by spending 50 engineer hours the team can eliminate 1 engineer hours spent doing a manual repeated tasks we expect to do 100 times this year.”
Finally, these proposals need to be sourced, collected, and evaluated multiple times a year. Collaboration is crucial here, too, as the best ideas to reduce debt might span multiple teams.
Hold brainstorming sessions on a quarterly basis with other engineering leaders and keep a shared backlog of such initiatives. You can evaluate them with the Riot Taxonomy 👇
⚖️ How to evaluate debt
When it comes to assessing tech debt initiatives, I am a fan of the system used by Riot.
They use three main metrics:
💣 Impact — business metrics impacted by the debt, or the value you unlock by repaying it. Crucial for the ROI equation.
💸 Fix Cost — a rough estimate based on some feasible solution. For the sake of prioritization, there is no need for it to be accurate. Simple t-shirt sizing (e.g. S, M, L, XL) is fine.
🦠 Contagion — this answers the question: “if this debt is allowed to continue to exist, how much will it spread?”. It’s a great angle because, in this regard, not all debt is created equal.
The contagion metric is particularly powerful because it informs on how impact and cost change over time. Bill Clark, former EM for League of Legends, explains this well:
If a piece of tech debt is well-contained, the cost to fix it later compared to now is basically identical. You can weigh how much impact it has today when determining when a fix makes sense.
If, on the other hand, a piece of tech debt is highly contagious, it will steadily become harder and harder to fix. What’s particularly gross about contagious tech debt is that its impact tends to increase as more and more systems become infected by the technical compromise at its core.
And that’s it for this week! If you liked the article, please do any of these:
1) ❤️ Share it — Refactoring lives thanks to word of mouth. Share the article with your team or with someone to whom it might be useful!
2) ✉️ Subscribe to the newsletter — if you aren’t already, consider becoming a paid subscriber.
p.s. 30-days money-back guarantee with no questions asked!
That’s it for this week! Here is a recap of the resources I linked throughout the article to dive deeper:
🌀 The True Meaning of Technical Debt — the original Refactoring article that addressed the meaning of technical debt and how to prevent it.
📑 What Companies Can Do to Reduce Technical Debt — Alex Omeyer, Nicolas Carlo and Maarten Dalmijn discuss some of the key findings from the “State of Technical Debt 2021”, which surveyed 200+ engineers around the topic.
📑 A Taxonomy of Tech Debt — the approach adopted by Riot Games to evaluate and prioritize tech debt initiatives.
💬 Community Thread — the discussion we had in the community around technical debt, that brought many contributions to this post. If you want to join, fill out this survey and I will get back to you!
Such a great article, thank you! It's quite a coincidence, but I mostly work on projects with large amounts of technical debt and always struggle with planning to repay it while keeping up with product tasks at the same time.
In my opinion, working on the "large" category of debt is the most important thing to do to succeed. It's very easy to get caught up in "making small, constant improvements" and overlook more fundamental reasons like bad architecture or outdated dependencies. However, it can be difficult to balance these larger changes with ongoing product tasks.