How we use tennis rankings for fixing bugs 🏸
And why it turned out to be great. With a caveat.
Hey 👋 this is Luca! Welcome to a ✨ monthly free edition✨ of Refactoring.
Every week I write advice on how to become a better engineering leader, backed by my own experience, research and case studies.
You can learn more about Refactoring here.
To receive all the full articles and support Refactoring, consider subscribing 👇
Last week I discussed our approach to maintenance, based on allocating a fixed weekly time with a dedicated cycle of planning and review.
I promised I would elaborate more on that, focusing on how we measured the process and improved it over time. So here we are!
In the first post I said we settled with 20-30% of the weekly dev time dedicated to maintenance. After I wrote it, a few people asked: "Isn't that too high?". While others said: "Isn't that too low?". This seems one of those situations where the answer is inevitably "it depends", but no, I will take the leap and argue that this number should work just fine for almost everybody.
Let me explain.
Isn't that too low? 📉
We are talking here of how much time you spend on small tasks on a weekly basis — they might be bugs or feature improvements. If you find yourself having to allocate more than 30% of your time — say, 40 to 50% — to fixing bugs, then you can ask yourself the following questions:
Are the bugs so many because the surface of your products is very large? Then maybe your team is too small to maintain them, and if you want to go faster you need (ideally) to hire more people. You may have reached the natural ceiling of what your dev team can handle.
Are the bugs so many because there is a lot of legacy or other problems in the codebase? Then a bigger refactoring (not managed as weekly maintenance) may be more effective in the medium term.
The point here is that such a maintenance process can only take the low hanging fruits, and that's by design. Spending too much time on them can make us blind to more coordinated efforts that provide bigger rewards in a longer time span.
Isn't that too high? 📈
Let's take the opposite situation. Everything seems to work fine and we can spend 90% of the time, or more, on planned new features.
It seems like an ideal scenario, and you might think "If I do everything right, I should end up there". In practice, though, I think it never happens. For a few reasons:
There is some maintenance you simply cannot avoid: dependency updates, replacement of little libraries gone out of use, there are many things that should be done routinely and represent a stable (even if small) effort over time.
Bugs and performance gains are everywhere: if you don't see any easy opportunity for improvement, it is more likely that your process for surfacing them is not working properly, rather than you have reached perfection. Ask yourself: am I logging and tracking errors correctly? Are logs organized in a way that issues are easily discovered? Am I reviewing performances, setting thresholds and alerts for response times? And most of all, do customers have an easy way to reach out and report stuff?
Small features matter: if you find yourself jumping from big feature to big feature, ask yourself: am I really iterating on what I release? Is there a feedback loop in place that provides opportunities for easy improvement? Teams that constantly focus on big features tend to be more gut-driven than data-driven — small things are harder to detect unless you surface them through data.
The point here is that low hanging fruits can be everywhere, but you have to set yourself up for finding them. The more you invest on finding them easily, making them come to you naturally, the more your maintenance process will provide value and become a staple of your team.
Measuring progress 🔍
Team processes are living things that need constant work, not only to be improved, but also to make sure they don't derail and lose steam over time.
To do this, we need to be able to measure the process performance. I believe that if you don't measure something, you can't really improve it.
But how do you measure it?
A first consideration is that maintenance is a two-sided game: there is one side reporting tasks, and another side addressing them. There is also a significant overlap between the two sides, with people taking both roles in turn. This setup leads to dynamics that are very similar to those of a marketplace: there is demand and supply, and you can measure both.
In our case we can measure two major metrics:
Tasks reported per week (demand)
Amount of work completed per week (supply)
Surprisingly (or not surprisingly) even measuring only one of them provides a good picture of the overall health of the process. The amount of reported tasks always tends to adjust on how much the team can work off. In fact, in periods when demand vastly exceeds supply, the following things happen in sequence:
backlog grows indefinitely
people lose confidence in the chance that their task will be completed
people report less and less tasks
On the contrary, if the dev team is responsive and able to complete everything that is being thrown at, people get motivated to report more and more.
Making the process stick 📌
Before creating a dedicated maintenance process, we had a backlog of tasks (mostly bugs) entered by the team, but we were not very efficient at processing them. Over a few months, people started losing interest in adding more tasks, and the process derailed.
With the introduction of a formal process, it seemed like everything was going to change: reported tasks per week jumped 4x and people were excited that this was finally the right way of doing things.
☝️ reported tasks per week — jumping from 5/week to 20
The excitement, however, didn't last. After the initial spike, the number of reported tasks spectacularly decreased to levels just a little higher than the beginning. You could feel people losing confidence again.
What were we doing wrong? We weren't doing proper reviews and retrospectives, we weren't discussing bottlenecks and putting in the effort to improve.
By missing these pieces, developers started feeling this part of the work wasn't very important, and gradually moved to spending more time on delivering regular features from the Sprint, neglecting the bugs.
Because here's the thing with processes: if you don't strive to improve, you don't stay put neither — you actually get worse!
The Ranking 🏆
After realizing this, we set to implement a better way to measure how "well" we had performed week over week, in order to discuss it in retrospectives and take measures for improvement.
We decided the process had two major goals:
Complete as much work as possible within the allocated time
Be reliable and complete things respecting our own deadlines
The second point is paramount because, if you remember from the first post, each week we create a commitment in which we declare which tasks will be addressed in the week. This sets expectations within the team — other people might plan their work around what we promise, so it's very important to deliver on that.
To track these goals, we improved our Airtable database (that we use to list tasks) to include a numeric value for each completed task. This value is set based on the following rules:
🥇 100 points for each "High" effort task completed on time
🥈 50 points for each "Medium" effort task completed on time
🥉 25 points for each "Low" effort task completed on time
If the task is delayed to the following week with respect to the commitment, its points are halved for each week of delay, with respect to the full value.
If a task is completed without having been declared in the commitment, its points are halved with respect to the full value
Based on these points, we created a Ranking of developers based on the sum of points they collected each week. These points "expired" after a number of weeks, just like in tennis rankings, so that over time each person, even new ones, could compete for the top spots.
This system allowed to track other metrics as well, like how much effort was spent by everyone on non-scheduled tasks, on tasks that were already late, etc. These secondary metrics provided direction for improvement and basis for discussion.
We also started reviewing the single weeks, to better understand how we performed as a team.
Collecting the wins 🎉
The introduction of scores and rankings created excitement and allowed for a virtuous cycle. In fact, this way:
We could identify issues and bottlenecks based on real data
We could discuss improvements and check the actual outcome week over week.
We created some healthy competition by making scores visible throughout the company. The ranking was shared every week on the public Slack channel, with the top performers being celebrated by the rest of the company.
All of this made the team output skyrocket, producing a stunning 3x increase in the weekly number of reported and completed tasks, in just a couple of months. And that's without increasing the time spent on it!
A Caveat ✋
A caveat, however: tracking—and displaying—individual performances is a tricky decision that can backfire in multiple ways. In particular, people can get stressed and feel incentivized to work long hours to get “more points”.
It never happened to us, because we never attached concrete incentives to the process, we kept everything at “fun” level, and—most of all—because we were already a tight-knit team with a light-hearted, non-judgemental company culture.
It’s important, at management level, that we always make people feel safe, not personally judged, and make everyone understand that the only goal of the process is to find issues, bottlenecks and improve together as a team.
Any system that leverages individual performance, even a very light one such as this, can be used for good or evil. I believe good management makes all the difference.
Like with the Pull Requests process, we found success when we made everything visible, measurable, and committed to continuously improve it.
But on top of that, the ranking dynamics created a company-wide engagement that surpassed our expectations — we had started it because it was fun!
I believe "because it's fun" is an underestimated reason to start things.
Have you ever tried something similar? Some “game” within your company that has had unexpected results? Let’s have a conversation in the comments 👇 or via email.
Hey, I am Luca 👋 thank you for reading through this post!
Every two weeks I write something about making software, working with people and personal growth. If you haven’t already, you can subscribe below to receive new posts in your inbox!