How to plan maintenance and small changes 🔨

A simple process to do it reliably over time, without losing your mind

In The Four Types of Work I described the different cycles we use to plan product and engineering work. A few people have asked a more details about the Operational Cycle, that is dedicated to maintenance.

A brief summary for those who don't have read the post (but read it if you can and let me know your thoughts!) — our main work gets done in Sprint cycles, lasting 2 weeks, in which we address regular activities. Then we have a separate weekly cycle dedicated to bugs and small changes reported by anyone in the team.

About this one, I will try to answer your questions and provide more context about what we implemented — it has worked well for us and we are all a bit proud of it.

Actually, there is so much to talk about that I will split this post in two:

  1. The first part (this one) will focus on the process itself: why we do things this way, and how we implemented them.

  2. The second part, next week, will review the performance of the process and what we did to improve it over time (spoiler: we made it measurable and added some…fun 🏆).

So let's dive in!

Q: What do you mean by maintenance?

First – what does maintenance exactly mean? That's an important question, because maintenance is a broad term, but in our case we use it with a very specific meaning.

We call maintenance any small task that:

  • can be (probably) completed in less than 1.5 days of work

  • is independent from other activities planned in the Sprint

Examples of tasks falling into this bucket are:

  • Non critical bugs

  • Small feature improvements

  • Dependency updates

  • Small refactoring

Q: Why the limit of 1.5 days of work?

We allocate between 20% to 30% of the week to maintenance – that means 1 to 1.5 days per week. One day is always allocated to these tasks, while the last half day is under total discretion of the developer. They might use it to study, do something they are generally responsible for (e.g. update a few dependencies), or continue working on Sprint tasks.

Most of this work is done on Fridays, which are completely allocated to this, while the remaining 0.5 can be distributed within the week as the developer prefers. Fridays were an obvious choice because developers are less confident to deploy big features and other significant work right before the weekend.

The limit of 1.5 days for each task exists because:

  1. We want tasks to be always completed within the week, and not to overflow to the following.

  2. If it requires 2 days or more, we feel it's better to plan it in the Sprint, for visibility of the entire team.

Q: What about bigger maintenance?

I know, I know. There are maintenance activities — in the usual meaning of the word — that occupy more than 1.5 days. Think of refactoring some large legacy code, updating a framework, etc.

How do you plan for these? Two routes here:

1) Some activities never become bigger than 1.5 straight man-days if they are done regularly week over week. Take for example updating a big framework (e.g. Rails): if you regularly set aside some time to update things, even with minor releases, you almost never find yourself in the need of doing the big, risky update all at once. Approaching things this way makes sure they nicely fall into the maintenance cycle.

2) For tasks that stay big anyway, you should plan them in the Sprint. Its owner — might be the CTO or an engineering leader — should sit at the planning table, demonstrate the task value, and schedule it.

This is a healthy discussion that we — engineering leaders — always need to have with the rest of the company, to avoid cornering engineering in a place where in the long run other people don't understand what we do, why we do it, and think we do things for our own satisfaction.

Being able to convey the business value of pure engineering work is hard, but is the only way to build real trust in the area.

Q: Why a dedicated cycle instead of managing everything in the Sprint?

We used to! We found it inconvenient for a couple of reasons:

1) Cycle time

Sprints last 2 weeks, so it happened frequently that things (e.g. bugs) reported in the first week of the sprint wouldn't be addressed before 2-3 weeks, unless very urgent.

This created a whole underground movement of people talking directly with developers and asking "please can you do this in the next couple of days? It should be easy" — derailing the process in ways you can imagine.

Having weekly cycles makes everyone more comfortable that their task will be done in reasonable time.

2) Prioritization

Sprint planning, meant as the act of deciding what to do in the Sprint, based on priorities and estimates, was ineffective for these tasks. When you have the big, quarterly project on one side, and a small back-office improvement to the other, it's very hard to create room to the latter.

Think about it: the big project might be a top-down, company-wide effort likely backed by an executive sitting at the table, while the improvement is often a small bottom-up initiative – e.g. the outcome of a chat with a specific customer. By trying to prioritize and schedule everything at the same time, the small tasks always got "bullied out" and ended up in an infinite backlog of stuff nobody ever did.

Q: Can you give more details about how you manage this?

We have a database on Airtable where everyone on the team can contribute by adding entries from a form. Each entry is labelled as either a bug or a change.

The creator of the entry can give it a priority (P1, P2 or P3), or leave it empty if unsure.

Each Wednesday, the engineering team has a meeting where they go through these tasks, assign an estimate to the new ones, and set the priority if missing.

☝️ open bugs and changes

Effort is put on a High / Medium / Low scale, that roughly stands for:

  • High = 1 man-day

  • Medium = 0.5 man-days

  • Low = 0.25 man-days

These are very rough estimates that developers should decide in a few minutes. The point here is not to be super precise, but just to provide an order of magnitude of the task. If the task is too big to be addressed during the maintenance cycle, it gets escalated and discussed in the following Sprint.

Then, during the meeting we run through the open tasks in order of Priority (high to low) and Effort (low to high), and developers assign them to themselves until their time is full.

Tasks assigned this way are moved to the Scheduled status, and a Commitment for the week is created with all Scheduled tasks. The commitment is visible to all the company: we take a screenshot of the Airtable view and share it in a Slack channel.

☝️ an example of weekly commitment

A nice touch: we created an internal Slack integration where people get notified when a task they reported gets Scheduled, and again when it's Done. The same notification is also sent to the public maintenance channel. This is super useful, for example, for bugs coming from Customer Success chats — CS people need to be notified when these bugs have been fixed in order to update the customers that reported them.

Without this small automation, developers had to remember to notify people by themselves, by also remembering who reported the bug in the first place. As you may imagine, this was almost never done.

The Wednesday meeting serves both as a planning for the next week, and a review for the previous one. At the end of the meeting, we also share on slack the output of the week:

☝️ the weekly outcome, shared with everyone on Slack

This sharing moment serves either as a celebration, if things went well, or "public shaming" if we missed a big chunk of the commitment. Everything is done with a light spirit, but the public element of the commitment really helps building accountability within the team.

That's it! These are the basics, but it ain't over. If you have read a few other posts on Refactoring, you know I love to measure everything, and we couldn't help but measure this as well.

Next week I will detail how this process, that looks good (to me) on the surface, almost derailed after a couple of months, why, and what we did to put it back on track stronger than ever and making everybody happy.

What’s your experience dealing with maintenance? Let’s have a conversation in the comments 👇 or via email.

Hey, I am Luca 👋 thank you for reading through this post!

Every two weeks I write something about making software, working with people and personal growth. If you haven’t already, you can subscribe below to receive new posts in your inbox!