Creating a measurable Pull Request process 🔍

Why you should track pull requests and set goals about them — and a pragmatic approach to do it.

Pull requests and code reviews are a cornerstone of the workflow of any engineering team. In my experience, this is a fairly uncontroversial statement in any team, as most people have a grasp of why PRs are useful.

However, as for testing and any part of the engineering work that doesn’t visibly advance the creation of production code, this intuitive idea of usefulness sometimes is not enough to protect the practice against the pressure of tight deadlines and management.

When you have to cut corners, some corners are just more vulnerable than others, and PRs are one of those.

I would go as far as to say that, without a conscious effort in weaving good PRs in your culture, you will find yourself in the situation where a non negligible share of people in your team secretly despises them. And you will be surprised by how effective these people can become at neutralizing the practice and make it useless, without anyone being able to notice.

Objectives

To understand why this is possible, first we have to define what are the objectives of a good PR process. In my book, they are:

  1. Improving the quality of code before being released

  2. Sharing knowledge across the team

  3. Keeping the PR cycle itself short

You may notice that there is a tension between the last objective and the first two. This is healthy, because we want to make sure we strike the right balance between shipping things fast and shipping things that are correct and under control of our team.

Once we agree on the objectives, we need to find a way of measuring progress — that is, defining quantitative metrics to track our objectives. This relationship between qualitative objectives and quantitative results is the core teaching of the OKR methodology. You see many examples of it being applied to company-wide contexts, with complex trees of nested goals and key results, but the truth is you can apply its basic principles everywhere. Even in very small processes like this, or your personal life.

Before getting to the actual metrics, I want to reflect on the fact that none of these come "out of the box". They require some effort to be set up. Failing to do so, however, keeps the process invisible and unable to inform engineering managers about its value. Moreover, engineers themselves will be incentivized to cut parts of it to prioritize outcomes that are more visible and prized (e.g. shipping things within a deadline), instead of invisible ones (doing a good PR).

This last part is crucial: if you have a process X that is acknowledged and appreciated within the company, and a process Y that is in contrast with X but doesn't get much recognition, in the long term process Y is doomed.

Metrics

The PR cycle is a multiplayer game, where players can have two roles:

  1. Submitter

  2. Reviewer

Submitters and Reviewers are subject to different obligations. In particular:

  • Reviewers perform an assessment of the PR, point out where code can be improved and make suggestions. We want them to do it thoroughly, write things that make sense, and do it fast.

  • Submitters, after submitting the PR, address the comments left by Reviewers and improve the code where needed. We want them to reply fast and implement the right corrections.

In our team, we make use of these metrics to track the respective obligations 👇

About Reviewer's metrics:

  • Reaction time: it's the time that passes between the submission of the PR and the reviewer leaving her comments. We want this to be low.

  • Influence: it's the percentage of comments that later get actually addressed by the submitter. We want this number to be high, because we want the reviewer to write things that make sense.

  • Review coverage: it's the percentage of the code that gets addressed by the reviewer's comments. We want this number to be high enough to make sure the review is substantial, but not so high that makes us wonder whether the whole PR is wrong. Review coverage and influence are two conflicting metrics that help keep the process healthy: if a reviewer wanted to keep her influence high, she may be tempted to comment only on things she is really sure of, but that would hurt the review coverage.

  • Involvement: it's the percentage of the overall team PRs where that particular reviewer is involved. We want this number to be balanced with respect to our team size, to make sure everyone is involved in the process.

About Submitter's metrics:

  • Responsiveness: it's about how fast the submitter responds to reviewer's comments. We want this to be high.

  • Comments addressed: it's the share of reviewer's comments that get actually addressed by the submitter at least via a comment reply. We want this number to be quite high because reviewer's comments should not be ignored.

  • Unreviewed PRs: it's the share of PRs that get approved without review. We want this number to approach zero.

  • Receptiveness: it's the share of reviewer's suggestions that cause actual changes to the code before the release. Again, we want to be this high enough to make sure the submitter listens for feedback.

We track these KPIs by using an engineering analytics tool, such as Waydev, Pluralsight Flow or LinearB. They are all excellent products (no affiliated in any way) — they track everything automatically and provide real help when they are embedded into the daily life.

Dashboard example from WayDev ☝️

Using Metrics in your team, a real world example

Once you have these metrics in place, you can use them at different levels.

  • At company level, you can use them to set OKRs for the whole engineering team, with clear performance goals that everyone can understand and check herself against.

  • At team level, you can use them to drill down into specific issues from the various development areas: are there any patterns you can recognize? Are there issues in one team that seem to be absent in another? Suddenly, having the data unlocks conversations that wouldn't have been possible before.

  • At individual level, you can leverage the data for 1:1s, for setting individual goals, and for having a high level picture of how a person performs with respect of that particular activity (PRs).

As an example, after we started tracking PRs, we soon discovered that ~20% of our frontend ones got merged without review. This led to an investigation that highlighted the following root causes:

  • We had few, often overwhelmed people working on that area, that made them feel they should go light on some PRs for the sake of fast delivery.

  • Codebase knowledge was split between team members with very little overlap, so that each time it was hard to find someone who could review the code properly.

We started a team-wide effort to reverse this trend, sharing more knowledge between developers and enforcing reviews for all PRs.

Within a quarter, we were able to bring unreviewed PRs close to zero, celebrating a big win with everybody.

-97% of unreviewed PRs in 3 months 🎉

Improving Metrics

Defining the KPIs is an important step to make sure the team works in the right direction, but it's not enough. For things to really pick up, Engineers must feel that Managers are supportive and care about this.

This means having regular team reviews where these numbers are discussed, surfacing the topic in 1:1s, and doing everything it is needed to create a culture where these are treated like fully-fledged company objectives.

In other words, if you are a manager, you should continuously signal that you care, to counter The Drift.

Finally, you can also setup practical tools and workflows to help. We have had some success integrating Pull Panda (now acquired by Github) into our workflow, with Slack notifications. Pull Panda notifies people when they are assigned to a review and sends periodic notifications to make sure no one forgets about them.

It also encourages keeping a low number of open PRs at any given time, keeping the average PR size under a defined threshold, and more. As a team, these served as secondary objectives that told us we were moving into the right direction.

Below are two pictures of how we were able to improve our PR cycle after the introduction of Pull Panda.

As soon as your process becomes measurable, opportunities for improvement become evident, as both Engineers and Managers get empowered with the tools to make progress.


Hey, I am Luca 👋, thank you for reading through this post!

Every two weeks I write something about making software, working with people and personal growth. If you haven’t already, you can subscribe below to receive new posts in your inbox!