How to Create an Engineering OS for your Organization π§βπ»
A practical canvas to reflect on how your team operates + a ton of examples from the community.
Refactoring was born to help engineers write good software and work well together.
This is a broad goal that is shared by other products and communities as well. I got in touch with many of them during these years, often doing some work together, partnering on articles, and more.
One of them is, of course, Plato.
Plato is an awesome mentoring platform and community, and recently got acquired by Coda π which gave me the chance to reconnect with Quang and Jean-Baptiste β Platoβs founders β and catch up.
This turned into an extremely interesting chat. They told me that they are building a comprehensive framework about how engineering teams operate, which they call EngOS.
EngOS has a lot of overlap with my own way of thinking about how teams work, so we wrote this piece together to explore this topic!
As we partnered on this piece, Plato also graciously provided an exclusive 60% discount to Refactoring readers for their upcoming Elevate conference.
You can find the tickets and the discount below:
πΒ What is an OS?
One of the first memories I have of my time as a startup founder is me printing a giant version of the Business Model Canvas, and hanging it to the wall.
I remember being confused by it: I didnβt know what to put in the various boxes, and even when I did, I was not sure it was the right stuff.
In hindsight, such confusion was the whole point of the exercise. The Canvas is not a framework in the sense of it being prescriptive about what you should or shouldnβt do. It is a template β a set of building blocks 𧱠that you can use to create your own framework.
Back to engineering, whenever we ask how teams should do this and that, we often get βit dependsβ as a response. But what if we take a step back? Can we define a standard set of areas β the equivalent of the Canvasβ boxes β which you can fill with your own processes, to define how your engineering org operates?
Thatβs what EngOS does π
π§©Β EngOS
Plato is a mentorship platform for engineering leaders. Quang started it in 2017, and, in 7+ years, they have worked with hundreds of companies, thousands of mentors, and hosted tens of thousands of sessions.
At some point, you naturally start to see patterns.
Quang tells me that every engineering team has an operative system. They may not call it that way, but they have one.
So, like a computer OS, an engineering OS is made of various pieces, each having a specific function. The pieces of EngOS answer questions like:
How do we make decisions?
How do we plan work?
How do we grow the team?
Depending on the scale and stage of the company, some categories may be informal or undefined, while others more crisp or inherited from the overarching organization. Also, as an engineering leader, chances are you can only influence some of them at your level.
So, EngOS defines four major areas:
π‘Β Execution β how things get done.
π΅Β Team β how teams are run.
π’Β Engineering β how engineering work is performed, from design to operations.
π£Β Personal productivity β how individuals operate.
Each area is further split into multiple items, so you have ten of them in total π
Once they identified these items, Plato started collecting notes about how the most successful companies run them. For example:
How Coda does Planning π‘
How Webflow does Hiring π΅
The driving principle of the EngOS is: do not reinvent the wheel. Whatever challenges you are facing at work, chances are other teams have faced them before, and they may have found a solution you can be inspired by.
So, in this article I reviewed all the individual items from EngOS, added my own ideas, and attached the best examples I could find from the Plato community and from Refactoring, that you can use to learn more.
Think of this as a reference piece that you can go back to from time to time to get some inspiration, or as a checklist to see how you are doing.
Letβs dive in!
π‘Β Execution
Execution is how things get done at your company.
This is a fractal topic: while at a high-level there should exist principles and processes that are shared across the whole org, the more you go into details the more teams should have the autonomy to run their own show.
Letβs look at the three main dimensions of execution:
1) Planning
Planning is an ever-controversial topic in engineering. The best planning processes strike a delicate balance between long-term goals with short-term agility. They create alignment across the organization, while empowering teams to find their own solutions.
Here are some powerful ideas about this:
βοΈΒ Coda Handbook for Planning & OKRs β Oliver Heckmann wrote a detailed handbook which draws from his experience as Head of Eng at Coda, plus that of 14 years as VP of Eng at Google. I love his breakdown into the three blocks: Big Rocks, OKRs, and Resource Planning, and how precisely he addresses process and execution.
πΒ The Four Types of Work β in one of the most popular Refactoring articles ever, I address how to plan for the various types of work you encounter in an engineering org.
ποΈΒ Netflixβs combination of Goals + Freedom β in our interview with Kathryn Koehler, Director of Productivity Eng at Netflix, she explains how they set annual and quarter goals, and leaves the team the freedom + responsibility to go after them however they like.
2) Collaboration
Collaboration is the set of processes that make people work together effectively: the way you run meetings, the way people get status updates, the set of project mgmt tools, and so on.
Here are a few questions you can ask yourself to start assessing the state of your team collaboration:
What recurring meetings do you hold?
What is a recurring meeting you have added over the last 6 months? Why?
What is a recurring meeting you have removed over the last 6 months? Why?
What is an example of a collaborative process that is run regularly and doesnβt involve meetings?
How can people learn about the status of a project / task without asking anyone?
Does any of these questions make you uneasy? Why? Start from there.
Here are also notable examples that we covered in the past:
π¨Β How to Build a Company around Crafters β in this interview with Kaz Nejatian, COO of Shopify, we walked through how they abruptly removed all recurring meetings from the company.
πͺΒ How to Reduce Meetings β a framework + a detailed case study on making meetings reduce until they eventuallyβ¦ disappear β¨
π§°Β How to Create a Great Remote Team β a full, long guide about what makes a remote team extraordinary, and how to create one, by taking care of communication, docs, and relationships.
3) Decision Making
How teams make decisions is an often overlooked part of how they work. It doesnβt feel like something that should be designed, yet you can easily measure how well you are doing it by asking two questions:
Efficiency β how long does it take to make a decision? Think of feature prioritization, design dilemmas, or build vs buy situations: how fast do you converge?
Quality β how sticky are your decisions? How often do you have to backtrack on them?
In my experience, here is what enables good decision making:
βΒ Shared principles β you want people to make choices that are in line with your teamβs culture. Different people tasked with the same decisions should mostly come to the same conclusions, and that is only possible if they share the same set of values.
π Good context β people can only make good decisions when they have full visibility into the elements that go into it. As a manager, you should lead with context: provide all the necessary data to get to a decision, and empower your teammates to go for it.
π Templates and procedures β for specific and recurring situations you may write down procedures about how to perform them. You may create a template for design docs, a set of criteria for buy vs build, a checklist for code reviews, and so on. Procedures easily turn into mini frameworks that speed up work immensely and improve its quality.
Here are some resources to learn more about this:
βοΈΒ The 7C Framework for Buy vs Build β a thoughtful, and detailed framework from Sue Nallapeta, CTO, about choosing technology. It considers 7 factors: core capability, cost, complexity, competence, cohesion, competitive advantage, and culture.
βΒ Engineering principles β my own take on how to create good engineering principles for your team, with plenty of examples from successful companies.
π²Β Mental models for engineers and managers β the collection of my favorite mental motels that helped me at work.
π΅Β Team
This is how teams are run across their lifecycle. It includes three areas: formation, management, and calibration:
1) Formation
Structuring teams is a complex process that includes hiring, allocating headcount, creating reporting structures, and more.
Teams go through different stages, which also influence their performance, as in the Tuckmanβs four stages π
Here are some resources that can help you across these stages:
π Webflowβs Playbook for Hiring Engineers β the story of how Webflow ditched LeetCode, created a simpler hiring pipeline, and measured how many engineering hours it saved in the process (spoiler: many!)
π― How to Structure Engineering Teams β our own full guide on how to create efficient team structures, based on scope, growth stages, roles, and more.
πΊοΈ Should you use Scrum in 2024? β aside from the cheeky title, this is a reflection on how team maturity impacts your processes and operations.
2) Management
Management is an extremely broad term. In the scope of EngOS, it is about coaching and mentoring your teammates. Think onboarding, 1:1 meetings, training, health and growth.
It is always hard to give universal management feedback, as it depends on so many factors. Recently, in the newsletter, we discussed what the right amount of management looks like, and the evolving role of Engineering Managers.
From Plato, I also loved Anant Guptaβs talk about lessons learned from his experiences at Uber, Linkedin, and Included Health.
3) Calibration
Calibration is all-things performance management.
This topic can get more or less nuanced based on the growth stage of your company, and include things like performance reviews, career frameworks, compensation, and succession plans for critical roles.
About engineering performance, I liked Gustoβs simple approach, which takes care of four axes:
ποΈΒ Project β the direct impact you or your team (if a manager) have on deliverables. It's best described in terms of customer behavior changes (e.g., increased product usage, higher NPS) or learnings that influence product direction.
π¨Β Better EngineeringΒ β improvements in systems that boost engineer effectiveness. It includes reducing test time, error rates, or new hire ramp time. Its importance grows with seniority due to the broader expected impact on the overall system.
πͺ΄Β People β enhancing team efficiency and health through things like hiring, mentoring, providing feedback, and code reviews. Metrics for People impact include team engagement, effective hiring, team health, and qualitative feedback from team members and peers.
π’Β Better Organization β improving the overall health of the organization. Examples include enhancing the hiring process through new interview questions or rubrics, driving diversity programs, and representing the company at external events.
I wrote more about calibration in two previous pieces:
π Β Performance Reviews β a first-principles approach to performance management + a practical workflow + wild ideas from successful companies.
πͺΒ Career Frameworks β what they are useful for, how to use them, and the various styles and examples.
π’Β Engineering
The last area is engineering-specific activities. EngOS buckets them into three categories:
1) Operations
This is how you manage maintenance, on-calls, incidents, and other operational tasks.
My favorite litmus test on this is how much proactive vs reactive work you do.
Great teams are able to operate mostly in proactive mode, through prevention and scheduled work. On-calls are peaceful and incidents are rare.
Struggling teams are rather stuck in reactive, fire-fighting mode. They routinely spend 50%+ of their time on operations, and product work gets constantly derailed by failures and P0 bugs.
We collected many community stories and ideas about this in these previous pieces:
π οΈΒ How to Plan for Maintenance β our thorough guide about creating a sustainable process for maintenance.
β¬οΈΒ Achieving 99.99% Uptime at DigiCert β if you are up for something more technical, here is a long and detailed case study by Wade Choules, SVP of Eng at DigiCert, about how they completely turned around their dev process to achieve high availability.
ποΈΒ Observability & Testing in Production β our recent interview with Charity Majors where we discuss, among other things, how teams should think about operations, testing, and continuous delivery.
2) Productivity
Engineering productivity is a divisive topic: most teams agree you should do something about it, but most disagree about what. Should you use metrics? Should you set goals? And what about developer experience? How should you work on it?
Fortunately, this is also one of the topics we cover the most on Refactoring. Here are some recommendations:
πΒ How to Use Engineering Metrics β this is my most popular article ever. Itβs a full guide on how to get started, what metrics to pick, and what you should use them for.
π«Β How we Drive Developer Productivity at Yelp β this is a piece I really enjoyed from the Plato community by Kent Willis, Director of Engineering at Yelp. It would be reductive to say itβs just about productivity. Itβs a summary of the 10-year engineering journey of Yelp, from 2013 to today, and how they went through the monolith, microservices, metrics, platform teams, and more.
ποΈ Engineering Productivity & Developer Experience β we recently interviewed Laura Tacho, CTO at DX about these very themes. Laura is a veteran: she is an accomplished engineering coach, and her work at DX focuses exactly on developer experience (you can pretty much tell by the name) and productivity.
3) Design
Finally, you should reflect on the state of your design process and documentation. In addition to what we already covered, here are two articles that should help you with it:
βοΈΒ Design Docs β design docs are the MVP of all docs. I wrote a popular article (which includes my own template) about them a while back.
πΒ How to Write Documentation β if you are up for something more thorough, here my full guide about creating good docs and making the team stick with them. Itβs a 3000-word piece full of additional references, links, and tools.
π£Β Personal Productivity
Last but not least, there is your personal productivity setup. I love how in the EngOS picture this is at the base: in fact, while much of your productivity depends on how your team works, there are important foundations you should take care of by yourself.
Everyone has their own preferences, but here are a few recipes that I have seen working broadly and I can recommend:
β±οΈΒ How to Manage Your Time β my personal setup, which includes pomodoros, timeboxing, a good calendar, and setting focus time aside.
π§Β How to Relieve Stress β understanding the sources of stress, monitoring your state effectively, and strategies to defuse burnout. This has become even more important for me since I work by myself.
π Atomic Habits β our community review of the best-seller book by James Clear. Itβs one of my favorite books: it has changed the way I think about my identity, progress, and many other things.
πββοΈΒ How to get started?
Once you understand the framework, you can use it to reflect on what you can do better with your team. Quang suggests doing so by holding an audit meeting.
It works in three steps:
πΒ Scope β You look at the 10 categories and choose 3-4 to include in the audit. These should be the ones with the strongest potential for improvement.
π³οΈΒ Survey β Before the meeting, you survey participants about pain points in the various categories.
π¬Β Discuss β You get together as a group and discuss the results of the survey. You may sort pain points by severity, and design actions to address them.
Here is also a free template you can use to run it.
πΒ Bottom line
Engineering leaders are tasked with making their teams operate efficiently.
Efficiency, however, is an all-encompassing goal that doesnβt strictly come from technology: it comes from good process, good culture, and from taking care of the team as a whole.
Not being able to make good decisions promptly is just as problematic as not having a good CI/CD process, yet it may be more easily overlooked. In fact, while this framework is called EngOS β as in engineering β I have found most categories to be relevant to any team, even non-technical ones.
Plato is building a library of community-sourced articles and templates to help solve the various friction points of your OS, and they will have many released by early June for their annual conference Elevate.
Elevate will be held in San Francisco on June 5th and 6th. As a Refactoring reader, if you want to join you can get 60% off the full ticket price by using the link below π
And thatβs it for today!
Sincerely
Luca