Playback speed
Share post
Share post at current time

Observability & Testing in Production — with Charity Majors 🎙️

Refactoring Podcast — Season 2 • Episode 3

Today's guest is Charity Majors, CTO at Honeycomb and one of my favorite writers. I believe I have recommended more articles from her blog than any other author.

🎙️ Episode

You can watch the full episode on Youtube:

Or listen to it on Spotify, Apple, Overcast, or your podcast app of choice.

🥇 Interview Summary

If you are a 🔒 paid subscriber 🔒 you will find my own summary of the interview below.

It’s the 10-minute, handcrafted takeaways of what we talked about, with timestamps to the relevant video moments, for those who don’t have time to sit through the 1-hour chat.

Here is the agenda for today:

  • ⚖️ Observability vs monitoring — what’s the difference, and what good observability enables you to do.

  • 💬 Intercom migration story — how observability saved Intercom from a painful, year-long migration.

  • 🚚 Continuous delivery — what does a good pipeline look like. Optimizing the flow and minimizing environments.

  • 🪂 Testing in production — why this is part of every engineer’s job.

  • 🤖 AI Engineering — how AI is changing how we write code, and a couple of concerns.

Let’s dive in 👇

⚖️ Observability vs Monitoring (video)

In the first part of our chat, Charity and I talked about the difference between Observability and other related concepts like monitoring, APM, and more.

Charity, who also wrote a book about it — explained how observability is a property of complex systems, just like scalability, or performance. Even more so, it is about socio-technical systems:

  • 📈 If you add more metrics, you are improving observability.

  • 🎓 If you educate your team on how to ask questions about the system, you are improving observability as well.

Observability is like the business intelligence of tooling: it’s the single source of truth where you can break down, zoom in, zoom out information about your systems, together with information about your customers.

More specifically, good observability is based on three pillars:

  • 📊 Metrics — offer insight into the health and performance of a system

  • 🔀 Traces — map the journey of a request or action as it moves through all the nodes of a system.

  • ✏️ Logs — provide detailed records of events and activities within the system.

Together, these form the backbone of any observable system, enabling teams to monitor and address system issues in real-time.

💬 Intercom migration story (video)

Charity makes a real-world example of how good observability enables teams to make the right technical decisions, taking the business context into account.

A few years ago, Intercom had to do a giant database migration because they had outgrown the largest EC2 instance size.

This post is for paid subscribers

Refactoring Podcast
Interviews with world-class engineering leaders about writing great software and working well with humans.