Discover more from Refactoring
Clay — A Higher Order Database 🧱
The story of the company that is mapping the internet.
Hey 👋 this is Luca ! Welcome to a ✨ bonus free edition ✨ of Refactoring.
Every week I write advice about some engineering or leadership topic, backed by my own experience, research and case studies.
Send me your questions and I will humbly write my take in the next newsletter issues!
Elephants are among the most ancient mammals in the world.
The first relatives of modern elephants appeared more than 40 million years ago, and they already shared most of the characteristics of the species we know today.
Over all these years, they have changed very little.
When I think of databases, I think of elephants — and not only because they both have good memory.
Databases are virtually the same today as they were 30 years ago. This was already peculiar 10 years ago, when I started my PhD in CS, focused on emerging database models. I compared relational and NoSQL data stores and developed a framework to make them interoperable.
It didn’t work out. The promised NoSQL revolution didn’t really happen, and the four most used databases today are all relational, and almost indistinguishable.
Over the past month, I have sat down with the team behind one of my favorite products: Clay.
They are building a database that looks nothing like we have ever seen.
This article tells the story of their product and their team. Here is what we are going to cover:
🔮 Vision — a compelling vision about the future of databases and the untapped potential of data.
🎨 Product — a cutting-edge product at the intersection of two of my obsessions: databases and no-code tools.
🔨 Technology — a deep dive on the tech stack and engineering challenges of a modern early stage startup.
🔄 Processes — a case study on the processes and culture of a tight-knit product / engineering team.
So let’s dive in!
This piece was written as part of the new 🌀 Refactoring Partner Program.
The program is an opportunity to work with exceptional engineering teams and write deep case studies about the way they work.
You can read about the guidelines I adhere to in the link above. I always note partnerships transparently, and only share my genuine opinion.
Feel free to reach out to share your feedback about the piece!
⚔️ Databases and spreadsheets
For being something that has been out there for so long, databases have plenty of evident shortcomings.
They are inflexible, they work at a low level of abstraction, and require specialized skills — both to read and write data. These problems didn’t go unnoticed for long, and eventually gave birth to databases’ biggest competitor: spreadsheets.
Spreadsheets are flexible, easy to use, and allow you to do more with data than simply store it. They put data to use with formulas, charts, and more. They give up, though, the formal structure of databases. This makes them harder to use at scale, or as pieces of a larger workflow.
Today there are plenty of products that try to combine the structuredness of databases with the ease of use of spreadsheets. The risk of such products is to be worse at both, and to struggle to find their own place.
To unsettle a stack that has been successful for 30 years, you need a vision.
🔮 Led by vision
When I first met Kareem, Clay’s founder and CEO, we talked for about an hour.
Kareem explained Clay to me for about 5 minutes, and spent the rest of the time telling me what’s wrong with databases today and where they will inevitably go.
Over the following weeks, after I agreed to write their story, I spoke with almost all Clay’s other employees. What struck me the most was how they are all deeply affected by this vision.
Their work is completely permeated by it, to the point that when I interviewed engineers about their processes and technical challenges, the first answer would often be: “sure, but let me tell you why we are doing this, first”.
Being a former founder myself, this is inspiring.
As a startup, your vision is your identity. It is what makes you punch above your weight. Vision makes your 5-people team execute better than Google; it hires people that are seemingly out of your league, and it wins investors when metrics alone wouldn’t cut it.
Let’s dive into this vision.
🔺 Higher-order databases
Kareem’s premise is simple.
Software engineering has largely evolved by stacking gradually easier-to-use tools on top of proven, lower-level ones. Programming languages are an obvious example: Ruby’s interpreter is written in C. C in turn outputs Assembly code.
We call newer languages higher-level, both because they literally run on top of others, and because they allow us to work at a higher level of abstraction.
Following suit, Kareem argues that regular databases are not going anywhere — instead, we are going to build higher-level databases on top of them.
What makes today’s databases low-level? To Kareem, it is the fact that they are blissfully unaware of the kind of data that they store. Columns think in terms of strings and numbers. What if a column knew it was a person, a company, or a Shopify store?
Clay’s bet is that such a semantic database would be able to connect and integrate data from multiple sources with ease, removing the pain of doing it ourselves. It would know how to take missing data, and auto-fill it.
That is exactly what Clay does.
🧱 Introducing Clay
Clay is a no-code tool with features in-between those of a database and a spreadsheet.
Today it is primarily used by non-technical users to automate tasks in sales, marketing, and recruiting. Engineers also use it to build complex workflows that would need code + schedulers + a regular database otherwise.
Clay’s uniqueness lies in being able to gather and seed data automatically from the most disparate sources. These sources can either add more rows or columns. For example, you may create an extra column to take a company’s number of employees from Linkedin, or hundreds of rows at once by pulling a list of restaurants from a Yelp page.
This data comes from two types of sources:
🔌 Structured — API integrations built-in for tens of popular services, like Clearbit, HubSpot, Linkedin, and more.
🕷️ Unstructured — users use the Clay’s chrome extension to scrape data from web pages and put it into their tables.
This combination of structured and unstructured data lies at the core of Clay’s vision, and it allows it to access virtually any information available on the internet 👇
🌐 Mapping the internet
80% of data on the internet is unstructured.
Unstructured means it isn’t organized according to a predefined data model. This in turn makes it hard to take such data and use it in a reliable way.
There are plenty of products that allow you to annotate elements on a web page and extract information in a structured fashion, like a JSON or a CSV file.
They rely on users manually doing the work of creating this missing data model.
What is different about Clay is that such mappings are public, so other people can reuse them and no one has to write the same scraper twice.
Clay is basically crowd-sourcing web scraping, mapping the internet for anyone to use.
Managing data is, unsurprisingly, the main theme of my chat with Clay’s engineering team.
Combining all these sources at scale is incredibly challenging. Think of all the services and websites you may want to take data from. Now multiply these for each row, each table, and each user on Clay. The result is an incredible load of external calls that you need to perform concurrently, in real-time.
Nicolae and the team tell me that a good chunk of the overall tech work goes into making sure that all of this runs smoothly:
We built our own tech for rate-limiting, timeouts, and caching. We use Redis as a dedicated database for real-time interactions, and store this information later on to Postgres.
Integrations are implemented as independent packages and run as isolated lambda functions. This allows a separate customer engineering team to add and change integrations even if they have less experience and at a faster pace, without risking a bug that affects the whole system.
So, this architecture serves two goals:
Making it easy to add new integrations from scratch (today it takes about two hours).
Keeping the infrastructure resilient to traffic spikes and sources’ outages.
Luca’s take: Clay designed their technical strategy around the element that should eventually create their competitive advantage: integrations. Making them resilient and easy to build empowers their business and makes the product move fast and hard to copy. This is exactly the role of a good tech strategy. See also:
After we discussed high-level architecture, I interviewed the rest of the engineering team about how they approach development.
What does your team look like and what are your skills?
On the tech side, we are a team of four engineers. Each of us has preferences and is either more backend or frontend-oriented, but we are mostly generalists who can complete full-stack tasks by themselves.
Luca’s take: startups in their earliest stages benefit from hiring generalists, instead of hiring for specific layers (e.g. backend). See also:
How does the deployment process work?
We deploy up to a couple of times a day. Bug fixes, small code changes, or big features alike should be available to customers as soon as they are ready.
Every branch comes with a preview link, so non-technical people on the team can preview the feature. We use QA Wolf, which is a testing-as-a-service team, to add automated tests on the UI so we can continue shipping quickly while avoiding regressions.
All code changes go through the staging environment before going to production, except for emergency bug fixes, which can skip the staging environment temporarily.
Each feature is checked by our designer on staging before it goes out. We squash commits so it’s easy to see when the feature has been merged, and to revert it if there’s an issue.
We have a CI/CD pipeline that automatically builds and deploys new commits to the corresponding environment. Builds & deployments take about 15 mins per environment, so it usually takes 30 mins to go from having the code ready to being available to customers.
Each team member takes their feature all the way to production, then posts about it in our #deployment-notes, and writes a little message so the team knows what came out. We also celebrate it to build momentum.
Luca’s take: shipping fast and often is the north star metric of engineering teams. Benefits trickle down to almost any aspect of your company. Top performing teams optimize their workflow to be able to release multiple times a day, in 15 mins or less. See also:
What databases do you use?
We use Postgres as the main persistence layer. We also use Redis for various use cases:
Real-time system for the UI (WebSocket + Redis)
Job queues (Bull)
Fast, frequent, but temporary computations (LUA scripts running in Redis)
We use ActiveMQ for queues that require scaling (in the range of 100Ks-1Ms queued jobs at peak times).
Do you do code reviews?
Large features and changes affecting critical code paths need thorough reviews, as their risk is higher. We usually ask the devs that are most familiar with the piece of code to do the code review.
Small features and bug fixes may or may not require a code review. It depends on the developer’s familiarity with our codebase and practices. In doubt, we encourage asking for a review.
Luca’s take: adopting a flexible approach around code reviews is sensible and benefits your velocity. Frameworks like the Ship / Show / Ask help you determine when to have blocking reviews, when to have post-release reviews, and when to avoid them altogether. See also:
How is testing done?
We have had unit tests for some time, but in its current form, it hasn’t provided us with a lot of value. We have started using a QA service (QA Wolf) that performs integration testing. So far, this has been helpful.
For smaller code changes, developers are responsible for testing their own changes. For larger code changes or features, we usually ask for another team member to help with testing.
Luca’s take: the role of unit tests in modern engineering has been under debate for quite some time. Unit tests let you find bugs at a highest level of granularity, at the expense of being more strictly coupled to your implementation. This means that, in a fast-changing environment, you will have to update them often to keep them relevant. Integration tests, on the contrary, are more resilient to product and implementation changes.
I stand by Guillermo’s take 👇
Kent’s Testing Trophy is also a great framing of where you should focus on, based on the business value you get.
What is the average day-in-the-life of a Clay developer?
We have an office in Brooklyn. It’s an open space with desks, a meeting room, a booth for calls & meetings, and a snack area.
Some people work from home; some people start the day from home and come to the office later during the day, and others choose to come to the office. On average, people come to the office 3 days a week — we keep Mondays and Fridays as flexible days for people to do deep work or take care of other things in their lives.
Writing as a way of thinking is encouraged, and we try to avoid interruptions so you can have deep flow. For example:
Stand-ups are replaced by a daily update on Slack.
One day a week we avoid meetings altogether (a.k.a. “deep-focus” Monday).
Every other week, we define goals and work items during sprint planning. These work items get assigned to the different devs on the team, and each of us is responsible for getting them to completion.
What makes Clay’s culture special?
We move really fast, both on the product and at improving our own processes. We do this by making ideas always welcome, so that individual input can have a significant impact.
For example, a team member recently started a format in Slack to make it easier to report new bugs. That inspired another team member to create a similar format for sharing customer feedback after demos. That in turn made it easier to standardize how we report feedback, and saved us a ton of time.
All of this happened completely bottom-up: the team adopted it because it was easy and helped, rather than because—e.g.—the CEO was enforcing it.
This is particularly relevant on the product side, as we are heavy users of our own product. We are constantly finding new use cases, trying out our own ideas and implementing new workflows.
We are in the unique position of building a product that enables creativity, so we think of ourselves as artists first, and then engineers. This is the kind of culture we strive to build.
🎨 Product Development
Everyone talks with customers.
When I sit down with Nash, Head of Growth, and Kareem, to discuss how they work on the product, this is the first thing they tell me.
We try to maximize our exposure to customer feedback by having multiple touch points, like demo calls, feedback calls, slack channels, and Intercom chats. This is crucial for us to stay close to the real problems.
Customer success engineers spend 80% of the time talking to customers, and 20% of the time coding. Software Engineers, in turn, spend 80% of the time coding, and 20% talking to customers. This helps clarify that people should not be siloed. It’s not that everyone should do everything — it’s about the proportion of how you spend your time.
Engineers particularly benefit from this approach. Many of them own specific features from top to bottom (and rotate on those over time), so these interactions allow them to get direct feedback on their work, and see first-hand how users perceive it. They never lose track of the final value they create.
This feedback loop goes straight into the product process, shaping three principles:
❤️ Big ideas + follow the love
🗣️ Qualitative over quantitative feedback
⚡ Create momentum
Let’s see all of them.
1) ❤️ Big ideas + follow the love
The team has a roadmap that is largely based on the long-term vision for the product. Based on that, product development goes through cycles that have two stages:
Take one of the big ideas and release a small, viable version of it.
Get feedback from customers and spend some time iterating on it.
Following the approach of Superhuman, the team doubles down on users who love the product the most. They hold bi-weekly calls with a select group of customers to constantly get their feedback. This feedback is then taken into high consideration and has a decisive impact on shaping the roadmap.
2) 🗣️ Qualitative over quantitative feedback
There is a substantial difference between getting insights from users and getting insights from metrics and analytics.
The former makes for qualitative, high-level feedback, while the latter is quantitative and specific.
Qualitative feedback helps you stay focused on what the real problems are, regardless of what your product does today. In a fast-growing environment where you may need to steer the ship fast, qualitative feedback makes sure you keep seeing the forest, and not just the trees.
Quantitative feedback has a place too, but most insights it brings are about iterative improvements. These are more useful at a later stage of the product lifecycle.
A-ha moments and big product breakthroughs are rarely found by looking at charts, but rather by talking to customers.
3) ⚡ Create momentum
Startups address incredibly hard problems, with low chances of success.
When things are going well, you can feel the momentum and the positive energy that carries you through and makes you go the extra mile.
Clay’s team builds such momentum by creating moments of delight. When a new feature request makes sense and can be done in little time (a few hours), the team just goes for it and tries to get it done the very same day. Later, they celebrate the win by sharing the customer’s response in our own slack channels, and in future planning meetings, too.
This creates a magical experience for the customer, and fulfilling, positive vibes for the team.
Clay’s challenge is tough. They are trying to unsettle a multi-decade-old stack which stands at the core of most companies’ operations.
Building a product that delivers this vision gradually, via sensible iterations, while also making it a successful business, is an ambitious and exciting journey.
So far, customers have been rewarding them. The Product Hunt launch — that happened while I was writing this — was a success, and Clay got the top spot of the week.
As a team, I love how they are obsessed with their vision, and at the same time how they manage to stay flexible and focus on what people want. Their approach is solid and malleable.
As a company dealing with clay, this should serve them well.