π― Sleuth β alignment at all levels
When priorities change, here's a simple process every engineering leader should follow:
βοΈ Re-align β on goals and re-allocate resources as needed.
π£ Communicate β the change to the team, or the entire org.
π Monitor β check if everyone is rowing towards the new direction.
This week I am happy to promote Sleuth, which makes this whole process easy.
Sleuth is a long-time partner of Refactoring, and it helps leaders align at all levels of management. Find out how π
Back to this weekβs ideas!
1) β° Should on-call be mandatory?
This is a question I get surprisingly often via email.
In my opinion, being on-call can be considered part of the duties of any engineer, so there is no wrong with making it mandatory.
Especially if you are introducing this from scratch, a mandatory on-call for all engineers is healthy to build ownership and to make sure no critical areas of the code are left uncovered by docs and playbooks.
Eventually, however, consider making it voluntary. People may be more or less okay with being on-call depending on things like their family responsibilities. If the team is big enough and the process has been battle-tested for a while already, then it's probably best to let people choose if they want to do on-call.
I wrote a full piece on creating good on-call processes π
2) π·οΈ How do you extract web data?
There are various ways to extract web data, on an ideal line that goes from buy to build. Most people are familiar with direct scraping, while for many use cases there are simpler and faster ways.
Here are the four main ones I know of:
Ready-made datasets ποΈ
Several companies (e.g. Bright Data, Oxylabs, or Webz) offer pre-collected datasets on various topics. These datasets are pricey but also incredibly rich, and they get refreshed on a regular basis. For example, for ~$50K, you can buy:
πΌ 500M Linkedin profile records β to power a massive outreach campaign.
π 270M Amazon product records β to perform market analysis for your e-commerce, inform pricing and predict trends.
π 130M Zillow listings β to spot real estate trends and investment opportunities.
Third-party APIs that wrap websites π·οΈ
Just like you can buy datasets, you can buy access to APIs that wrap popular websites like Amazon or Google. But why should I use a wrapper instead of the APIs of the same service?
Answers depend on the service, so letβs make the example of search engine (SERP) APIs:
π Higher limits β higher thresholds for data access, and more data retrieved in a single query.
π Tailored fit β to accomodate specific use cases, so they are 1) simpler to use, and 2) more feature-rich. E.g. Tavily is a stripped down API that is optimized for LLMs.
πͺ Resilience β they have mitigations against IP bans and CAPTCHA challenges.
π Multiple suppliers β they provide results from multiple search engines at the same time.
Hosted web scraping APIs / tools π₯οΈ
If you need more flexibility but don't want to reinvent the wheel, there exist plenty of hosted web scraping tools and APIs.
These services provide the endpoints and the infrastructure to scrape data from any website. Popular examples include: ScrapingBee, Scrapfly, and Bright Data.
Implement from scratch π§
When developers look at hosted scraping services, they are always surprised by how much they cost. The problem is that scraping is one of those tasks that seem simple until itβs not. In my experience, there are three main challenges:
π Dynamic Content β modern, JavaScript-heavy websites are tricky to scrape
𧩠CAPTCHAs β many websites don't like being scraped.
π« IP Blocking β if a website catches on to your scraping, they might ban your IP.
There are many ways to address these problems one by one, but they are not always trivial.
So, unless 1) web scraping is 100% core to your business, 2) there is relevant IP you want to build around it, and/or 3) volume is very high so that cost is an issue, I actually discourage people from implementing everything from scratch.
Still, if you want to go that route, here are my recommendations:
For complex, interactive, JS-heavy sites where user interaction is required, I'd recommend using Playwright.
If you're dealing with simpler, static pages where you don't need to execute JS, libraries like Scrapy or BeautifulSoup work just as well.
I wrote a full primer on scraping use cases and technologies a few months ago π
3) π Feature Management Workflows
One of the biggest trends of 2024, which we often discussed, is the rise of product engineers (PEs).
This led, among other things, to more engineered workflows for managing features. Modern feature management has usually three components: 1) controlled rollouts, 2) experiments, and 3) monitoring. Letβs look at all of them:
Controlled rollouts π¦
PEs use feature flags to release new features only to a subset of users, to get early feedback and measure impact.
This helps in monitoring leading indicators (e.g., initial user engagement) before affecting lagging ones (e.g., overall user retention). E.g. releasing a new UI to 10% of users and comparing their engagement rates to the control group.
Feature flags are useful for many other things, which we discussed this in a previous article π
https://refactoring.fm/p/feature-flags
Conducting experiments π¬
Having reliable leading indicators + a good feature flag infra allows you to run A/B tests.
You can compare different versions of a feature, roll them out to different groups of users, and make data-driven decisions. E.g. you can test two different onboarding flows and measure their impact on new user activation rates (leading) before seeing effects on user lifetime value (lagging).
A/B tests are not the only way to run experiments. In fact, to be statistically significant, they need a good volume of data, which is not always available. Think of a B2B SaaS setting, where you have a few customers, or the checkout page of an e-commerce, where only (e.g.) 5% of your users ever get to.
In these cases, you want to measure feature adoption and retention, as well as gather qualitative feedback from relevant users, like those who tried the feature but churned.
Measuring outcomes π
Post-launch, product engineers closely track both leading and lagging indicators to assess feature performance.
They set up dashboards and alerts for key metrics, so they can iterate based on real-time data.
Example: Tracking daily active users of a new feature (leading) while monitoring its impact on overall product usage and revenue (lagging).
You can find our full piece on modern product engineering below!
And thatβs it for today! If you are finding this newsletter valuable, consider doing any of these:
1) π Subscribe to the full version β if you arenβt already, consider becoming a paid subscriber. 1700+ engineers and managers have joined already! Learn more about the benefits of the paid plan here.
2) π£ Advertise with us β we are always looking for great products that we can recommend to our readers. If you are interested in reaching an audience of tech executives, decision-makers, and engineers, you may want to advertise with us π
If you have any comments or feedback, just respond to this email!
I wish you a great week! βοΈ
Luca