How to Test Software 🔍
A practical and opinionated take on one of engineering's most divisive topics.
Testing is an ever-controversial topic in software.
You would be hard-pressed to find any engineer who doesn't believe that testing is crucial to their workflow. Despite this, many teams struggle and are not happy with it.
Additionally, the general consensus around what good testing looks like seems to shift all the time. Testing is so ingrained into all parts of the dev workflow that any change to such a workflow leads to changes to testing.
For example, introducing feature flags, or switching to trunk-based dev, or adding or removing a Staging environment, or using AI to write code may all bring changes to your testing strategy.
In this article, I will write my take on what good testing looks like, and suggest a practical workflow that works today, in 2023.
Here is our agenda for today:
⚖️ Contracts vs Implementations — an important mental model to reflect on testing.
🔨 Regressions — figuring out the true ROI of tests.
🗂️ Types of tests — a basic classification with the main types you should know.
⛰️ Testing Pyramid — a classic testing model made popular by Martin Fowler.
🏆 Testing Trophy — a modern testing model which I personally prefer.
🔄 My Testing Process — we put all of this together to design a practical testing workflow.
✨ Other types of tests — we go beyond the basics with chaos engineering, load, security, and data testing.
💬 Community examples — how Product Hunt and Swarmia do testing.
📚 Resources — as always, further articles and resources to learn more
Let’s dive in!
⚖️ Contracts vs Implementations
Our relationship with tests is similar to the one with docs, security, and all those investments where we sacrifice something today for some benefit in the future.
Therefore, we should write tests only when their future value is higher than their writing effort. However, figuring this out is easier said than done.
To help with this, it is useful to think that any piece of software is the combination of a contract and its implementation. This stands true at any level of granularity, be it a small function or a large component.
📃 Contract — specifies the behaviour in terms of what outputs are produced from what inputs.
🔨 Implementation — the internals of how such transformations are made.
All types of tests verify that some contract is respected, while treating the implementation as a black box.
The various types of tests, such as unit, integration, or end-to-end, differ mostly in the scope and size of such contracts and black boxes.
For example, the way a UI works when you click things around is a contract between the software and the user. Similarly, the signature and semantics of a function represent a contract between the function and the other code that invokes it.
🔨 Regressions
Tests avoid regressions by enforcing these contracts whenever you make changes to their underlying implementations.
Tests are also useful for other things, too, such as documenting what these contracts are about or helping create better designs (e.g. with TDD). These benefits are sometimes controversial, but even if we agree to all, we are still talking of secondary stuff.
Writing and maintaining tests is expensive. If they didn’t catch regressions, we wouldn’t write them, period.
So, based on how and when the contract or the implementation changes, we have three scenarios:
Contract doesn’t change + implementation doesn’t change → Test is not useful
Contract doesn’t change + implementation changes → Test can catch a regression
Contract changes + implementation changes → Need to change the test
The only scenario in which a test repays itself is the second one. In the first one, the test is irrelevant, while in the last one, it is even a liability because you have to update it to reflect the new contract.
So, you want to invest in tests whenever there is:
High chance of implementation change, and
Low chance of contract change
Based on that, just how effective are the various types of tests? Let’s list them first.