Skip navigation EPAM

How to Combat Bias in Software Testing

Stephen Margheim


Automated tests are awesome. They are fast, scalable, and repeatable. But they aren't without their complexities and pitfalls. Likewise, manual QA testing is awesome. It is user-centric and fault-tolerant. But manual QA, like everything else, also comes with its own complexities and pitfalls. One of the particular areas I have been thinking about lately is how bias can be a hidden and pernicious pitfall for both automated and manual tests.

Automated tests are biased by their author, a developer. In most teams, the developer who builds a feature is the developer who writes the automated tests for that feature. It is all too easy for this developer, regardless of their level or experience, to write tests that are too tightly related to the specific code they wrote. Ideally, automated tests should test the behaviour of some particular part of a software system from the point of view of someone without knowledge of the inner workings of that system.

In development, we often call this an "interface." Our applications have "user interfaces," and our software has "logical interfaces." When working with an API, for example, that API is providing an interface via HTTP for external services—visit this URL and receive back this kind of data. When working with object classes in a codebase, those objects expose some interface for getting information about those objects or telling those objects to perform certain activities.

It is absolutely essential for a healthy codebase that the relevant parts have clearly defined interfaces.

One of the most productive ways of ensuring that your software has well-built interfaces is automated tests. But when a developer builds some new feature, they are focused not on the external interface but the internal implementation, and thus they are liable to write tests for that internal implementation.

For example, imagine an API for retrieving stock images for blog posts. You specify a category and some search terms, and the API returns the top five images in that category that match your search. A good test would prepare some initial data, maybe a couple of photos across a couple of categories, and then make the API call for a particular category with a particular search query where only one image is expected to be returned. If that specific image is returned, the test passes; if some other image is returned, additional images are returned, or if no images are returned, this test fails. This is a solid automated test that is testing the important aspects of the interface for this API.

When building this API, though, our developer is deep in the weeds of how images are tagged, how we search against those tags with the incoming query, and how we then order the matching images by relevance to find the top five. These are the things at the top of his mind. When he finally cracks one of these problems, in his excitement he rushes to put together a test so that he ensures he doesn't break the solution he just found at some point in the future. After doing this for each of the central technical challenges for this feature, we now have multiple different automated tests that ensure that each individual part of the implementation behaves in a certain way—but we have no test that the end result behaves the way that it should.

This is the most common bias for automated tests: testing the wrong things because of who writes the tests and when. However, there is no silver bullet for solving this problem. Most teams will never be able to hire developers focused solely on writing tests. Even if you could or did, you would then simply introduce friction and latency to your development process as "implementing devs" would have to wait for "testing devs" or vice versa, and the communication and hand-off costs would increase. There is no magically perfect way to build software and avoid all potential negative pitfalls. The best we can do is be aware and stay vigilant to keep people from falling into pitfalls at a minimum.

Manual testing has its own primary bias that we should be wary of. As discussed in a previous post, I discussed why users struggle to provide good bug reports. In short, a good bug report requires context, and users aren't paying enough attention to the evolving context of the system as they work to then articulate it when a bug appears.

However, there is a second reason that users are particularly bad at preparing bug reports, and that is because of a natural human bias. You see, when someone encounters a bug in your software, they are, by definition, surprised. They are also frustrated. And when we are surprised and frustrated, one of the first things our minds will do is attempt to re-establish some sense of control, most commonly in the form of analyzing the surprising and frustrating situation. "How did I end up here? Was this my fault? Ah yes, I clicked the blue 'Submit' button, but I hadn't yet filled out that last field." Our mind will naturally try to find a story that explains how we ended up in this situation. The problem is that most users most of the time don't have the right kind of knowledge of the system to analyze it properly, so they are very likely to come to a faulty conclusion. That is to say: this bias makes it likely that what context users do provide in a bug report hurts more than it helps. The fact that the field was empty and the button was blue are not important to the bug (in our imagined example here). Again, though, there is little we can do to fight against this bias.

For companies, the first thing you can do is set your expectations accordingly. You will essentially never get useful bug reports from customers. So, be sure to invest in good error tracking and logging services and hire professional QA testers to help you get quality bug reports from end-users. For anyone reading this who is an employee of a software company and has or might file a bug report, you can fight back a bit by remembering this bias and striving to simply provide raw information, as much as possible, with as little commentary as possible. This gives the developer the opportunity to analyze the situation and form a hypothesis of what happened, instead of being forced to accept your hypothesis. In the end, biases are pernicious things to root out of our systems and processes. Thinking more about them, seeing them more clearly, and talking about them more often can lessen their negative impact.