How much testing is enough testing? That is the million-dollar question.
Prior to Release
A mature software development organization may focus on testing prior to release, including thousands of unit tests, a significant number of integration tests, and maybe a few automated functional tests. Then, the organization might move onto penetration testing, load testing, and performance testing. An extensive assortment of testing is one intuitive, and potential, way of doing your due diligence.
On the other hand, a smaller growth-stage startup might be running guided exploratory tests or simple test cases continuously throughout the development process, taking time from their small team of developers or paying an external party to do so. However, it's important to note that this burden weighs heavily on smaller teams, may entail developer bias, and can hinder the production flow.
Following software release, the mature company might feature flag new things and release them to a small number of users -- up to millions for bigger products -- also known as A/B testing, something carried out by most of the larger software companies you’re familiar with today. Following a series of feature toggling, engineers look at the metrics from the tests as well as exceptions (i.e. errors reported by the computer during program execution). The development team will then use this feedback to decide how and when to implement new features, and how to test them.
Conversely, the startup may decide to lean on manual test cases, which entails testing domain and range. With a buildup of test cases, the team may begin to feel pretty confident after a while. However, it's unlikely they'd be testing the entire input domain (i.e. they may be missing test cases that they didn’t think to write). It can be difficult to track how much of the domain you’ve tested unless you assign a testing score against which you can -- albeit somewhat arbitrarily -- measure your testing progress. This, in fact, is a point in the testing process where guided exploratory testing would serve as a strong addition to test cases, helping to uncover issues that may have been hidden or unthought of before. It's important that this testing is done by real people on real devices, otherwise it is difficult to determine how the software will respond to real-world interactions.
What do you think? In both of the above cases -- mature software company or growing startup -- how would you go about measuring testing efficacy or completion?
We have some thoughts to get you started.
Here are three distinct arguments attempting to answer the above question.
The ROI Argument
Ultimately, whether you hire an in-house team or work with an external one, testing costs time and money. Once you've tested for long enough, you’ll at some point reach diminishing returns in the ROI of your testing. Everyone wants to optimize quality, but at what cost? You must weigh the cost of further testing with the security that this additional testing provides. Accordingly, as long as the ROI exceeds the cost of further testing, it seems logical to continue. This is always a true statement about everything, right? If you know that something has positive ROI, you should always do it!
The Relativity Argument
The comprehensiveness of testing is relative to the situation at hand. Every team, product, and type of testing is different. There is no universal “enough” metric. You must view the software you are testing in a lens unique to that software’s needs and expected performance (as well as the way in which it is being tested). In other words, there is no universal way to acknowledge that you’ve done “enough” testing, but merely an acceptable amount for the situation at hand (i.e. subjectively, do you feel comfortable with the quality of your product?). The question of how you measure this is too relative to ascribe a universal indicator.
The "Resistance Is Futile" Argument
It’s impossible to answer this question, as there is no such thing as exhaustive or fully comprehensive testing. There is always going to be something missed by a team, no matter how large or well trained, and no matter which type of testing is being performed. Your goal is to find the top 20% of bugs (responsible for the top 80% of issues) causing problems in your software; achieving anything more is a futile wild-goose chase, and anything less is incomplete.
Whichever of the above categories you fit into -- maybe somewhere in between -- you’re likely to agree that it’s a complex question, and you can’t answer it without specifying what kind of testing you're doing (at test IO, we resonate most consistently with the Relativity Argument). Accordingly, it’s far too simple to ask, “Have we tested enough?” You must also ask, “Have we tested in enough ways?”
But, how do you answer that? There are numerous types of testing, all catering to organizations at different stages and with varying needs (as highlighted in the above examples). As mentioned in a recent article we posted, the type of testing you decide upon depends on what you're looking to find. All of these test types have their own success metrics, some of which may be relative to what is being tested or to tester expectations.
If you're looking to run a functional test, you may be interested in smoke/sanity, compatibility, regression, or even user acceptance testing. Or, perhaps you're more interested in exploratory testing, where testers are given more range to test a variety of user flows to identify bugs that would otherwise slip past scripted tests. At test IO, we're evangelists of guided exploratory testing, whereby our testers are given broad-to-specific testing guidelines within the exploratory model. This allows them to find issues that may otherwise be missed by in-house QA while still focusing on a targeted area of software. We can do this through a variety of test types: rapid, focused, coverage, usability, or custom tests (where you can specify a combination of types and guidelines).
Ultimately, there are a variety of ways to test software, all of which can prove useful when applied to the right situation. The first step is to match the proper form of testing to your needs, then, and only then, will you be able to understand what a "enough" testing is for you.