Test Strategy

All posts in this series:

In a typical project, there are lots of automated tests. We already showed some examples like ShoppingCartTest before. This test is rather isolated. It does provide us with some confidence, but only about a small part of the system. We also wrote about different kinds of tests on different levels of granularity and isolation/integration. In most projects having isolated tests isn’t sufficient. We need at least some higher-level tests to be confident about the quality and correctness of the overall system. This leads to some questions:

Which kinds of tests do we need?
What is the responsibility of each kind of test, what is out of their scope?
How do we combine those different kinds of tests to gain the most confidence?

Those questions - and probably some more - should be answered consciously for each project. The decisions - as almost every project decision - should be driven by the individual requirements of the project. The decisions are a crucial part of the test strategy for the project. But what exactly is a test strategy? It is difficult to find a single definition of the term “test strategy” (see for example the definitions in Wikipedia or by the ISTQB). Yet, at its core, the definitions seem to have the same goal, “How do we test our system to ensure that the requirements are met?”. Part of a test strategy can also be the description of roles and responsibilities, a test plan, or the test environment. However, we will not go into these aspects here.

Test Types

We do not write automated tests just for the sake of having them. We write them, mainly, to gain confidence about the correct behaviour of our code. “Correct”, though, can mean slightly different things, depending on an aspect. Also, each aspect might require a different type of test.

In our Tests Granularity post, we already talked about some different kinds of tests that can be identified on the isolation-integration-axis of the test pyramid. In most of the projects we have seen so far this was the main aspect that identified different kinds of tests. But, depending on the context of our project, there might be other aspects.

When it comes to the correctness of the system, we’re not limited to example-based tests. Those are the tests we see most commonly, which verify that for a given input our system returns an expected output or transitions to a given state. We could also decide to test parts or the whole system using property-based tests.

If performance is a critical aspect of our project we might want to have some performance tests in addition. We might also need some automated stress tests to make sure our system can handle the expected amount of requests. The same goes for security. Maybe we need some automated (or at least regularly executed) penetration tests of our system. In highly regulated project contexts there may also be some other kinds of tests that we have to include in your test strategy.

Generally speaking, the necessary types of tests and their number depend on the quality requirements of our system.

One other thing that has a huge impact on our test requirements is the release and deployment strategy of our project. If our system is deployed automatically after each commit (continuous deployment) we completely rely on our automated tests to detect any (critical) bugs. If, on the other hand, our system is released and deployed on a manual basis (at least to production), it might be OK to keep our automated tests a little more relaxed. At least, if we can ensure that the resulting gap of confidence is filled by a comprehensive set of manual tests that are executed before the deployment.

That brings us to another point in selecting the necessary kinds of tests for our overall test strategy: not all of those tests have to be automated. It’s perfectly fine to decide that testing some aspects of our system in an automated way is too difficult or too expensive, and they will be tested manually, at least for the start. We could even decide that other aspects are just nice-to-have and not important for the overall functionality of our system and so will not be tested at all. For all those decisions it’s important to make them consciously, based on the project context (maybe some assumptions). Ideally, they’re all documented in the test strategy.

Sticking to our online shop example, we might decide that we need the following kinds of tests to be confident about the correctness and quality of our system:

end-2-end tests, testing the shop as a whole (via the UI)
more detailed tests of the main components of the shop (e.g. shopping cart, catalogue, search, …)
isolated test of single classes/methods on demand (if they help in addition to the previous ones)
performance tests of the main click paths (especially catalogue, shopping cart, checkout)
design, look-and-feel, accessibility will be tested manually
penetration tests will be done once before the go-live of the shop, then regularly every year

Responsibilities and Composition

As we wrote before (Why You Should Write Automated Tests), writing and maintaining tests does not come for free. Writing them takes time. Executing them takes time. Maintaining them takes even more time, respecting the fact that they usually stay in the project and have to be kept in sync (directly or indirectly) for a long time.

It bears repeating, test strategy is about getting as much confidence as possible/necessary about the behaviour of the system with reasonable effort. After implementing the strategy we should feel safe when releasing the system after all our tests succeeded.

So, the goal is not to write as many tests as possible. Instead, we want to write just as many tests as necessary. This means that, in an optimal set of tests, usually called a test suite, every single test should have its own, distinct focus. The tests should be MECE: mutually exclusive, collectively exhaustive. On one hand, they should cover all important aspects of the system, on the other, they should not overlap.

As you might have experienced by yourself, this kind of optimal test suite is very hard to achieve. Especially keeping the tests mutually exclusive becomes even harder, if not impossible if we combine tests of different granularity. Although, we should still try to get as close as possible to this goal to gain the most benefit from our tests spending the least effort.

To be able to do this we need to be clear about the responsibility of each kind of test. Of course, this goes hand in hand with the decision on which kind of tests we need. If we are missing another kind of test to be confident, this automatically means that there’s at least one aspect that is not yet covered by all the tests we already have. So the responsibility of the additional kind of test is obviously to cover exactly this aspect. It’s also possible to explicitly exclude some aspects from the responsibility of one kind of test and move it to another one.

Once we are clear about the responsibilities it’s important to keep these in mind when writing the different tests. A test should focus on its own responsibility and, if possible, ignore everything else. If we do not stick to this rule, we will end up testing one aspect multiple times in different tests, and usually in different ways.

Let’s go back to our example and have a look at the responsibilities of the different kinds of tests to get this a little clearer.

We decided to have some tests of the main components of the shop. They should ensure that each of these components works as expected. A component will probably contain several smaller units as well as some infrastructure (database, search index, http, ui, …). So those tests will need some level of integration which will increase their complexity, development effort, execution time, and so on.

There will probably be some details in the implementation of each component that can be tested much easier with some more isolated tests. So we decided to have those tests as well for those details. This of course means that those details should then be ignored by the higher-level component tests. For example, the component test of the shopping cart might test that the subtotal amount is calculated if a product is added to the cart. Some detailed isolation tests of the calculation function might ensure that the result is correct, even if the product was added with quantity 2, 0, -1 or NaN.

On the other hand, knowing that each component works well in isolation does not feel confident enough for the overall shop. So we want to have some end-2-end tests to ensure that the components properly play together and the main use cases are working. Those end-2-end tests should try to ignore the details which are already covered by the previous tests (e.g. ensuring that the shopping cart amounts are calculated correctly) but focus on the interaction between those components instead (e.g. adding a product from the catalogue to the shopping cart). To verify this functionality those tests do not need to care about the details of the product or the concrete amounts of the shopping cart. It should be sufficient to verify that, after adding the product, the shopping cart contains one item. Maybe it’s necessary to ensure that it’s the expected item, but this should be done as superficially as possible, ignoring the details.

Combining the different kinds of tests in this way, they form a kind of layered test architecture, where each layer focuses on its own responsibility and relies on and trusts the layers below to care about the more detailed stuff.

Tooling and Implementation

Apart from what we should test and how to combine those tests, our test strategy should also say something about the “how”. Even if we correctly identify and create the suites and tests themselves, we might choose better or worse ways or tools to implement them. For example, for performance testing, we could write the tests using JUnit only, but probably choosing a tool specifically designed for this job (like JMeter) would be a better idea. It’s also a good idea to use those tools consistently. If we need to write performance tests for several different components, writing them, in the same way, using the same tool will help us keep the overall complexity of our tests at a reasonable level. In this case, the test strategy is the right place to describe how performance testing is handled in our project.

Another aspect of the tests is how to write the test cases. One choice could be to completely specify/write them in the code. On one hand, this would make the life of the developers easier, there would be no need to learn new tools or use different languages for different purposes. On the other hand, this could prevent non-developers from writing tests or even reading and understanding those tests. If we would choose to write the test cases in Cucumber, JBehave, or a similar tool, we would be able to specify the test cases in plain English (or some other language). That could enable us to automatically generate more readable test reports. Although, it would probably mean more complexity and effort, because of the additional tool (layer).

One more thing to consider is the testing environment. Some systems might require some hardware to be tested on (think firmware). Testing those systems on a regular PC or a server cannot be done at all or only with a limited scope. In that case, this environment might heavily affect how we can test our software. We might also decide that we will create separate test suites, one for testing locally, and one for testing in this special testing environment. On one hand, we would be able to run some tests fast and get quick, although limited, feedback. Executing this other test suite requiring special hardware could be done every night. Feedback would be far better, although delayed. On the other hand, that would probably require some tests to overlap, would definitely need more effort, but, depending on the context, could still be a good idea.

Scheduling

Apart from deciding upon what we need to test, which tools we will use and how we write the tests, we also need to agree on when to run each kind of test. The easiest answer, all of them as often as possible, is usually not realistic. As long as the system we’re working on is relatively small we could do that. But sooner or later running all of our automated tests will simply take too much time to execute them every few seconds or minutes. And that will happen even though we were careful about having our tests run fast (see our Tests Granularity post).

Our testing strategy might for example say that we need to run all of our automated tests on every push to the repository. Our CI/CD tool will then try to build and test the whole application. If the tests take a few minutes, it shouldn’t be a big problem. It is also common for certain types of tests (e.g. unit tests) to be run locally on the developer’s computer regularly. For this to be possible without problems, the test strategy might say that certain test types must be executable locally without side effects on other systems.

Still, that might not be enough. Let’s assume we have one test suite dedicated to load or stress testing our system. Such a suite might need to run for quite some time to allow the system to warm up the caches. If such a suite takes an hour to complete, do we really want to execute it on every push to the repository? It would probably be enough to run such a suite before merging into the main branch. We could also decide to run it every night on each branch, just to notify the teams if they’ve impacted the performance of the system negatively early on.

Another aspect to consider would be cost. Even automated tests can be expensive to execute. Load testing might require a lot of infrastructure. Even if we use cloud services, we might find out that spinning up the whole setup several times per day leads to considerable expenses. A similar problem arises when we need something special for our tests, like dedicated hardware (again, think about developing firmware). We might only have so much of it, and there might be several teams competing for access to this hardware. In such cases, the test strategy might prescribe how to efficiently use available resources and still achieve enough confidence.

Also, don’t forget that not all tests need to be automated. Those tests that are not automated would need to be executed manually, probably shortly before the release and deployment of our system. Similar to hardware limitations, here we also need to take into consideration how many people would be able to execute those manual tests, how much time those tests take, what other responsibilities those people have, etc. We might find out that we will need to hire more testers to satisfy the testing goal defined in the testing strategy.

Running the tests too often might create too many costs or require too much effort to justify the results. On the other hand, running the tests too rarely might lead to a situation where we wouldn’t know about a problem we’ve introduced for too long. Once we notice something is wrong, it might not be that easy and cheap to find and fix the problem. The test strategy needs to find a balance.

Conclusion

A test strategy is a very important document for each reasonably big project. It defines which aspects of our system need to be tested, how we address those aspects using different kinds of tests, and how often we run those tests. Its main purpose is to help everyone involved understand how we ensure that in the end, our system will be of good quality.

Many thanks to Joachim Praetorius for his feedback and suggestions to improve this post.

Header photo by Pedro Miranda on Unsplash