When you learn about automated testing one of the first things you stumble upon is the test pyramid, originally introduced by Mike Cohn. There are many different versions of this pyramid available on the web. Following are three examples.

Test pyramid by Martin Fowler
Figure 1
Test pyramid by Jessie Leung
Figure 2
Test pyramid by Julien Fiaffé
Figure 3

Although all of them show the pyramid form, the terms that are used within the pyramid differ from version to version. From my point of view, that’s a signal. There might be an issue with those names. And indeed, I’ve always struggled with them throughout my career.

In his blog post “The Practical Test Pyramid” the author Ham Vocke gives a (in my eyes) good summary of the essence of the test pyramid:

  1. Write tests with different granularity
  2. The more high-level you get the fewer tests you should have

To me, this summary seems precise and clear and in most cases I completely agree with it. Maybe, that’s because I understand granularity in this area mainly as a level of isolation vs. integration. In most of the projects I was working on, having a extensive base of low-level, isolated tests together with a few high-level, integrated tests was working very well.

However, I don’t like the names used in the different versions of the test pyramid. Especially the very common names unit tests and integration tests.

For a long time, my understanding of the term unit tests was that they should test exactly one class or method (I’m a java developer, feel free to exchange the terms ‘class’ and ‘method’ with whatever fits better in your area, it shouldn’t matter much) in isolation, mocking or faking everything else as necessary. I still think this is a very good approach and I try to apply it whereever it makes sense, but I had to learn that there are scenarios where it doesn’t make sense.

Take a spring data JPA repository for example. It’s an interface, defining some method signatures. There’s nothing you can test without the necessary part of spring data that brings it to life (by generating the code to implement those interface methods). And still you might want to test your repository to verify that you chose the right method name so that the generated method behaves as you expect it.

Well, I agree, that’s a very extreme example, but there are more.

Let’s assume you’ve decided against using spring data and instead write the repository yourself using plain JDBC. Now you have some code (maybe more than you want) that you could test in isolation. But, what would that look like? You would probably end up writing a test that calls a findSomethingById method and then verifies that your repository implementation calls the right JDBC classes and methods (you have mocked before) using the expected SQL statement and parameters. Assuming this test is green, what does that mean? The only thing you verified with this kind of test is your implementation. There’s no proof that the SQL query you used really does fetch the ‘something’ by its ID. There’s also no proof that you matched the right columns of your result set to the right properties in your result object. You still don’t know if your repository is working as expected. You would need to add some (so called) integration tests, using a real (probably in-memory) database, to verify this. And, even worse, when you notice that your implementation needs some improvement (maybe due to performance issues) you would have to adapt your unit test as well, because it verifies the exact implementation and you’re just about to change that. What’s the value of a test, if you have to adapt it along with the implementation? If you have to change your implementation, you want to have a test that stays untouched and verifies that your repository is still working as expected. That again would be the integration test.

But what does the term integration test mean? Obviously, it describes the fact that something is integrated. But what? In our previous example, it would be the repository class along with the required JDBC functionality and some database that contains the underlying table. So, all the parts that belong to the repository as a whole. Or, in other words, all parts that belong to the unit of the repository.

And that’s the point: Those two terms are typically used as something different, or even opposite. But they are orthogonal, in my eyes. A unit test might need some level of integration to be able to test the functionality of the unit. A so called integration test can also be seen as a unit test of a slightly bigger unit. The question is which unit I want to test. How big does it have to be and how many things have to be integrated to test it properly.

It’s always a good idea to keep the unit under test as small as possible as long as it’s still possible to test its functionality (not its implementation). On the other hand there might be good reasons to make some of the units a little bigger. Besides the technical reasons I described in the examples above, there can also be strategical reasons. There might be units in your system that are not as important as others or that will change less often. In this case, writing very detailed low-level tests might not be necessary.

However, testing all the small units separately and ensuring that each of them works as expected is not sufficient. You also have to verify that all those separate units work together. So combining multiple small units to bigger ones and testing them in integration seems to be a good idea. In the projects I’m working on, I usually want to have a good base of low-level unit tests. But I also want to have at least some end-2-end tests that verify the whole application as a fully integrated (very big) unit, focussing on its main use cases. Beside of this, there might also be some very important or highly complex (integrated) units of the system that you want to test separately on an isolation/integration level somewhere in between.

So, when you look at the test pyramid and you think about the kinds of tests you want to have in your project, don’t pay too much attention to the categories and names that are used in whatever version of the pyramid you’re looking at. Instead think about, what are the units your system has, what is a good level of isolation or integration to test them, and how you want to combine tests with different levels of granularity to be sure the whole systems behaves as expected.

Header Photo by Mitchell Griest on Unsplash