Modules – Microservices – Monoliths — Back to the architectural guidelines

Dieser Artikel ist auch auf Deutsch verfügbar

While most of our hopes were invested in microservices just a few years ago, this architecture now has reputation of leading to complex, distributed systems. Those who believe that monoliths are the solution may have had other experiences with this approach than most of us had not too long ago. It was not uncommon for monoliths to take hours to compile, take a quarter-hour or more to start, and take a whole weekend to deploy. Also the modularization of monoliths was often so bad that even seemingly simple changes took a lot of effort. Criticism of microservices must not bring back these times.

Microservices? Monoliths? Modules!

Large systems are always divided into coarse grained modules. Monoliths, for example, can be divided into Java packages or Maven projects. Microservices are “just” another type of modularization. If the division into modules is suboptimal, the implementation of the modules will not help, neither as Java packages, Maven projects, nor microservices. The question of monoliths vs. microservices is not the most important. Instead, it comes down to modules and the division into modules.

So, it’s worth looking at the origin of the term module [1]. Modules are a concept for scaling the development of software to larger organizations. Software projects are almost exclusively developed in teams so lots of code is being developed. It’s practically impossible for a single developer to understand the entire system in detail. There must be a concept that developers can use to further develop the system while limiting the knowledge they need to do so. This is where modules come in: Developers should ideally only have to understand each individual module in order to change them.

Modules are not for computers, but rather are a tool for developers to better understand the code and improve the changeability of large systems. We wouldn’t need modules if we had unlimited mental capacity and we could therefore understand arbitrarily large and complex systems easily. The division into modules begs the question of how people access and structure complex systems and, typically, knowledge. This should be considered when dividing the modules.

Information Hiding

Yet, there are methods for dividing systems into modules that are so fundamental that they will always be followed. One such concept is information hiding (Fig. 1). It is based on the idea that developers utilize all the information provided by a module. This can include data structures in addition to the interface. This effect would be apparent in the program code: The code uses certain parts of the module, like methods or instance variables. Information not as easily apparent in the program code can also be utilized. To meet performance requirements, developers can utilize information about runtime behavior of a module obtained through tests. Other parts of the system thus make assumptions about modules, which makes it more difficult to change the modules. If other parts of a system rely on an instance variable, a method, or the performance of a module, a change to these elements will mean that the parts of the system dependent on these elements will also have to be changed.

The solution is to hide as much information as possible, hence the name of the principle. For example, instance variables should never be exported and are thus only accessible through the public methods. However, there is almost nothing that can be done against relying on the performance of a module. By reducing the exposed information, modules are easier to change because other parts of the system can use less information, and so changes are less dependent on other components and do not have to adopt their assumptions if those components change. With information hiding, as with modules, it’s not as much about the computer, and more about people and limiting the knowledge. While modularization limits the knowledge required for changing code, information hiding limits the knowledge that spreads and makes changes more difficult because it may be invalidated by changes.

Information hiding can be implemented through interfaces, specifically: A class exposes public methods. Only a change to these methods influences other parts of the system and might have an impact on them. Private methods and data structures cannot be easily changed, as they cannot be used outside the class. For example, a class that models a bank account should make it possible to retrieve the account balance through its interface. With information hiding, developers can change the bank account from storing the account balance to an on-the-fly calculation of the balance based on the transactions. This would not be possible if the balance is directly exposed as an instance variable. What applies to instance variables and classes also applies to microservices and databases: The manner in which a microservice stores data in a database should not be exposed to the outside, as it otherwise cannot be changed.

Many developers observe these concepts, but the reasons and the theory are usually unclear. But understanding the motivation is necessary for successfully implementing modularization. It makes sense to understand a goal like information hiding and then achieve this goal through suitable measures, instead of following certain approaches without really understanding them and their motivation.

Independent Modules: Impossible!

We often hear calls for “independent” or “decoupled” modules. Yet, this is a contradiction: Modules are part of a system. Thus, there cannot be fully independent or decoupled modules, as they then do not form a system. They would be entirely isolated. Any connection to other modules is a dependency, and a coupling with other modules. A complete system can never exist without such connections.

However, it’s important to look closer at dependencies. Loose coupling [2] (Fig. 2) is worth striving for. A change should ideally only affect one module. However, changes can also impact other modules. This should rather be the exception to the rule, and the necessary changes to other modules should be minor. So loose coupling means that the change should not spread unabated to other modules. Information hiding is a concept that, as already mentioned, contributes to the loose coupling.

Fig. 2: Loose coupling: A change influences a module, but has only little influence beyond that.

Whether the module coupling is loose or not depends on the types of changes. Computer science pioneer Parnas recommended in 1971[3] that modules should be designed such that technical decisions are hidden within a module. In his example, only one module is responsible for storing data, instead of implementing this aspect in all modules. He then assessed this allocation with various change scenarios that each covered technical changes, such as a different type of data storage. The modules were loosely coupled for these types of changes: The changes only concern one module or a few modules.

This concept of “encapsulating” technical decisions is still common to this day and has become second nature to many. Of course, the concept still makes sense because if code that implements a certain technical concern is scattered throughout system, it not only makes changes more complicated, but also makes it harder to understand the system. Easy comprehensibility is an important prerequisite for a system’s changeability.

In the 50 years since the paper appeared, however, the world has evolved. The example from Parnas’ paper discusses an algorithm to do some string manipulations. Nowadays, there are libraries for such things and it is uncommon to develop modules for such functionalities. Even if this does happen, effective modularization at this level hardly contributes to a project’s success.

Modularization nowadays is typically a challenge on another level: The modules are larger. And the typical changes are no longer technical, but concern the business logic. This is why modularization must deal with the business logic, and implement a coarse-grained modularization. Domain-driven design can be helpful here.

Because of the needed support for change to the business logic, the division of modules should be based on functionality. Rules like information hiding still apply to coarse-grained modules, of course. So the data required for the respective functionalities should be hidden within the modules. Many functionalities will need business objects to do so, like customers or products. But different functionalities use different information about these business objects: When a customer pays for a product, one needs the price of the product and the customer’s available payment methods. But if the product is to be delivered to the customer, different data - i.e. the product’s weight and the customer’s delivery address - are important. For loose coupling, splitting a system into coarse-grained modules by functionalities each with the needed data is more suitable than splitting the data into separate business objects. Then changes to a business logic for delivery or payment will then likely only affect the one module that includes the logic and data.

If the division is based on data, like product or customer, things look quite different: The data, e.g. customer or product, are used by many functionalities so that changes to the functionalities, such as delivery or payment, would presumably also impact the data modules and might even impact other module with different functionalities that also rely on the data modules. So the change will likely not be contained in just one module.

Division by functionality is at many levels the foundation of a good architecture: CRC cards (Class – Responsibility – Collaboration) serve to divide an object-based system into fine-grained classes. The responsibilities and collaboration with other classes are pertinent here. A bounded context canvas provides a description for a coarse-grained business logic module from domain-driven design, and, similar to CRC cards, focuses on the relationships with other bounded contexts, a description, and the terms the business logic uses. So the focus of the division should not be on data, but rather on functionalities and relationships – and at different levels.

Interim Result

If we were to design a system with the aforementioned approaches, we would have various coarse-grained, business logic modules. They would have dependencies but would be loosely coupled, primarily for changes to the business logic. They would expose functionalities while concealing the implementation and required data to implement information hiding. This division is essential, but not sufficient for actually implementing the system. A bounded context as a coarse-grained module can be implemented in a variety of different technical ways: A microservice, a Maven project, or a Java package are just a few of the many possibilities. The question of implementation is raised with each bounded context or other coarse-grained module. The modules must have an interface, but this can be implemented in many ways: For example, only public Java classes can be used outside of packages. So with a facade class, the complex internal behavior of a coarse-grained module can be hidden, and only certain operations are exposed to other packages at all. Java packages or Maven modules can thus expose interfaces. Microservices, on the other hand, have a REST interface. So they just implement the same basic concept differently.

Microservices?

If you just consider concepts like information hiding and loose coupling, it is hard to make a clear decision in favor of one of the technical modularization approaches because they all implement the same concepts, just in different manners. However, microservices do have some characteristics that can be considered coarse-grainedly information hiding: Clients of a microservice will not know which programming language the microservice is implemented in. With a Maven module, it must be implemented at least be a language that generates Java byte code. A microservice can also be redeployed without any information about that being disclosed outwardly, if you’re diligent. The number of instances is also not externally visible. Microservices thus offer more opportunities to make changes. These are mainly technical advantages: Deployments, changes to the programming languages, or scaling of a microservice are possible without other modules being affected. This increases the wiggle room for people responsible for a microservice.

On the other hand, microservices are a distributed system, so the consistency of the data, the reliability of communication with other modules, and transactions pose additional challenges. So as with so many architecture decisions, the decision for microservices is a trade-off.

From a certain perspective, microservices are riskier but also hold greater a potential for greater returns: If the modularization is not particularly good, microservices will result not just in greater effort with implementation but also in challenges with consistency or performance due to network communication. But if the modularization is great, microservices can not only be altered largely independently of one another, but also scaled and deployed. Even the programming language of the implementation can be changed. If you choose a modularization approach other than microservices, you may avoid the challenges that come with failure, but you’ll also miss out on the benefits of success.

Of course, it is absurd to use microservices as a means to implement the modules if don’t reap the benefits because of organizational measures, such as centrally coordinating deployment or centralizing all technology decisions. Perhaps there are still enough advantages for a decision in favor of microservices, and maybe there are good reasons for the limitations. However, you should think critically about this.

Whether the trade-off of microservices is worthwhile can be decided differently for each module. Bounded contexts are likely decoupled with regards to the business logic. So there should be just very few challenges with regard to consistency or transactions. Domain-driven design (DDD) considers aggregates to be the boundary for consistency and transaction. They are parts of a bounded context. So DDD expects those boundaries to be much more fine grained than a bounded context. That means there should not be any challenges for transactions and consistency across bounded contexts. If such problems do occur, the coarse-grained modules can be implemented not as a microservice, but rather as Maven modules or Java packages with a shared database, whereupon transaction and consistency problems are relatively easy to solve.

Monoliths

A decision against microservices is another architectural decision, and should not be made as a matter of ideology but rather based on the expected advantages and disadvantages. If you do decide against microservices, however, there is another challenge: It is quite hard for a microservice to depend on the internals of another microservice. They live in different projects, docker containers, and maybe even use different programming languages. Access is therefore really only possible via the interface. In a monolith, however, it is much easier to use random classes from other modules. Developers will therefore implement more and more dependencies, resulting in a system in which there will be not just dependencies to the specified interfaces, as is allowed by information hiding. This means the shared knowledge is always expanding.

You can prevent this by actively managing the dependencies and only allowing certain dependencies. There are also numerous tools for this[4]. Without active management of the dependencies, the monolith will lose its structure sooner or later. It appears as though is only recently being truly utilized through microservices and domain-driven design, even though the tools have been around for some time. Better management of modularization is a fundamental step forward that can have a very positive impact on the long-term serviceability of monoliths.

Rightsize Monoliths!

It should also now be clear that compiling times that take hours, and start times that take minutes, as was the case with monoliths, are not acceptable. Modern continuous delivery pipelines that automatically compile and test a system would have hardly been feasible for such a monolith, as the pipeline would have been much too slow and often also quite complex. A monolith of this size should thus be divided into smaller systems, but these would not necessarily be considered microservices. Structuring software into various systems was one possibility for coarse-grained modularization even before microservices, but it could not prevent the chaos of monoliths. Microservices have shed light on the possibilities in this area.

Article

Modules – Microservices – Monoliths

Back to the architectural guidelines