I write, speak, and advise on the subject of service mesh. In conversations, it has often happened that my counterparts tell me that they want to introduce a service mesh in their project. After inquiring about the architecture and project situation, I advise against it in most cases. Many people are irritated, almost disappointed.
The story I’ve heard several times goes something like this:
„We currently have this awful monolith. No one can foresee the consequences of code changes, so development is not progressing. It’s very frustrating. We decided to rebuild the application and do it right this time: With microservices, Kubernetes, and a service mesh!“
How decisions are made in IT has occupied me and that is what this text will be about. But first of all something about the background.
Why (synchronous) microservices are so popular
We know that microservices facilitate the development of complex software. To this end, they not only meet the technical challenges but also support solving organizational, i.e. human, challenges. Instead of 50 people having to deal with the full complexity of a single application, smaller teams can take responsibility for part of it independently. A frequently stated argument is that those teams can select and change the appropriate languages and frameworks.
Hardly anyone could find that bad.
However, the important question is not whether microservices are good or bad, but whether they actually solve an existing problem and what undesirable consequences are associated with it.
Microservices seem to be a promising solution to the problem of the hard-to-maintain monolith of my conversation partners: The consequences of code changes are predictable within a microservice. The impact on other microservices is minimized on explicit network interfaces. Unlike in a monolith, the code of another service can only be used via a defined interface.
In fact, problems can have completely different causes that cannot be solved by technical changes. For example, the software development procedure or the composition of the team may be unsuitable. Often, too little attention is paid to the technical aspects. Certainly, the structure of the monolith has eroded over time. If the cause is unclear, microservices may suffer similarly.
Depending on the implementation, microservices can have undesirable consequences that are often overlooked.
The autonomous microservices must be integrated at runtime. Dependencies between services are therefore less visible than in a monolith. Many teams decide to implement dependencies with synchronous calls. They obviously correspond most closely to familiar method calls, which are no longer possible. The effort for re-implementation and automation might often be so high that the path of least resistance is taken during integration. Other reasons could be that alternatives to synchronous calls are simply not known.
The popular synchronous integration of microservices has many negative effects. For example, each network call lasts longer than a method call, which can be problematic for performance in long call chains. Callers must also expect their request to fail or take too long. Communication between microservices must also be secured by encryption, authentication, and authorization.
Back to the service mesh…
A Service Mesh [1] is designed to solve problems of microservice architectures. However, current implementations such as Istio and Linkerd [2] mainly solve problems that occur with synchronous network calls:
- Encryption and mutual authentication of requests
- Collect metrics on network requests (volume, status code, latency, …)
- Resilience to network faults (Retry, Timeout and Circuit Breaker)
- Configuration of routing rules based on paths and headers (enables A/B testing and canary releasing)
This tempting offer is paid for with technical complexity, even more latency, and additional resource consumption. But even if this is acceptable, another problem remains: a service mesh must be configured, operated, updated, and, in the event of a fault, examined by humans.
Compared to the implementation of equivalent functions in each individual microservice, this seems like little effort. However, it can still be a big challenge for the team if they have too little capacity to learn new technology. So whether with or without a service mesh, microservices add complexity. A service mesh hides much of this complexity. Often, this is an advantage that quickly turns into a disadvantage - for example, as soon as errors occur or when requirements are not supported by the service mesh implementation.
How decisions are made
If you compare with other industries, you will see that IT is very much characterized by the free distribution of information and knowledge - for example, articles, podcasts, tutorials, websites, and meetups. Also, or perhaps precisely because of this, innovative ideas and technologies are constantly emerging. Many developers have a high willingness to learn and a pronounced professional curiosity.
This remarkable situation has a few unintended side effects.
Large companies such as Google, Amazon, and Netflix are strong drivers of innovation and, due to their size, publish a lot of information and software products. They receive much attention and trust because of their particularly impressive challenges. But the published solutions solve exactly these special challenges. If a presented technology is not put into the right context, it is easily perceived as a panacea and used for completely inappropriate purposes. Microservices and single-page applications are just a few examples.
The information on which teams base their decisions is therefore often incomplete and often one-sided. However, curiosity about new technology makes it difficult to critically examine a hype and consider less intuitive or less hip alternatives. It is often more obvious to take action quickly than to deal with the problem and its causes in detail.
Making better decisions
It sounds trivial, but to make a good decision, problems and goals must be clear, and the measures that go with them must be selected.
Several possible solutions should be discussed in the first place. That too sounds banal, but it is not so easy. After all, constant progress means that it is hardly possible to keep an overview, let alone to critically question or practically try out everything.
Many developers are grateful for opportunities to familiarize themselves with new topics. The challenge is to categorize the large amount of information available from articles, conferences, etc. And, of course, an article about a framework or a technology cannot replace practical experience with it. So, it is just as important to encourage the exchange of experience as it is to permit experiments. If experiments are allowed outside the projects, the urge to introduce new technologies into projects decreases.
Organizations need to respond to these challenges. At INNOQ the following has proven to be successful:
- Workshops, which we offer externally, are offered regularly or if requested internally
- The entire company meets at regular events and exchanges information through lectures, open spaces, or programming in groups. The organizational effort for this is less than often assumed.
- Questions can be asked or information shared at any time in topic-specific Slack Channels
Many things stand and fall with a corporate culture that is based on trust in its employees and welcomes critical discussion.
Finally, let’s get back to the problem of my interlocutor.
Alternatives that work without Service Mesh
Synchronously communicating microservices with service mesh is only one solution to the inertia of a monolith. Depending on the context, there are the following alternatives:
- Possibly, the existing monolith can be significantly improved by refactoring or modularization. To enforce the compliance of interfaces, modules can be used instead of communication via the network. In any case, it is worthwhile to take a look at the current application and its problems to find a better solution.
- Perhaps it is not the monolithic architecture that is „to blame“, but the choice of the process model, infrastructure, programming language, or framework. Under certain circumstances, a new monolith with a more suitable working mode or another technology could solve many problems. After all, if you’ve spoiled one soup, it doesn’t mean that it will happen again in the second, nor does it mean that it will work better in a casserole.
- Self-contained systems are related to microservices. They divide the application into independent components. Integration often takes place asynchronously or via the frontend, for example, with transclusion.
- And even if they are to be microservices, the type of integration should be chosen carefully. Depending on the goals and requirements, feeds or message queues can be considered as alternatives to synchronous calls.
Many thanks to Robert Glaser, Christoph Iserlohn, Martin Otten, Joachim Praetorius, Philipp Schirmacher, Hermann Schmidt, Tina Schönborn, Tammo van Lessen, Oliver Wolf, Eberhard Wolff, and the anonymous squirrel for feedback on this article.