TL;DR
- Shared tables turn your persistence model into an integration contract – breaking information hiding and creating tight coupling.
- The shared DB becomes a cross-system bottleneck: contention, locks, and failure domains spread across teams and services.
- Tables expose data, not intent: CRUD without semantics makes audits, debugging, and evolution painful.
- Shared structures drive duplicated business logic and inconsistent derived values – even if the underlying data is “consistent.”
- Recommendation: prefer explicit APIs with clear boundaries and ownership; expose only what consumers need (don’t auto-export your data model).
Dieser Blogpost ist auch auf Deutsch verfügbar
You may be thinking: is that still a thing? It’s been known that shared database tables are a bad way of integrating systems for a long, long time. Haven’t we stopped using shared databases for systems integration years ago?
Unfortunately, I still see it all the time. Sometimes it’s the “classical” case, where both systems share a DB table or even a complete schema. Increasingly it takes a more subtle form: “APIs” that expose objects derived from or bound to DB schemas. So if you are either convinced that DB integration is a reasonable way of systems integration, or if you are just starting out, and want to learn about the woes of your elders, this article is for you.
What is shared database integration?
When distributed systems integrate via a shared database, that usually means they exchange data by reading and writing the same physical tables and/or views from the same schema.
Initially, shared database integration is easy, even convenient. You don’t need to discuss API design at all, you just tell people “If you need customer data, here’s the connection string. Look at the Customer table”. Even when requirements change, and we only want the customers that ordered something within the last 12 Months, we may be happy to see that our API did not need to change.
Also there’s no need to think about consistency guarantees, since data is always read from the “source of truth”. You don’t need to deal with eventual consistency or fear that data may be outdated. In a case where both systems write the same data, they can use DB features to change the data in one atomic transaction. Relational databases also excel at detecting concurrent modification. It seems like there’s a lot of hard problems (API design, concurrency) that we can avoid thinking about by just using a DB table as an API. So what is wrong with using a shared table?
Performance and availability degradation
A shared relational database is bound to become the bottleneck for all connected systems. By sharing a database, you also share the same computational resources. As your system grows each new connection and each new client will consume CPU and RAM. So the limit of your distributed application’s scalability is the limit to which you can scale the shared infrastructure. And while throwing more hardware may be cheap[1], eventually you will reach the limit of hardware capability.
And while you did not have to think about consistency guarantees at first, you now have systems competing for virtual resources like row and table locks. So while you may have plenty of overhead when it comes to your hardware resources, a long-running transaction locking many tables can lock out all other users of the shared DB by starving them of these virtual resources. Your consistency guarantee will eventually lead to a degradation in a system’s response time or even availability.
You will distribute, eventually
This is usually the point when we turn a shared database into a distributed system. You can either shard the database (which means distributing the data to different servers) or use read replicas and/or clustering features to move load from a single machine to multiple machines. This is where depending on your database platform things may get expensive and/or complicated. But it may work for a while. But you should be aware that there are fundamental limitations to this approach.
The CAP theorem simply states you can only have two of the following things: You have all your data in the same place, you can guarantee data consistency or your data is always available for queries and/or writes. Or as Brewer put it:
Any distributed data store can provide at most two of the following three guarantees: Consistency [..] Availability [..] or Partition Tolerance. https://en.wikipedia.org/wiki/CAP_theorem
You will now face a hard trade-off for all your data. While there may be places where eventual consistency is tolerable, you now have to choose all-or-nothing solutions. But having to face this trade-off for all your data is not the worst part of sharing a DB. Database tables turn out to be awful at being APIs.
The API insufficiencies of tables
There are several drawbacks to using your database tables as a quasi-API between systems. Security controls are limited at best. You will not reveal intentions, only factual data in your APIs. And worst of all sharing data structures may lead to rampant duplication of business logic and tight coupling between systems.
Security woes
A single set of tables usually means a blunt security model: either an app can read/write a table or it can’t. That’s rarely enough. Example: the support tool should see a customer’s address and order status, but not their internal risk score or notes from fraud investigations. If everything sits in shared tables, the easiest path becomes “grant the support app access to the whole customers row,” and now one bug, one overly broad query, or one insider mistake exposes data that never needed to be available.
Proper APIs allow you to have fine-grained controls over who can access data and even do things like redact sensitive details unless a certain type of client requests it. And the best part is: you can choose any permission model that suits your requirements, not just the model that your database vendor chose to support.
No easy way to protect invariants
Invariants are things we assume in our code will always be true. It may be that we assume a certain combination of data cannot occur or conventions that make our software easier: e. g. order numbers for express orders start with a 1. When sharing database tables, your only chance of enforcing invariants is when your database schema can enforce the invariant. While this may work for simple rules like “this is a required field, so I will declare it NOT NULL in my DB schema”, it falls apart quickly as business rules require you to check for things like “deliveries of 50 kg or more can only be picked up on Tuesdays or Thursdays”.
A proper API protects invariants and thus data integrity by not allowing data to be modified unless all invariants are met.
Wait, why is that value different now?
Tables only tell you what the current value is, not why it changed. When systems integrate by writing the same columns, you get pure CRUD (Create, Read, Update, Delete) without any context. Example: a credit_limit changes from 5,000 to 500. Was it a customer request, an automated policy rule, a fraud response, or a support agent mistake? The table update doesn’t say. Audits become guesswork, debugging turns into log archaeology, and teams start adding “notes” columns and special flags.
A proper integration approach carries this kind of semantics. I would much prefer a Domain Event like “CreditLimitReducedDueToRiskReview”. That way our system can tell who triggered the change, what rule applied, and we can have a traceable event history - not just an overwritten number.
Please come to your own conclusions
Sharing the same tables also pushes teams to store only “raw facts” and avoid derived values, because they want to avoid perceived duplication. Sometimes it may not even be clear who owns the calculations. That’s when every system re-derives things on their own.
For example systems may share a transactions table for deposits and withdrawals, but when they need the account balance, they have to derive it for themselves, duplicating the logic. And they don’t necessarily agree - one app calculates “account balance” by subtracting pending payments; another ignores pending payments; a third rounds cents differently. All of them read the same transactions table, but each shows a different balance to the customer. The table may be consistent, yet the business numbers aren’t - and you get disputes, manual fixes, and lots of “why doesn’t it match?” meetings.
Proper APIs allow you to offer derived values, without necessarily having to store these in a database. They can also offer different calculations for different kinds of clients, depending on their requirements. Best of all however, they allow a single system to “own” a piece of data and/or calculation and allow that system to be the source of truth for what a data model means, by also supplying documentation and interpretation of the stored data.
Tightly coupled
That leads us to the ultimate drawback for sharing database tables amongst distributed systems, and this issue applies to any shared data structure. While some of the aforementioned drawbacks may be tolerable or you may work your way around them, there is a huge drawback to integrating using shared data structures. You introduce uncontrolled tight coupling.
Systems exhibit tight coupling, when a change in one system necessitates a change in another unrelated system. In our previous example the derived account balance we encountered a subtle way how our shared data storage pushes us to re-implement the same business logic over and over again. Not only does that make for a lot of duplicated effort, it makes things hard to change because when they change, they need to change everywhere.
And even when our system does not care how something is calculated, decided or derived, we still have to know about all the implementation details of the originating system, because that’s all we get to work with. Since changes like these can break any system, companies may at some point introduce strict governance and only allow these changes when every system that’s even remotely connected to that shared model agrees. We’re one step short of re-discovering the Canonical Data Model Anti-Pattern.
No Information Hiding
In other words: our API is lacking one of the single most important things a proper API can offer us: Information Hiding. As a consuming system we don’t want to be bound to implementation details that we don’t care about. These should be hidden to us, so that when they ultimately change, we don’t even notice that the calculation logic has changed. That’s the kind of loose coupling we would like for modular, independent systems. And the data model that’s used to persist data is just that - an implementation detail.
When we introduce APIs, unfortunately it’s not sufficient for us to expose our data structures using a different format and protocol (such as JSON over HTTP). As long as we use standard protocols and formats these only hide the implementation language of a system. We are less coupled to a certain programming language or technology, but still thightly coupled to the implementation details of the data model.
We actually need to consciously choose what data we expose to other systems and only map those details into a specifically created API data structure. And that’s where even well-designed, distributed systems that don’t share a database often fail, by cutting a very convenient corner. There is a very subtle way of sharing database structures for systems integration that has been gaining momentum over the last years: using isomorphic projections[2] to generate APIs.
Generative approaches promise speed and convenience by cutting down on repetitive and boring tasks, such as mapping data objects. They are especially popular with rapid development frameworks (something like Django for Python or Symfony for PHP). In RAD you routinely generate your DB schema from an object model (e. G. in Django ORM).
It is tempting to use a similar mechanism to generate your API from the same model as well. It’s actually a popular feature of the Django Rest Framework. But beware! While these generative approaches don’t share most of the drawbacks of the naïve database integration approach outlined above, they still share the biggest drawback: they expose the underlying data model of your application with other Systems. By doing this you break one of the most important design principles for loosely coupled, modular systems: Information Hiding.
There is nothing wrong with using isomorphisms to cut down on repetitive, boring code. But you should choose what gets mapped and what does not. One convenient way of doing this is to actually model your API structures (some people would call these objects DTOs) separately and then use some “automagic” to populate these objects. But how do you know what you should expose and what should be hidden? Well, talk to your API consumers!
Summary
Database integration is a high risk integration strategy, especially when independent evolution of systems matter. In well-designed distributed systems data flows through systems and is rarely centralised.
So while there may be methods and ways of sharing data structures that are absolutely legitimate this should not be your first or only integration strategy. Use caution. Mind the coupling you create by sharing data structures and don’t just expose everything because you don’t want to think about API design trade-offs or because it’s a convenient feature of your framework.
If you don’t choose what data to expose in your API manually, chances are you are giving away more than you should. In the words of Alberto Brandolini: “No, you don’t need to access my data!” - and that’s probably true for any distributed system where more than one team is working on it or modularity is paramount.
While actual APIs come with a cost of their own (namely versioning, mapping logic, dealing with failures and choosing the right consistency guarantees just to name few) these costs are well worth the independence that comes with having well-defined boundaries, ownership of things and explicit contracts.
-
Something that’s not guaranteed to last forever. As I am writing this in early 2026 we are experiencing a price hike for DRAM triggered by massive AI data centre buildout. ↩︎
-
That’s a fancy mathematical way of saying: you can map things both ways: you can derive the API data structure from the application data model and the application data model from the API structure. They are automatically and often implicitly mapped by a computer. ↩︎