Why You Should Avoid a Canonical Data Model

In recent times, I’ve been involved a few architecture projects on the enterprise level again. If you’ve never been in that world, i.e. if you’ve been focusing on individual systems so far, let me give you the basic gist of what this kind of environment is like: There are lots of meetings, more meetings, and even more meetings; there’s an abundance of slide decks, packed with text and diagrams – none of that Presentation Zen nonsense, please. There are conceptual architecture frameworks, showing different perspectives, there are guidelines and reference architectures, enterprise-wide layering approaches, a little bit of SOA und EAI and ESB and Portals and (lately) API talk thrown in for good measure. Vendors and system integrators and (of course) consultants all see their chance to exert influence on strategic decisions, making their products or themselves an integral part of the company’s future strategy. It can be a very frustrating, but (at least sometimes) also very rewarding experience: Those wheels are very big and really hard to turn, but if you manage to turn them, the effect is significant.

It’s also amazing to see how many of the things that cause problems when building large systems are repeated on the enterprise level. (We don’t often make mistakes … but if we do, we make them big!) My favorite one is the idea of establishing canonical data model (CDM) for all of your interfaces.

If you haven’t heard of this idea before, a quick summary is: Whatever kind of technology you’re using (an ESB, a BPM platform, or just some assembly of services of some kind), you standardize the data models of the business objects you exchange. In its extreme (and very common) form, you end up with having just one kind of Person, Customer, Order, Product, etc., with a set of IDs, attributes, and associations everyone can agree on. It isn’t hard to understand how that might seem a very compelling thing to attempt: After all, even a non-technical manager will understand that the conversion from one data model to another whenever systems need to talk to each other is a complete waste of time. It’s obviously a good idea to standardize. Then, anyone who happens to have a model that differs from the canonical one will have to implement a conversion to a and from it just once, new systems can just use the CDM directly, and everyone will be able to communicate without further ado!

In fact, it’s a horrible, horrible idea. Don’t do it.

In his book on Domain Driven Design, Eric Evans gave a name to a concept that is obvious to anyone who has actually successfully built a larger system: The Bounded Context. This is a structuring mechanism that avoids having a single huge model for all of your application, simply because that (a) becomes unmanageable and (b) makes no sense to begin with. It recognizes that a Person or a Contract are different things in different contexts on a conceptual level. This is not an implementation problem – it’s reality.

If this is true for a large system – and trust me, it is – it’s infinitely more true for an enterprise-wide architecture. Of course you can argue that with a CDM, you’re only standardizing the interface layer, but that doesn’t change a thing: You’re still trying to make everyone agree what a concept means, and my point is that you should recognize that not every single system has the same needs.

But isn’t this all just pure theory? Who cares about this, anyway? The amazing thing is that organizations are excellent in generating a huge amount of work based on bad assumptions. The CDM (in the form I’ve described it here) requires coordination between all the parties that use a particular object in their interfaces (unless you trust that someone will be able to just design the right thing from scratch on their own, which you should never do). You’ll have meetings with some enterprise architect and a few representatives for specific systems, trying to agree what a customer is. You’ll end up with something that has tons of optional attributes because everyone insisted theirs need to be there, and with lots of things that are kind of weird because they reflect some system’s internal restrictions. Despite the fact that it’ll take you ages to agree on it, you’ll end up with a zombie interface model will be universally hated by everyone who has to work with it.

So is a CDM a universally bad idea? Yes, unless you approach it differently. In many cases, I doubt a CDM’s value in the first place, and think you are better off with a different and less intrusive kind of specification. But if you want a CDM, here are a number of things you can do to address the problems you’ll run into:

Allow for independent parts to be specified independently. If only one system is responsible for a particular part of your data model, leave it to the people to specify what it looks like canonically. Don’t make them participate in meetings. If you’re unsure whether the data model they create has a significant overlap with another group’s, it probably hasn’t.
Standardize on formats and possibly fragments of data models. Don’t try to come up with a consistent model of the world. Instead, create small buildings blocks. What I’m thinking of are e.g. small XML or JSON fragments, akin to microformats, that standardize small groups of attributes (I wouldn’t call them business objects).
Most importantly, don’t push your model from a central team downwards or outwards to the individual teams. Instead, it should be the teams who decide to “pull” them into their own context when they believe they provide value. It’s not you who’s doing the really important stuff (even though that’s a common delusion that’s attached to the mighty Enterprise Architect title). Collect the data models the individual teams provide in a central location, if you must, and make them easily browsable and searchable. (Think of providing a big elastic search index as opposed to a central UML model).

What you actually need to as an enterprise architect is to get out of people’s way. In many cases, a crucial ingredient to achieve this is to create as little centralization as possible. It shouldn’t be your goal to make everyone do the same thing. It should be your goal to establish a minimal set of rules that allows people to work as independently as possible. A CDM of the kind I’ve described above is the exact opposite.

Photo by Arun Clarke

Blog Post

Why You Should Avoid a Canonical Data Model

TAGS