In our projects, we very often get the question, how to design messages for exchanging data with other teams over asynchronous technologies, such as Kafka.

When looking at data, it basically comes down to these archetypes:

The technical representation of domain events in an asynchronous messaging system is pretty straightforward (put the ID, the event type, and the timestamp in the payload, make it bitemporal, make it idempotent). And for business modeling, we have collaborative techniques, such as event storming.

So, in this article, I want to focus on best practices in how to build a feed with mutable entities.

Designing Compacted State Feeds

When managed data is published at a defined place for consumers to subscribe, we call it a feed.

My recommended design for exchanging mutable data with state looks like this:

It is also a good idea, to add a reference to the domain event that triggered the state change.

It is important, that the full current state is included into the message, also known as Event-Carried State Transfer. So, no context or historic messages are needed to build the current state for a consumer. If the full snapshot of the current state does not fit into the payload of the message, it is OK to add a link to retrieve the payload via a GET request.

This architecture requires a component that supports persistent streams (unlimited retention), such as Apache Kafka, HTTP-Feeds, NATS JetStream, or RabbitMQ Streams

Compaction, also known as deduplication, is a method to delete older entries for one business key. It ensures, that the state of every business entity remains on the topic at least once. This keeps the topic small and reduces the time and inconsistency for consumers that need to read the topic from the beginning.

When an entity gets deleted, we need to notify our consumers to delete it as well. We need the business key and the fact, that it has been deleted. We call this a tombstone message.

Example

In this example, the inventory is published. Whenever the quantity of an article changes, a new message is added to the feed. The business key of an article is the SKU, which is defined as subject in the header.

[{
  "specversion" : "1.0",
  "type" : "org.http-feeds.example.inventory",
  "source" : "https://example.http-feeds.org/inventory",
  "id" : "1c6b8c6e-d8d0-4a91-b51c-1f56bd04c758",
  "time" : "2021-01-01T00:00:01Z",
  "subject" : "9521234567899",
  "event" : "org.http-feeds.example.events.webshop.order",
  "data" : {
    "sku": "9521234567899",
    "updated": "2022-01-01T00:00:01Z",
    "quantity": 5
  }
},{
  "specversion" : "1.0",
  "type" : "org.http-feeds.example.inventory",
  "source" : "https://example.http-feeds.org/inventory",
  "id" : "292042fb-ab04-4653-af90-19a24032bffe",
  "time" : "2021-12-01T00:00:15Z",
  "subject" : "9521234512349",
  "event" : "org.http-feeds.example.events.logistics.nofind",
  "data" : {
    "sku": "9521234512349",
    "updated": "2022-01-01T00:00:12Z",
    "quantity": 0
  }
},{
  "specversion" : "1.0",
  "type" : "org.http-feeds.example.inventory",
  "source" : "https://example.http-feeds.org/inventory",
  "id" : "fa3e2a22-398c-4d02-ad08-9415e43178e6",
  "time" : "2021-01-01T00:00:22Z",
  "subject" : "9521234567899",
  "event" : "org.http-feeds.example.events.branch.sale",
  "data" : {
    "sku": "9521234567899",
    "updated": "2022-01-01T00:00:21Z",
    "quantity": 4
  }
},
{
  "specversion" : "1.0",
  "type" : "org.http-feeds.example.inventory",
  "source" : "https://example.http-feeds.org/inventory",
  "id" : "06b13630-e4c3-4d85-a669-ce66fc4daa75",
  "time" : "2021-12-31T00:00:01Z",
  "subject" : "9521234567899",
  "event" : "org.http-feeds.example.events.articles.unlisted",
  "method": "DELETE"
}
]

This example uses CloudEvents event format.

The first message will be deleted after the next compaction run, as a newer message with the same subject 9521234567899 was added to the feed.

The last message is a tombstone message that signals a delete event, so the previous message will also be deleted after compaction.

Alternatives and their Drawbacks

We have seen some patterns quite often, which all come with drawbacks:

Publishing only the properties of an entity which have changed, the diffs, is usually a suboptimal idea, as they require distributing business logic to the consumers for building the current state. And, they require that you need to keep your history forever.

Sometimes, we see people to call this pattern Event Sourcing. Event sourcing is an internal persistence strategy that should not be used for cross-system data sharing. Also, using Domain Events for Event Sourcing is generally not a good idea (see Domain Events vs. Event Sourcing)

Daily snapshots and real-time delta feeds seem natural and work. However, they are unnecessarily complex, as you need to pause delta loading when you process a snapshot. You might be out of sync at that period. With compacted state feeds, the separation of snapshots and delta loads is simply not needed.

CDC uses on actual databases. Not great, as we often want to abstract the logical model from the technical model to have the flexibility for changes. However, for legacy applications, this might be a feasible option.

Benefits

Feeds are the connecting asynchronous interfaces in complex software systems and the foundation of a scaled event architectures.

An overview of the benefits of compacted state feeds :

Compacted state feeds combine the benefits of full snapshots and real-time delta processing for mutable business entities while keeping the complexity moderate. It can be implemented based on available technologies, such as Kafka or HTTP endpoints.