Dieser Blogpost ist auch auf Deutsch verfügbar

What is a data product?

The term data product arises in the context of data mesh architecture.

In the data mesh approach, a team takes ownership of their analytical data. Instead of pouring all data into a single data lake to be analyzed by a central team, each team makes their own data available as data products.

A data product is a set of relevant information for the purpose of analysis. The format is initially of lesser importance.

A data product should meet the following criteria:

Data products might take the following forms:

Just one more requirement?

Data products are made available to other stakeholders. A team owns its own data, but stakeholders need this data for analyses.

Waiting until the first requirements are submitted would be one option. Considering data products right from the beginning would be another one.

This article shows the benefits of data products from the perspective of what they offer your own team.

Staring out without data products

Let’s begin here with a short story taken from real life. A project involved the provision of product data. This was to take place in multiple steps. The first step was to create a proxy that would read and transform the existing product data so that it could be made available in a new format. The next step was then to replace the existing product data sources. Ideally, the users of the product data would not notice a thing.

Rough overview of the system landscape
Rough overview of the system landscape

Interviews with subject matter and system experts were held in advance. Even a developer with experience in the previous system landscape was actively involved in the development. Hypotheses were suggested, and an attempt was made to test them with random sampling.

The go-live was divided into several steps. In every step, new questions and problems arose. The consequences: The number of questions grew. The workload required to answer these questions and carry out the necessary data research also continued to grow.

The research typically involved the following tools:

Over time, the workload just kept growing.

This tied up an expanding amount of the team’s time.

One particular challenge: External data was required to evaluate the priority of the discovered problems. Product data for articles with high stock levels must naturally be checked more quickly then for articles with low stock levels.

If we fail to sufficiently consider from the start how we want to analyze our data, we will pay the price later.

How data products have helped

The first data products were created in order to deal with the problems.

The first tables in Google Big Query were built. These finally made it possible to carry out quantitative analyses. By comparing the update messages with the product data, it was possible to check whether the data was stored correctly in the product information system. It was also possible to check whether unusual updates were being sent from the “old” systems.

To permit more detailed analyses, product data from the “old” system was integrated in the next step. Fortunately, this data was also made available as a data product. At this point, a majority of the product data journeys could be represented.

Typical questions were:

The critical point was the quantitative comparability.

The product owner of the team now also had the ability to provide support in the form of research and analysis. Piece by piece, the first reports were built in order to carry out analyses. It was now also possible to use BI tools for this, such as Microsoft Power BI and Google Looker.

A separate article will be published about BI tools. This will also address the question of why development teams should come to grips with something like this.

Data products belong in the backlog

Maybe you were wondering as you read the last two sections: “Why wasn’t all of that done right from the beginning?”

That’s a good question. The simple answer? Everything costs time and money.

At the very start of a development project, a team is often under particularly high pressure. Talking about quality criteria and setting up scenarios? Discussing necessary reports and technical data definitions? There is often no time for all that, or at least that is the impression.

That’s why data products belong in the backlog and must be part of the refinement process. They also belong within the scope of roadmaps and in any meeting in which deliverables, target deadlines, and costs are discussed.

Tip: If no concrete requirements exist, it can be assumed that these will all arrive shortly before the target deadline.

Analysis capability promotes autonomy

Almost every organization has reports with key figures that cover varying levels of detail. Especially in large organizations, data products are the basis for data teams to provide such reports for the C-level and others. [2]

But when the topic is approached from this side, it might first give rise to a different thought. Data products are one more burden on teams. They represent yet another deliverable. This makes it important to examine the topic from the perspective of a specific team. My hypothesis is that, in the vast majority of cases, this workload will arise regardless. The question is only when and how it impacts the flow of the team.

As teams, however, we like to enjoy a high level of autonomy. But this also requires that we operate in line with the business goals of our organization. And how is that supposed to work if we don’t have an overview of the most important data concerning our own software product?

Summary

Every team should give thought to their data products. This should begin already at the very start of the development work, even if the requirements of the stakeholders are not yet known. When a data mesh approach is followed, data products are created almost automatically. But even if this approach is not taken, a team can make use of the data product concept for its own purposes.

Data products make the following possible:

Sources

  1. Evans, E. J., Evans, E. J. & Fowler, M. (2004). Domain-driven Design: Tackling Complexity in the Heart of Software. Addison-Wesley.  ↩

  2. Data Mesh Architecture. (2023, January). Accessed online on 4 January 2023 here  ↩