Does BI Software Make Sense for Development Teams?

Dieser Blogpost ist auch auf Deutsch verfügbar

Title: Does BI Software Make Sense for Development Teams? Subtitle: Language: de Author: Daniel Lauxtermann Date: 2023–03–06 published_at: 2023–03–07 Keywords: data-mesh, organization-communication

Time is a valuable asset, and it is easily consumed by the analysis and preparation of data. In other words, there may be good reason to come to grips with BI software.

This article takes an in-depth look at the topic of “data products” for teams. The article Why Your Team Needs Data Products dealt with data products more generally.

Just what are BI tools anyway?

BI stands for Business Intelligence.

It is quite a big topic, actually. Providers such as Microsoft, Amazon, IBM and others have much to offer in this space. However, I would like to focus on a specific kind of BI tool.

I will be considering software that has the following properties:

Importing of data from a source
Transformation of data
Generation of analysis models (views of the data)
Visualization of the data

Candidates that meet these requirements:

Microsoft Power BI
Google Looker
Tableau

Tools for data teams and data analysts

But those are all tools for data teams and data analysts. They don’t have anything to do with us as a development team. We only supply the data.

That is a typical reaction. And if it applies in your case, please don’t stop reading just yet.

There are good reasons to engage with BI tools as a development team.

Examples:

Visualization of the data comes “for free”.
The option of sharing data also comes “for free”.
Depending on the context, stakeholders may be able to work with the data themselves.
Data from various sources can be linked up together.
Depending on the solution, data can be “quickly” integrated and transformed.
Stakeholders can work with the data themselves.

Once again, we find that “one size fits all” does not apply.

It goes without saying that solutions can be purpose-built for the analysis of data. Or existing tools and frameworks can be combined. It depends on the context whether that is a reasonable decision. And yet my hypothesis is that the use of BI software often represents the faster route.

Preliminary thoughts

A few preliminaries are important to note here. It is worth evaluating whether BI software could be of use in a given situation.

Is the data available in a relational format?
Does a tool already exist? How about the required infrastructure?
If not, are the ordering processes fast enough?
Which data sources are to be integrated?
Where is the data saved and what must be considered here?
How current is the data?

This list is just a starting point, of course.

Example with Google Looker

Let’s examine a quick example with Google Looker. This example comes from a real project, one in which all reports were implemented by a single person who was not trained as a developer. Based on these experiences, my hypothesis is that developers might need 2–3 days of familiarization time. Then the first reports will start rolling in.

Setup

In our specific setup, Google Big Query as used as the “warehouse”. This means that all data products are stored there. Google Looker is already available as infrastructure, and procedures already exist for setting up new projects.

Of course, the starting situation is not always this good. But it might be more often than you might think.

A little info about Looker

Looker is fully web-based. Looker works with various warehouses (e.g. Google Big Query). It works with relational data. In my estimation, it makes a good “one-stop-shop” solution for mid-sized companies. By this I mean that it may also be possible to carry out certain data transformations in Looker. In larger companies, it can be used for analysis and visualization of data products.

A few positive things from the perspective of a development team:

Anyone who knows SQL is optimally positioned.
The application-specific syntax (called LookML) is always used to generate SQL.
The SQL can be viewed at any time, making debugging easier.
Support for version management with Git, commits and deployments.
Definitions are created in text form.
The web editor validates definitions and also identifies dependencies.

There are also negative aspects, which may be more or less important, depending on the context.

Native SQL syntax is not sufficiently checked.
The web editor does not meet the standards of a modern code editor, not to speak of an IDE.
Error information in the web editor could be better

Structure of a Looker model

This article is not intended as a how-to or tutorial, but a few basics are required in order to see how fast (or slow) the work will go.

We start by defining a model. Every definition takes place in a file, and the files can be organized into folders. In the simplest case, the model only specifies which definitions are to be taken into account. This is done with the include statement.

In the next step, the views are defined. Roughly speaking, these are the data tables we can work with. A view can simply correspond 1:1 to a table in the warehouse. But it is also possible to write larger SQL queries. For instance, multiple existing tables could be combined in this way. Careful consideration is required here. Is this a job for Looker? Or is this a job for a level prior to Looker?

Views can be written completely manually. But Looker can also read existing tables. The result is then a proposed set of dimensions and measures. Dimensions are the properties. They can be displayed and filtered. Measures are generally calculations. The average of X. The total of Y.

It is also possible to store a label and a description for all data for greater clarity.

The effort involved in getting this far is relatively minimal.

include: "/views/**/*.view"

view: root_products {
  sql_table_name: `datamarts.root_products`;;
  label: "Root Products"

# Define dimensions

  dimension: root_product_code {
    primary_key: yes
    type: string
    sql: ${TABLE}.root_product_code ;;
    label: "Code"
    description: "Unique key to identify a root product"

    # Link uses the code as value and offers direct access to
    # the product search prefilled with the code.
    link: {
      label: "Open in Product Search"
      url: "https://my-web-app-url/products?filter.label={{value}}"
      icon_url: "https://my-web-app-url/assets/favicon.png" #Optional
    } 
  }

  dimension: short_description {
    type: string
    sql: ${TABLE}.short_description ;;
    label: "Short description"
    description: "Short description for the product"
  }
}

This small excerpt defines a view. It is based on a table located in the dataset datamarts.

Two dimensions are defined:

the code which defines a product and
the short_description of the product.

Now we have to make this data publicly available for analysis. This is done with explores.

A simple example:

# Define explores

explore: product_data_explore {
  view_name: root_products
  label: "Product Data"
  group_label: "Evaluate" 
 }

Now it is already possible to access the data via the web interface.

This has laid all the groundwork for carrying out the following activities:

Filtering by dimensions
Selecting the output values
Visualizing data (diagrams)
Exporting filtered data (e.g. as a CSV file)
Creating custom dimensions and custom measures
Saving filtered explores as looks
Creating dashboards

From this point, other groups of people are able to work with the data. Even people who aren’t trained as developers.

Of course, it is rarely sufficient to just examine a view. That’s why it is possible to extend the explore via join. In our example, we have a hierarchy of products. So let’s add the next level to our view.

To do this, we add a definition. We just need to imagine that other views have already been defined.

join: color_products {
    type: left_outer 
    sql_on: ${root_products.root_product_code} = ${color_product.root_product_code};;
    relationship: one_to_many
  }

This integrates the view color_products. It receives the root_product_code as a foreign key. Now both tables are available in our explore product_data_explore.

More is always possible

A proper model requires more lines, of course. But the examples show that the syntax is not complicated. Plus, SQL can be employed in many places. If you have to make an adjustment to a dimension, for instance, this can be easily done with SQL. For example, CAST can be used to change a data type.

Nevertheless, this relatively small amount of work is enough to conveniently visualize the data.

Naturally, many other possibilities exist as well. It is worth taking a look at the documentation. Or find yourself an online course. There are a lot of options.

In our example, using Looker made sense because

other teams had already published data in Big Query and
the infrastructure was already available.

Alternatives

This is not the place for a detailed comparison, but I would like to briefly address two alternatives.

Microsoft Power BI
Good, old Excel

We carried out the first steps with Power BI before we learned who could set up our own Looker project for us. In contrast to Looker, the initial work is done in a graphical environment. Unfortunately, there is no proper version management. On the other hand, it’s possible to get started quickly. Unlike Looker, there is also a Windows client. This allows quite a few things to be done locally. Accessing databases is also possible, of course. But if you are interested in scaling up, a local installation is not sufficient. Still, small analyses and prototypes can be produced quickly this way.

When it comes to publishing reports or updating data in the cloud, you have to purchase the corresponding services from Microsoft.

Another tool is (good) old Excel. If the data is already available in the right format (e.g. in Big Query), many analyses can actually be done with Excel. Thanks in particular to the introduction of Power Query to Microsoft Excel, there are a ton of possibilities for analysis there. And Excel also supports visualizations.

Summary

In the end, it all depends on the context. But in many cases, a development team can save time by making use of BI software. And stakeholders can be involved at an earlier stage.

What positive or negative experiences have you had? Has this article stirred up any interesting thoughts? I look forward to your comments, questions and discussions.

Blog Post