Dieser Blogpost ist auch auf Deutsch verfügbar

Time is a valuable asset, and it is easily consumed by the analysis and preparation of data. In other words, there may be good reason to come to grips with BI software.

This article takes an in-depth look at the topic of “data products” for teams. The article Why Your Team Needs Data Products dealt with data products more generally.

Just what are BI tools anyway?

BI stands for Business Intelligence.

It is quite a big topic, actually. Providers such as Microsoft, Amazon, IBM and others have much to offer in this space. However, I would like to focus on a specific kind of BI tool.

I will be considering software that has the following properties:

Candidates that meet these requirements:

Tools for data teams and data analysts

But those are all tools for data teams and data analysts. They don’t have anything to do with us as a development team. We only supply the data.

That is a typical reaction. And if it applies in your case, please don’t stop reading just yet.

There are good reasons to engage with BI tools as a development team.

Examples:

Once again, we find that “one size fits all” does not apply.

It goes without saying that solutions can be purpose-built for the analysis of data. Or existing tools and frameworks can be combined. It depends on the context whether that is a reasonable decision. And yet my hypothesis is that the use of BI software often represents the faster route.

Preliminary thoughts

A few preliminaries are important to note here. It is worth evaluating whether BI software could be of use in a given situation.

This list is just a starting point, of course.

Example with Google Looker

Let’s examine a quick example with Google Looker. This example comes from a real project, one in which all reports were implemented by a single person who was not trained as a developer. Based on these experiences, my hypothesis is that developers might need 2–3 days of familiarization time. Then the first reports will start rolling in.

Setup

In our specific setup, Google Big Query as used as the “warehouse”. This means that all data products are stored there. Google Looker is already available as infrastructure, and procedures already exist for setting up new projects.

Of course, the starting situation is not always this good. But it might be more often than you might think.

A little info about Looker

Looker is fully web-based. Looker works with various warehouses (e.g. Google Big Query). It works with relational data. In my estimation, it makes a good “one-stop-shop” solution for mid-sized companies. By this I mean that it may also be possible to carry out certain data transformations in Looker. In larger companies, it can be used for analysis and visualization of data products.

A few positive things from the perspective of a development team:

There are also negative aspects, which may be more or less important, depending on the context.

Structure of a Looker model

This article is not intended as a how-to or tutorial, but a few basics are required in order to see how fast (or slow) the work will go.

We start by defining a model. Every definition takes place in a file, and the files can be organized into folders. In the simplest case, the model only specifies which definitions are to be taken into account. This is done with the include statement.

In the next step, the views are defined. Roughly speaking, these are the data tables we can work with. A view can simply correspond 1:1 to a table in the warehouse. But it is also possible to write larger SQL queries. For instance, multiple existing tables could be combined in this way. Careful consideration is required here. Is this a job for Looker? Or is this a job for a level prior to Looker?

Views can be written completely manually. But Looker can also read existing tables. The result is then a proposed set of dimensions and measures. Dimensions are the properties. They can be displayed and filtered. Measures are generally calculations. The average of X. The total of Y.

It is also possible to store a label and a description for all data for greater clarity.

The effort involved in getting this far is relatively minimal.

include: "/views/**/*.view"

view: root_products {
  sql_table_name: `datamarts.root_products`;;
  label: "Root Products"

# Define dimensions

  dimension: root_product_code {
    primary_key: yes
    type: string
    sql: ${TABLE}.root_product_code ;;
    label: "Code"
    description: "Unique key to identify a root product"

    # Link uses the code as value and offers direct access to
    # the product search prefilled with the code.
    link: {
      label: "Open in Product Search"
      url: "https://my-web-app-url/products?filter.label={{value}}"
      icon_url: "https://my-web-app-url/assets/favicon.png" #Optional
    } 
  }

  dimension: short_description {
    type: string
    sql: ${TABLE}.short_description ;;
    label: "Short description"
    description: "Short description for the product"
  }
}

This small excerpt defines a view. It is based on a table located in the dataset datamarts.

Two dimensions are defined:

Now we have to make this data publicly available for analysis. This is done with explores.

A simple example:

# Define explores

explore: product_data_explore {
  view_name: root_products
  label: "Product Data"
  group_label: "Evaluate" 
 }

Now it is already possible to access the data via the web interface.

This has laid all the groundwork for carrying out the following activities:

From this point, other groups of people are able to work with the data. Even people who aren’t trained as developers.

Of course, it is rarely sufficient to just examine a view. That’s why it is possible to extend the explore via join. In our example, we have a hierarchy of products. So let’s add the next level to our view.

To do this, we add a definition. We just need to imagine that other views have already been defined.

join: color_products {
    type: left_outer 
    sql_on: ${root_products.root_product_code} = ${color_product.root_product_code};;
    relationship: one_to_many
  }

This integrates the view color_products. It receives the root_product_code as a foreign key. Now both tables are available in our explore product_data_explore.

More is always possible

A proper model requires more lines, of course. But the examples show that the syntax is not complicated. Plus, SQL can be employed in many places. If you have to make an adjustment to a dimension, for instance, this can be easily done with SQL. For example, CAST can be used to change a data type.

Nevertheless, this relatively small amount of work is enough to conveniently visualize the data.

Naturally, many other possibilities exist as well. It is worth taking a look at the documentation. Or find yourself an online course. There are a lot of options.

In our example, using Looker made sense because

Alternatives

This is not the place for a detailed comparison, but I would like to briefly address two alternatives.

We carried out the first steps with Power BI before we learned who could set up our own Looker project for us. In contrast to Looker, the initial work is done in a graphical environment. Unfortunately, there is no proper version management. On the other hand, it’s possible to get started quickly. Unlike Looker, there is also a Windows client. This allows quite a few things to be done locally. Accessing databases is also possible, of course. But if you are interested in scaling up, a local installation is not sufficient. Still, small analyses and prototypes can be produced quickly this way.

When it comes to publishing reports or updating data in the cloud, you have to purchase the corresponding services from Microsoft.

Another tool is (good) old Excel. If the data is already available in the right format (e.g. in Big Query), many analyses can actually be done with Excel. Thanks in particular to the introduction of Power Query to Microsoft Excel, there are a ton of possibilities for analysis there. And Excel also supports visualizations.

Summary

In the end, it all depends on the context. But in many cases, a development team can save time by making use of BI software. And stakeholders can be involved at an earlier stage.

What positive or negative experiences have you had? Has this article stirred up any interesting thoughts? I look forward to your comments, questions and discussions.