Choose Your Docker Base Image Wisely

6. Dezember 2019

A while ago, I ran into a problem with my application that only occurred in production but not locally. The reason we use Docker both locally and in production is to prevent exactly that from happening. What went wrong?

5 Minuten Lesedauer

Lucas Dohmen

More than once I’ve been bitten by differing versions of libxml between my local machine and production. A lot of hours are gone forever – spent on finding the differences between the machine of a coworker and mine. That’s why I’m now using Docker and Docker Compose when working on Ruby on Rails apps. I still hope that using Docker reduces those problems. But sometimes it doesn’t.

An internal application I’m currently working on communicates with an LDAP server to authenticate users. It uses the OmniAuth Gem for that. I recently updated the app to Rails 6 and after some minor tweaks, everything worked fine on my local Docker container. Jubilating, I merged the changes into master. Tiny robots swarmed round, building the Docker image and running tests on GitLab CI. After checking that all tests are green, they pushed the Docker image they had built to production. And suddenly, logging in didn’t work anymore. In the logs, OmniAuth told me that it could not talk to the LDAP server.

We use the same LDAP server both locally and in production. Assuming the bug was caused by my update to Rails 6, I wasted hours debugging OmniAuth and Rails 6. But why did this only occur in production? Locally, I could still log in and out as usual. Finally I thought: Maybe, just maybe, Docker was doing something different on my machine. I ran docker-compose build but still could not reproduce the bug locally. Then I ran docker-compose build --pull to update my base image and finally: The bug occurred locally.

So I tried to contact the LDAP server from the Docker container with openssl s_client -connect $SERVER. OpenSSL could not negotiate a cypher. Why was that happening? The LDAP server is pretty old and only supports TLS 1.1. The Docker base image I used is based on Debian, and starting with Debian 10, they decided to only communicate via TLS 1.2 and newer. So I finally found the bug. But how could I have prevented this from happening?

I asked my colleagues for help and learned that a Docker tag (like 2.6.3 in my case) is more like a Git branch than a Git tag. Using 2.6.3 as my tag only means that I get some version of Debian with some version of Ruby 2.6.3. Including the version of Debian in my tag (ruby:2.6.3-stretch) would have prevented the update from Debian 9 (Codename „stretch”) to Debian 10 (Codename „buster”). So after changing the base image to ruby:2.6.3-stretch, logging in worked again in production. As soon as the LDAP server is updated, I can change it to ruby:2.6.3-buster.

Encoded in my label I now have the minor version of Ruby and the major version of Debian. This will not prevent all breaking changes, though (for example, the authors could decide to not install some APT package they did before). To improve that, I could take this further:

I could use Debian as my base image and install all required software (including Ruby) myself.
I could publish my own base image based on Debian and then use that as my base image.

I decided against both solutions, as installing a specific version of Ruby is traditionally not exactly a pleasure[1]. For now, I will stick with ruby:2.6.3-stretch and hope that there will be no breaking changes on that tag.

This problem reminded me that we always need to get familiar with the semantics of versioning in an ecosystem we’re not familiar with. In the case of a Docker image, we need to make sure we understand what the tag of my image describes. Putting the version of each installed dependency into the label could get cumbersome pretty quickly. An alternative way of publishing the Ruby image would be to publish a ruby2.6.3 image and use the tags for semantic versioning (with an OS update being a breaking change).

Thanks to Joachim Praetorius, Lars Hupel, Martin Kühl, Niko Will, Michael Schürig, Martin Eigenbrodt and Bascht for their help and advice.

At the time of writing, Ruby's stable release is 2.6.5 while Debian Buster is still at 2.5.5 (which is not even the current version of the 2.5 branch). This requires installing Ruby in a current version means compiling it manually or using one of the Ruby installers. This of course is not a Ruby specific problem, Node.js is in a similar situation. ↩

Blog-Post

Choose Your Docker Base Image Wisely

TAGS