It all started with an innocuous snippet of Idris posted on Twitter. But before we get into details, we need to talk about format strings and first-class types.
Format strings are an archetypal example demonstrating dependent typing. It’s a simple but telling use case. In most languages we program in we don’t have access to first-class types. As a result, we have to rely on various tools to help us program with format strings. Those tools may be embedded in our compilers, IDEs, or other analyzers. In any case, they are not intrinsic to the language.
Here, for instance, a C compiler is warning us of an invalid call to
That error got caught at compile time. Unfortunately, sometimes the feedback loop is not as tight. Let’s consider this piece of Java interacting with a SQL database.
The code is invalid: there’s no fourth argument, and the two which remain are never set. In spite of errors, the compiler accepts the file without complaints. Unless our IDE warns us earlier, we have to count on feedback from our test suite or static code analysers like SonarJava.
In a language with first-class types the situation changes.
Types and the number of arguments to a function like
printf can be determined at compile time based on its first parameter: the format string.
Idris is a language with first-class types. In fact, the format string problem appears as an example in Edwin Brady’s Type-Driven Development with Idris. I can’t recommend this book enough; get a copy if you have a chance.
The first five chapters of the book offer a gentle and unassuming introduction to the language. In chapter six the author leads us through a process of defining a
function which will fail to compile unless arguments match the specification in the format string. We can experiment with the
printf in the REPL:
What’s important to note is that the rules you see above are not embedded into the language, the standard library, or the compiler. We specify them ourselves by using Idris' type system. How? Let’s take a look, loosely following the solution from the book.
The implementation begins with a definition of a type representing our format string. We will limit ourselves to formats with
We represent a format string
"%d is a %s" as a value
Number (Lit " is a " (Str End)).
- it begins with a number passed as an argument to
- then it continues with a literal constant string
" is a ",
- then it expects a string passed as another argument to
- it ends.
Format like the one above we want to compute the type of our
printf function. We need a function for this.
We’ll call it
PrintfType, and its implementation might look as follows.
If the format is a
Number, expect an
Int argument and continue computing the type. If it’s a
Str, expect a
String argument. If it’s a literal, no argument is necessary but we still need to continue parsing; the rest of the format might need some arguments. Finally, at the end of the format we end up with a regular
Using the REPL, we can verify that the type for our format is correct. It’s a type describing a function, taking an
Int and a
String and returning a
Now we need a way of parsing our format string into its
unpack we transform the given string into a list of characters. Such a list is much easier to process and pattern-match against than a normal string. After unpacking we parse the list using the following
Let’s go through all the top-level cases.
- If no characters are left, we’re done and return
- If the list begins with
"%d", we parse it as a
- If it begins with
"%s"— it’s a
- Otherwise, we read a literal string using the
lit keeps reading from its first argument and accumulates read characters in the second one. Upon encountering a
'%' we know we have a complete literal. Using
pack — the inverse of
unpack — we transform what we have back into a string.
We return the string in a
Lit, with the rest of parsing being handled by a call to
Let’s try it out by parsing a string into a
Format and then computing its
Now it’s time to wrap it up by declaring the type of
Can you see?
printf takes a
String as its first argument and the rest of its signature is computed from the first argument.
Let’s inspect it in the REPL!
The implementation of the
printf function remains an open topic, but we’ve solved the most interesting part of the problem. Code using our
printf will fail to compile if our arguments don’t match the format string. Using exactly the same approach we could bullet-proof our functions for building
At this point, I can’t recommend Type-Driven Development with Idris enough. In chapter six you will find a far more precise discussion of
printf’s implementation, including an alternative definition of
Having an understanding of how
printf works, we are well-equipped to attack the next problem: format strings with precision specifications, such as
We’ll also feature the aforementioned snippet from Twitter. But for now, let’s take a break, have a walk outside, and let what we saw sink in. We’ll be back soon. Until then!