Property-based Testing with “fast-check”

Dieser Artikel ist auch auf Deutsch verfügbar

The basic concept of “property-based testing” is easy enough to explain: Define a function with various parameters, however you like. Assertions may (and should) appear within the function. Now, instead of manually calling the function with various sets of parameters, you let the test framework do that for you. As developers, we can then focus on the test logic instead of the test inputs.

But as simple as that sounds in theory, practice is another story. Here we are confronted with three key questions:

How do I write meaningful assertions when I don’t have a clear idea of what my inputs are?
How does the framework know which parameters are permissible?
What is the added value compared with classic unit testing?

We will take a look at these questions one-by-one here. As an example, we will use the “fast-check” library written in TypeScript, which can be seamlessly integrated into any JS test framework.

How do I write assertions?

We’ll begin with an example that does not require a specific library and requires only standard JavaScript functionality. Let’s assume we want to test the implementation of the new BigInt number type. As indicated by the recipe above, we first have to define a function that accepts a number of parameters and returns true or false. The simplest conceivable test case is the commutativity of +:

function plusCommute(x,  y)  { 
   return x + y === y + x;
}

Because JavaScript is untyped, this function can also handle other values than BigInt. To narrow that down a bit more tightly, we could add a few preconditions – at least under Node:

function plusCommute(x, y) { 
   assert(typeof x === "bigint"); 
   assert(typeof y === "bigint"); 
   return x + y === y + x;
}

For the actual testing, we only have to call the function with multiple parameters:

> plusCommute(1n, 2n)
true
> plusCommute(1n, 8n) 
true
> plusCommute(-3n, 8n) 
true
> plusCommute(-3n, 0n) 
true

At this point, we can already see where this is going: We want plusCommute to be automatically called with appropriate parameters.

Some people might complain that they rarely write code that stupidly adds two numbers. Superficially that is true, but be honest: How often have you found bugs in code that works with units of measure or currencies, for example? Especially in JavaScript, there are a number of pitfalls to be dodged in the use of floating point numbers since there is lots of code that calculates simply with number, allowing imprecision to sneak in quickly.

Fortunately, John Hughes, one of the minds behind the concept of property-based testing, has written an article that offers help with testing even complicated code [1]. Hughes divides the “properties” into five classes: Postconditions, invariants, metamorphic properties, inductive properties and model-based properties. While some of these classes are easy to explain, others are more difficult. Let’s get started with another example: sorting arrays.

Postcondition

The simplest property is the postcondition of a sorting function, which must return a sorted array. In JavaScript, we can express this as follows:

function isSorted(array) {
   for (let i = 0; i < array.length - 1; ++i)
      if (array[i] > array[i+1]) 
         return false;
   return true; 
}

function sortIsSorted(array) { 
   array.sort();
   return isSorted(array); 
}

This function should always return true. Less than amazed? Not so fast. There were in fact problems with the built-in sorting function in the JDK that were eventually found with a verification tool [2].

By the way: The function isSorted can also be tested, such as by checking if it fails when it is given the reversed array:

function isSortedNotReverse(array) { 
   array.sort().reverse();
   return !isSorted(array);
}

Bonus points to anyone who can guess when this property does not apply, in other words when a sorted and then reversed array still counts as sorted.

Invariants

Let’s move on to the second class, the invariants. What if the sort() function always produced an empty array? An empty array is definitely sorted, but such a routine would definitely not be of much help. The invariant states here that the elements afterwards are the same as the ones before. There is no simple way to put this since it is first necessary to make a copy of the array:

function sortKeepsElements(inputArray) {
   const outputArray = [...inputArray] ; 
   outputArray.sort();
   return inputArray.every(elem => outputArray.includes(elem)); 
}

To be precise, we should also check here whether other (additional) elements have found their way into the target array and whether repeated elements are still present the same number of times. For “standard” sorting algorithms, we would be done at this point. There would be no need to test other properties since the postcondition and invariant are sufficient to fully specify the sorting. But if we also wanted to check the stability of a sorting method (since version ES2019 of ECMAScript), we would also have to supply a special comparator to the sorting function. This property also falls under the category of “invariant” because stability means that identical elements stay in the same order as in the input [3].

Metamorphic properties

The third class of properties consists of “metamorphs”. Hughes describes these as cases when “related calls return related results”. The commutativity and associativity of addition and multiplication are such properties, for example. These tests are used when you do not know directly what result a function will have but you can place it into relation with another call. The function isSortedNotReverse defined above also describes such a metamorphic property.

Another variant of this consists of “equivalence” tests: We are often faced with that problem that a specific functionality has one clearly correct but slow implementation plus an alternative implementation that is efficient but complicated. In this case, we can simply write a test that compares the two implementations with each other. For example, a bubble sort satisfies all the properties of a stable sorting function, but no one would ever consider using it in real software.

The other two classes are unfortunately somewhat harder to explain so we will skip over them for now and take up the second of our initial questions.

How does the framework know the parameters?

In principle, the test library must have a method for generating parameters that are appropriate for a property. The terminology varies by library, but generally this functionality is provided by a “generator”. A generator is an object (or a function) that generates a series of values of a specific type or a specific shape either randomly or deterministically. For example, a generator for BigInt could produce the (infinite) series of positive numbers starting with 1n. Alternatively, it could produce all numbers in a random order.

Most test libraries generate values randomly in order to cover the entire value range. fast-check calls those generators “arbitraries”. Especially tricky values are often sprinkled in, such as 0, NaN or an empty string. Of course, there is a limitation here: It is only ever possible to generate a finite number of values. That’s why it is possible to configure the number of runs.

In most typed programming languages, like Haskell, Scala or Java, there are test libraries that automatically use the appropriate test generators based on the parameter types of a function. However, this tool is not available in JavaScript. With fast-check, therefore, it is always necessary to explicitly list the generators, which isn’t that much more complicated.

For now though, let’s stick to our example with the numbers. In fast-check, calling the property looks as follows:

import fc from 'fast-check';

fc.assert( 
   fc.property(
      fc.bigInt(), 
      fc.bigInt(), 
      plusCommute
   ) 
);

This nesting may be unfamiliar. The outer level ensures that a failed evaluation results in an exception (otherwise, merely a special object is returned). The inner level creates a property by first specifying the generators and then the callback, which is called with the generated values. In practice, a wrapper would additionally ensure that the test is carried out by (for example) Jest or Mocha:

import fc from 'fast-check';

test('addition should commute', () => { 
   fc.assert(
      fc.property(
         fc.bigInt(), 
         fc.bigInt(), 
         (x, y) => x + y === y + x
      )
   );
});

Such constructions are not uncommon for test code in JavaScript, especially when asynchronous functions are involved. In the case of the arrays, fast-check can be instructed to populate these with elements from a generator:

fc.assert( fc.property(
   fc.array(
      fc.integer()),
      sortIsSorted 
   )
);

This test disappointingly fails with the following cryptic error message:

Property failed after 4 tests
{ seed: 1397560093, path: "3:2:4:6:5", endOnFailure: true } 
Counterexample: [[10,2]]
Shrunk 4 time(s)
Got error: Property failed by returning false

Let’s examine this in detail:

After four test runs, that is, after four generated values, the property fails.
The randomizing seed is 1397560093; this can be used to precisely reproduce the same constellation. That is tremendously useful for locally debugging tests that fail in the CI.
The counterexample is the array [10, 2].
The “path” and the line “Shrunk 4 times” indicate that fast-check first found a “big” counterexample, and subsequently shrunk it over the course of four rounds. This involves removing elements from the array in sequence and checking whether the property still fails.

Pretty clever, right? In our case, however, we are initially only interested in why the sorting function does not sort correctly with the input [10, 2]. The answer can be found in [3]: We forgot to pass a comparator. The correct call would be:

array.sort((a, b) => a - b)

If the callback is missing, JavaScript first converts the elements into strings. And that brings us to the final question of the article.

What is the added value?

A classic unit test might not have caught this mysterious behavior since if only a relatively small number of cases are tested, sort might behave as expected:

>array = [ 3, 2, 1]
[ 3, 2, 1 ]
> array.sort() 
[ 1, 2, 3 ]

fast-check, on the other hand, is not subject to human “confirmation bias” and happily tosses arbitrary values into the function. Although this may cause the tests to take longer in some situations, you do get much better coverage.

The lack of determinism is a disadvantage, however. If the possibility of different values being randomly generated during every test run is too unnerving, a static seed should be configured. Practically all test libraries support this. For greater confidence and to compensate for the lack of randomness, you can increase the number of test cases. fast-check executes every property 100 times, for example, and 48 significantly higher values are also possible.

This type of test has another significantly more important advantage, however. And this advantage sets in even before the actual testing phase: Simply the act of formulating suitable test properties forces clearer structuring of the programming logic while also revealing “hidden” assumptions. What does this mean, exactly?

Test properties work particularly well when the “function under test” has no side effects whatsoever. In the end, this ensures that programming logic is better isolated from interaction with surrounding systems. Even more than in unit tests, involved setup processes are sure to be noticed when they must be carried out 100 times as often.

The alternative is to structure the software such that any side effects can be easily encapsulated. When testing a database routine, for example, it must be possible to quickly reset the database into a fresh state. Transactions are useful for this. In a practical project, I used an in-memory file system for code that interacted with the file system in order to avoid have to constantly clear away temporary files. In this way, new instances can be built up and removed again automatically within microseconds.

In addition, property-based tests force you to examine which assumptions are specifically required for a feature to function correctly. Negative numbers as parameters are always a hot topic in this regard. Does your application manage properly when someone attempts to produce an event with a negative duration? Or a date prior to 1 January 1970? In some cases, such inputs are not desired and are excluded with prior conditions. But fast-check and its peers mercilessly test such exceptions, making the entire system more robust in the process.

Summary

In 2019, David MacIver – author of a test library for Python – wrote an article entitled “In praise of property-based testing” [4], where he demonstrated the advantages of this style, not unlike the article you are reading now. But much more importantly, he called on his readers to get started with it! Even if you already have numerous classic tests in a system, it is never to late to test new features with properties or to add properties to existing features [5]. You might discover opportunities for refactoring that will result in software with cleaner microarchitecture. A number of tips on how not to write test properties can be found in [6].

Article