Blockchain Mining with Rust

If you want to read more about our blockchain hands-on event, there is a German article about it.

The task at hand: mining a block

Mining a new block was supposed to be done by means of a proof-of-work algorithm. Let’s first look at the JSON representation of what a block looks like:

{
  "index": 2,
  "timestamp": 1524816285740,
  "proof": 17213004,
  "transactions": [],
  "previousBlockHash": "000000b642b67d8bea7cffed1ec990719a3f7837de5ef0f8ede36537e91cdc0e"
}

As you can see, a block candidate contains an index, which denotes the position in the chain, a timestamp, a list of transactions, and a reference to the hash of the previous block. The list of transactions is basically the payload of our block, but what it exactly looks like is not relevant to our problem.

In addition to all those other fields, a block candidate contains a numerical field called proof. Typically, the task of mining a new block is to find a hash of the block that starts with a specific number of leading zeros. Specifically, in our case, a block is valid if the SHA256 hash of the JSON representation (using the order of fields shown above) starts with six zeros, i.e. with 000000.

How do we find such a block? All the fields of the block are fixed, apart from proof. So the goal is to find a value for this field that leads to a hash starting with 000000 when the JSON representation of the whole block is hashed.

Representing a block in Rust

In order to solve this problem in Rust, we first need to come up with appropriate representations of a block and a transaction. Since we do not want to do the JSON serialization ourselves, we are going to use serde, serde_json, and serde_derive.

Serde is a popular serialization library for Rust, supporting multiple formats, including JSON, and providing macros to automatically derive the necessary implementations of its Serialize and Deserialize traits using Rust’s built-in #[derive] attribute.

That being said, here are the two Rust structs representing a Transaction and a Block, respectively:

extern crate serde_json;
#[macro_use]
extern crate serde_derive;

#[derive(Serialize, Deserialize, Debug, Clone)]
pub struct Transaction {
    id: String,
    timestamp: u64,
    payload: String,
}

#[derive(Serialize, Deserialize, Debug, Clone)]
pub struct Block {
    index: u64,
    timestamp: u64,
    pub proof: u64,
    transactions: Vec<Transaction>,
    previous_block_hash: String,
}

The genesis block

A blockchain always needs a specific genesis block, which is the very first block in the chain. This genesis block is predefined and needs to be hardcoded into our blockchain implementation. We define a function returning the genesis block like so:

impl Block {
    pub fn genesis() -> Block {
        let transaction = Transaction {
            id: String::from("b3c973e2-db05-4eb5-9668-3e81c7389a6d"),
            payload: String::from("I am Heribert Innoq"),
            timestamp: 0,
        };

        Block {
            index: 1,
            timestamp: 0,
            proof: 1917336,
            transactions: vec![transaction],
            previous_block_hash: String::from("0"),
        }
    }
}

Validating a block candidate

In order to find a valid block, we need to be able to create a JSON representation of a block candidate. We’ll add a method to Block that does just that and delegates to the functionality provided by serde_json:

impl Block {
    pub fn to_json(&self) -> String {
        serde_json::to_string(self).unwrap()
    }
}

The JSON serialization may fail, but we want to fail miserably if that happens in our simple application, so we call unwrap on the Result returned by serde_json::to_string.

We also need to be able to compute a SHA256 hash of a block candidate’s JSON representation. To do that, we make use of the crypto-hash crate. We define a simple function that takes an immutable reference to a Block and returns a new String:

extern crate crypto_hash;
use crypto_hash::{hex_digest, Algorithm};

impl Block {
    pub fn hash(block: &Block) -> String {
        hex_digest(Algorithm::SHA256, block.to_json().as_bytes())
    }
}

Note that we are able to call to_json on our Block reference, since we defined to_json as a method above, taking &self as its first and only parameter. The hex_digest function and the Algorithm enum are imported from the crypto_hash crate.

Let’s also create a function that checks whether a given hash is valid:

impl Block {
    pub fn valid(hash: &str, prefix: &str) -> bool {
        hash.starts_with(prefix)
    }
}

The valid function expects a hash and a prefix to check for, both not of type String, which owns its content, but &str, whose content is borrowed. This is fine, as we don’t want do take ownership ourselves, or mutate the hash or prefix in any way.

Before we can use any of those functions, though, we first need to be able to create a candidate from a timestamp, a sequence of transactions, and the previous block. We define a constructor function that returns a new Block:

impl Block {

    pub fn new(timestamp: u64, transactions: Vec<Transaction>, previous_block: &Block) -> Block {
        Block {
            index: previous_block.index + 1,
            timestamp: timestamp,
            proof: 0,
            transactions: transactions,
            previous_block_hash: Self::hash(previous_block),
        }
    }
}

Unlike the transactions, which need to be owned by the new Block, the previous block is expected as an immutable reference, because we only want to read that block’s index and compute its hash.

Two ways of mining

Now that we have the basic machinery for mining a new block in place, let’s look at a few different approaches for implementing the actual mining in Rust.

The mutable mining mystery

I am a big fan of functional programming. That being said, one of the things I find so intriguing about Rust is how it helps you to deal with mutable state without going insane – the compiler is going out of its way to help you by preventing you from making certain mistakes that would lead to disastrous errors at runtime in most other languages.

That’s why the very first attempt at mining a new block that we chose to implement at our hands-on event was based on mutable state. Here is what it looks like:

impl Block {
    pub fn mine_single_threaded_mutably(block: &mut Block, prefix: &str) {
        while !Self::valid(&Self::hash(block), prefix) {
            block.proof += 1
        }
    }    
}

By writing block: &mut Block in our parameter list, we say that we want to mutably borrow a reference to a Block, which means that we are allowed to mutate it in our function. The implementation of our mining function is then based on a cornerstone of imperative programming that has fallen into oblivion among functional programmers a long time ago: the while loop. As long as the hash of the block is not valid, we keep incrementing the block’s proof. This is the simplest possible way of finding a proof leading to a valid hash. Other approaches are conceivable as well. For instance, it would be perfectly possible to choose a random value for the proof instead of incrementing it. However, randomly chosen proofs is an approach we didn’t pursue at all at our hands-on event.

When we call this function, we need to pass in a mutable reference to a Block candidate. The function will not return anything when it is done. Instead, we can inspect the mutated Block that we passed in, for example to see the proof that led to a valid hash:

let previous_block = Block::genesis();
let mut block = Block::new(1524480511, Vec::new(), &previous_block);
Block::mine_single_threaded_mutably(&mut block, "000000");
println!("{:?}", block.proof)

This will print the value 5518788 to stdout after a while.

The iterative identification inscrutability

It was pretty straightforward to implement the imperative solution to the mining problem using mutable state. At heart, though, I am still a functional programmer, so I always prefer a more functional and declarative solution that is more about the what, and less about the how. The great thing about Rust that very often, you have a choice. Using Rust’s powerful iterator library, we can write a purely functional solution like this:

impl Block {
    pub fn mine_with_iterator(block_candidate: &Block, prefix: &str) -> Block {
        (0..).map(|proof| Block {
            index: block_candidate.index,
            timestamp: block_candidate.timestamp,
            proof: proof,
            transactions: block_candidate.transactions.clone(),
            previous_block_hash: block_candidate.previous_block_hash.clone(),
        }).find(|b| Self::valid(&Self::hash(b), prefix)).unwrap()
    }
}

The signature of this function is a bit different from the previous one. This time, we want to borrow a Block reference immutably, and we return a completely new Block.

The implementation is quite interesting: Using the 0.. notation, we create a half-open range starting at 0. Range implements the Iterator trait, so that we can use all the available iterator adapters, including the map adapter. Specifically, we map from a number, which is our proof, to a new Block. Note that we clone the transactions and the previous_block_hash from the passed in block, because our new Block needs to own them.

So far, we have an infinite iterator, because it’s based on a half-open range. We consume this iterator by means of the find method, which expects a predicate, a function from &Block to bool. Our predicate checks whether the block is valid, using the functions we defined above. The find method returns an Option<Block> because in general, the predicate may never be fulfilled for any item. However, our assumption in this example application is that at some point, there will always be a proof value leading to a valid hash, so we simply unwrap the option. If the option is actually a None, our application will crash.

Beautiful benchmarking business

The functional and immutable solution looks pretty good. However, we actually had to write a lot more lines of code than in the imperative approach, mainly because in order to stay immutable, we had to create a new Block instead of mutating the existing one. This means we need to create a new Block on the heap for every proof we try, and because a Block owns the transactions and the hash of the previous block, we need to clone those from the passed in &Block.

It’s certainly interesting to see how the two approaches compare when it comes to performance – will the mutable version be a lot faster than the functional one?

I wrote a small function that helps to do some crude benchmarking. Rust nightly has built-in support for benchmarking, but we want to run on stable (1.26.2), and there are some limitations to the built-in benchmarking that make it unsuitable for our use case.

For our own benchmarking functionality, we build on the histogram crate. Here is our benchmarking function:

extern crate histogram;

use histogram::Histogram;

fn measure<F>(label: &str, closure: F) where F: Fn() -> String {
    let iters = 10;
    let mut histogram = Histogram::new();
    println!("{}:", label);
    for _ in 0..iters {
        let start = SystemTime::now();
        let mut s = closure();
        let end = SystemTime::now();
        let duration = end.duration_since(start).unwrap();
        let millis: u64 = duration.as_secs() * 1000 + duration.subsec_nanos() as u64 / 1000000;
        s.clear();
        println!("{} ms", millis);
        histogram.increment(millis).unwrap();
    }
    let mean = histogram.mean().unwrap();
    let median = histogram.percentile(50 as f64).unwrap();
    let max = histogram.maximum().unwrap();
    let min = histogram.minimum().unwrap();
    let std_dev = histogram.stddev().unwrap();
    println!("mean:\t{} ms/iter", mean);
    println!("median:\t{} ms/iter", median);
    println!("min:\t{} ms/iter", min);
    println!("max:\t{} ms/iter", max);
    println!("stddev:\t{}\n", std_dev);
}

It expects a label and a closure that can safely be executed repeatedly. It will create a new Histogram and execute the passed in closure multiple times. Right now, it’s hard-coded to 10 times, which is not ideal to get reliable statistics, but since our mining algorithm takes quite a long time, I decided not to execute it more often. Each execution of the closure is timed, and the resulting value added to the histogram. At the end, a bunch of interesting values are printed to stdout, like the mean, median, max, and min execution time, as well as the standard deviation.

Now we can write some benchmarking code to compare the two implementations:

let previous_block = Block::genesis();
measure("test_mine_single_threaded_mutably", || {
        let mut block = Block::new(1524480511, Vec::new(), &previous_block);
        Block::mine_single_threaded_mutably(&mut block, "000000");
        format!("{:?}", block.proof)
    });

measure("test_mine_with_iterator", || {
    let block = Block::new(1524480511, Vec::new(), &previous_block);
    format!("{:?}", Block::mine_with_iterator(&block, "000000").proof)
});

We use cargo run --release in order to run a release build, which makes the compiler use a lot of optimizations that are not used in a development build. Here is the result on my MacBook Pro (Retina, 13-inch, Late 2012, 2,9 GHz Intel Core i7, four cores):

test_mine_single_threaded_mutably:
mean:	10055 ms/iter
median:	10175 ms/iter
min:	9495 ms/iter
max:	10642 ms/iter
stddev:	399

test_mine_with_iterator:
mean:	9881 ms/iter
median:	10052 ms/iter
min:	9552 ms/iter
max:	10232 ms/iter
stddev:	273

I already knew that the Rust compiler is supposed to do all kinds of optimizations when using iterators. Nevertheless, I was still surprised to see that the two implementations are more or less on the same level – in fact, the functional, immutable implementation is even a tiny bit faster than the imperative one. Based on the stats provided by the OS X Activity Monitor, the memory usage is about 25% higher, which is likely due to the additional heap allocations we did in order to stay immutable.

Summary

Implementing the block mining functionality in Rust is pretty straightforward. In this blog post, we saw that, for some CPU-bound tasks, functional implementations can be just as fast as imperative ones that are based on mutable state, because the Rust compiler is remarkably good at optimizing that stuff for you.

Our mining algorithm is pretty naive. Some obvious optimizations were deliberately left out. For example, it’s possible to only serialize a block to JSON once and only replace the value of the proof field in the resulting String.

Another limitation of our two implementations is that they are both leveraging only a single core. Mining a new block, however, is a task that can probably be classified as embarrassingly parallel. In the follow-up to this blog post, we will look at three alternative implementations that utilize all the machine’s cores.

Blog-Post