Tag Archives: hash tree

Transaction Hash Tree In A Blockchain

I’m starting a mini blog series that centers around the Blockchain topic. At the end of the series will be a simple blockchain application in Scala on an Actor-based Akka cluster. The application will in some way follow a simplified version of the Bitcoin cryptocurrency’s operational model, including its proof-of-work consensus algorithm.

Cryptocurrency and Blockchain

Some quick background info about blockchain – In 2009, Bitcoin emerged as the first decentralized cryptocurrency and took the world by storm. Besides proving to the world the possibility of running a digital currency without the need of a centralized authority, it has also fascinated people (particularly in the finance and technology industries) with its simple yet effective operational model.

Cryptocurrency has also popularized the term “blockchain” which represents its underlying data structure and has since been broadened to a computing class that covers a wide range of applications (e.g. “smart contracts”) in different domains. Even though conceptually how a cryptocurrency like Bitcoin works isn’t complicated, it does require some basic knowledge in cryptography, particularly in PKCS (public key cryptography standards).

Utility functions

First, a few utility functions:

import java.time.{Instant, LocalDateTime, ZoneId}
import java.time.format.DateTimeFormatter

object Util {
  def bytesToBase64(bytes: Array[Byte]): String =
    java.util.Base64.getEncoder.encodeToString(bytes)

  def base64ToBytes(base64: String): Array[Byte] =
    java.util.Base64.getDecoder.decode(base64)

  def longToBytes(num: Long): Array[Byte] =
    java.nio.ByteBuffer.allocate(8).putLong(num).array

  def timestampToDateTime(timestamp: Long, zone: String = "UTC"): String =
    DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss").format(
        LocalDateTime.ofInstant(Instant.ofEpochMilli(timestamp), ZoneId.of(zone))
      )
}

import Util._

import java.time.{Instant, LocalDateTime, ZoneId}

import java.time.format.DateTimeFormatter

object Util {

def bytesToBase64(bytes: Array[Byte]): String =

java.util.Base64.getEncoder.encodeToString(bytes)

def base64ToBytes(base64: String): Array[Byte] =

java.util.Base64.getDecoder.decode(base64)

def longToBytes(num: Long): Array[Byte] =

java.nio.ByteBuffer.allocate(8).putLong(num).array

def timestampToDateTime(timestamp: Long, zone: String = "UTC"): String =

DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss").format(

LocalDateTime.ofInstant(Instant.ofEpochMilli(timestamp), ZoneId.of(zone))

)

}

import Util._

Hashing is a critical process for integrity check in blockchain’s underlying data structure. We’ll use SHA-256, which has a function signature of Array[Byte] => Array[Byte]. We’ll also need some minimal cryptographic functions for creating public keys that serve as IDs for user accounts.

Basic cryptography

import scala.util.{Try, Success, Failure}
import java.io.{OutputStreamWriter, FileOutputStream, IOException}
import java.security.spec.X509EncodedKeySpec
import java.security._
import org.apache.commons.codec.binary.Base64
import org.bouncycastle.jce.provider.BouncyCastleProvider
import org.bouncycastle.util.io.pem.{PemObject, PemWriter}

object Crypto {
  def sha256(byteArr: Array[Byte]): Array[Byte] =
    java.security.MessageDigest.getInstance("SHA-256").digest(byteArr)

  val hashFcn = sha256 _

  def generateKeyPair(keySize: Int = 4096): KeyPair = {
    val generator = java.security.KeyPairGenerator.getInstance("RSA")
    generator.initialize(keySize)
    generator.genKeyPair
  }

  def writePemFile(key: Key, description: String, filename: String): Unit = {
    val pemObject = new PemObject(description, key.getEncoded())
    val pemWriter = new PemWriter(new OutputStreamWriter(new FileOutputStream(filename)))

    try {
      pemWriter.writeObject(pemObject)
    } catch {
      case e: IOException => println("ERROR: IO Exception $e")
    } finally {
      pemWriter.close()
    }
  }

  def generateKeyPairPemFiles(filePrefix: String, keySize: Int = 4096): Unit = {
    Security.addProvider(new BouncyCastleProvider())

    val generator = KeyPairGenerator.getInstance("RSA")
    generator.initialize(keySize)
    val keyPair = generator.generateKeyPair()

    writePemFile(keyPair.getPublic(), "RSA PUBLIC KEY", s"${filePrefix}_public.pem")
    writePemFile(keyPair.getPrivate(), "RSA PRIVATE KEY", s"${filePrefix}_private.pem")
  }

  def publicKeyFromPemFile(keyFile: String): Option[PublicKey] = {
    Try(scala.io.Source.fromFile(keyFile)) match {
      case Success(k) =>
        val keyString = k.mkString.
          replace("-----BEGIN RSA PUBLIC KEY-----\n", "").
          replace("-----END RSA PUBLIC KEY-----\n", "")
        val keyBytes = Base64.decodeBase64(keyString)
        val publicKey = KeyFactory.getInstance("RSA").generatePublic(new X509EncodedKeySpec(keyBytes))
        Some(publicKey)
      case Failure(e) =>
        None
    }
  }

  def publicKeyToBase64(publicKey: PublicKey): String =
    bytesToBase64(publicKey.getEncoded)
}

import Crypto._

import scala.util.{Try, Success, Failure}

import java.io.{OutputStreamWriter, FileOutputStream, IOException}

import java.security.spec.X509EncodedKeySpec

import java.security._

import org.apache.commons.codec.binary.Base64

import org.bouncycastle.jce.provider.BouncyCastleProvider

import org.bouncycastle.util.io.pem.{PemObject, PemWriter}

object Crypto {

def sha256(byteArr: Array[Byte]): Array[Byte] =

java.security.MessageDigest.getInstance("SHA-256").digest(byteArr)

val hashFcn = sha256 _

def generateKeyPair(keySize: Int = 4096): KeyPair = {

val generator = java.security.KeyPairGenerator.getInstance("RSA")

generator.initialize(keySize)

generator.genKeyPair

}

def writePemFile(key: Key, description: String, filename: String): Unit = {

val pemObject = new PemObject(description, key.getEncoded())

val pemWriter = new PemWriter(new OutputStreamWriter(new FileOutputStream(filename)))

try {

pemWriter.writeObject(pemObject)

} catch {

case e: IOException => println("ERROR: IO Exception $e")

} finally {

pemWriter.close()

}

def generateKeyPairPemFiles(filePrefix: String, keySize: Int = 4096): Unit = {

Security.addProvider(new BouncyCastleProvider())

val generator = KeyPairGenerator.getInstance("RSA")

generator.initialize(keySize)

val keyPair = generator.generateKeyPair()

writePemFile(keyPair.getPublic(), "RSA PUBLIC KEY", s"${filePrefix}_public.pem")

writePemFile(keyPair.getPrivate(), "RSA PRIVATE KEY", s"${filePrefix}_private.pem")

}

def publicKeyFromPemFile(keyFile: String): Option[PublicKey] = {

Try(scala.io.Source.fromFile(keyFile)) match {

case Success(k) =>

val keyString = k.mkString.

replace("-----BEGIN RSA PUBLIC KEY-----\n", "").

replace("-----END RSA PUBLIC KEY-----\n", "")

val keyBytes = Base64.decodeBase64(keyString)

val publicKey = KeyFactory.getInstance("RSA").generatePublic(new X509EncodedKeySpec(keyBytes))

Some(publicKey)

case Failure(e) =>

None

}

def publicKeyToBase64(publicKey: PublicKey): String =

bytesToBase64(publicKey.getEncoded)

}

import Crypto._

To load Base64 public keys from the key files commonly in PKCS#8 PEM format on a file system, we use Bouncy Castle and Apache Commons Codec. As a side note, neither of the additional packages would be needed if the key files were in PKCS#8 DER format, which is binary and less commonly used.

Transactions in a Blockchain

With some basic utility and crypto functions in place, we now create class Account, which represents a user (e.g. a transaction originator or a miner) with the user’s cryptographic public key as the account ID. The corresponding private key, supposedly kept in private by the user, is for decrypting a transaction encrypted with the key. In our simplified model, the transactions won’t be encrypted hence private keys won’t be used.

case class Account(key: String, name: String) {
  override def toString: String =
    s"A(${key.substring(0,2)}*${key.substring(128,132)}*, $name)"
}

object Account {
  def fromKeyFile(keyFile: String, name: String): Account =
    publicKeyFromPemFile(keyFile) match {
      case Some(key) =>
        new Account(publicKeyToBase64(key), name)
      case None =>
        throw new Exception(s"ERROR: Problem loading $keyFile!")
    }
}

case class Account(key: String, name: String) {

override def toString: String =

s"A(${key.substring(0,2)}*${key.substring(128,132)}*, $name)"

}

object Account {

def fromKeyFile(keyFile: String, name: String): Account =

publicKeyFromPemFile(keyFile) match {

case Some(key) =>

new Account(publicKeyToBase64(key), name)

case None =>

throw new Exception(s"ERROR: Problem loading $keyFile!")

}

Next, we create class TransactionItem that represents a single transaction.

case class TransactionItem(id: String, accountFrom: Account, accountTo: Account, amount: Long, timestamp: Long) {

  override def toString: String = {
    val datetime = timestampToDateTime(timestamp)
    s"TI(${id.substring(0, 4)}, ${accountFrom.name} -> ${accountTo.name}, ${amount}, ${datetime})"
  }
}

object TransactionItem {
  def apply(accountFrom: Account, accountTo: Account, amount: Long, timestamp: Long): TransactionItem = {
    val bytes = accountFrom.key.getBytes ++ accountTo.key.getBytes ++ longToBytes(amount) ++ longToBytes(timestamp)
    new TransactionItem(bytesToBase64(hashFcn(bytes)), accountFrom, accountTo, amount, timestamp)
  }
}

case class TransactionItem(id: String, accountFrom: Account, accountTo: Account, amount: Long, timestamp: Long) {

override def toString: String = {

val datetime = timestampToDateTime(timestamp)

s"TI(${id.substring(0, 4)}, ${accountFrom.name} -> ${accountTo.name}, ${amount}, ${datetime})"

}

object TransactionItem {

def apply(accountFrom: Account, accountTo: Account, amount: Long, timestamp: Long): TransactionItem = {

val bytes = accountFrom.key.getBytes ++ accountTo.key.getBytes ++ longToBytes(amount) ++ longToBytes(timestamp)

new TransactionItem(bytesToBase64(hashFcn(bytes)), accountFrom, accountTo, amount, timestamp)

}

The id of TransactionItem is the hash value of the concatenated class fields in bytes. Note that the apply factory method performs the necessary hashing of the provided arguments to assemble a TransactionItem with the hash-value ID.

Next, we define class Transactions, representing a collection of TransactionItems. The id of Transactions is just a random-UUID. It could’ve been defined as a collective hash value like TransactionItem’s id to ensure content integrity but we’re going to leave that to be taken care in a hash-tree data structure, the Merkle Tree.

import java.util.UUID.randomUUID

case class Transactions(id: String, items: Array[TransactionItem], timestamp: Long) {
  override def toString: String = {
    val datetime = timestampToDateTime(timestamp)
    s"T(${id.substring(0, 4)}, ${items.map(_.amount).sum}/${items.size}, ${datetime})"
  }
}

object Transactions {
  def apply(transactions: Array[TransactionItem], timestamp: Long): Transactions =
    new Transactions(randomUUID.toString, transactions, timestamp)
}

import java.util.UUID.randomUUID

case class Transactions(id: String, items: Array[TransactionItem], timestamp: Long) {

override def toString: String = {

val datetime = timestampToDateTime(timestamp)

s"T(${id.substring(0, 4)}, ${items.map(_.amount).sum}/${items.size}, ${datetime})"

}

object Transactions {

def apply(transactions: Array[TransactionItem], timestamp: Long): Transactions =

new Transactions(randomUUID.toString, transactions, timestamp)

}

For illustration, we create a method that instantiates a Transactions object consisting of a random number of TransactionItems, each with a pair of random Accounts and a random amount (remember each Account needs a public key as its id).

val numOfAccounts = 5
val maxTransItems = 3
val keyPath = "/tmp/"

def randomFcn = java.util.concurrent.ThreadLocalRandom.current

def distinctRandomIntPair(lower: Int, upper: Int): List[Int] = {
  val rand1 = randomFcn.nextInt(lower, upper)
  val rand2 = randomFcn.nextInt(lower, upper)
  if (rand1 != rand2)
    List(rand1, rand2)
  else
    List(rand1, if (rand2 < upper - 1) rand2 + 1 else lower)
}

def generateTrans(
    numOfAccounts: Int,
    maxTransItems: Int,
    keyPath: String,
    keyFiles: List[String]
  ): Transactions = {

  def genTransItem: TransactionItem = {
    val idx = distinctRandomIntPair(0, numOfAccounts)
    val accountFrom = Account.fromKeyFile(s"${keyPath}${keyFiles(idx(0))}", s"User${idx(0)}")
    val accountTo = Account.fromKeyFile(s"${keyPath}${keyFiles(idx(1))}", s"User${idx(1)}")
    val amount = 1000L + randomFcn.nextInt(0, 5) * 500L

    val transItem = TransactionItem(accountFrom, accountTo, amount, System.currentTimeMillis)

    println(s"accountFrom: $accountFrom  accountTo: $accountTo")
    println(s"TransactionItem: $transItem")

    transItem
  }

  val numOfTransItems = randomFcn.nextInt(1, maxTransItems + 1)
  val transItems = Array.tabulate(numOfTransItems)(_ => genTransItem)

  Transactions(transItems, System.currentTimeMillis)
}

val numOfAccounts = 5

val maxTransItems = 3

val keyPath = "/tmp/"

def randomFcn = java.util.concurrent.ThreadLocalRandom.current

def distinctRandomIntPair(lower: Int, upper: Int): List[Int] = {

val rand1 = randomFcn.nextInt(lower, upper)

val rand2 = randomFcn.nextInt(lower, upper)

if (rand1 != rand2)

List(rand1, rand2)

else

List(rand1, if (rand2 < upper - 1) rand2 + 1 else lower)

}

def generateTrans(

numOfAccounts: Int,

maxTransItems: Int,

keyPath: String,

keyFiles: List[String]

): Transactions = {

def genTransItem: TransactionItem = {

val idx = distinctRandomIntPair(0, numOfAccounts)

val accountFrom = Account.fromKeyFile(s"${keyPath}${keyFiles(idx(0))}", s"User${idx(0)}")

val accountTo = Account.fromKeyFile(s"${keyPath}${keyFiles(idx(1))}", s"User${idx(1)}")

val amount = 1000L + randomFcn.nextInt(0, 5) * 500L

val transItem = TransactionItem(accountFrom, accountTo, amount, System.currentTimeMillis)

println(s"accountFrom: $accountFrom accountTo: $accountTo")

println(s"TransactionItem: $transItem")

transItem

}

val numOfTransItems = randomFcn.nextInt(1, maxTransItems + 1)

val transItems = Array.tabulate(numOfTransItems)(_ => genTransItem)

Transactions(transItems, System.currentTimeMillis)

}

Generating a couple of Transactions objects with random content:

(0 until numOfAccounts).foreach(i => generateKeyPairPemFiles(s"${keyPath}account${i}"))

val keyFiles = List.tabulate(numOfAccounts)(i => s"account${i}_public.pem")

val trans1 = generateTrans(numOfAccounts, maxTransItems, keyPath, keyFiles)
// accountFrom: A(MI*kOJv*, User1)  accountTo: A(MI*rlVg*, User2)
// TransactionItem: TI(Ih+j, User1 -> User2, 2000, 2020-03-02 21:36:23)
// accountFrom: A(MI*rlVg*, User2)  accountTo: A(MI*Qpgs*, User4)
// TransactionItem: TI(3vjD, User2 -> User4, 2500, 2020-03-02 21:36:23)
// trans1: Transactions = T(dca2, 4500/2, 2020-03-02 21:36:23)

val trans2 = generateTrans(numOfAccounts, maxTransItems, keyPath, keyFiles)
// accountFrom: A(MI*3SG5*, User3)  accountTo: A(MI*kOJv*, User1)
// TransactionItem: TI(EQgn, User3 -> User1, 1000, 2020-03-02 21:36:26)
// accountFrom: A(MI*rlVg*, User2)  accountTo: A(MI*kOJv*, User1)
// TransactionItem: TI(+cR/, User2 -> User1, 2500, 2020-03-02 21:36:26)
// accountFrom: A(MI*rlVg*, User2)  accountTo: A(MI*3SG5*, User3)
// TransactionItem: TI(G1Ue, User2 -> User3, 1500, 2020-03-02 21:36:26)
// trans2: Transactions = T(7915, 5000/3, 2020-03-02 21:36:26)

(0 until numOfAccounts).foreach(i => generateKeyPairPemFiles(s"${keyPath}account${i}"))

val keyFiles = List.tabulate(numOfAccounts)(i => s"account${i}_public.pem")

val trans1 = generateTrans(numOfAccounts, maxTransItems, keyPath, keyFiles)

// accountFrom: A(MI*kOJv*, User1) accountTo: A(MI*rlVg*, User2)

// TransactionItem: TI(Ih+j, User1 -> User2, 2000, 2020-03-02 21:36:23)

// accountFrom: A(MI*rlVg*, User2) accountTo: A(MI*Qpgs*, User4)

// TransactionItem: TI(3vjD, User2 -> User4, 2500, 2020-03-02 21:36:23)

// trans1: Transactions = T(dca2, 4500/2, 2020-03-02 21:36:23)

val trans2 = generateTrans(numOfAccounts, maxTransItems, keyPath, keyFiles)

// accountFrom: A(MI*3SG5*, User3) accountTo: A(MI*kOJv*, User1)

// TransactionItem: TI(EQgn, User3 -> User1, 1000, 2020-03-02 21:36:26)

// accountFrom: A(MI*rlVg*, User2) accountTo: A(MI*kOJv*, User1)

// TransactionItem: TI(+cR/, User2 -> User1, 2500, 2020-03-02 21:36:26)

// accountFrom: A(MI*rlVg*, User2) accountTo: A(MI*3SG5*, User3)

// TransactionItem: TI(G1Ue, User2 -> User3, 1500, 2020-03-02 21:36:26)

// trans2: Transactions = T(7915, 5000/3, 2020-03-02 21:36:26)

Hashing transactions into a Merkle tree

Merkle trees (or hash trees) are commonly used in blockchain computing. The main purpose of using a hash tree is to guarantee the authenticity of the contents in the dataset by successively composing their hashes in a hard-to-tamper fashion while keeping the resulting data structure relatively lightweight.

In a previous blog post re: Merkle tree implementation in Scala, we saw how a hash tree can be created from a collection of dataset. Even though the transaction collection data structure is now slightly more complex than the trivial example in the previous post, there is no added complexity in creating the hash tree. Borrowing a slight variant of the Merkle tree class from that post:

class MerkleTree(
    val hash: Array[Byte],
    val left: Option[MerkleTree] = None,
    val right: Option[MerkleTree] = None
  ) extends Serializable {

  def height: Int = {
    def loop(node: MerkleTree): Int = {
      if (!node.left.isEmpty && !node.right.isEmpty)
        math.max(loop(node.left.get), loop(node.right.get)) + 1
      else if (!node.left.isEmpty)
        loop(node.left.get) + 1
      else if(!node.right.isEmpty)
        loop(node.right.get) + 1
      else 1
    }
    loop(this)
  }

  def printNodes: Unit = {
    def printlnByLevel(t: MerkleTree): Unit = {
      for (l <- 1 to t.height) {
        loopByLevel(t, l)
        println
      }
    }
    def loopByLevel(node: MerkleTree, level: Int): Unit = {
      if (level <= 1)
        print(s"$node ")
      else {
        if (!node.left.isEmpty)
          loopByLevel(node.left.get, level - 1)
        else ()
        if (!node.right.isEmpty)
          loopByLevel(node.right.get, level - 1)
        else ()
      }
    }
    printlnByLevel(this)
  }

  override def toString: String = s"MT(${bytesToBase64(hash).substring(0, 4)})"
}

object MerkleTree {
  def apply(data: Array[Array[Byte]]): MerkleTree = {
    @scala.annotation.tailrec
    def buildTree(nodes: Array[MerkleTree]): Array[MerkleTree] = nodes match {
      case ns if ns.size <= 1 =>
        ns
      case ns =>
        val pairedNodes = ns.grouped(2).map{
          case Array(a, b) => new MerkleTree(hashFcn(a.hash ++ b.hash), Some(a), Some(b))
          case Array(a)    => new MerkleTree(hashFcn(a.hash), Some(a), None)
        }.toArray
        buildTree(pairedNodes)
    }

    if (data.isEmpty)
      new MerkleTree(hashFcn(Array.empty[Byte]))
    else {
      val nodes = data.map(byteArr => new MerkleTree(hashFcn(byteArr)))
      buildTree(nodes)(0)  // Return root of the tree
    }
  }
}

class MerkleTree(

val hash: Array[Byte],

val left: Option[MerkleTree] = None,

val right: Option[MerkleTree] = None

) extends Serializable {

def height: Int = {

def loop(node: MerkleTree): Int = {

if (!node.left.isEmpty && !node.right.isEmpty)

math.max(loop(node.left.get), loop(node.right.get)) + 1

else if (!node.left.isEmpty)

loop(node.left.get) + 1

else if(!node.right.isEmpty)

loop(node.right.get) + 1

else 1

}

loop(this)

}

def printNodes: Unit = {

def printlnByLevel(t: MerkleTree): Unit = {

for (l <- 1 to t.height) {

loopByLevel(t, l)

println

}

def loopByLevel(node: MerkleTree, level: Int): Unit = {

if (level <= 1)

print(s"$node ")

else {

if (!node.left.isEmpty)

loopByLevel(node.left.get, level - 1)

else ()

if (!node.right.isEmpty)

loopByLevel(node.right.get, level - 1)

else ()

}

printlnByLevel(this)

}

override def toString: String = s"MT(${bytesToBase64(hash).substring(0, 4)})"

}

object MerkleTree {

def apply(data: Array[Array[Byte]]): MerkleTree = {

@scala.annotation.tailrec

def buildTree(nodes: Array[MerkleTree]): Array[MerkleTree] = nodes match {

case ns if ns.size <= 1 =>

case ns =>

val pairedNodes = ns.grouped(2).map{

case Array(a, b) => new MerkleTree(hashFcn(a.hash ++ b.hash), Some(a), Some(b))

case Array(a) => new MerkleTree(hashFcn(a.hash), Some(a), None)

}.toArray

buildTree(pairedNodes)

}

if (data.isEmpty)

new MerkleTree(hashFcn(Array.empty[Byte]))

else {

val nodes = data.map(byteArr => new MerkleTree(hashFcn(byteArr)))

buildTree(nodes)(0) // Return root of the tree

}

Using MerkleTree’s apply factory method, we simply supply transactions objects we’ve created as the method argument:

val mRoot1 = MerkleTree(trans1.items.map(_.id.getBytes))
// mRoot1: MerkleTree = MT(OLsC)

mRoot1.printNodes
// MT(OLsC)
// MT(v7GL) MT(FbT3)

val mRoot2 = MerkleTree(trans2.items.map(_.id.getBytes))
// mRoot2: MerkleTree = MT(B0yx)

mRoot2.printNodes
// MT(B0yx)
// MT(o85z) MT(MFDs)
// MT(/t9G) MT(qxR3) MT(iGYW)

val mRoot1 = MerkleTree(trans1.items.map(_.id.getBytes))

// mRoot1: MerkleTree = MT(OLsC)

mRoot1.printNodes

// MT(OLsC)

// MT(v7GL) MT(FbT3)

val mRoot2 = MerkleTree(trans2.items.map(_.id.getBytes))

// mRoot2: MerkleTree = MT(B0yx)

mRoot2.printNodes

// MT(B0yx)

// MT(o85z) MT(MFDs)

// MT(/t9G) MT(qxR3) MT(iGYW)

For transaction collection trans1, Merkle root mRoot1 is all that is needed to ensure its integrity. So is mRoot2 for trans2. For a given collection of transactions, the recursive hashing of the transaction items in the tree nodes all the way to the root node makes it mathematically difficult to tamper with the transaction content. The Merkle root along with the associated transaction collection will be kept in an immutable “block”.

While the term “blockchain” has been used ad libitum throughout the post, we have not seen anything remotely resembling a “block” yet, have we? So far, we’ve only put in place some simple data structures along with a few utility/crypto functions. Nonetheless, they’re the essential elements for the building “block” of a blockchain, which we’ll dig into in the next post of this blog series.

Merkle Tree Implementation In Scala

1 Reply

A Merkle tree, a.k.a. hash tree, is a tree in which every leaf node contains a cryptographic hash of a dataset, and every branch node contains a hash of the concatenation of the corresponding hashes of its child nodes. Typical usage is for efficient verification of the content stored in the tree nodes.

Blockchain and Merkle tree

As cryptocurrency (or more generally, blockchain system) has become popular, so has its underlying authentication-oriented data structure, Merkle tree. In the cryptocurrency world, a blockchain can be viewed as a distributed ledger consisting of immutable but chain-able blocks, each of which hosts a set of transactions in the form of a Merkle tree. In order to chain a new block to an existing blockchain, part of the tamper-proof requirement is to guarantee the integrity of the enclosed transactions by composing their hashes in a specific way and storing them in a Merkle tree.

In case the above sounds like gibberish, here’s a great introductory article about blockchain. To delve slight deeper into it with a focus on cryptocurrency, this blockchain guide from the Bitcoin Project website might be of interest. Just to be clear, even though blockchain helps popularize Merkle tree, implementing a flavor of the data structure does not require knowledge of blockchain or cryptocurrency.

In this blog post, we will assemble a barebone Merkle tree using Scala. While a Merkle tree is most often a binary tree, it’s certainly not confined to be one, although that’s what we’re going to implement.

A barebone Merkle tree class

// MerkleTree class
class MerkleTree(
    val hash: Array[Byte],
    val left: Option[MerkleTree] = None,
    val right: Option[MerkleTree] = None
  )

// MerkleTree class

class MerkleTree(

val hash: Array[Byte],

val left: Option[MerkleTree] = None,

val right: Option[MerkleTree] = None

)

Note that when both the class fields left and right are None, it represents a leaf node.

To build a Merkle tree from a collection of byte-arrays (which might represent a transaction dataset), we will use a companion object to perform the task via its apply method. To create a hash within each of the tree nodes, we will also need a hash function, hashFcn of type Array[Byte] => Array[Byte].

// MerkleTree companion object
object MerkleTree {
  def apply(data: Array[Array[Byte]], hashFcn: Array[Byte] => Array[Byte]): MerkleTree = {
    val nodes = data.map(byteArr => new MerkleTree(hashFcn(byteArr)))
    buildTree(nodes, hashFcn)(0)  // Return root of the tree
  }

  private def buildTree(...): Array[MerkleTree] = ???
}

// MerkleTree companion object

object MerkleTree {

def apply(data: Array[Array[Byte]], hashFcn: Array[Byte] => Array[Byte]): MerkleTree = {

val nodes = data.map(byteArr => new MerkleTree(hashFcn(byteArr)))

buildTree(nodes, hashFcn)(0) // Return root of the tree

}

private def buildTree(...): Array[MerkleTree] = ???

}

Building a Merkle tree

As shown in the code, what’s needed for function buildTree is to recursively pair up the nodes to form a tree with each of its nodes consisting the combined hash of their corresponding child nodes. The recursive pairing will eventually end with the single top-level node called the Merkle root. Below is an implementation of such a function:

// Building a Merkle tree
  @scala.annotation.tailrec
  private def buildTree(
      nodes: Array[MerkleTree],
      hashFcn: Array[Byte] => Array[Byte]): Array[MerkleTree] = nodes match {

    case ns if ns.size <= 1 =>
      ns
    case ns =>
      val pairedNodes = ns.grouped(2).map{
          case Array(a, b) => new MerkleTree(hashFcn(a.hash ++ b.hash), Some(a), Some(b))
          case Array(a)    => new MerkleTree(hashFcn(a.hash), Some(a), None)
        }.toArray
      buildTree(pairedNodes, hashFcn)
  }

// Building a Merkle tree

@scala.annotation.tailrec

private def buildTree(

nodes: Array[MerkleTree],

hashFcn: Array[Byte] => Array[Byte]): Array[MerkleTree] = nodes match {

case ns if ns.size <= 1 =>

case ns =>

val pairedNodes = ns.grouped(2).map{

case Array(a, b) => new MerkleTree(hashFcn(a.hash ++ b.hash), Some(a), Some(b))

case Array(a) => new MerkleTree(hashFcn(a.hash), Some(a), None)

}.toArray

buildTree(pairedNodes, hashFcn)

}

Now, back to class MerkleTree, and let’s add a simple function for computing height of the tree:

// Computing the height of a Merkle tree
  def height: Int = {
    def loop(node: MerkleTree): Int = {
      if (!node.left.isEmpty && !node.right.isEmpty)
        math.max(loop(node.left.get), loop(node.right.get)) + 1
      else if (!node.left.isEmpty)
        loop(node.left.get) + 1
      else if(!node.right.isEmpty)
        loop(node.right.get) + 1
      else 1
    }
    loop(this)
  }

// Computing the height of a Merkle tree

def height: Int = {

def loop(node: MerkleTree): Int = {

if (!node.left.isEmpty && !node.right.isEmpty)

math.max(loop(node.left.get), loop(node.right.get)) + 1

else if (!node.left.isEmpty)

loop(node.left.get) + 1

else if(!node.right.isEmpty)

loop(node.right.get) + 1

else 1

}

loop(this)

}

Putting all the pieces together

For illustration purpose, we’ll add a side-effecting function printNodes along with a couple of for-display utility functions so as to see what our Merkle tree can do. Putting everything altogether, we have:

// Merkle tree class and companion object
class MerkleTree(
  val hash: Array[Byte],
  val left: Option[MerkleTree] = None,
  val right: Option[MerkleTree] = None) {

  def height: Int = {
    def loop(node: MerkleTree): Int = {
      if (!node.left.isEmpty && !node.right.isEmpty)
        math.max(loop(node.left.get), loop(node.right.get)) + 1
      else if (!node.left.isEmpty)
        loop(node.left.get) + 1
      else if(!node.right.isEmpty)
        loop(node.right.get) + 1
      else 1
    }
    loop(this)
  }

  override def toString: String = s"MerkleTree(hash = ${bytesToBase64(hash)})"
  private def toShortString: String = s"MT(${bytesToBase64(hash).substring(0, 4)})"
  private def bytesToBase64(bytes: Array[Byte]): String =
    java.util.Base64.getEncoder.encodeToString(bytes)

  def printNodes: Unit = {
    def printlnByLevel(t: MerkleTree): Unit = {
      for (l <- 1 to t.height) {
        loopByLevel(t, l)
        println
      }
    }
    def loopByLevel(node: MerkleTree, level: Int): Unit = {
      if (level <= 1)
        print(s"${node.toShortString} ")
      else {
        if (!node.left.isEmpty)
          loopByLevel(node.left.get, level - 1)
        else ()
        if (!node.right.isEmpty)
          loopByLevel(node.right.get, level - 1)
        else ()
      }
    }
    printlnByLevel(this)
  }
}

object MerkleTree {
  def apply(data: Array[Array[Byte]], hashFcn: Array[Byte] => Array[Byte]): MerkleTree = {
    @scala.annotation.tailrec
    def buildTree(nodes: Array[MerkleTree]): Array[MerkleTree] = nodes match {
      case ns if ns.size <= 1 =>
        ns
      case ns =>
        val pairedNodes = ns.grouped(2).map{
            case Array(a, b) => new MerkleTree(hashFcn(a.hash ++ b.hash), Some(a), Some(b))
            case Array(a)    => new MerkleTree(hashFcn(a.hash), Some(a), None)
          }.toArray
        buildTree(pairedNodes)
    }

    if (data.isEmpty)
      new MerkleTree(hashFcn(Array.empty[Byte]))
    else {
      val nodes = data.map(byteArr => new MerkleTree(hashFcn(byteArr)))
      buildTree(nodes)(0)  // Return root of the tree
    }
  }
}

// Merkle tree class and companion object

class MerkleTree(

val hash: Array[Byte],

val left: Option[MerkleTree] = None,

val right: Option[MerkleTree] = None) {

def height: Int = {

def loop(node: MerkleTree): Int = {

if (!node.left.isEmpty && !node.right.isEmpty)

math.max(loop(node.left.get), loop(node.right.get)) + 1

else if (!node.left.isEmpty)

loop(node.left.get) + 1

else if(!node.right.isEmpty)

loop(node.right.get) + 1

else 1

}

loop(this)

}

override def toString: String = s"MerkleTree(hash = ${bytesToBase64(hash)})"

private def toShortString: String = s"MT(${bytesToBase64(hash).substring(0, 4)})"

private def bytesToBase64(bytes: Array[Byte]): String =

java.util.Base64.getEncoder.encodeToString(bytes)

def printNodes: Unit = {

def printlnByLevel(t: MerkleTree): Unit = {

for (l <- 1 to t.height) {

loopByLevel(t, l)

println

}

def loopByLevel(node: MerkleTree, level: Int): Unit = {

if (level <= 1)

print(s"${node.toShortString} ")

else {

if (!node.left.isEmpty)

loopByLevel(node.left.get, level - 1)

else ()

if (!node.right.isEmpty)

loopByLevel(node.right.get, level - 1)

else ()

}

printlnByLevel(this)

}

object MerkleTree {

def apply(data: Array[Array[Byte]], hashFcn: Array[Byte] => Array[Byte]): MerkleTree = {

@scala.annotation.tailrec

def buildTree(nodes: Array[MerkleTree]): Array[MerkleTree] = nodes match {

case ns if ns.size <= 1 =>

case ns =>

val pairedNodes = ns.grouped(2).map{

case Array(a, b) => new MerkleTree(hashFcn(a.hash ++ b.hash), Some(a), Some(b))

case Array(a) => new MerkleTree(hashFcn(a.hash), Some(a), None)

}.toArray

buildTree(pairedNodes)

}

if (data.isEmpty)

new MerkleTree(hashFcn(Array.empty[Byte]))

else {

val nodes = data.map(byteArr => new MerkleTree(hashFcn(byteArr)))

buildTree(nodes)(0) // Return root of the tree

}

Test building the Merkle tree with a hash function

By providing the required arguments for MerkleTree’s apply factory method, let’s create a Merkle tree with, say, 5 dummy byte-arrays using a popular hash function SHA-256. The created Merkle tree will be represented by its tree root, a.k.a. Merkle Root:

// Test building a Merkle tree
val sha256: Array[Byte] => Array[Byte] =
  byteArr => java.security.MessageDigest.getInstance("SHA-256").digest(byteArr)

val data = Array(
    Array[Byte](0, 1, 2),
    Array[Byte](3, 4, 5),
    Array[Byte](6, 7, 8),
    Array[Byte](9, 10, 11),
    Array[Byte](12, 13, 14)
  )

val mRoot = MerkleTree(data, sha256)
// mt: MerkleTree = MerkleTree(hash = C6OoSW1rymkivkJPBrhf9necQAbzPsq7RyZKd4ZU8hM=)

mRoot.hash
// res1: Array[Byte] = Array(11, -93, -88, 73, 109, 107, -54, 105, ...)

mRoot.printNodes
// MT(C6Oo) 
// MT(QKRB) MT(Hfri) 
// MT(J3JM) MT(d7VY) MT(9ypu) 
// MT(rksy) MT(KEhp) MT(Q4f2) MT(45qR) MT(BSpd)

// Test building a Merkle tree

val sha256: Array[Byte] => Array[Byte] =

byteArr => java.security.MessageDigest.getInstance("SHA-256").digest(byteArr)

val data = Array(

Array[Byte](0, 1, 2),

Array[Byte](3, 4, 5),

Array[Byte](6, 7, 8),

Array[Byte](9, 10, 11),

Array[Byte](12, 13, 14)

)

val mRoot = MerkleTree(data, sha256)

// mt: MerkleTree = MerkleTree(hash = C6OoSW1rymkivkJPBrhf9necQAbzPsq7RyZKd4ZU8hM=)

mRoot.hash

// res1: Array[Byte] = Array(11, -93, -88, 73, 109, 107, -54, 105, ...)

mRoot.printNodes

// MT(C6Oo)

// MT(QKRB) MT(Hfri)

// MT(J3JM) MT(d7VY) MT(9ypu)

// MT(rksy) MT(KEhp) MT(Q4f2) MT(45qR) MT(BSpd)

As can be seen from the output, the 5 dummy data blocks get hashed and placed in the 5 leaf nodes, each with its hash value wrapped with its sibling’s (if any) in another hash and placed in the parent node.

For a little better clarity, below is an edited output in a tree structure:

//                                        MT(C6Oo)
//                                       /        \
//                            ----------           -----------
//                          /                                  \
//                  MT(QKRB)                                    MT(Hfri)
//                /          \                                /
//               /            \                              /
//       MT(J3JM)              MT(d7VY)              MT(9ypu)
//       /     \               /     \               /
//      /       \             /       \             /
// MT(rksy)   MT(KEhp)   MT(Q4f2)   MT(45qR)   MT(BSpd)

// MT(C6Oo)

// / \

// ---------- -----------

// / \

// MT(QKRB) MT(Hfri)

// / \ /

// MT(J3JM) MT(d7VY) MT(9ypu)

// / \ / \ /

// MT(rksy) MT(KEhp) MT(Q4f2) MT(45qR) MT(BSpd)

Building a Merkle tree from blockchain transactions

To apply the using of Merkle tree in the blockchain world, we’ll substitute the data block with a sequence of transactions from a blockchain.

First, we define a trivialized Transaction class with the transaction ID being the hash value of the combined class fields using the same hash function sha256:

// A trivialized Transaction class
import java.nio.file.{Files, Paths}
import java.time.{Instant, LocalDateTime, ZoneId}
import java.time.format.DateTimeFormatter

case class Transaction(id: String, from: String, to: String, amount: Long, timestamp: Long) {
  override def toString: String = {
    val ts = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss").format(
        LocalDateTime.ofInstant(Instant.ofEpochMilli(timestamp), ZoneId.of("UTC"))
      )
    s"T(${id.substring(0, 4)}, ${from} -> ${to}, ${amount}, ${ts})"
  }
}

object Transaction {
  def apply(
      from: String, to: String, amount: Long, timestamp: Long, hashFcn: Array[Byte] => Array[Byte]
    ): Transaction = {
    val bytes = from.getBytes ++ to.getBytes ++ longToBytes(amount) ++ longToBytes(timestamp)
    new Transaction(bytesToBase64(hashFcn(bytes)), from, to, amount, timestamp)
  }

  private def bytesToBase64(bytes: Array[Byte]): String =
    java.util.Base64.getEncoder.encodeToString(bytes)

  private def longToBytes(num: Long) =
    java.nio.ByteBuffer.allocate(8).putLong(num).array
}

// A trivialized Transaction class

import java.nio.file.{Files, Paths}

import java.time.{Instant, LocalDateTime, ZoneId}

import java.time.format.DateTimeFormatter

case class Transaction(id: String, from: String, to: String, amount: Long, timestamp: Long) {

override def toString: String = {

val ts = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss").format(

LocalDateTime.ofInstant(Instant.ofEpochMilli(timestamp), ZoneId.of("UTC"))

)

s"T(${id.substring(0, 4)}, ${from} -> ${to}, ${amount}, ${ts})"

}

object Transaction {

def apply(

from: String, to: String, amount: Long, timestamp: Long, hashFcn: Array[Byte] => Array[Byte]

): Transaction = {

val bytes = from.getBytes ++ to.getBytes ++ longToBytes(amount) ++ longToBytes(timestamp)

new Transaction(bytesToBase64(hashFcn(bytes)), from, to, amount, timestamp)

}

private def bytesToBase64(bytes: Array[Byte]): String =

java.util.Base64.getEncoder.encodeToString(bytes)

private def longToBytes(num: Long) =

java.nio.ByteBuffer.allocate(8).putLong(num).array

}

Next, we create an array of transactions:

// An array of transactions
val transactions = Array(
  Transaction("Alice", "Bob", 2500L, 1582587819175L, sha256),
  Transaction("Bob", "Carl", 4000L, 1582588700350L, sha256),
  Transaction("Carl", "Dana", 4000L, 1582591774502L, sha256)
)
// transactions: Array[Transaction] = Array(
//   T(ikSk, Alice -> Bob, 2500, 2020-02-24 23:43:39),
//   T(+EvZ, Bob -> Carl, 4000, 2020-02-24 23:58:20),
//   T(Ke8m, Carl -> Dana, 4000, 2020-02-25 00:49:34)
// )

// An array of transactions

val transactions = Array(

Transaction("Alice", "Bob", 2500L, 1582587819175L, sha256),

Transaction("Bob", "Carl", 4000L, 1582588700350L, sha256),

Transaction("Carl", "Dana", 4000L, 1582591774502L, sha256)

)

// transactions: Array[Transaction] = Array(

// T(ikSk, Alice -> Bob, 2500, 2020-02-24 23:43:39),

// T(+EvZ, Bob -> Carl, 4000, 2020-02-24 23:58:20),

// T(Ke8m, Carl -> Dana, 4000, 2020-02-25 00:49:34)

// )

Again, using MerkleTree’s apply factory method, we build a Merkle tree consisting of hash values of the individual transaction IDs, which in turn are hashes of their corresponding transaction content:

// Creating the Merkle root
val mRoot = MerkleTree(transactions.map(_.id.getBytes), sha256)
// res1: MerkleTree = MerkleTree(hash = CobcOH899Hq91uk3cXRR5As5J+ThqLacivYEZifhhfM=)

mRoot.printNodes
// MT(Cobc) 
// MT(VG1C) MT(g7oF) 
// MT(nhJS) MT(nt1U) MT(pslD)

// Creating the Merkle root

val mRoot = MerkleTree(transactions.map(_.id.getBytes), sha256)

// res1: MerkleTree = MerkleTree(hash = CobcOH899Hq91uk3cXRR5As5J+ThqLacivYEZifhhfM=)

mRoot.printNodes

// MT(Cobc)

// MT(VG1C) MT(g7oF)

// MT(nhJS) MT(nt1U) MT(pslD)

The Merkle root along with the associated transactions are kept in an immutable block. It’s also an integral part of the elements to be collectively hashed into the block-identifying hash value. The block hash will serve as the linking block ID for the next block that manages to successfully append to it. All the cross-hashing operations coupled with the immutable block structure make any attempt to tamper with the blockchain content highly expensive.

Genuine Blog

A Tech Blog by Leo Cheung

Tag Archives: hash tree

Transaction Hash Tree In A Blockchain

Cryptocurrency and Blockchain

Utility functions

Basic cryptography

Transactions in a Blockchain

Hashing transactions into a Merkle tree

Merkle Tree Implementation In Scala

Blockchain and Merkle tree

A barebone Merkle tree class

Building a Merkle tree

Putting all the pieces together

Test building the Merkle tree with a hash function

Building a Merkle tree from blockchain transactions