Transaction Hash Tree In A Blockchain

I’m starting a mini blog series that centers around the Blockchain topic. At the end of the series will be a simple blockchain application in Scala on an Actor-based Akka cluster. The application will in some way follow a simplified version of the Bitcoin cryptocurrency’s operational model, including its proof-of-work consensus algorithm.

Cryptocurrency and Blockchain

Some quick background info about blockchain – In 2009, Bitcoin emerged as the first decentralized cryptocurrency and took the world by storm. Besides proving to the world the possibility of running a digital currency without the need of a centralized authority, it has also fascinated people (particularly in the finance and technology industries) with its simple yet effective operational model.

Cryptocurrency has also popularized the term “blockchain” which represents its underlying data structure and has since been broadened to a computing class that covers a wide range of applications (e.g. “smart contracts”) in different domains. Even though conceptually how a cryptocurrency like Bitcoin works isn’t complicated, it does require some basic knowledge in cryptography, particularly in PKCS (public key cryptography standards).

Utility functions

First, a few utility functions:

import java.time.{Instant, LocalDateTime, ZoneId}
import java.time.format.DateTimeFormatter

object Util {
  def bytesToBase64(bytes: Array[Byte]): String =
    java.util.Base64.getEncoder.encodeToString(bytes)

  def base64ToBytes(base64: String): Array[Byte] =
    java.util.Base64.getDecoder.decode(base64)

  def longToBytes(num: Long): Array[Byte] =
    java.nio.ByteBuffer.allocate(8).putLong(num).array

  def timestampToDateTime(timestamp: Long, zone: String = "UTC"): String =
    DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss").format(
        LocalDateTime.ofInstant(Instant.ofEpochMilli(timestamp), ZoneId.of(zone))
      )
}

import Util._

import java.time.{Instant, LocalDateTime, ZoneId}

import java.time.format.DateTimeFormatter

object Util {

def bytesToBase64(bytes: Array[Byte]): String =

java.util.Base64.getEncoder.encodeToString(bytes)

def base64ToBytes(base64: String): Array[Byte] =

java.util.Base64.getDecoder.decode(base64)

def longToBytes(num: Long): Array[Byte] =

java.nio.ByteBuffer.allocate(8).putLong(num).array

def timestampToDateTime(timestamp: Long, zone: String = "UTC"): String =

DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss").format(

LocalDateTime.ofInstant(Instant.ofEpochMilli(timestamp), ZoneId.of(zone))

)

}

import Util._

Hashing is a critical process for integrity check in blockchain’s underlying data structure. We’ll use SHA-256, which has a function signature of Array[Byte] => Array[Byte]. We’ll also need some minimal cryptographic functions for creating public keys that serve as IDs for user accounts.

Basic cryptography

import scala.util.{Try, Success, Failure}
import java.io.{OutputStreamWriter, FileOutputStream, IOException}
import java.security.spec.X509EncodedKeySpec
import java.security._
import org.apache.commons.codec.binary.Base64
import org.bouncycastle.jce.provider.BouncyCastleProvider
import org.bouncycastle.util.io.pem.{PemObject, PemWriter}

object Crypto {
  def sha256(byteArr: Array[Byte]): Array[Byte] =
    java.security.MessageDigest.getInstance("SHA-256").digest(byteArr)

  val hashFcn = sha256 _

  def generateKeyPair(keySize: Int = 4096): KeyPair = {
    val generator = java.security.KeyPairGenerator.getInstance("RSA")
    generator.initialize(keySize)
    generator.genKeyPair
  }

  def writePemFile(key: Key, description: String, filename: String): Unit = {
    val pemObject = new PemObject(description, key.getEncoded())
    val pemWriter = new PemWriter(new OutputStreamWriter(new FileOutputStream(filename)))

    try {
      pemWriter.writeObject(pemObject)
    } catch {
      case e: IOException => println("ERROR: IO Exception $e")
    } finally {
      pemWriter.close()
    }
  }

  def generateKeyPairPemFiles(filePrefix: String, keySize: Int = 4096): Unit = {
    Security.addProvider(new BouncyCastleProvider())

    val generator = KeyPairGenerator.getInstance("RSA")
    generator.initialize(keySize)
    val keyPair = generator.generateKeyPair()

    writePemFile(keyPair.getPublic(), "RSA PUBLIC KEY", s"${filePrefix}_public.pem")
    writePemFile(keyPair.getPrivate(), "RSA PRIVATE KEY", s"${filePrefix}_private.pem")
  }

  def publicKeyFromPemFile(keyFile: String): Option[PublicKey] = {
    Try(scala.io.Source.fromFile(keyFile)) match {
      case Success(k) =>
        val keyString = k.mkString.
          replace("-----BEGIN RSA PUBLIC KEY-----\n", "").
          replace("-----END RSA PUBLIC KEY-----\n", "")
        val keyBytes = Base64.decodeBase64(keyString)
        val publicKey = KeyFactory.getInstance("RSA").generatePublic(new X509EncodedKeySpec(keyBytes))
        Some(publicKey)
      case Failure(e) =>
        None
    }
  }

  def publicKeyToBase64(publicKey: PublicKey): String =
    bytesToBase64(publicKey.getEncoded)
}

import Crypto._

import scala.util.{Try, Success, Failure}

import java.io.{OutputStreamWriter, FileOutputStream, IOException}

import java.security.spec.X509EncodedKeySpec

import java.security._

import org.apache.commons.codec.binary.Base64

import org.bouncycastle.jce.provider.BouncyCastleProvider

import org.bouncycastle.util.io.pem.{PemObject, PemWriter}

object Crypto {

def sha256(byteArr: Array[Byte]): Array[Byte] =

java.security.MessageDigest.getInstance("SHA-256").digest(byteArr)

val hashFcn = sha256 _

def generateKeyPair(keySize: Int = 4096): KeyPair = {

val generator = java.security.KeyPairGenerator.getInstance("RSA")

generator.initialize(keySize)

generator.genKeyPair

}

def writePemFile(key: Key, description: String, filename: String): Unit = {

val pemObject = new PemObject(description, key.getEncoded())

val pemWriter = new PemWriter(new OutputStreamWriter(new FileOutputStream(filename)))

try {

pemWriter.writeObject(pemObject)

} catch {

case e: IOException => println("ERROR: IO Exception $e")

} finally {

pemWriter.close()

}

def generateKeyPairPemFiles(filePrefix: String, keySize: Int = 4096): Unit = {

Security.addProvider(new BouncyCastleProvider())

val generator = KeyPairGenerator.getInstance("RSA")

generator.initialize(keySize)

val keyPair = generator.generateKeyPair()

writePemFile(keyPair.getPublic(), "RSA PUBLIC KEY", s"${filePrefix}_public.pem")

writePemFile(keyPair.getPrivate(), "RSA PRIVATE KEY", s"${filePrefix}_private.pem")

}

def publicKeyFromPemFile(keyFile: String): Option[PublicKey] = {

Try(scala.io.Source.fromFile(keyFile)) match {

case Success(k) =>

val keyString = k.mkString.

replace("-----BEGIN RSA PUBLIC KEY-----\n", "").

replace("-----END RSA PUBLIC KEY-----\n", "")

val keyBytes = Base64.decodeBase64(keyString)

val publicKey = KeyFactory.getInstance("RSA").generatePublic(new X509EncodedKeySpec(keyBytes))

Some(publicKey)

case Failure(e) =>

None

}

def publicKeyToBase64(publicKey: PublicKey): String =

bytesToBase64(publicKey.getEncoded)

}

import Crypto._

To load Base64 public keys from the key files commonly in PKCS#8 PEM format on a file system, we use Bouncy Castle and Apache Commons Codec. As a side note, neither of the additional packages would be needed if the key files were in PKCS#8 DER format, which is binary and less commonly used.

Transactions in a Blockchain

With some basic utility and crypto functions in place, we now create class Account, which represents a user (e.g. a transaction originator or a miner) with the user’s cryptographic public key as the account ID. The corresponding private key, supposedly kept in private by the user, is for decrypting a transaction encrypted with the key. In our simplified model, the transactions won’t be encrypted hence private keys won’t be used.

case class Account(key: String, name: String) {
  override def toString: String =
    s"A(${key.substring(0,2)}*${key.substring(128,132)}*, $name)"
}

object Account {
  def fromKeyFile(keyFile: String, name: String): Account =
    publicKeyFromPemFile(keyFile) match {
      case Some(key) =>
        new Account(publicKeyToBase64(key), name)
      case None =>
        throw new Exception(s"ERROR: Problem loading $keyFile!")
    }
}

case class Account(key: String, name: String) {

override def toString: String =

s"A(${key.substring(0,2)}*${key.substring(128,132)}*, $name)"

}

object Account {

def fromKeyFile(keyFile: String, name: String): Account =

publicKeyFromPemFile(keyFile) match {

case Some(key) =>

new Account(publicKeyToBase64(key), name)

case None =>

throw new Exception(s"ERROR: Problem loading $keyFile!")

}

Next, we create class TransactionItem that represents a single transaction.

case class TransactionItem(id: String, accountFrom: Account, accountTo: Account, amount: Long, timestamp: Long) {

  override def toString: String = {
    val datetime = timestampToDateTime(timestamp)
    s"TI(${id.substring(0, 4)}, ${accountFrom.name} -> ${accountTo.name}, ${amount}, ${datetime})"
  }
}

object TransactionItem {
  def apply(accountFrom: Account, accountTo: Account, amount: Long, timestamp: Long): TransactionItem = {
    val bytes = accountFrom.key.getBytes ++ accountTo.key.getBytes ++ longToBytes(amount) ++ longToBytes(timestamp)
    new TransactionItem(bytesToBase64(hashFcn(bytes)), accountFrom, accountTo, amount, timestamp)
  }
}

case class TransactionItem(id: String, accountFrom: Account, accountTo: Account, amount: Long, timestamp: Long) {

override def toString: String = {

val datetime = timestampToDateTime(timestamp)

s"TI(${id.substring(0, 4)}, ${accountFrom.name} -> ${accountTo.name}, ${amount}, ${datetime})"

}

object TransactionItem {

def apply(accountFrom: Account, accountTo: Account, amount: Long, timestamp: Long): TransactionItem = {

val bytes = accountFrom.key.getBytes ++ accountTo.key.getBytes ++ longToBytes(amount) ++ longToBytes(timestamp)

new TransactionItem(bytesToBase64(hashFcn(bytes)), accountFrom, accountTo, amount, timestamp)

}

The id of TransactionItem is the hash value of the concatenated class fields in bytes. Note that the apply factory method performs the necessary hashing of the provided arguments to assemble a TransactionItem with the hash-value ID.

Next, we define class Transactions, representing a collection of TransactionItems. The id of Transactions is just a random-UUID. It could’ve been defined as a collective hash value like TransactionItem’s id to ensure content integrity but we’re going to leave that to be taken care in a hash-tree data structure, the Merkle Tree.

import java.util.UUID.randomUUID

case class Transactions(id: String, items: Array[TransactionItem], timestamp: Long) {
  override def toString: String = {
    val datetime = timestampToDateTime(timestamp)
    s"T(${id.substring(0, 4)}, ${items.map(_.amount).sum}/${items.size}, ${datetime})"
  }
}

object Transactions {
  def apply(transactions: Array[TransactionItem], timestamp: Long): Transactions =
    new Transactions(randomUUID.toString, transactions, timestamp)
}

import java.util.UUID.randomUUID

case class Transactions(id: String, items: Array[TransactionItem], timestamp: Long) {

override def toString: String = {

val datetime = timestampToDateTime(timestamp)

s"T(${id.substring(0, 4)}, ${items.map(_.amount).sum}/${items.size}, ${datetime})"

}

object Transactions {

def apply(transactions: Array[TransactionItem], timestamp: Long): Transactions =

new Transactions(randomUUID.toString, transactions, timestamp)

}

For illustration, we create a method that instantiates a Transactions object consisting of a random number of TransactionItems, each with a pair of random Accounts and a random amount (remember each Account needs a public key as its id).

val numOfAccounts = 5
val maxTransItems = 3
val keyPath = "/tmp/"

def randomFcn = java.util.concurrent.ThreadLocalRandom.current

def distinctRandomIntPair(lower: Int, upper: Int): List[Int] = {
  val rand1 = randomFcn.nextInt(lower, upper)
  val rand2 = randomFcn.nextInt(lower, upper)
  if (rand1 != rand2)
    List(rand1, rand2)
  else
    List(rand1, if (rand2 < upper - 1) rand2 + 1 else lower)
}

def generateTrans(
    numOfAccounts: Int,
    maxTransItems: Int,
    keyPath: String,
    keyFiles: List[String]
  ): Transactions = {

  def genTransItem: TransactionItem = {
    val idx = distinctRandomIntPair(0, numOfAccounts)
    val accountFrom = Account.fromKeyFile(s"${keyPath}${keyFiles(idx(0))}", s"User${idx(0)}")
    val accountTo = Account.fromKeyFile(s"${keyPath}${keyFiles(idx(1))}", s"User${idx(1)}")
    val amount = 1000L + randomFcn.nextInt(0, 5) * 500L

    val transItem = TransactionItem(accountFrom, accountTo, amount, System.currentTimeMillis)

    println(s"accountFrom: $accountFrom  accountTo: $accountTo")
    println(s"TransactionItem: $transItem")

    transItem
  }

  val numOfTransItems = randomFcn.nextInt(1, maxTransItems + 1)
  val transItems = Array.tabulate(numOfTransItems)(_ => genTransItem)

  Transactions(transItems, System.currentTimeMillis)
}

val numOfAccounts = 5

val maxTransItems = 3

val keyPath = "/tmp/"

def randomFcn = java.util.concurrent.ThreadLocalRandom.current

def distinctRandomIntPair(lower: Int, upper: Int): List[Int] = {

val rand1 = randomFcn.nextInt(lower, upper)

val rand2 = randomFcn.nextInt(lower, upper)

if (rand1 != rand2)

List(rand1, rand2)

else

List(rand1, if (rand2 < upper - 1) rand2 + 1 else lower)

}

def generateTrans(

numOfAccounts: Int,

maxTransItems: Int,

keyPath: String,

keyFiles: List[String]

): Transactions = {

def genTransItem: TransactionItem = {

val idx = distinctRandomIntPair(0, numOfAccounts)

val accountFrom = Account.fromKeyFile(s"${keyPath}${keyFiles(idx(0))}", s"User${idx(0)}")

val accountTo = Account.fromKeyFile(s"${keyPath}${keyFiles(idx(1))}", s"User${idx(1)}")

val amount = 1000L + randomFcn.nextInt(0, 5) * 500L

val transItem = TransactionItem(accountFrom, accountTo, amount, System.currentTimeMillis)

println(s"accountFrom: $accountFrom accountTo: $accountTo")

println(s"TransactionItem: $transItem")

transItem

}

val numOfTransItems = randomFcn.nextInt(1, maxTransItems + 1)

val transItems = Array.tabulate(numOfTransItems)(_ => genTransItem)

Transactions(transItems, System.currentTimeMillis)

}

Generating a couple of Transactions objects with random content:

(0 until numOfAccounts).foreach(i => generateKeyPairPemFiles(s"${keyPath}account${i}"))

val keyFiles = List.tabulate(numOfAccounts)(i => s"account${i}_public.pem")

val trans1 = generateTrans(numOfAccounts, maxTransItems, keyPath, keyFiles)
// accountFrom: A(MI*kOJv*, User1)  accountTo: A(MI*rlVg*, User2)
// TransactionItem: TI(Ih+j, User1 -> User2, 2000, 2020-03-02 21:36:23)
// accountFrom: A(MI*rlVg*, User2)  accountTo: A(MI*Qpgs*, User4)
// TransactionItem: TI(3vjD, User2 -> User4, 2500, 2020-03-02 21:36:23)
// trans1: Transactions = T(dca2, 4500/2, 2020-03-02 21:36:23)

val trans2 = generateTrans(numOfAccounts, maxTransItems, keyPath, keyFiles)
// accountFrom: A(MI*3SG5*, User3)  accountTo: A(MI*kOJv*, User1)
// TransactionItem: TI(EQgn, User3 -> User1, 1000, 2020-03-02 21:36:26)
// accountFrom: A(MI*rlVg*, User2)  accountTo: A(MI*kOJv*, User1)
// TransactionItem: TI(+cR/, User2 -> User1, 2500, 2020-03-02 21:36:26)
// accountFrom: A(MI*rlVg*, User2)  accountTo: A(MI*3SG5*, User3)
// TransactionItem: TI(G1Ue, User2 -> User3, 1500, 2020-03-02 21:36:26)
// trans2: Transactions = T(7915, 5000/3, 2020-03-02 21:36:26)

(0 until numOfAccounts).foreach(i => generateKeyPairPemFiles(s"${keyPath}account${i}"))

val keyFiles = List.tabulate(numOfAccounts)(i => s"account${i}_public.pem")

val trans1 = generateTrans(numOfAccounts, maxTransItems, keyPath, keyFiles)

// accountFrom: A(MI*kOJv*, User1) accountTo: A(MI*rlVg*, User2)

// TransactionItem: TI(Ih+j, User1 -> User2, 2000, 2020-03-02 21:36:23)

// accountFrom: A(MI*rlVg*, User2) accountTo: A(MI*Qpgs*, User4)

// TransactionItem: TI(3vjD, User2 -> User4, 2500, 2020-03-02 21:36:23)

// trans1: Transactions = T(dca2, 4500/2, 2020-03-02 21:36:23)

val trans2 = generateTrans(numOfAccounts, maxTransItems, keyPath, keyFiles)

// accountFrom: A(MI*3SG5*, User3) accountTo: A(MI*kOJv*, User1)

// TransactionItem: TI(EQgn, User3 -> User1, 1000, 2020-03-02 21:36:26)

// accountFrom: A(MI*rlVg*, User2) accountTo: A(MI*kOJv*, User1)

// TransactionItem: TI(+cR/, User2 -> User1, 2500, 2020-03-02 21:36:26)

// accountFrom: A(MI*rlVg*, User2) accountTo: A(MI*3SG5*, User3)

// TransactionItem: TI(G1Ue, User2 -> User3, 1500, 2020-03-02 21:36:26)

// trans2: Transactions = T(7915, 5000/3, 2020-03-02 21:36:26)

Hashing transactions into a Merkle tree

Merkle trees (or hash trees) are commonly used in blockchain computing. The main purpose of using a hash tree is to guarantee the authenticity of the contents in the dataset by successively composing their hashes in a hard-to-tamper fashion while keeping the resulting data structure relatively lightweight.

In a previous blog post re: Merkle tree implementation in Scala, we saw how a hash tree can be created from a collection of dataset. Even though the transaction collection data structure is now slightly more complex than the trivial example in the previous post, there is no added complexity in creating the hash tree. Borrowing a slight variant of the Merkle tree class from that post:

class MerkleTree(
    val hash: Array[Byte],
    val left: Option[MerkleTree] = None,
    val right: Option[MerkleTree] = None
  ) extends Serializable {

  def height: Int = {
    def loop(node: MerkleTree): Int = {
      if (!node.left.isEmpty && !node.right.isEmpty)
        math.max(loop(node.left.get), loop(node.right.get)) + 1
      else if (!node.left.isEmpty)
        loop(node.left.get) + 1
      else if(!node.right.isEmpty)
        loop(node.right.get) + 1
      else 1
    }
    loop(this)
  }

  def printNodes: Unit = {
    def printlnByLevel(t: MerkleTree): Unit = {
      for (l <- 1 to t.height) {
        loopByLevel(t, l)
        println
      }
    }
    def loopByLevel(node: MerkleTree, level: Int): Unit = {
      if (level <= 1)
        print(s"$node ")
      else {
        if (!node.left.isEmpty)
          loopByLevel(node.left.get, level - 1)
        else ()
        if (!node.right.isEmpty)
          loopByLevel(node.right.get, level - 1)
        else ()
      }
    }
    printlnByLevel(this)
  }

  override def toString: String = s"MT(${bytesToBase64(hash).substring(0, 4)})"
}

object MerkleTree {
  def apply(data: Array[Array[Byte]]): MerkleTree = {
    @scala.annotation.tailrec
    def buildTree(nodes: Array[MerkleTree]): Array[MerkleTree] = nodes match {
      case ns if ns.size <= 1 =>
        ns
      case ns =>
        val pairedNodes = ns.grouped(2).map{
          case Array(a, b) => new MerkleTree(hashFcn(a.hash ++ b.hash), Some(a), Some(b))
          case Array(a)    => new MerkleTree(hashFcn(a.hash), Some(a), None)
        }.toArray
        buildTree(pairedNodes)
    }

    if (data.isEmpty)
      new MerkleTree(hashFcn(Array.empty[Byte]))
    else {
      val nodes = data.map(byteArr => new MerkleTree(hashFcn(byteArr)))
      buildTree(nodes)(0)  // Return root of the tree
    }
  }
}

class MerkleTree(

val hash: Array[Byte],

val left: Option[MerkleTree] = None,

val right: Option[MerkleTree] = None

) extends Serializable {

def height: Int = {

def loop(node: MerkleTree): Int = {

if (!node.left.isEmpty && !node.right.isEmpty)

math.max(loop(node.left.get), loop(node.right.get)) + 1

else if (!node.left.isEmpty)

loop(node.left.get) + 1

else if(!node.right.isEmpty)

loop(node.right.get) + 1

else 1

}

loop(this)

}

def printNodes: Unit = {

def printlnByLevel(t: MerkleTree): Unit = {

for (l <- 1 to t.height) {

loopByLevel(t, l)

println

}

def loopByLevel(node: MerkleTree, level: Int): Unit = {

if (level <= 1)

print(s"$node ")

else {

if (!node.left.isEmpty)

loopByLevel(node.left.get, level - 1)

else ()

if (!node.right.isEmpty)

loopByLevel(node.right.get, level - 1)

else ()

}

printlnByLevel(this)

}

override def toString: String = s"MT(${bytesToBase64(hash).substring(0, 4)})"

}

object MerkleTree {

def apply(data: Array[Array[Byte]]): MerkleTree = {

@scala.annotation.tailrec

def buildTree(nodes: Array[MerkleTree]): Array[MerkleTree] = nodes match {

case ns if ns.size <= 1 =>

case ns =>

val pairedNodes = ns.grouped(2).map{

case Array(a, b) => new MerkleTree(hashFcn(a.hash ++ b.hash), Some(a), Some(b))

case Array(a) => new MerkleTree(hashFcn(a.hash), Some(a), None)

}.toArray

buildTree(pairedNodes)

}

if (data.isEmpty)

new MerkleTree(hashFcn(Array.empty[Byte]))

else {

val nodes = data.map(byteArr => new MerkleTree(hashFcn(byteArr)))

buildTree(nodes)(0) // Return root of the tree

}

Using MerkleTree’s apply factory method, we simply supply transactions objects we’ve created as the method argument:

val mRoot1 = MerkleTree(trans1.items.map(_.id.getBytes))
// mRoot1: MerkleTree = MT(OLsC)

mRoot1.printNodes
// MT(OLsC)
// MT(v7GL) MT(FbT3)

val mRoot2 = MerkleTree(trans2.items.map(_.id.getBytes))
// mRoot2: MerkleTree = MT(B0yx)

mRoot2.printNodes
// MT(B0yx)
// MT(o85z) MT(MFDs)
// MT(/t9G) MT(qxR3) MT(iGYW)

val mRoot1 = MerkleTree(trans1.items.map(_.id.getBytes))

// mRoot1: MerkleTree = MT(OLsC)

mRoot1.printNodes

// MT(OLsC)

// MT(v7GL) MT(FbT3)

val mRoot2 = MerkleTree(trans2.items.map(_.id.getBytes))

// mRoot2: MerkleTree = MT(B0yx)

mRoot2.printNodes

// MT(B0yx)

// MT(o85z) MT(MFDs)

// MT(/t9G) MT(qxR3) MT(iGYW)

For transaction collection trans1, Merkle root mRoot1 is all that is needed to ensure its integrity. So is mRoot2 for trans2. For a given collection of transactions, the recursive hashing of the transaction items in the tree nodes all the way to the root node makes it mathematically difficult to tamper with the transaction content. The Merkle root along with the associated transaction collection will be kept in an immutable “block”.

While the term “blockchain” has been used ad libitum throughout the post, we have not seen anything remotely resembling a “block” yet, have we? So far, we’ve only put in place some simple data structures along with a few utility/crypto functions. Nonetheless, they’re the essential elements for the building “block” of a blockchain, which we’ll dig into in the next post of this blog series.

Genuine Blog

A Tech Blog by Leo Cheung