Tag Archives: scala

NIO-based Reactor In Scala

Java NIO and Reactor pattern

A quick recap of Java NIO, which consists of the following key components:

Buffer – a container of primitive typed data (e.g. Byte, Int) that can be optimized for native I/O operations with memory alignment and paging functionality
Channel – a connector associated with an I/O entity (e.g. files, sockets) that supports non-blocking I/O operations
Selector – a demultiplexer on an event loop that selects events which are ready for carrying out pre-registered I/O operations (e.g. read, write)

Note that NIO Channel implements SelectableChannel which can be registered with the Selector as a SelectionKey for any I/O operations of interest. To optimally handle high-volume client connections to the server, channels can be configured via method configureBlocking(false) to support non-blocking I/O.

With NIO Buffers enabling optimal memory access and native I/O operations, Channels programmatically connecting I/O entities, and Selector serving as the demultiplexer on an event loop selecting ready-to-go I/O events to execute in a non-blocking fashion, the Java NIO API is a great fit for implementing an effective Reactor server.

Reactor event loop

This Scala version of the NIO Reactor server consists of two main classes NioReactor and Handler, along with a trait SelKeyAttm which is the base class for objects that are to be coupled with individual selection-keys as their attachments (more on this later).

Central to the NioReactor class is the “perpetual” event loop performed by class method selectorLoop(). It’s an recursive function that doesn’t ever return (thus returning Nothing), equivalent to the conventional infinite while(true){} loop. All it does is to repetitively check for the selection-keys whose corresponding channels are ready for the registered I/O operations and iterate through the keys to carry out the necessary work defined in the passed-in function iterFn().

  import java.util.{Iterator => JavaIter}

  @scala.annotation.tailrec
  final def selectorLoop(iterFn: (JavaIter[SelectionKey]) => Unit): Nothing = {
    selector.select()
    val it = selector.selectedKeys().iterator()
    iterFn(it)
    selectorLoop(iterFn)
  }

import java.util.{Iterator => JavaIter}

@scala.annotation.tailrec

final def selectorLoop(iterFn: (JavaIter[SelectionKey]) => Unit): Nothing = {

selector.select()

val it = selector.selectedKeys().iterator()

iterFn(it)

selectorLoop(iterFn)

}

Function iterateSelKeys, which is passed in as the parameter for the event loop function, holds the selection-keys iteration logic. While it’s tempting to convert the Java Iterator used in the original Java application to a Scala Iterator, the idea was scrapped due to the need for the timely removal of the iterated selection-key elements via remove() which apparently is a required step for the time-critical inner working of the selector. Scala Iterator (or Iterable) does not have such method or its equivalence.

  @scala.annotation.tailrec
  final def iterateSelKeys(it: JavaIter[SelectionKey]): Unit = {
    if (it.hasNext()) {
      val sk = it.next()
      it.remove()
      val attm: SelKeyAttm = sk.attachment().asInstanceOf[SelKeyAttm]
      if (attm != null)
        attm.run()
      iterateSelKeys(it)
    }
    else ()
  }

@scala.annotation.tailrec

final def iterateSelKeys(it: JavaIter[SelectionKey]): Unit = {

if (it.hasNext()) {

val sk = it.next()

it.remove()

val attm: SelKeyAttm = sk.attachment().asInstanceOf[SelKeyAttm]

if (attm != null)

attm.run()

iterateSelKeys(it)

}

else ()

}

Contrary to the selection-key attachments being of type Runnable in the original version, they’re now a subtype of SelKeyAttm each of which implements method run() that gets called once selected by the Selector. Using Scala Futures, Runnables are no longer the object type of the selection-key attachments. By making SelKeyAttm the base type for objects attached to the selection-keys, a slightly more specific “contract” (in the form of method specifications) is set up for those objects to adhere to.

Acceptor

The Acceptor, associated with the NIO ServerSocketChannel for the listener socket, is a subtype of SelKeyAttm. It’s responsible for reception of server connection requests.

  class Acceptor extends SelKeyAttm {
    def run(): Try[Unit] = Try {
        val channel: SocketChannel = serverChannel.accept()
        if (channel != null)
          new Handler(selector, channel)
        ()
      }
      .recover {
        case e: IOException => println(s"Acceptor: $e")
      }
  }

class Acceptor extends SelKeyAttm {

def run(): Try[Unit] = Try {

val channel: SocketChannel = serverChannel.accept()

if (channel != null)

new Handler(selector, channel)

()

}

.recover {

case e: IOException => println(s"Acceptor: $e")

}

Part of class NioReactor’s constructor routine is to bind the ServerSocketChannel to a specified port number. It’s also where the ServerSocketChannel is configured to be non-blocking and registered with the selector it’s ready to accept connections (OP_ACCEPT), subsequently creating a selection-key with the Acceptor instance as its attachment.

class NioReactor(port: Int) {
  implicit val ec: ExecutionContext = NioReactor.ec

  val selector: Selector = Selector.open()
  val serverChannel: ServerSocketChannel = ServerSocketChannel.open()

  serverChannel.socket().bind(new InetSocketAddress(port))
  serverChannel.configureBlocking(false)

  val sk: SelectionKey = serverChannel.register(selector, SelectionKey.OP_ACCEPT)
  sk.attach(new Acceptor())

  // ...
}

class NioReactor(port: Int) {

implicit val ec: ExecutionContext = NioReactor.ec

val selector: Selector = Selector.open()

val serverChannel: ServerSocketChannel = ServerSocketChannel.open()

serverChannel.socket().bind(new InetSocketAddress(port))

serverChannel.configureBlocking(false)

val sk: SelectionKey = serverChannel.register(selector, SelectionKey.OP_ACCEPT)

sk.attach(new Acceptor())

// ...

}

The companion object of the NioReactor class is set up with a thread pool to run the Reactor server at a provided port number in a Scala Future.

object NioReactor {
  val poolSize: Int = 10
  val workerPool = Executors.newFixedThreadPool(poolSize)
  implicit val ec = ExecutionContext.fromExecutorService(workerPool)

  def apply(port: Int = 9090): Future[Unit] = Future {
      (new NioReactor(port)).loop()
    }
    .recover {
      case e: IOException => println(s"Reactor($port): $e")
    }

  // ...
}

object NioReactor {

val poolSize: Int = 10

val workerPool = Executors.newFixedThreadPool(poolSize)

implicit val ec = ExecutionContext.fromExecutorService(workerPool)

def apply(port: Int = 9090): Future[Unit] = Future {

(new NioReactor(port)).loop()

}

.recover {

case e: IOException => println(s"Reactor($port): $e")

}

// ...

}

Event handlers

As shown in the snippet of the Acceptor class, upon acceptance of a server connection, an instance of Handler is spawned. All events (in our case, the reading requests from and writing responses to client sockets) are processed by those handlers, which are another subtype of SelKeyAttm.

The Handler class instance takes a Selector and a SocketChannel as parameters, initializes a couple of ByteBuffers for read/write, configures the SocketChannel to be non-blocking, registers with the selector for I/O operation OP_READ, creates a selection-key with the existing handler instance as its attachment, followed by nudging the selector for immediate return of any selected channels.

Method run() is responsible for, upon being called, carrying out the main read/write handling logic in accordance with the selection-key the passed-in SocketChannel is associated with and the corresponding I/O operation of interest.

object Handler {
  val readBufSize: Int = 1024
  val writeBufSize: Int = 1024
}

class Handler(sel: Selector, channel: SocketChannel)(implicit ec: ExecutionContext) extends SelKeyAttm {
  import Handler._

  var selKey: SelectionKey = null
  val readBuf = ByteBuffer.allocate(readBufSize)
  var writeBuf = ByteBuffer.allocate(writeBufSize)

  channel.configureBlocking(false)

  selKey = channel.register(sel, SelectionKey.OP_READ)
  selKey.attach(this)
  sel.wakeup()

  def run(): Try[Unit] = Try {
      if (selKey.isReadable())
        read()
      else if (selKey.isWritable())
        write()
    }
    .recover {
      case e: IOException => println(s"Handler run(): $e")
    }

  def process(): Unit = ???

  def read(): Unit = ???

  def write(): Unit = ???

}

object Handler {

val readBufSize: Int = 1024

val writeBufSize: Int = 1024

}

class Handler(sel: Selector, channel: SocketChannel)(implicit ec: ExecutionContext) extends SelKeyAttm {

import Handler._

var selKey: SelectionKey = null

val readBuf = ByteBuffer.allocate(readBufSize)

var writeBuf = ByteBuffer.allocate(writeBufSize)

channel.configureBlocking(false)

selKey = channel.register(sel, SelectionKey.OP_READ)

selKey.attach(this)

sel.wakeup()

def run(): Try[Unit] = Try {

if (selKey.isReadable())

read()

else if (selKey.isWritable())

write()

}

.recover {

case e: IOException => println(s"Handler run(): $e")

}

def process(): Unit = ???

def read(): Unit = ???

def write(): Unit = ???

}

Processing read/write buffers

Method read() calls channel.read(readBuf) which reads a preset number of bytes from the channel into the readBuf ByteBuffer and returns the number of Bytes read. If the channel has reached “end-of-stream”, in which case channel.read() will return -1, the corresponding selection-key will be cancelled and the channel will be closed; otherwise, processing work will commence.

  def read(): Unit = synchronized {
      Try {
          val numBytes: Int = channel.read(readBuf)
          println("Handler read(): #bytes read into 'readBuf' buffer = " + numBytes)
  
          if (numBytes == -1) {
            selKey.cancel()
            channel.close()
            println("Handler read(): client connection might have been dropped!")
          }
          else {
            Future {
                process()
              }
              .recover {
                case e: IOException => println(s"Handler process(): $e")
              }
          }
        }
        .recover {
          case e: IOException => println(s"Handler read(): $e")
        }
    }

def read(): Unit = synchronized {

Try {

val numBytes: Int = channel.read(readBuf)

println("Handler read(): #bytes read into 'readBuf' buffer = " + numBytes)

if (numBytes == -1) {

selKey.cancel()

channel.close()

println("Handler read(): client connection might have been dropped!")

}

else {

Future {

process()

}

.recover {

case e: IOException => println(s"Handler process(): $e")

}

.recover {

case e: IOException => println(s"Handler read(): $e")

}

Method process() does the actual post-read processing work. It’s supposed to do the heavy-lifting (thus being wrapped in a Scala Future), although in this trivial server example, all it does is simply echoing whatever read from the readBuf ByteBuffer using the NIO Buffer API and write into the writeBuf ByteBuffer, followed by switching the selection-key’s I/O operation of interest to OP_WRITE.

  def process(): Unit = synchronized {
      readBuf.flip()
      val bytes: Array[Byte] = Array.ofDim[Byte](readBuf.remaining())
      readBuf.get(bytes, 0, bytes.length)
      print("Handler process(): " + new String(bytes, Charset.forName("ISO-8859-1")))

      writeBuf = ByteBuffer.wrap(bytes)

      selKey.interestOps(SelectionKey.OP_WRITE)
      selKey.selector().wakeup()
    }

def process(): Unit = synchronized {

readBuf.flip()

val bytes: Array[Byte] = Array.ofDim[Byte](readBuf.remaining())

readBuf.get(bytes, 0, bytes.length)

print("Handler process(): " + new String(bytes, Charset.forName("ISO-8859-1")))

writeBuf = ByteBuffer.wrap(bytes)

selKey.interestOps(SelectionKey.OP_WRITE)

selKey.selector().wakeup()

}

Method write() calls channel.write(writeBuf) to write from the writeBuf ByteBuffer into the calling channel, followed by clearing both the read/write ByteBuffers and switching the selection-key’s I/O operation of interest back to OP_READ.

  def write(): Unit = {
    Try {
        val numBytes: Int = channel.write(writeBuf)
        println("Handler write(): #bytes read from 'writeBuf' buffer = " + numBytes)

        if (numBytes > 0) {
          readBuf.clear()
          writeBuf.clear()

          selKey.interestOps(SelectionKey.OP_READ)
          selKey.selector().wakeup()
        }
      }
      .recover {
        case e: IOException => println(s"Handler write(): $e")
      }
  }

def write(): Unit = {

Try {

val numBytes: Int = channel.write(writeBuf)

println("Handler write(): #bytes read from 'writeBuf' buffer = " + numBytes)

if (numBytes > 0) {

readBuf.clear()

writeBuf.clear()

selKey.interestOps(SelectionKey.OP_READ)

selKey.selector().wakeup()

}

.recover {

case e: IOException => println(s"Handler write(): $e")

}

Final thoughts

In this code rewrite in Scala, the main changes include the replacement of:

Java Runnable with Scala Future along with the base type SelKeyAttm for the Acceptor and Handler objects that are to be attached to selection-keys
while-loop with recursive functions
try-catch with Try-recover

While Java NIO is a great API for building efficient I/O-heavy applications, its underlying design apparently favors the imperative programming style. Rewriting the NIO-based Reactor server application using a functional programming language like Scala doesn’t necessarily make the code easier to read or maintain, as many function calls in the API return void (i.e. Scala Unit) and mutate variables passed in as parameters, making it difficult to be thoroughly rewritten in an idiomatic fashion.

Full source code of the Scala NIO Reactor server application is available at this GitHub repo.

To compile and run the Reactor server, git-clone the repo and run sbt from the project-root at a terminal on the server host:

$ sbt compile
$ sbt "runMain reactor.NioReactor `port`"

1 2	$ sbt compile $ sbt "runMain reactor.NioReactor `port`"

Skipping the port number will bind the server to the default port 9090.

To connect to the Reactor server, use telnet from one or more client host terminals:

telnet `server-host` `port`
#  e.g. telnet reactor.example.com 8080, telnet localhost 9090

1 2	telnet `server-host` `port` # e.g. telnet reactor.example.com 8080, telnet localhost 9090

Any text input from the client host(s) will be echoed back by the Reactor server, which itself will also report what has been processed. Below are sample input/output from a couple of client host terminals and the server terminal:

## Client host terminal #1:

$ telnet 192.168.1.100 9090
Trying ::1...
Connected to 192.168.1.100.
Escape character is '^]'.
blah blah blah from term #1
blah blah blah from term #1
^]
telnet> quit
Connection closed.

## Client host terminal #2:

$ telnet 192.168.1.100 9090
Trying ::1...
Connected to 192.168.1.100.
Escape character is '^]'.
foo bar from term #2
foo bar from term #2
^]
telnet> quit
Connection closed.

## Server host terminal:

$ sbt "runMain reactor.NioReactor"
[info] ...
[info] running reactor.NioReactor 
Handler read(): #bytes read into 'readBuf' buffer = 29
Handler write(): #bytes read from 'writeBuf' buffer = 29
Handler read(): #bytes read into 'readBuf' buffer = 22
Handler write(): #bytes read from 'writeBuf' buffer = 22
Handler read(): #bytes read into 'readBuf' buffer = -1
Handler read(): client connection might have been dropped!
Handler read(): #bytes read into 'readBuf' buffer = -1
Handler read(): client connection might have been dropped!
^C
[warn] Canceling execution...
Cancelled: runMain reactor.NioReactor

## Client host terminal #1:

$ telnet 192.168.1.100 9090

Trying ::1...

Connected to 192.168.1.100.

Escape character is '^]'.

blah blah blah from term #1

telnet> quit

Connection closed.

## Client host terminal #2:

$ telnet 192.168.1.100 9090

Trying ::1...

Connected to 192.168.1.100.

Escape character is '^]'.

foo bar from term #2

telnet> quit

Connection closed.

## Server host terminal:

$ sbt "runMain reactor.NioReactor"

[info] ...

[info] running reactor.NioReactor

Handler read(): #bytes read into 'readBuf' buffer = 29

Handler write(): #bytes read from 'writeBuf' buffer = 29

Handler read(): #bytes read into 'readBuf' buffer = 22

Handler write(): #bytes read from 'writeBuf' buffer = 22

Handler read(): #bytes read into 'readBuf' buffer = -1

Handler read(): client connection might have been dropped!

Handler read(): #bytes read into 'readBuf' buffer = -1

Handler read(): client connection might have been dropped!

[warn] Canceling execution...

Cancelled: runMain reactor.NioReactor

As a side note, the output from method Handler.process() which is wrapped in a Scala Future will be reported if the server is being run from within an IDE like IntelliJ.

Scala Binary Search Tree

1 Reply

When I wrote about Scala linked list implementation a couple of years ago, I also did some quick ground work for implementing binary search trees (BST). Occupied by other R&D projects at the time, it was put aside and has since been patiently awaiting its turn to see the light of day. As much of the code is already there, I’m going to put it up in this blog post along with some narrative remarks.

First, we come up with an ADT (algebraic data type). Let’s call it BSTree, starting out with a base trait with generic type A for the data element to be stored inside the tree structure, to be extended by a case class BSBranch as tree branches and a case object BSLeaf as “null” tree nodes. The ADT’s overall structure resembles that of the one used in the linked list implementation described in the old post.

ADT BSTree

sealed trait BSTree[+A] { self =>
  // ...

  override def toString: String = self match {
    case BSLeaf => "-"
    case BSBranch(e, c, l, r) => (l, r) match {
      case (BSLeaf, BSLeaf) => s"TN($e[$c], -, -)"
      case (BSBranch(le, lc, _, _), BSLeaf) => s"TN($e[$c], $le[$lc], -)"
      case (BSLeaf, BSBranch(re, rc, _, _)) => s"TN($e[$c], -, $re[$rc])"
      case (BSBranch(le, lc, _, _), BSBranch(re, rc, _, _)) => s"TN($e[$c], $le[$lc], $re[$rc])"
    }
  }
}

case class BSBranch[+A](
    elem: A, count: Int = 1, left: BSTree[A] = BSLeaf, right: BSTree[A] = BSLeaf
  ) extends BSTree[A]

case object BSLeaf extends BSTree[Nothing]

sealed trait BSTree[+A] { self =>

// ...

override def toString: String = self match {

case BSLeaf => "-"

case BSBranch(e, c, l, r) => (l, r) match {

case (BSLeaf, BSLeaf) => s"TN($e[$c], -, -)"

case (BSBranch(le, lc, _, _), BSLeaf) => s"TN($e[$c], $le[$lc], -)"

case (BSLeaf, BSBranch(re, rc, _, _)) => s"TN($e[$c], -, $re[$rc])"

case (BSBranch(le, lc, _, _), BSBranch(re, rc, _, _)) => s"TN($e[$c], $le[$lc], $re[$rc])"

}

case class BSBranch[+A](

elem: A, count: Int = 1, left: BSTree[A] = BSLeaf, right: BSTree[A] = BSLeaf

) extends BSTree[A]

case object BSLeaf extends BSTree[Nothing]

A few notes:

Data elements (of generic type A) are stored only in branches and data of the same value will go to the same branch and be represented with a proper count value.
BSTree is covariant, or else BSLeaf can’t even be defined as a sub-type of BSTree[Nothing].
A toString method is created for simplified string output of a tree instance.

Populating a BSTree

One of the first things we need is a method to insert tree nodes into an existing BSTree. We start expanding the base trait with method insert(). That’s all great for adding a node one at a time, but we also need a way to create a BSTree and populate it from a readily available collection of data elements. It makes sense to delegate such a factory method to the companion object BSTree as its method apply().

sealed trait BSTree[+A] { self =>

  def insert[B >: A : Ordering](elem: B): BSTree[B] = {
    val ord = implicitly[Ordering[B]]
    import ord._
    self match {
      case BSLeaf => BSBranch(elem, 1, BSLeaf, BSLeaf)
      case t @ BSBranch(e, c, l, r) =>
        if (e > elem)
          t.copy(left = l.insert(elem))
        else if (e < elem)
          t.copy(right = r.insert(elem))
        else
          t.copy(count = c + 1)
    }
  }

  // ...
}

object BSTree {
  def apply[A : Ordering](elems: Vector[A]): BSTree[A] = {
    val ord = implicitly[Ordering[A]]
    import ord._
    elems.foldLeft[BSTree[A]](BSLeaf)(_.insert(_))
  }
}

sealed trait BSTree[+A] { self =>

def insert[B >: A : Ordering](elem: B): BSTree[B] = {

val ord = implicitly[Ordering[B]]

import ord._

self match {

case BSLeaf => BSBranch(elem, 1, BSLeaf, BSLeaf)

case t @ BSBranch(e, c, l, r) =>

if (e > elem)

t.copy(left = l.insert(elem))

else if (e < elem)

t.copy(right = r.insert(elem))

else

t.copy(count = c + 1)

}

// ...

}

object BSTree {

def apply[A : Ordering](elems: Vector[A]): BSTree[A] = {

val ord = implicitly[Ordering[A]]

import ord._

elems.foldLeft[BSTree[A]](BSLeaf)(_.insert(_))

}

Note that type parameter B for insert() needs to be a supertype of A because Function1 is contravariant over its parameter type. In addition, the context bound “B : Ordering” constrains type B to be capable of being ordered (i.e. compared) which is necessary for traversing a binary search tree.

Testing BSTree.apply():

val tree = BSTree(Vector(30, 10, 50, 20, 40, 70, 60, 80, 20, 50, 60))
// tree: BSTree[Int] = TN(30[1], 10[1], 50[2])

/*
tree:
           30
        /       \
    10            50
       \        /    \
        20     40     70
                    /    \
                   60    80
*/

val tree = BSTree(Vector(30, 10, 50, 20, 40, 70, 60, 80, 20, 50, 60))

// tree: BSTree[Int] = TN(30[1], 10[1], 50[2])

tree:

/ \

10 50

\ / \

20 40 70

/ \

60 80

Tree traversal and finding tree nodes

Next, we need methods for tree traversal and search. For brevity, we only include in-order traversal.

  def traverseInOrder: Unit = self match {
    case BSLeaf => ()
    case BSBranch(_, _, l, r) =>
      l.traverseInOrder
      println(self)
      r.traverseInOrder
  }

  def traverseByLevel: Unit = {
    def loop(tree: BSTree[A], level: Int): Unit = {
      if (level > 0)
        tree match {
          case BSLeaf =>
          case BSBranch(_, _, l, r) =>
            loop(l, level - 1)
            loop(r, level - 1)
        }
      else
        print(s"$tree  ")
    }
    for (l <- 0 until self.height) {
      loop(self, l)
      println()
    }
  }

  def find[B >: A : Ordering](elem: B): BSTree[B] = {
    val ord = implicitly[Ordering[B]]
    import ord._
    self match {
      case BSLeaf => BSLeaf
      case t @ BSBranch(e, c, l, r) =>
        if (e < elem)
          r.find(elem)
        else if (e > elem)
          l.find(elem)
        else
          t
    }
  }

  def height: Int = self match {
    case BSLeaf => 0
    case BSBranch(_, _, l, r) => (l.height max r.height) + 1
  }

def traverseInOrder: Unit = self match {

case BSLeaf => ()

case BSBranch(_, _, l, r) =>

l.traverseInOrder

println(self)

r.traverseInOrder

}

def traverseByLevel: Unit = {

def loop(tree: BSTree[A], level: Int): Unit = {

if (level > 0)

tree match {

case BSLeaf =>

case BSBranch(_, _, l, r) =>

loop(l, level - 1)

loop(r, level - 1)

}

else

print(s"$tree ")

}

for (l <- 0 until self.height) {

loop(self, l)

println()

}

def find[B >: A : Ordering](elem: B): BSTree[B] = {

val ord = implicitly[Ordering[B]]

import ord._

self match {

case BSLeaf => BSLeaf

case t @ BSBranch(e, c, l, r) =>

if (e < elem)

r.find(elem)

else if (e > elem)

l.find(elem)

else

}

def height: Int = self match {

case BSLeaf => 0

case BSBranch(_, _, l, r) => (l.height max r.height) + 1

}

Using the tree created above:

/*
tree:
           30
        /       \
    10            50
       \        /    \
        20     40     70
                    /    \
                   60    80
*/

tree.height  // 4

tree.traverseInOrder
/*
TN(10[1], -, 20[2])
TN(20[2], -, -)
TN(30[1], 10[1], 50[2])
TN(40[1], -, -)
TN(50[2], 40[1], 70[1])
TN(60[2], -, -)
TN(70[1], 60[2], 80[1])
TN(80[1], -, -)
*/

tree.traverseByLevel
/*
TN(30[1], 10[1], 50[2])
TN(10[1], -, 20[2])  TN(50[2], 40[1], 70[1])
TN(20[2], -, -)  TN(40[1], -, -)  TN(70[1], 60[2], 80[1])
TN(60[2], -, -)  TN(80[1], -, -)
*/

val tree50 = tree.find(50)  // TN(50[2], 40[1], 70[1])
/*
tree50:
           50
        /       \
    40            70
                /    \
               60     80
*/

tree:

/ \

10 50

\ / \

20 40 70

/ \

60 80

tree.height // 4

tree.traverseInOrder

TN(10[1], -, 20[2])

TN(20[2], -, -)

TN(30[1], 10[1], 50[2])

TN(40[1], -, -)

TN(50[2], 40[1], 70[1])

TN(60[2], -, -)

TN(70[1], 60[2], 80[1])

TN(80[1], -, -)

tree.traverseByLevel

TN(30[1], 10[1], 50[2])

TN(10[1], -, 20[2]) TN(50[2], 40[1], 70[1])

TN(20[2], -, -) TN(40[1], -, -) TN(70[1], 60[2], 80[1])

TN(60[2], -, -) TN(80[1], -, -)

val tree50 = tree.find(50) // TN(50[2], 40[1], 70[1])

tree50:

/ \

40 70

/ \

60 80

Removing tree nodes

To be able to remove tree nodes that consist of a specific or range of element values, we include also the following few methods in the base trait.

Note that delete() may involve a little shuffling of the tree nodes. Once the tree node to be removed is located, that node may need to be filled with the node having the next-bigger element & count values from its right node (or equivalently, the node having the next-smaller element from its left node).

  def delete[B >: A : Ordering](elem: B): BSTree[B] = {
    val ord = implicitly[Ordering[B]]
    import ord._
    def leftMost(tree: BSTree[B], prev: BSTree[B]): BSTree[B] = tree match {
      case BSLeaf => prev
      case t @ BSBranch(_, _, l, _) => leftMost(l, t)
    }
    self match {
      case BSLeaf => BSLeaf
      case t @ BSBranch(e, c, l, r) =>
        if (e < elem)
          t.copy(right = r.delete(elem))
        else if (e > elem)
          t.copy(left = l.delete(elem))
        else {
          if (l == BSLeaf)
            r
          else if (r == BSLeaf)
            l
          else {
            val nextBigger = leftMost(r, r)
            nextBigger match { case BSBranch(enb, cnb, _, _) =>
              t.copy(elem = enb, count = cnb, right = r.delete(enb))
            }
          }
        }
    }
  }

  def trim[B >: A : Ordering](lower: B, upper: B): BSTree[B] = {
    val ord = implicitly[Ordering[B]]
    import ord._
    self match {
      case BSLeaf => BSLeaf
      case t @ BSBranch(e, _, l, r) =>
        val tt = t.copy(left = l.trim(lower, upper), right = r.trim(lower, upper)) 
        if (e < lower)
          tt match { case BSBranch(_, _, _, r) => r }
        else if (e > upper)
          tt match { case BSBranch(_, _, l, _) => l }
        else
          tt
    }
  }

  def cutOut[B >: A : Ordering](lower: B, upper: B): BSTree[B] = {
    val ord = implicitly[Ordering[B]]
    import ord._
    self match {
      case BSLeaf => BSLeaf
      case t @ BSBranch(e, _, l, r) =>
        val tt = t.copy(left = l.cutOut(lower, upper), right = r.cutOut(lower, upper)) 
        if (e >= lower && e <= upper)
          tt match { case BSBranch(e, _, _, _) => tt.delete(e) }
        else
          tt
    }
  }

def delete[B >: A : Ordering](elem: B): BSTree[B] = {

val ord = implicitly[Ordering[B]]

import ord._

def leftMost(tree: BSTree[B], prev: BSTree[B]): BSTree[B] = tree match {

case BSLeaf => prev

case t @ BSBranch(_, _, l, _) => leftMost(l, t)

}

self match {

case BSLeaf => BSLeaf

case t @ BSBranch(e, c, l, r) =>

if (e < elem)

t.copy(right = r.delete(elem))

else if (e > elem)

t.copy(left = l.delete(elem))

else {

if (l == BSLeaf)

else if (r == BSLeaf)

else {

val nextBigger = leftMost(r, r)

nextBigger match { case BSBranch(enb, cnb, _, _) =>

t.copy(elem = enb, count = cnb, right = r.delete(enb))

}

def trim[B >: A : Ordering](lower: B, upper: B): BSTree[B] = {

val ord = implicitly[Ordering[B]]

import ord._

self match {

case BSLeaf => BSLeaf

case t @ BSBranch(e, _, l, r) =>

val tt = t.copy(left = l.trim(lower, upper), right = r.trim(lower, upper))

if (e < lower)

tt match { case BSBranch(_, _, _, r) => r }

else if (e > upper)

tt match { case BSBranch(_, _, l, _) => l }

else

}

def cutOut[B >: A : Ordering](lower: B, upper: B): BSTree[B] = {

val ord = implicitly[Ordering[B]]

import ord._

self match {

case BSLeaf => BSLeaf

case t @ BSBranch(e, _, l, r) =>

val tt = t.copy(left = l.cutOut(lower, upper), right = r.cutOut(lower, upper))

if (e >= lower && e <= upper)

tt match { case BSBranch(e, _, _, _) => tt.delete(e) }

else

}

Method trim() removes tree nodes with element values below or above the provided range. Meanwhile, method cutOut() does the opposite by cutting out tree nodes with values within the given range. It involves slightly more work than trim(), requiring the use of delete() for individual tree nodes.

Example:

val treeD50 = tree.delete(50)  // TN(30[1], 10[1], 60[2])
/*
treeD50:
           30
        /       \
    10            60
       \        /    \
        20     40     70
                         \
                          80
*/

val treeT15to65 = tree.trim(15, 65)  // TN(30[1], 20[2], 50[2])
/*
treeT15to65:
           30
        /       \
    20            50
                /    \
               40     60
*/

val treeC25to55 = tree.trimInRange(25, 55)  // TN(60[1], 10[1], 70[1])
/*
treeC25to55:
           60
        /       \
    10            70
       \             \
        20            80
*/

val treeD50 = tree.delete(50) // TN(30[1], 10[1], 60[2])

treeD50:

/ \

10 60

\ / \

20 40 70

val treeT15to65 = tree.trim(15, 65) // TN(30[1], 20[2], 50[2])

treeT15to65:

/ \

20 50

/ \

40 60

val treeC25to55 = tree.trimInRange(25, 55) // TN(60[1], 10[1], 70[1])

treeC25to55:

/ \

10 70

\ \

20 80

Rebuilding a binary search tree

A highly unbalanced binary search tree beats the purpose of using such a data structure. One of the most straight forward ways to rebuild a binary search tree is to “unpack” the individual tree nodes of the existing tree by traversing in-order into a list (e.g. a Vector or List) of elements, followed by reconstructing a new tree with nodes being assigned elements from recursively half-ing the in-order node list.

  def rebuild: BSTree[A] = {
    def loop(vs: Vector[(A, Int)], lower: Int, upper: Int): BSTree[A] = {
      if (upper >= lower) {
        val m = (lower + upper) / 2
        val (e, c) = vs(m)
        BSBranch(e, c, loop(vs, lower, m-1), loop(vs, m+1, upper))
      }
      else
        BSLeaf
    }
    self match {
      case BSLeaf =>
        BSLeaf
      case _ =>
        val vs = self.toElemVector
        loop(vs, 0, vs.size - 1)
    }
  }

  def isBalanced: Boolean = self match {
    case BSLeaf => true
    case BSBranch(_, _, l, r) =>
      if (math.abs(l.height - r.height) > 1)
        false
      else
        l.isBalanced &amp;&amp; r.isBalanced
  }

  def toElemVector: Vector[(A, Int)] = self match {  // in-order
    case BSLeaf =>
      Vector.empty[(A, Int)]
    case BSBranch(e, c, l, r) =>
      (l.toElemVector :+ (e, c)) ++ r.toElemVector
  }

  def toVector: Vector[BSTree[A]] = self match {  // in-order
    case BSLeaf => Vector.empty[BSTree[A]]
    case BSBranch(_, _, l, r) =>
      (l.toVector :+ self) ++ r.toVector
  }

def rebuild: BSTree[A] = {

def loop(vs: Vector[(A, Int)], lower: Int, upper: Int): BSTree[A] = {

if (upper >= lower) {

val m = (lower + upper) / 2

val (e, c) = vs(m)

BSBranch(e, c, loop(vs, lower, m-1), loop(vs, m+1, upper))

}

else

BSLeaf

}

self match {

case BSLeaf =>

BSLeaf

case _ =>

val vs = self.toElemVector

loop(vs, 0, vs.size - 1)

}

def isBalanced: Boolean = self match {

case BSLeaf => true

case BSBranch(_, _, l, r) =>

if (math.abs(l.height - r.height) > 1)

false

else

l.isBalanced && r.isBalanced

}

def toElemVector: Vector[(A, Int)] = self match { // in-order

case BSLeaf =>

Vector.empty[(A, Int)]

case BSBranch(e, c, l, r) =>

(l.toElemVector :+ (e, c)) ++ r.toElemVector

}

def toVector: Vector[BSTree[A]] = self match { // in-order

case BSLeaf => Vector.empty[BSTree[A]]

case BSBranch(_, _, l, r) =>

(l.toVector :+ self) ++ r.toVector

}

Example:

val unbalancedTree = BSTree(Vector(20, 10, 30, 30, 40, 50, 60, 60, 50))
// unbalancedTree: BSTree[Int] = TN(20[1], 10[1], 30[2])

/*
unbalancedTree:
           20
        /       \
    10            30
                     \
                     40
                        \
                         60
                        /
                       50
*/

unbalancedTree.isBalanced  // false

val rebuiltTree = unbalancedTree.rebuild  // TN(30[2], 10[1], 50[2])
/*
rebuiltTree:

           30
        /       \
    10            50
       \        /    \
        20     40     60
*/

rebuiltTree.isBalanced  // true

val unbalancedTree = BSTree(Vector(20, 10, 30, 30, 40, 50, 60, 60, 50))

// unbalancedTree: BSTree[Int] = TN(20[1], 10[1], 30[2])

unbalancedTree:

/ \

10 30

unbalancedTree.isBalanced // false

val rebuiltTree = unbalancedTree.rebuild // TN(30[2], 10[1], 50[2])

rebuiltTree:

/ \

10 50

\ / \

20 40 60

rebuiltTree.isBalanced // true

Thoughts on the ADT

An alternative to how the ADT is designed is to have the class fields and methods declared in the BSTree base trait with specific implementations reside within subclasses BSBranch and BSLeaf, thus eliminating the need of the boiler-plate pattern matching for the subclasses. There is also the benefit of making class fields like left & right referenceable from the base trait, though they would need to be wrapped in Options with value None for BSLeaf.

As can be seen with the existing ADT, an advantage is having all the binary tree functions defined within the base trait once and for all. If there is the need for having left and right referenceable from the BSTree base trait, one can define something like below within the trait.

  def leftIfAny: Option[BSTree[A]] = self match {
    case BSLeaf => None
    case BSBranch(_, _, l, _) => Some(l)
  }

  def rightIfAny: Option[BSTree[A]] = self match {
    case BSLeaf => None
    case BSBranch(_, _, _, r) => Some(r)
  }

def leftIfAny: Option[BSTree[A]] = self match {

case BSLeaf => None

case BSBranch(_, _, l, _) => Some(l)

}

def rightIfAny: Option[BSTree[A]] = self match {

case BSLeaf => None

case BSBranch(_, _, _, r) => Some(r)

}

Example:

tree.leftIfAny
// Some(TN(10[1], -, 20[2]))

tree.rightIfAny.flatMap(_.rightIfAny)
// Some(TN(70[1], 60[2], 80[1]))

tree.leftIfAny

// Some(TN(10[1], -, 20[2]))

tree.rightIfAny.flatMap(_.rightIfAny)

// Some(TN(70[1], 60[2], 80[1]))

Then there is also non-idiomatic approach of using mutable class fields in a single tree class commonly seen in Java implementation, like below:

object SimpleBST {
  class TreeNode[A](_elem: A, _left: TreeNode[A] = null, _right: TreeNode[A] = null) {
    var elem: A = _elem
    var count: Int = 1
    var left: TreeNode[A] = _left
    var right: TreeNode[A] = _right
    override def toString: String =
      if (this == null)
        "null"
      else (left, right) match {
        case (null, null) => s"TN($elem[$count], -, -)"
        case (l,    null) => s"TN($elem[$count], ${l.elem}[${l.count}], -)"
        case (null, r)    => s"TN($elem[$count], -, ${r.elem}[${r.count}])"
        case (l,    r)    => s"TN($elem[$count], ${l.elem}[${l.count}], ${r.elem}[${r.count}])"
      }
  }

  // Methods for node addition, deletion, etc ...
}

object SimpleBST {

class TreeNode[A](_elem: A, _left: TreeNode[A] = null, _right: TreeNode[A] = null) {

var elem: A = _elem

var count: Int = 1

var left: TreeNode[A] = _left

var right: TreeNode[A] = _right

override def toString: String =

if (this == null)

"null"

else (left, right) match {

case (null, null) => s"TN($elem[$count], -, -)"

case (l, null) => s"TN($elem[$count], ${l.elem}[${l.count}], -)"

case (null, r) => s"TN($elem[$count], -, ${r.elem}[${r.count}])"

case (l, r) => s"TN($elem[$count], ${l.elem}[${l.count}], ${r.elem}[${r.count}])"

}

// Methods for node addition, deletion, etc ...

}

Addendum: Complete source code of the BSTree ADT

sealed trait BSTree[+A] { self =>

  def traverseInOrder: Unit = self match {
    case BSLeaf => ()
    case BSBranch(_, _, l, r) =>
      l.traverseInOrder
      println(self)
      r.traverseInOrder
  }

  def traverseByLevel: Unit = {
    def loop(tree: BSTree[A], level: Int): Unit = {
      if (level > 0)
        tree match {
          case BSLeaf =>
          case BSBranch(_, _, l, r) =>
            loop(l, level - 1)
            loop(r, level - 1)
        }
      else
        print(s"$tree  ")
    }
    for (l <- 0 until self.height) {
      loop(self, l)
      println()
    }
  }

  def find[B >: A : Ordering](elem: B): BSTree[B] = {
    val ord = implicitly[Ordering[B]]
    import ord._
    self match {
      case BSLeaf => BSLeaf
      case t @ BSBranch(e, c, l, r) =>
        if (e < elem)
          r.find(elem)
        else if (e > elem)
          l.find(elem)
        else
          t
    }
  }

  def height: Int = self match {
    case BSLeaf => 0
    case BSBranch(_, _, l, r) => (l.height max r.height) + 1
  }

  def insert[B >: A : Ordering](elem: B): BSTree[B] = {
    val ord = implicitly[Ordering[B]]
    import ord._
    self match {
      case BSLeaf => BSBranch(elem, 1, BSLeaf, BSLeaf)
      case t @ BSBranch(e, c, l, r) =>
        if (e > elem)
          t.copy(left = l.insert(elem))
        else if (e < elem)
          t.copy(right = r.insert(elem))
        else
          t.copy(count = c + 1)
    }
  }

  def delete[B >: A : Ordering](elem: B): BSTree[B] = {
    val ord = implicitly[Ordering[B]]
    import ord._
    def leftMost(tree: BSTree[B], prev: BSTree[B]): BSTree[B] = tree match {
      case BSLeaf => prev
      case t @ BSBranch(_, _, l, _) => leftMost(l, t)
    }
    self match {
      case BSLeaf => BSLeaf
      case t @ BSBranch(e, c, l, r) =>
        if (e < elem)
          t.copy(right = r.delete(elem))
        else if (e > elem)
          t.copy(left = l.delete(elem))
        else {
          if (l == BSLeaf)
            r
          else if (r == BSLeaf)
            l
          else {
            val nextBigger = leftMost(r, r)
            nextBigger match { case BSBranch(enb, cnb, _, _) =>
              t.copy(elem = enb, count = cnb, right = r.delete(enb))
            }
          }
        }
    }
  }

  def trim[B >: A : Ordering](lower: B, upper: B): BSTree[B] = {
    val ord = implicitly[Ordering[B]]
    import ord._
    self match {
      case BSLeaf => BSLeaf
      case t @ BSBranch(e, _, l, r) =>
        val tt = t.copy(left = l.trim(lower, upper), right = r.trim(lower, upper)) 
        if (e < lower)
          tt match { case BSBranch(_, _, _, r) => r }
        else if (e > upper)
          tt match { case BSBranch(_, _, l, _) => l }
        else
          tt
    }
  }

  def cutOut[B >: A : Ordering](lower: B, upper: B): BSTree[B] = {
    val ord = implicitly[Ordering[B]]
    import ord._
    self match {
      case BSLeaf => BSLeaf
      case t @ BSBranch(e, _, l, r) =>
        val tt = t.copy(left = l.cutOut(lower, upper), right = r.cutOut(lower, upper)) 
        if (e >= lower && e <= upper)
          tt match { case BSBranch(e, _, _, _) => tt.delete(e) }
        else
          tt
    }
  }

  def rebuild(): BSTree[A] = {
    def loop(vs: Vector[A], lower: Int, upper: Int): BSTree[A] = {
      if (upper >= lower) {
        val m = (lower + upper) / 2
        val em = vs(m)
        self match {
          case BSBranch(e, c, _, _) if e == em =>
            BSBranch(em, c+1, loop(vs, lower, m-1), loop(vs, m+1, upper))
          case _ =>
            BSBranch(em, 1, loop(vs, lower, m-1), loop(vs, m+1, upper))
        }
      }
      else
        BSLeaf
    }
    self match {
      case BSLeaf =>
        BSLeaf
      case _ =>
        val vs = self.toElemVector
        loop(vs, 0, vs.size - 1)
    }
  }

  def isBalanced: Boolean = self match {
    case BSLeaf => true
    case BSBranch(_, _, l, r) =>
      if (math.abs(l.height - r.height) > 1)
        false
      else
        l.isBalanced &amp;&amp; r.isBalanced
  }

  def toElemVector: Vector[A] = self match {  // in-order
    case BSLeaf =>
      Vector.empty[A]
    case BSBranch(e, _, l, r) =>
      (l.toElemVector :+ e) ++ r.toElemVector
  }

  def toVector: Vector[BSTree[A]] = self match {  // in-order
    case BSLeaf => Vector.empty[BSTree[A]]
    case BSBranch(_, _, l, r) =>
      (l.toVector :+ self) ++ r.toVector
  }

  override def toString: String = self match {
    case BSLeaf => "-"
    case BSBranch(e, c, l, r) => (l, r) match {
      case (BSLeaf, BSLeaf) => s"TN($e[$c], -, -)"
      case (BSBranch(le, lc, _, _), BSLeaf) => s"TN($e[$c], $le[$lc], -)"
      case (BSLeaf, BSBranch(re, rc, _, _)) => s"TN($e[$c], -, $re[$rc])"
      case (BSBranch(le, lc, _, _), BSBranch(re, rc, _, _)) => s"TN($e[$c], $le[$lc], $re[$rc])"
    }
  }
}

object BSTree {
  def apply[A : Ordering](elems: Vector[A]): BSTree[A] = {
    val ord = implicitly[Ordering[A]]
    import ord._
    elems.foldLeft[BSTree[A]](BSLeaf)(_.insert(_))
  }
}

case class BSBranch[A](
    elem: A, count: Int = 1, left: BSTree[A] = BSLeaf, right: BSTree[A] = BSLeaf
  ) extends BSTree[A]

case object BSLeaf extends BSTree[Nothing]

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

sealed trait BSTree[+A] { self =>

def traverseInOrder: Unit = self match {

case BSLeaf => ()

case BSBranch(_, _, l, r) =>

l.traverseInOrder

println(self)

r.traverseInOrder

}

def traverseByLevel: Unit = {

def loop(tree: BSTree[A], level: Int): Unit = {

if (level > 0)

tree match {

case BSLeaf =>

case BSBranch(_, _, l, r) =>

loop(l, level - 1)

loop(r, level - 1)

}

else

print(s"$tree ")

}

for (l <- 0 until self.height) {

loop(self, l)

println()

}

def find[B >: A : Ordering](elem: B): BSTree[B] = {

val ord = implicitly[Ordering[B]]

import ord._

self match {

case BSLeaf => BSLeaf

case t @ BSBranch(e, c, l, r) =>

if (e < elem)

r.find(elem)

else if (e > elem)

l.find(elem)

else

}

def height: Int = self match {

case BSLeaf => 0

case BSBranch(_, _, l, r) => (l.height max r.height) + 1

}

def insert[B >: A : Ordering](elem: B): BSTree[B] = {

val ord = implicitly[Ordering[B]]

import ord._

self match {

case BSLeaf => BSBranch(elem, 1, BSLeaf, BSLeaf)

case t @ BSBranch(e, c, l, r) =>

if (e > elem)

t.copy(left = l.insert(elem))

else if (e < elem)

t.copy(right = r.insert(elem))

else

t.copy(count = c + 1)

}

def delete[B >: A : Ordering](elem: B): BSTree[B] = {

val ord = implicitly[Ordering[B]]

import ord._

def leftMost(tree: BSTree[B], prev: BSTree[B]): BSTree[B] = tree match {

case BSLeaf => prev

case t @ BSBranch(_, _, l, _) => leftMost(l, t)

}

self match {

case BSLeaf => BSLeaf

case t @ BSBranch(e, c, l, r) =>

if (e < elem)

t.copy(right = r.delete(elem))

else if (e > elem)

t.copy(left = l.delete(elem))

else {

if (l == BSLeaf)

else if (r == BSLeaf)

else {

val nextBigger = leftMost(r, r)

nextBigger match { case BSBranch(enb, cnb, _, _) =>

t.copy(elem = enb, count = cnb, right = r.delete(enb))

}

def trim[B >: A : Ordering](lower: B, upper: B): BSTree[B] = {

val ord = implicitly[Ordering[B]]

import ord._

self match {

case BSLeaf => BSLeaf

case t @ BSBranch(e, _, l, r) =>

val tt = t.copy(left = l.trim(lower, upper), right = r.trim(lower, upper))

if (e < lower)

tt match { case BSBranch(_, _, _, r) => r }

else if (e > upper)

tt match { case BSBranch(_, _, l, _) => l }

else

}

def cutOut[B >: A : Ordering](lower: B, upper: B): BSTree[B] = {

val ord = implicitly[Ordering[B]]

import ord._

self match {

case BSLeaf => BSLeaf

case t @ BSBranch(e, _, l, r) =>

val tt = t.copy(left = l.cutOut(lower, upper), right = r.cutOut(lower, upper))

if (e >= lower && e <= upper)

tt match { case BSBranch(e, _, _, _) => tt.delete(e) }

else

}

def rebuild(): BSTree[A] = {

def loop(vs: Vector[A], lower: Int, upper: Int): BSTree[A] = {

if (upper >= lower) {

val m = (lower + upper) / 2

val em = vs(m)

self match {

case BSBranch(e, c, _, _) if e == em =>

BSBranch(em, c+1, loop(vs, lower, m-1), loop(vs, m+1, upper))

case _ =>

BSBranch(em, 1, loop(vs, lower, m-1), loop(vs, m+1, upper))

}

else

BSLeaf

}

self match {

case BSLeaf =>

BSLeaf

case _ =>

val vs = self.toElemVector

loop(vs, 0, vs.size - 1)

}

def isBalanced: Boolean = self match {

case BSLeaf => true

case BSBranch(_, _, l, r) =>

if (math.abs(l.height - r.height) > 1)

false

else

l.isBalanced && r.isBalanced

}

def toElemVector: Vector[A] = self match { // in-order

case BSLeaf =>

Vector.empty[A]

case BSBranch(e, _, l, r) =>

(l.toElemVector :+ e) ++ r.toElemVector

}

def toVector: Vector[BSTree[A]] = self match { // in-order

case BSLeaf => Vector.empty[BSTree[A]]

case BSBranch(_, _, l, r) =>

(l.toVector :+ self) ++ r.toVector

}

override def toString: String = self match {

case BSLeaf => "-"

case BSBranch(e, c, l, r) => (l, r) match {

case (BSLeaf, BSLeaf) => s"TN($e[$c], -, -)"

case (BSBranch(le, lc, _, _), BSLeaf) => s"TN($e[$c], $le[$lc], -)"

case (BSLeaf, BSBranch(re, rc, _, _)) => s"TN($e[$c], -, $re[$rc])"

case (BSBranch(le, lc, _, _), BSBranch(re, rc, _, _)) => s"TN($e[$c], $le[$lc], $re[$rc])"

}

object BSTree {

def apply[A : Ordering](elems: Vector[A]): BSTree[A] = {

val ord = implicitly[Ordering[A]]

import ord._

elems.foldLeft[BSTree[A]](BSLeaf)(_.insert(_))

}

case class BSBranch[A](

elem: A, count: Int = 1, left: BSTree[A] = BSLeaf, right: BSTree[A] = BSLeaf

) extends BSTree[A]

case object BSLeaf extends BSTree[Nothing]

A Crossword Puzzle In Scala

Data structure for the crossword board

First, it would help make the implementation more readable by coming up with a couple of inter-related classes to represent the crossword board’s layout:

Class Board represents the crossword board which can be constructed with the following parameters:

sz: Size of the square-shaped board with dimension sz x sz
bgCh: Background character representing a single cell of the board which can be identified with its XY-coordinates (default ‘.’)
spCh: Space character representing a single empty cell of a word slot (default ‘*’)
slots: A set of word slots, each of which is represented as an instance of class Slot
arr: The “hidden” character array of dimension sz x sz that stores the content (bgCh, spCh (in empty slots) and words)

Class Slot represents an individual word slot with these parameters:

start: The XY-coordinates of the starting character of the word slot
horiz: A boolean indicator whether a word slot is horizontal (true) or vertical (false)
len: Length of the word slot
jctIdxs: Indexes of all the junction cells that intersect any other word slots
word: A word that fits the slot in length and matches characters that intersect other word slots at its junctions (default “”)

Thus, we have the skeletal classes:

case class Slot(start: (Int, Int), horiz: Boolean, len: Int, jctIdxs: Set[Int], word: String = "")

case class Board private(sz: Int, bgCh: Char = '.', spCh: Char = '*', slots: Set[Slot] = Set()) {
  private val arr: Array[Array[Char]] = Array.ofDim(sz, sz)
}

Next, we expand class Board to include a few methods for recursively filling its word slots with matching words along with a couple of side-effecting methods for displaying board content, error reporting, etc.

case class Board private(sz: Int, bgCh: Char = '.', spCh: Char = '*', slots: Set[Slot] = Set()) {
  private val arr: Array[Array[Char]] = Array.ofDim(sz, sz)

  // Print content of a provided array
  def printArr(a: Array[Array[Char]] = arr): Unit

  // Report error messages
  def logError(msg: String, sList: List[Slot], wMap: Map[Int, Array[String]], jMap: Map[(Int, Int), Char]): Unit

  // Fill slots with words of matching lengths & junction characters
  def fillSlotsWithWords(words: Array[String]): Board

  // Update board array from slots
  def updateArrayFromSlots(): Board
}

To separate initialization tasks from the core crossword solving logic, we add an companion object to class Board.

object Board {
  // Initialize a board's array from provided background char, slot space char & slots
  def apply(sz: Int, bgCh: Char = '.', spCh: Char = '*', slots: Set[Slot] = Set()): Board

  // Convert an index of a slot into coordinates
  def idxToCoords(start: (Int, Int), horiz: Boolean, idx: Int): (Int, Int)

  // Initialize a board and create slots from board array with ONLY layout
  def initBoardLayout(layout: Array[Array[Char]], bg: Char = '.', sp: Char = '*'): Board
    def findSlot(i: Int, j: Int): Option[Slot]
    def createSlots(i: Int, j: Int, slots: Set[Slot], mkCh: Char): Set[Slot]
}

Creating a crossword puzzle

There are a couple of ways to construct a properly initialized Board object:

From a set of pre-defined word slots, or,
From a sz x sz array of the board layout with pre-populated cells of background and empty world slot

object Board {
  // Initialize a board's array from provided background char, slot space char & slots
  def apply(sz: Int, bgCh: Char = '.', spCh: Char = '*', slots: Set[Slot] = Set()): Board = {
    val board = new Board(sz, bgCh, spCh, slots)
    val slotCoords = slots.toList.flatMap{ slot =>
      val (i, j) = idxToCoords(slot.start, slot.horiz, slot.len - 1)
      List(i, j)
    }
    require(slots.isEmpty || slots.nonEmpty && slotCoords.max < sz, s"ERROR: $slots cannot be contained in ${board.arr}!")
    for (i <- 0 until sz; j <- 0 until sz) {
      board.arr(i)(j) = bgCh
    }
    slots.foreach{ slot =>
      val (i, j) = (slot.start._1, slot.start._2)
      (0 until slot.len).foreach{ k =>
        val (i, j) = idxToCoords(slot.start, slot.horiz, k)
        board.arr(i)(j) = if (slot.word.isEmpty) spCh else slot.word(k)
      }
    }
    board
  }

  ...
}

Method Board.apply() constructs a Board object by populating the private array field with the provided character bgCh to represent the background cells. Once initialized, empty slots represented by the other provided character spCh can be “carved out” in accordance with a set of pre-defined word Slot objects.

Example:

val emptySlots = Set(
  Slot((6, 8), false, 4, Set(2), ""),
  Slot((3, 1), true, 6, Set(2, 5), ""),
  Slot((3, 6), false, 7, Set(0, 5), ""),
  Slot((1, 9), false, 5, Set(0), ""),
  Slot((1, 3), false, 5, Set(0, 2, 4), ""),
  Slot((8, 1), true, 8, Set(5, 7), ""),
  Slot((1, 2), true, 8, Set(1, 7), ""),
  Slot((5, 0), true, 4, Set(3), "")
)

val board = Board(sz = 10, slots = emptySlots)

board.printArr()
/*
. . . . . . . . . . 
. . * * * * * * * * 
. . . * . . . . . * 
. * * * * * * . . * 
. . . * . . * . . * 
* * * * . . * . . * 
. . . . . . * . * . 
. . . . . . * . * . 
. * * * * * * * * . 
. . . . . . * . * . 
*/

Alternatively, one could construct a Board by providing a pre-populated array of the board layout.

object Board {
  ...

  // Initialize a board and create slots from board array with ONLY layout
  def initBoardLayout(layout: Array[Array[Char]], bg: Char = '.', sp: Char = '*'): Board = {
    val sz = layout.length
    def findSlot(i: Int, j: Int): Option[Slot] = {
      if (j < sz-1 && layout(i)(j+1) == sp) {  // Horizontal
        val ln = Iterator.from(j).takeWhile(k => k < sz && layout(i)(k) == sp).size
        val js = (0 until ln).collect{
          case k if (i>=0+1 && layout(i-1)(j+k)!=bg) || (i<sz-1 && layout(i+1)(j+k)!=bg) => k
        }.toSet
        Option.when(ln > 1)(Slot((i, j), true, ln, js))
      }
      else {  // Vertical
        val ln = Iterator.from(i).takeWhile(k => k < sz && layout(k)(j) == sp).size
        val js = (0 until ln).collect{
          case k if (j>=0+1 && layout(i+k)(j-1)!=bg) || (j<sz-1 && layout(i+k)(j+1)!=bg) => k
        }.toSet
        Option.when(ln > 1)(Slot((i, j), false, ln, js))
      }
    }
    def createSlots(i: Int, j: Int, slots: Set[Slot], mkCh: Char): Set[Slot] = {
      if (i == sz) slots
      else {
        if (j == sz)
          createSlots(i+1, 0, slots, mkCh)
        else {
          findSlot(i, j) match {
            case Some(slot) =>
              val jctCoords = slot.jctIdxs.map{ idx =>
                Board.idxToCoords(slot.start, slot.horiz, idx)
              }
              (0 until slot.len).foreach { k =>
                val (i, j) = Board.idxToCoords(slot.start, slot.horiz, k)
                if (!jctCoords.contains((i, j)))
                  layout(i)(j) = mkCh
              }
              createSlots(i, j+1, slots + slot, mkCh)
            case None =>
              createSlots(i, j+1, slots, mkCh)
          }
        }
      }
    }
    val mkCh = '\u0000'
    val newBoard = Board(sz, bg, sp).copy(slots = createSlots(0, 0, Set(), mkCh))
    (0 until sz).foreach{ i =>
      newBoard.arr(i) = layout(i).map{ ch => if (ch == mkCh) sp else ch }
    }
    newBoard
  }
}

Method Board.initBoardLayout() kicks off createSlots(), which parses every element of the array to identify the “head” of any word slot and figure out its “tail” and indexes of cells that intersect with other slots. The mkCh character is for temporarily masking off any identified slot cell that is not a intersecting junction during the parsing.

Example:

val arr = Array(
  Array('.', '.', '.', '.', '.', '.', '.', '.', '.', '.'),
  Array('.', '.', '*', '*', '*', '*', '*', '*', '*', '*'),
  Array('.', '.', '.', '*', '.', '.', '.', '.', '.', '*'),
  Array('.', '*', '*', '*', '*', '*', '*', '.', '.', '*'),
  Array('.', '.', '.', '*', '.', '.', '*', '.', '.', '*'),
  Array('*', '*', '*', '*', '.', '.', '*', '.', '.', '*'),
  Array('.', '.', '.', '.', '.', '.', '*', '.', '*', '.'),
  Array('.', '.', '.', '.', '.', '.', '*', '.', '*', '.'),
  Array('.', '*', '*', '*', '*', '*', '*', '*', '*', '.'),
  Array('.', '.', '.', '.', '.', '.', '*', '.', '*', '.')
)

val board = Board.initBoardLayout(arr)
/*
Board(10, '.', '*', HashSet(
  Slot((6, 8), false, 4, Set(2), ""),
  Slot((3, 1), true, 6, Set(2, 5), ""),
  Slot((3, 6), false, 7, Set(0, 5), ""),
  Slot((1, 9), false, 5, Set(0), ""),
  Slot((1, 3), false, 5, Set(0, 2, 4), ""),
  Slot((8, 1), true, 8, Set(5, 7), ""),
  Slot((1, 2), true, 8, Set(1, 7), ""),
  Slot((5, 0), true, 4, Set(3), "")
))
*/

board.printArr()
/*
. . . . . . . . . . 
. . * * * * * * * * 
. . . * . . . . . * 
. * * * * * * . . * 
. . . * . . * . . * 
* * * * . . * . . * 
. . . . . . * . * . 
. . . . . . * . * . 
. * * * * * * * * . 
. . . . . . * . * . 
*/

Solving the crossword puzzle

The core crossword solving logic is handled by method fillSlotsWithWords() placed within the body of case class Board.

case class Board private(sz: Int, bgCh: Char = '.', spCh: Char = '*', slots: Set[Slot] = Set()) {
  private val arr: Array[Array[Char]] = Array.ofDim(sz, sz)

  ...

  // Fill slots with words of matching lengths & junction characters
  def fillSlotsWithWords(words: Array[String]): Board = {
    val wordMap: Map[Int, Array[String]] = words.groupMap(_.length)(identity)
    val wLenProd: Int = wordMap.map{ case (_, ws) => ws.size }.product
    val trials: Int = wLenProd * wLenProd
    // Prioritize slots with `len` of the smallest by-len group and maximal number of `jctIdxs`
    val orderedSlots = slots.groupMap(_.len)(identity).toList.
      flatMap{ case (_, ss) => ss.map((ss.size, _)) }.
      sortBy{ case (k, slot) => (k, -slot.jctIdxs.size) }.
      map{ case (_, slot) => slot }
    val jctMap: Map[(Int, Int), Char] = slots.
      map(slot => slot.jctIdxs.map(Board.idxToCoords(slot.start, slot.horiz, _))).
      reduce(_ union _).map(_->' ').toMap
    def loop(slotList: List[Slot],
             slotSet: Set[Slot],
             wMap: Map[Int, Array[String]],
             jMap: Map[(Int, Int), Char],
             runs: Int): Set[Slot] = {
      if (runs == 0) {  // Done trying!
        logError(s"FAILURE: Tried $trials times ...", slotList, wMap, jMap)
        slotSet
      }
      else {
        slotList match {
          case Nil =>
            slotSet  // Success!
          case slot :: others =>
            val wordsWithLen = wMap.get(slot.len)
            wordsWithLen match {
              case None =>
                logError(s"FAILURE: Missing words of length ${slot.len}!", slotList, wMap, jMap)
                slotSet
              case Some(words) =>
                words.find{ word =>
                  slot.jctIdxs.forall{ idx =>
                    val jCoords = Board.idxToCoords(slot.start, slot.horiz, idx)
                    jMap(jCoords) == ' ' || word(idx) == jMap(jCoords)
                  }
                }
                match {
                  case Some(w) =>
                    val kvs = slot.jctIdxs.map { idx =>
                      Board.idxToCoords(slot.start, slot.horiz, idx) -> w(idx)
                    }
                    loop(others, slotSet - slot + slot.copy(word = w), wMap, jMap ++ kvs, runs-1)
                  case None =>
                    val newWMap = wMap + (slot.len -> (words.tail :+ words.head))
                    // Restart the loop with altered wordMap
                    loop(orderedSlots, slots, newWMap, jctMap, runs-1)
                }
            }
        }
      }
    }
    val newBoard = copy(slots = loop(orderedSlots, Set(), wordMap, jctMap, wLenProd * wLenProd))
    (0 until sz).foreach{ i => newBoard.arr(i) = arr(i) }
    newBoard.updateArrayFromSlots()
  }

  ...
}

Method fillSlotsWithWords() takes an array of words as the parameter. Those words are supposed to fit the slots of a given initialized Board. The method creates an optimally ordered List of the slots and a Map of words grouped by the length of the words with each key associated with a list of words of the same length. In addition, it also assembles a Map of slot junctions as a helper dataset for matching intersecting characters by XY-coordinates.

The method then uses a recursive loop() to fill slotList and shuffle the wordMap as needed. Let me elaborate a little bit:

Optimally ordered slots – Slots are pre-ordered so that those with the most unique slot-length and the most slot junctions will be processed first. The idea is to fill upfront the most slots with certainty (e.g. those with unique length) and lock down the most slot junction characters as subsequent matching criteria for the intersecting slots.
Order shuffling – During the recursive processing, when a slot of a given length can no longer be filled with the remaining list of slots, the list of words of that length will be shuffled and the loop() will be reset to run with the altered wordMap.

Finally, since it’s possible that the given crossword puzzle slot layout and words might not have a solution, a limit in the number of trials of running the recursive loop() is provided as a safety measure to avoid an infinite loop.

Example:

val words = Array("mandarin", "apple", "nance", "papaya", "avocado", "date", "kiwi", "honeydew")

val filledBoard = board.fillSlotsWithWords(words)
/*
Board(10, '.', '*', HashSet(
  Slot((1, 9), false, 5, Set(0), "nance"),
  Slot((3, 6), false, 7, Set(0, 5), "avocado"),
  Slot((5, 0), true, 4, Set(3), "date"),
  Slot((8, 1), true, 8, Set(5, 7), "honeydew"),
  Slot((6, 8), false, 4, Set(2), "kiwi"),
  Slot((3, 1), true, 6, Set(2, 5), "papaya"),
  Slot((1, 3), false, 5, Set(0, 2, 4), "apple"),
  Slot((1, 2), true, 8, Set(1, 7), "mandarin")
))
*/

filledBoard.printArr()
/*
. . . . . . . . . . 
. . m a n d a r i n 
. . . p . . . . . a 
. p a p a y a . . n 
. . . l . . v . . c 
d a t e . . o . . e 
. . . . . . c . k . 
. . . . . . a . i . 
. h o n e y d e w . 
. . . . . . o . i . 
*/

Final thoughts

Appended is the complete source code of the classes for the crossword puzzle.

Obviously there are many different ways to formulate and solve a puzzle game like this. What’s being illustrated here is a brute-force approach, as the order shuffling routine upon hitting a wall within the slot-filling recursive loop has to reset the recursion process. The good news is that, having tested it with a dozen of random examples (admittedly a small sample), the prioritization strategy (by optimally picking slots to be evaluated) does prove to help a great deal in efficiently solving the sample games.

case class Slot(start: (Int, Int), horiz: Boolean, len: Int, jctIdxs: Set[Int], word: String = "")

object Board {
  // Initialize a board's array from provided background char, slot space char & slots
  def apply(sz: Int, bgCh: Char = '.', spCh: Char = '*', slots: Set[Slot] = Set()): Board = {
    val board = new Board(sz, bgCh, spCh, slots)
    val slotCoords = slots.toList.flatMap{ slot =>
      val (i, j) = idxToCoords(slot.start, slot.horiz, slot.len - 1)
      List(i, j)
    }
    require(slots.isEmpty || slots.nonEmpty && slotCoords.max < sz, s"ERROR: $slots cannot be contained in ${board.arr}!")
    for (i <- 0 until sz; j <- 0 until sz) {
      board.arr(i)(j) = bgCh
    }
    slots.foreach{ slot =>
      val (i, j) = (slot.start._1, slot.start._2)
      (0 until slot.len).foreach{ k =>
        val (i, j) = idxToCoords(slot.start, slot.horiz, k)
        board.arr(i)(j) = if (slot.word.isEmpty) spCh else slot.word(k)
      }
    }
    board
  }

  // Convert an index of a slot into coordinates
  def idxToCoords(start: (Int, Int), horiz: Boolean, idx: Int): (Int, Int) = {
    if (horiz) (start._1, start._2 + idx) else (start._1 + idx, start._2)
  }

  // Initialize a board and create slots from board array with ONLY layout
  def initBoardLayout(layout: Array[Array[Char]], bg: Char = '.', sp: Char = '*'): Board = {
    val sz = layout.length
    def findSlot(i: Int, j: Int): Option[Slot] = {
      if (j < sz-1 && layout(i)(j+1) == sp) {  // Horizontal
        val ln = Iterator.from(j).takeWhile(k => k < sz && layout(i)(k) == sp).size
        val js = (0 until ln).collect{
          case k if (i>=0+1 && layout(i-1)(j+k)!=bg) || (i<sz-1 && layout(i+1)(j+k)!=bg) => k
        }.toSet
        Option.when(ln > 1)(Slot((i, j), true, ln, js))
      }
      else {  // Vertical
        val ln = Iterator.from(i).takeWhile(k => k < sz && layout(k)(j) == sp).size
        val js = (0 until ln).collect{
          case k if (j>=0+1 && layout(i+k)(j-1)!=bg) || (j<sz-1 && layout(i+k)(j+1)!=bg) => k
        }.toSet
        Option.when(ln > 1)(Slot((i, j), false, ln, js))
      }
    }
    def createSlots(i: Int, j: Int, slots: Set[Slot], mkCh: Char): Set[Slot] = {
      if (i == sz) slots
      else {
        if (j == sz)
          createSlots(i+1, 0, slots, mkCh)
        else {
          findSlot(i, j) match {
            case Some(slot) =>
              val jctCoords = slot.jctIdxs.map{ idx =>
                Board.idxToCoords(slot.start, slot.horiz, idx)
              }
              (0 until slot.len).foreach { k =>
                val (i, j) = Board.idxToCoords(slot.start, slot.horiz, k)
                if (!jctCoords.contains((i, j)))
                  layout(i)(j) = mkCh
              }
              createSlots(i, j+1, slots + slot, mkCh)
            case None =>
              createSlots(i, j+1, slots, mkCh)
          }
        }
      }
    }
    val mkCh = '\u0000'
    val newBoard = Board(sz, bg, sp).copy(slots = createSlots(0, 0, Set(), mkCh))
    (0 until sz).foreach{ i =>
      newBoard.arr(i) = layout(i).map{ ch => if (ch == mkCh) sp else ch }
    }
    newBoard
  }
}

case class Board private(sz: Int, bgCh: Char = '.', spCh: Char = '*', slots: Set[Slot] = Set()) {
  private val arr: Array[Array[Char]] = Array.ofDim(sz, sz)

  // Print content of a provided array
  def printArr(a: Array[Array[Char]] = arr): Unit =
    for (i <- 0 until sz) {
      for (j <- 0 until sz) {
        print(s"${a(i)(j)} ")
        if (a(i)(j) == '\u0000') print(" ")  // Print a filler space for null char
      }
      println()
    }

  // Report error messages
  def logError(msg: String, sList: List[Slot], wMap: Map[Int, Array[String]], jMap: Map[(Int, Int), Char]): Unit = {
    println(msg)
    println(s"Remaining slots: $sList")
    println(s"Latest wordMap: $wMap")
    println(s"Latest jctMap: $jMap")
  }

  // Fill slots with words of matching lengths & junction characters
  def fillSlotsWithWords(words: Array[String]): Board = {
    val wordMap: Map[Int, Array[String]] = words.groupMap(_.length)(identity)
    val wLenProd: Int = wordMap.map{ case (_, ws) => ws.size }.product
    val trials: Int = wLenProd * wLenProd
    // Prioritize slots with `len` of the smallest by-len group and maximal number of `jctIdxs`
    val orderedSlots = slots.groupMap(_.len)(identity).toList.
      flatMap{ case (_, ss) => ss.map((ss.size, _)) }.
      sortBy{ case (k, slot) => (k, -slot.jctIdxs.size) }.
      map{ case (_, slot) => slot }
    val jctMap: Map[(Int, Int), Char] = slots.
      map(slot => slot.jctIdxs.map(Board.idxToCoords(slot.start, slot.horiz, _))).
      reduce(_ union _).map(_->' ').toMap
    def loop(slotList: List[Slot],
             slotSet: Set[Slot],
             wMap: Map[Int, Array[String]],
             jMap: Map[(Int, Int), Char],
             runs: Int): Set[Slot] = {
      if (runs == 0) {  // Done trying!
        logError(s"FAILURE: Tried $trials times ...", slotList, wMap, jMap)
        slotSet
      }
      else {
        slotList match {
          case Nil =>
            slotSet  // Success!
          case slot :: others =>
            val wordsWithLen = wMap.get(slot.len)
            wordsWithLen match {
              case None =>
                logError(s"FAILURE: Missing words of length ${slot.len}!", slotList, wMap, jMap)
                slotSet
              case Some(words) =>
                words.find{ word =>
                  slot.jctIdxs.forall{ idx =>
                    val jCoords = Board.idxToCoords(slot.start, slot.horiz, idx)
                    jMap(jCoords) == ' ' || word(idx) == jMap(jCoords)
                  }
                }
                match {
                  case Some(w) =>
                    val kvs = slot.jctIdxs.map { idx =>
                      Board.idxToCoords(slot.start, slot.horiz, idx) -> w(idx)
                    }
                    loop(others, slotSet - slot + slot.copy(word = w), wMap, jMap ++ kvs, runs-1)
                  case None =>
                    val newWMap = wMap + (slot.len -> (words.tail :+ words.head))
                    // Restart the loop with altered wordMap
                    loop(orderedSlots, slots, newWMap, jctMap, runs-1)
                }
            }
        }
      }
    }
    val newBoard = copy(slots = loop(orderedSlots, Set(), wordMap, jctMap, wLenProd * wLenProd))
    (0 until sz).foreach{ i => newBoard.arr(i) = arr(i) }
    newBoard.updateArrayFromSlots()
  }

  // Update board array from slots
  def updateArrayFromSlots(): Board = {
    slots.foreach{ slot =>
      (0 until slot.len).foreach{ k =>
        val (i, j) = Board.idxToCoords(slot.start, slot.horiz, k)
        arr(i)(j) = if (slot.word.isEmpty) spCh else slot.word(k)
      }
    }
    this
  }
}

Genuine Blog

A Tech Blog by Leo Cheung

Tag Archives: scala

NIO-based Reactor In Scala

Java NIO and Reactor pattern

Reactor event loop

Acceptor

Event handlers

Processing read/write buffers

Final thoughts

Scala Binary Search Tree

ADT BSTree

Populating a BSTree

Tree traversal and finding tree nodes

Removing tree nodes

Rebuilding a binary search tree

Thoughts on the ADT

Addendum: Complete source code of the BSTree ADT

A Crossword Puzzle In Scala

Data structure for the crossword board

Creating a crossword puzzle

Solving the crossword puzzle

Final thoughts