Monthly Archives: November 2019

Scala’s groupMap And groupMapReduce

For grouping elements in a Scala collection by a provided key, the de facto method of choice has been groupBy, which has the following signature for an `Iterable`:

// Method groupBy
def groupBy[K](f: (A) => K): immutable.Map[K, Iterable[A]]

It returns an immutable Map of elements each consisting of a key and a collection of values of the original type. To process this collection of values in the resulting Map, Scala provides a method mapValues with the below signature:

// Method mapValues
def mapValues[W](f: (V) => W): Map[K, W]

This `groupBy/mapValues` combo proves to be handy for processing the values of the Map generated from the grouping. However, as of Scala 2.13, method `mapValues` is no longer available.

groupMap

A new method, groupMap, has emerged for grouping of a collection based on provided functions for defining the keys and values of the resulting Map. Here’s the signature of method groupMap for an `Iterable`:

// Method groupMap
def groupMap[K, B](key: (A) => K)(f: (A) => B): immutable.Map[K, Iterable[B]]

Let’s start with a simple example grouping via the good old `groupBy` method:

// Example 1: groupBy
val fruits = List("apple", "apple", "orange", "pear", "pear", "pear")

fruits.groupBy(identity)
// res1: Map[String, List[String]] = Map(
//   "orange" -> List("orange"),
//   "apple" -> List("apple", "apple"),
//   "pear" -> List("pear", "pear", "pear")
// )

We can replace `groupBy` with `groupMap` like below:

// Example 1: groupMap
fruits.groupMap(identity)(identity)

In this particular case, the new method doesn’t offer any benefit over the old one.

Let’s look at another example that involves a collection of class objects:

// Example 2
case class Pet(species: String, name: String, age: Int)

val pets = List(
  Pet("cat", "sassy", 2), Pet("cat", "bella", 3), 
  Pet("dog", "poppy", 3), Pet("dog", "bodie", 4), Pet("dog", "poppy", 2), 
  Pet("bird", "coco", 2), Pet("bird", "kiwi", 1)
)

If we want to list all pet names per species, a `groupBy` coupled with `mapValues` will do:

// Example 2: groupBy
pets.groupBy(_.species).mapValues(_.map(_.name))
// res2: Map[String, List[String]] = Map(
//   "cat" -> List("sassy", "bella"),
//   "bird" -> List("coco", "kiwi"),
//   "dog" -> List("poppy", "bodie", "poppy")
// )

But in this case, `groupMap` can do it with better readability due to the functions for defining the keys and values of the resulting Map being nicely placed side by side as parameters:

// Example 2: groupMap
pets.groupMap(_.species)(_.name)

groupMapReduce

At times, we need to perform reduction on the Map values after grouping of a collection. This is when the other new method groupMapReduce comes in handy:

// Method groupMapReduce
def groupMapReduce[K, B](key: (A) => K)(f: (A) => B)(reduce: (B, B) => B): immutable.Map[K, B]

Besides the parameters for defining the keys and values of the resulting Map like `groupMap`, `groupMapReduce` also expects an additional parameter in the form of a binary operation for reduction.

Using the same pets example, if we want to compute the count of pets per species, a `groupBy/mapValues` approach will look like below:

// Example 3: groupBy/mapValues
pets.groupBy(_.species).mapValues(_.size)
// res1: Map[String, Int] = Map("cat" -> 2, "bird" -> 2, "dog" -> 3)

With `groupMapReduce`, we can “compartmentalize” the functions for the keys, values and reduction operation separately as follows:

// Example 3: groupMapReduce
pets.groupMapReduce(_.species)(_ => 1)(_ + _)

One more example:

// Example 4
import java.time.LocalDate
case class Product(id: String, saleDate: LocalDate, listPrice: Double, discPrice: Double)

val products = List(
  Product("p001", LocalDate.of(2019, 9, 11), 10, 8.5),
  Product("p002", LocalDate.of(2019, 9, 18), 12, 10),
  Product("p003", LocalDate.of(2019, 9, 27), 10, 9),
  Product("p004", LocalDate.of(2019, 10, 6), 15, 12.5),
  Product("p005", LocalDate.of(2019, 10, 20), 12, 8),
  Product("p006", LocalDate.of(2019, 11, 8), 15, 12),
  Product("p007", LocalDate.of(2019, 11, 16), 10, 8.5),
  Product("p008", LocalDate.of(2019, 11, 25), 10, 9)
)

Let’s say we want to compute the monthly total of list price and discounted price of the product list. In the `groupBy/mapValues` way:

// Example 4: groupBy/mapValues
products.groupBy(_.saleDate.getMonth).mapValues(
  _.map(p => (p.listPrice, p.discPrice)).reduce(
    (total, prc) => (total._1 + prc._1, total._2 + prc._2))
)
// res2: scala.collection.immutable.Map[java.time.Month,(Double, Double)] =
//   Map(OCTOBER -> (27.0,20.5), SEPTEMBER -> (32.0,27.5), NOVEMBER -> (35.0,29.5))

Using `groupMapReduce`:

// Example 4: groupMapReduce
products.groupMapReduce(_.saleDate.getMonth)(p => (p.listPrice, p.discPrice))(
  (total, prc) => (total._1 + prc._1, total._2 + prc._2))
)