Scala’s groupMap And groupMapReduce

For grouping elements in a Scala collection by a provided key, the de facto method of choice has been groupBy, which has the following signature for an `Iterable`:

// Method groupBy
def groupBy[K](f: (A) => K): immutable.Map[K, Iterable[A]]

It returns an immutable Map of elements each consisting of a key and a collection of values of the original type. To process this collection of values in the resulting Map, Scala provides a method mapValues with the below signature:

// Method mapValues
def mapValues[W](f: (V) => W): Map[K, W]

This `groupBy/mapValues` combo proves to be handy for processing the values of the Map generated from the grouping. However, as of Scala 2.13, method `mapValues` is no longer available.

groupMap

A new method, groupMap, has emerged for grouping of a collection based on provided functions for defining the keys and values of the resulting Map. Here’s the signature of method groupMap for an `Iterable`:

// Method groupMap
def groupMap[K, B](key: (A) => K)(f: (A) => B): immutable.Map[K, Iterable[B]]

Let’s start with a simple example grouping via the good old `groupBy` method:

// Example 1: groupBy
val fruits = List("apple", "apple", "orange", "pear", "pear", "pear")

fruits.groupBy(identity)
// res1: Map[String, List[String]] = Map(
//   "orange" -> List("orange"),
//   "apple" -> List("apple", "apple"),
//   "pear" -> List("pear", "pear", "pear")
// )

We can replace `groupBy` with `groupMap` like below:

// Example 1: groupMap
fruits.groupMap(identity)(identity)

In this particular case, the new method doesn’t offer any benefit over the old one.

Let’s look at another example that involves a collection of class objects:

// Example 2
case class Pet(species: String, name: String, age: Int)

val pets = List(
  Pet("cat", "sassy", 2), Pet("cat", "bella", 3), 
  Pet("dog", "poppy", 3), Pet("dog", "bodie", 4), Pet("dog", "poppy", 2), 
  Pet("bird", "coco", 2), Pet("bird", "kiwi", 1)
)

If we want to list all pet names per species, a `groupBy` coupled with `mapValues` will do:

// Example 2: groupBy
pets.groupBy(_.species).mapValues(_.map(_.name))
// res2: Map[String, List[String]] = Map(
//   "cat" -> List("sassy", "bella"),
//   "bird" -> List("coco", "kiwi"),
//   "dog" -> List("poppy", "bodie", "poppy")
// )

But in this case, `groupMap` can do it with better readability due to the functions for defining the keys and values of the resulting Map being nicely placed side by side as parameters:

// Example 2: groupMap
pets.groupMap(_.species)(_.name)

groupMapReduce

At times, we need to perform reduction on the Map values after grouping of a collection. This is when the other new method groupMapReduce comes in handy:

// Method groupMapReduce
def groupMapReduce[K, B](key: (A) => K)(f: (A) => B)(reduce: (B, B) => B): immutable.Map[K, B]

Besides the parameters for defining the keys and values of the resulting Map like `groupMap`, `groupMapReduce` also expects an additional parameter in the form of a binary operation for reduction.

Using the same pets example, if we want to compute the count of pets per species, a `groupBy/mapValues` approach will look like below:

// Example 3: groupBy/mapValues
pets.groupBy(_.species).mapValues(_.size)
// res1: Map[String, Int] = Map("cat" -> 2, "bird" -> 2, "dog" -> 3)

With `groupMapReduce`, we can “compartmentalize” the functions for the keys, values and reduction operation separately as follows:

// Example 3: groupMapReduce
pets.groupMapReduce(_.species)(_ => 1)(_ + _)

One more example:

// Example 4
import java.time.LocalDate
case class Product(id: String, saleDate: LocalDate, listPrice: Double, discPrice: Double)

val products = List(
  Product("p001", LocalDate.of(2019, 9, 11), 10, 8.5),
  Product("p002", LocalDate.of(2019, 9, 18), 12, 10),
  Product("p003", LocalDate.of(2019, 9, 27), 10, 9),
  Product("p004", LocalDate.of(2019, 10, 6), 15, 12.5),
  Product("p005", LocalDate.of(2019, 10, 20), 12, 8),
  Product("p006", LocalDate.of(2019, 11, 8), 15, 12),
  Product("p007", LocalDate.of(2019, 11, 16), 10, 8.5),
  Product("p008", LocalDate.of(2019, 11, 25), 10, 9)
)

Let’s say we want to compute the monthly total of list price and discounted price of the product list. In the `groupBy/mapValues` way:

// Example 4: groupBy/mapValues
products.groupBy(_.saleDate.getMonth).mapValues(
  _.map(p => (p.listPrice, p.discPrice)).reduce(
    (total, prc) => (total._1 + prc._1, total._2 + prc._2))
)
// res2: scala.collection.immutable.Map[java.time.Month,(Double, Double)] =
//   Map(OCTOBER -> (27.0,20.5), SEPTEMBER -> (32.0,27.5), NOVEMBER -> (35.0,29.5))

Using `groupMapReduce`:

// Example 4: groupMapReduce
products.groupMapReduce(_.saleDate.getMonth)(p => (p.listPrice, p.discPrice))(
  (total, prc) => (total._1 + prc._1, total._2 + prc._2))
)

2 thoughts on “Scala’s groupMap And groupMapReduce

  1. Jim Newton

    It is interesting when some bizarre piece of code turns out to be a well established pattern.
    I was able to refactor a piece of code which was creating a huge amount of GC pressure into
    a single call to groupMapReduce. The code is much shorter, and the GC pressure was eliminated.

    Reply
    1. Leo Cheung Post author

      Thanks for the comment Jim. This discussion thread about issues re: Scala groupBy (i.e. the “early” return of a `Map` and common use case of having to re-transform the returned `Map` with `groupBy/mapValues`) may or may not be exactly the performance problem that prompted your refactoring work. Nonetheless, it appears to have motivated the creation of methods `groupMap` and `groupMapReduce`.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *