For grouping elements in a Scala collection by a provided key, the de facto method of choice has been groupBy, which has the following signature for an `Iterable`:
// Method groupBy def groupBy[K](f: (A) => K): immutable.Map[K, Iterable[A]]
It returns an immutable Map of elements each consisting of a key and a collection of values of the original type. To process this collection of values in the resulting Map, Scala provides a method mapValues with the below signature:
// Method mapValues def mapValues[W](f: (V) => W): Map[K, W]
This `groupBy/mapValues` combo proves to be handy for processing the values of the Map generated from the grouping. However, as of Scala 2.13, method `mapValues` is no longer available.
groupMap
A new method, groupMap, has emerged for grouping of a collection based on provided functions for defining the keys and values of the resulting Map. Here’s the signature of method groupMap for an `Iterable`:
// Method groupMap def groupMap[K, B](key: (A) => K)(f: (A) => B): immutable.Map[K, Iterable[B]]
Let’s start with a simple example grouping via the good old `groupBy` method:
// Example 1: groupBy
val fruits = List("apple", "apple", "orange", "pear", "pear", "pear")
fruits.groupBy(identity)
// res1: Map[String, List[String]] = Map(
// "orange" -> List("orange"),
// "apple" -> List("apple", "apple"),
// "pear" -> List("pear", "pear", "pear")
// )
We can replace `groupBy` with `groupMap` like below:
// Example 1: groupMap fruits.groupMap(identity)(identity)
In this particular case, the new method doesn’t offer any benefit over the old one.
Let’s look at another example that involves a collection of class objects:
// Example 2
case class Pet(species: String, name: String, age: Int)
val pets = List(
Pet("cat", "sassy", 2), Pet("cat", "bella", 3),
Pet("dog", "poppy", 3), Pet("dog", "bodie", 4), Pet("dog", "poppy", 2),
Pet("bird", "coco", 2), Pet("bird", "kiwi", 1)
)
If we want to list all pet names per species, a `groupBy` coupled with `mapValues` will do:
// Example 2: groupBy
pets.groupBy(_.species).mapValues(_.map(_.name))
// res2: Map[String, List[String]] = Map(
// "cat" -> List("sassy", "bella"),
// "bird" -> List("coco", "kiwi"),
// "dog" -> List("poppy", "bodie", "poppy")
// )
But in this case, `groupMap` can do it with better readability due to the functions for defining the keys and values of the resulting Map being nicely placed side by side as parameters:
// Example 2: groupMap pets.groupMap(_.species)(_.name)
groupMapReduce
At times, we need to perform reduction on the Map values after grouping of a collection. This is when the other new method groupMapReduce comes in handy:
// Method groupMapReduce def groupMapReduce[K, B](key: (A) => K)(f: (A) => B)(reduce: (B, B) => B): immutable.Map[K, B]
Besides the parameters for defining the keys and values of the resulting Map like `groupMap`, `groupMapReduce` also expects an additional parameter in the form of a binary operation for reduction.
Using the same pets example, if we want to compute the count of pets per species, a `groupBy/mapValues` approach will look like below:
// Example 3: groupBy/mapValues
pets.groupBy(_.species).mapValues(_.size)
// res1: Map[String, Int] = Map("cat" -> 2, "bird" -> 2, "dog" -> 3)
With `groupMapReduce`, we can “compartmentalize” the functions for the keys, values and reduction operation separately as follows:
// Example 3: groupMapReduce pets.groupMapReduce(_.species)(_ => 1)(_ + _)
One more example:
// Example 4
import java.time.LocalDate
case class Product(id: String, saleDate: LocalDate, listPrice: Double, discPrice: Double)
val products = List(
Product("p001", LocalDate.of(2019, 9, 11), 10, 8.5),
Product("p002", LocalDate.of(2019, 9, 18), 12, 10),
Product("p003", LocalDate.of(2019, 9, 27), 10, 9),
Product("p004", LocalDate.of(2019, 10, 6), 15, 12.5),
Product("p005", LocalDate.of(2019, 10, 20), 12, 8),
Product("p006", LocalDate.of(2019, 11, 8), 15, 12),
Product("p007", LocalDate.of(2019, 11, 16), 10, 8.5),
Product("p008", LocalDate.of(2019, 11, 25), 10, 9)
)
Let’s say we want to compute the monthly total of list price and discounted price of the product list. In the `groupBy/mapValues` way:
// Example 4: groupBy/mapValues
products.groupBy(_.saleDate.getMonth).mapValues(
_.map(p => (p.listPrice, p.discPrice)).reduce(
(total, prc) => (total._1 + prc._1, total._2 + prc._2))
)
// res2: scala.collection.immutable.Map[java.time.Month,(Double, Double)] =
// Map(OCTOBER -> (27.0,20.5), SEPTEMBER -> (32.0,27.5), NOVEMBER -> (35.0,29.5))
Using `groupMapReduce`:
// Example 4: groupMapReduce products.groupMapReduce(_.saleDate.getMonth)(p => (p.listPrice, p.discPrice))( (total, prc) => (total._1 + prc._1, total._2 + prc._2)) )
