For grouping elements in a Scala collection by a provided key, the de facto method of choice has been groupBy, which has the following signature for an `Iterable`:
// Method groupBy def groupBy[K](f: (A) => K): immutable.Map[K, Iterable[A]]
It returns an immutable Map of elements each consisting of a key and a collection of values of the original type. To process this collection of values in the resulting Map, Scala provides a method mapValues with the below signature:
// Method mapValues def mapValues[W](f: (V) => W): Map[K, W]
This `groupBy/mapValues` combo proves to be handy for processing the values of the Map generated from the grouping. However, as of Scala 2.13, method `mapValues` is no longer available.
groupMap
A new method, groupMap, has emerged for grouping of a collection based on provided functions for defining the keys and values of the resulting Map. Here’s the signature of method groupMap for an `Iterable`:
// Method groupMap def groupMap[K, B](key: (A) => K)(f: (A) => B): immutable.Map[K, Iterable[B]]
Let’s start with a simple example grouping via the good old `groupBy` method:
// Example 1: groupBy val fruits = List("apple", "apple", "orange", "pear", "pear", "pear") fruits.groupBy(identity) // res1: Map[String, List[String]] = Map( // "orange" -> List("orange"), // "apple" -> List("apple", "apple"), // "pear" -> List("pear", "pear", "pear") // )
We can replace `groupBy` with `groupMap` like below:
// Example 1: groupMap fruits.groupMap(identity)(identity)
In this particular case, the new method doesn’t offer any benefit over the old one.
Let’s look at another example that involves a collection of class objects:
// Example 2 case class Pet(species: String, name: String, age: Int) val pets = List( Pet("cat", "sassy", 2), Pet("cat", "bella", 3), Pet("dog", "poppy", 3), Pet("dog", "bodie", 4), Pet("dog", "poppy", 2), Pet("bird", "coco", 2), Pet("bird", "kiwi", 1) )
If we want to list all pet names per species, a `groupBy` coupled with `mapValues` will do:
// Example 2: groupBy pets.groupBy(_.species).mapValues(_.map(_.name)) // res2: Map[String, List[String]] = Map( // "cat" -> List("sassy", "bella"), // "bird" -> List("coco", "kiwi"), // "dog" -> List("poppy", "bodie", "poppy") // )
But in this case, `groupMap` can do it with better readability due to the functions for defining the keys and values of the resulting Map being nicely placed side by side as parameters:
// Example 2: groupMap pets.groupMap(_.species)(_.name)
groupMapReduce
At times, we need to perform reduction on the Map values after grouping of a collection. This is when the other new method groupMapReduce comes in handy:
// Method groupMapReduce def groupMapReduce[K, B](key: (A) => K)(f: (A) => B)(reduce: (B, B) => B): immutable.Map[K, B]
Besides the parameters for defining the keys and values of the resulting Map like `groupMap`, `groupMapReduce` also expects an additional parameter in the form of a binary operation for reduction.
Using the same pets example, if we want to compute the count of pets per species, a `groupBy/mapValues` approach will look like below:
// Example 3: groupBy/mapValues pets.groupBy(_.species).mapValues(_.size) // res1: Map[String, Int] = Map("cat" -> 2, "bird" -> 2, "dog" -> 3)
With `groupMapReduce`, we can “compartmentalize” the functions for the keys, values and reduction operation separately as follows:
// Example 3: groupMapReduce pets.groupMapReduce(_.species)(_ => 1)(_ + _)
One more example:
// Example 4 import java.time.LocalDate case class Product(id: String, saleDate: LocalDate, listPrice: Double, discPrice: Double) val products = List( Product("p001", LocalDate.of(2019, 9, 11), 10, 8.5), Product("p002", LocalDate.of(2019, 9, 18), 12, 10), Product("p003", LocalDate.of(2019, 9, 27), 10, 9), Product("p004", LocalDate.of(2019, 10, 6), 15, 12.5), Product("p005", LocalDate.of(2019, 10, 20), 12, 8), Product("p006", LocalDate.of(2019, 11, 8), 15, 12), Product("p007", LocalDate.of(2019, 11, 16), 10, 8.5), Product("p008", LocalDate.of(2019, 11, 25), 10, 9) )
Let’s say we want to compute the monthly total of list price and discounted price of the product list. In the `groupBy/mapValues` way:
// Example 4: groupBy/mapValues products.groupBy(_.saleDate.getMonth).mapValues( _.map(p => (p.listPrice, p.discPrice)).reduce( (total, prc) => (total._1 + prc._1, total._2 + prc._2)) ) // res2: scala.collection.immutable.Map[java.time.Month,(Double, Double)] = // Map(OCTOBER -> (27.0,20.5), SEPTEMBER -> (32.0,27.5), NOVEMBER -> (35.0,29.5))
Using `groupMapReduce`:
// Example 4: groupMapReduce products.groupMapReduce(_.saleDate.getMonth)(p => (p.listPrice, p.discPrice))( (total, prc) => (total._1 + prc._1, total._2 + prc._2)) )