For grouping elements in a Scala collection by a provided key, the de facto method of choice has been groupBy, which has the following signature for an Iterable
:
1 2 |
// Method groupBy def groupBy[K](f: (A) => K): immutable.Map[K, Iterable[A]] |
It returns an immutable Map of elements each consisting of a key and a collection of values of the original type. To process this collection of values in the resulting Map, Scala provides a method mapValues with the below signature:
1 2 |
// Method mapValues def mapValues[W](f: (V) => W): Map[K, W] |
This groupBy/mapValues
combo proves to be handy for processing the values of the Map generated from the grouping. However, as of Scala 2.13, method mapValues
is no longer available.
groupMap
A new method, groupMap, has emerged for grouping of a collection based on provided functions for defining the keys and values of the resulting Map. Here’s the signature of method groupMap for an Iterable
:
1 2 |
// Method groupMap def groupMap[K, B](key: (A) => K)(f: (A) => B): immutable.Map[K, Iterable[B]] |
Let’s start with a simple example grouping via the good old groupBy
method:
1 2 3 4 5 6 7 8 9 |
// Example 1: groupBy val fruits = List("apple", "apple", "orange", "pear", "pear", "pear") fruits.groupBy(identity) // res1: Map[String, List[String]] = Map( // "orange" -> List("orange"), // "apple" -> List("apple", "apple"), // "pear" -> List("pear", "pear", "pear") // ) |
We can replace groupBy
with groupMap
like below:
1 2 |
// Example 1: groupMap fruits.groupMap(identity)(identity) |
In this particular case, the new method doesn’t offer any benefit over the old one.
Let’s look at another example that involves a collection of class objects:
1 2 3 4 5 6 7 8 |
// Example 2 case class Pet(species: String, name: String, age: Int) val pets = List( Pet("cat", "sassy", 2), Pet("cat", "bella", 3), Pet("dog", "poppy", 3), Pet("dog", "bodie", 4), Pet("dog", "poppy", 2), Pet("bird", "coco", 2), Pet("bird", "kiwi", 1) ) |
If we want to list all pet names per species, a groupBy
coupled with mapValues
will do:
1 2 3 4 5 6 7 |
// Example 2: groupBy pets.groupBy(_.species).mapValues(_.map(_.name)) // res2: Map[String, List[String]] = Map( // "cat" -> List("sassy", "bella"), // "bird" -> List("coco", "kiwi"), // "dog" -> List("poppy", "bodie", "poppy") // ) |
But in this case, groupMap
can do it with better readability due to the functions for defining the keys and values of the resulting Map being nicely placed side by side as parameters:
1 2 |
// Example 2: groupMap pets.groupMap(_.species)(_.name) |
groupMapReduce
At times, we need to perform reduction on the Map values after grouping of a collection. This is when the other new method groupMapReduce comes in handy:
1 2 |
// Method groupMapReduce def groupMapReduce[K, B](key: (A) => K)(f: (A) => B)(reduce: (B, B) => B): immutable.Map[K, B] |
Besides the parameters for defining the keys and values of the resulting Map like groupMap
, groupMapReduce
also expects an additional parameter in the form of a binary operation for reduction.
Using the same pets example, if we want to compute the count of pets per species, a groupBy/mapValues
approach will look like below:
1 2 3 |
// Example 3: groupBy/mapValues pets.groupBy(_.species).mapValues(_.size) // res1: Map[String, Int] = Map("cat" -> 2, "bird" -> 2, "dog" -> 3) |
With groupMapReduce
, we can “compartmentalize” the functions for the keys, values and reduction operation separately as follows:
1 2 |
// Example 3: groupMapReduce pets.groupMapReduce(_.species)(_ => 1)(_ + _) |
One more example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
// Example 4 import java.time.LocalDate case class Product(id: String, saleDate: LocalDate, listPrice: Double, discPrice: Double) val products = List( Product("p001", LocalDate.of(2019, 9, 11), 10, 8.5), Product("p002", LocalDate.of(2019, 9, 18), 12, 10), Product("p003", LocalDate.of(2019, 9, 27), 10, 9), Product("p004", LocalDate.of(2019, 10, 6), 15, 12.5), Product("p005", LocalDate.of(2019, 10, 20), 12, 8), Product("p006", LocalDate.of(2019, 11, 8), 15, 12), Product("p007", LocalDate.of(2019, 11, 16), 10, 8.5), Product("p008", LocalDate.of(2019, 11, 25), 10, 9) ) |
Let’s say we want to compute the monthly total of list price and discounted price of the product list. In the groupBy/mapValues
way:
1 2 3 4 5 6 7 |
// Example 4: groupBy/mapValues products.groupBy(_.saleDate.getMonth).mapValues( _.map(p => (p.listPrice, p.discPrice)).reduce( (total, prc) => (total._1 + prc._1, total._2 + prc._2)) ) // res2: scala.collection.immutable.Map[java.time.Month,(Double, Double)] = // Map(OCTOBER -> (27.0,20.5), SEPTEMBER -> (32.0,27.5), NOVEMBER -> (35.0,29.5)) |
Using groupMapReduce
:
1 2 3 4 |
// Example 4: groupMapReduce products.groupMapReduce(_.saleDate.getMonth)(p => (p.listPrice, p.discPrice))( (total, prc) => (total._1 + prc._1, total._2 + prc._2)) ) |