Like fetching top N elements from a sequence of comparable elements, patching numeric sequence is also a common need, especially when processing data that isn’t complete or clean. By “patching”, I mean interpolating missing spots in a list of numbers. A simplistic patch or interpolation is to fill a missing number with the average of the previous few numbers.
For example, given the following list of numbers:
60, 10, 50, (), 20, 90, 40, 80, (), (), 70, 30
we would like to replace each of the missing numbers with the average of, say, its previous 3 numbers. In this case, the leftmost missing number should be replace with 40 (i.e. (60 + 10 + 50) / 3).
Below is a simple snippet that patches missing numbers in a Double-type sequence with the average of the previous N numbers. The missing (or bad) numbers in the original sequence are represented as Double.NaN.
// Patching Double-type numbers with average of previous N numbers def patchAvgLastN(n: Int)(list: List[Double]) = { def patchCurrElem(i: Int)(l: List[Double], e: Double) = { val avgLastN = if (e.isNaN) l match { case Nil => 0.0 case _ => val lastN = l.take(i) lastN.sum / lastN.size } else e avgLastN :: l } list.foldLeft( List[Double]() )( patchCurrElem(n) ).reverse } val list: List[Double] = List(60, 10, 50, Double.NaN, 20, 90, 40, 80, Double.NaN, Double.NaN, 70, 30) patchAvgLastN(3)(list) // res1: List[Double] = List(60.0, 10.0, 50.0, 40.0, 20.0, 90.0, 40.0, 80.0, 70.0, 63.3, 70.0, 30.0)
As shown in the code, method ‘patchCurrElem’ is created to prepend the calculated average of the previous N numbers to the supplied list. Its signature fits well to be a function taken by ‘foldLeft’ to traverse the entire sequence for applying the patch. Since ‘patchCurrElem’ prepends the sub-sequence for optimal operations in Scala List, the final list requires a reversal.
Note that ‘lastN.size’ rather than the literal ‘N’ is used to handle cases when there is less than N prior numbers available for average calculation. And ‘case Nil’ will cover cases when there is no prior number.
Generalizing the patch method
In deriving a generic method for ‘patchAvgLastN’, We’re not going to generalize it for Scala Numeric, as ‘average’ isn’t quite meaningful for non-fractional numbers such as integers. Instead, we’ll generalize it for Scala Fractional, which provides method ‘mkNumericOps’ for access to FractionalOps that consists of division operator (i.e. ‘/’) necessary for average calculation.
Since we’re no longer handling Double-type number, ‘-999’ is used as the default value (of type Int) to replace Double.Nan as the missing (or bad) number.
// Patching sequence of generic numbers with average of previous N numbers def patchAvgLastN[T](n: Int, badNum: Int = -999)(list: List[T])(implicit num: Fractional[T]) = { def patchCurrElem(i: Int)(l: List[T], e: T) = { import num.mkNumericOps val unk = num.fromInt(badNum) val avgLastN = if (e == unk) l match { case Nil => num.zero case _ => val lastN = l.take(i) lastN.reduce(_ + _) / num.fromInt(lastN.size) } else e avgLastN :: l } list.foldLeft( List[T]() )( patchCurrElem(n) ).reverse } val list: List[Double] = List(60, 10, 50, -999, 20, 90, 40, 80, -999, -999, 70, 30) patchAvgLastN[Double](3)(list) // res1: List[Double] = List(60.0, 10.0, 50.0, 40.0, 20.0, 90.0, 40.0, 80.0, 70.0, 63.3, 70.0, 30.0) val list: List[Float] = List(60, 10, 50, -999, 20, 90, 40, 80, -999, -999, 70, 30) patchAvgLastN[Float](3)(list) // res2: List[Float] = List(60.0, 10.0, 50.0, 40.0, 20.0, 90.0, 40.0, 80.0, 70.0, 63.3, 70.0, 30.0) val list: List[BigDecimal] = List(60, 10, 50, -999, 20, 90, 40, 80, -999, -999, 70, 30) patchAvgLastN[BigDecimal](3)(list) // res3: List[BigDecimal] = List(60, 10, 50, 40, 20, 90, 40, 80, 70, 63.33, 70, 30)
Work-arounds for patching integer sequences
A quick work-around for interpolating a list of integers (type Int, Long or BigInt) would be to transform the integers to a Fractional type, apply the patch and transform back to the original type. Note that in some cases, rounding or truncation might occur in the transformations. For example:
// A work-around for patching integer sequences val list: List[Int] = List(60, 10, 50, -999, 20, 90, 40, 80, -999, -999, 70, 30) patchAvgLastN[Double](3)(list.map(_.toDouble)).map(_.toInt) // res1: List[Int] = List(60, 10, 50, 40, 20, 90, 40, 80, 70, 63, 70, 30) val list: List[Long] = List(60, 10, 50, -999, 20, 90, 40, 80, -999, -999, 70, 30) patchAvgLastN[Double](3)(list.map(_.toDouble)).map(_.toLong) // res2: List[Long] = List(60, 10, 50, 40, 20, 90, 40, 80, 70, 63, 70, 30) val list: List[BigInt] = List(60, 10, 50, -999, 20, 90, 40, 80, -999, -999, 70, 30) patchAvgLastN[BigDecimal](3)(list.map(BigDecimal(_))).map(_.toBigInt) // res3: List[scala.math.BigInt] = List(60, 10, 50, 40, 20, 90, 40, 80, 70, 63, 70, 30)