Monthly Archives: July 2018

Patching Numeric Sequence In Scala

Like fetching top N elements from a sequence of comparable elements, patching numeric sequence is also a common need, especially when processing data that isn’t complete or clean. By “patching”, I mean interpolating missing spots in a list of numbers. A simplistic patch or interpolation is to fill a missing number with the average of the previous few numbers.

For example, given the following list of numbers:

60, 10, 50, (), 20, 90, 40, 80, (), (), 70, 30

we would like to replace each of the missing numbers with the average of, say, its previous 3 numbers. In this case, the leftmost missing number should be replace with 40 (i.e. (60 + 10 + 50) / 3).

Below is a simple snippet that patches missing numbers in a Double-type sequence with the average of the previous N numbers. The missing (or bad) numbers in the original sequence are represented as Double.NaN.

// Patching Double-type numbers with average of previous N numbers

def patchAvgLastN(n: Int)(list: List[Double]) = {
  def patchCurrElem(i: Int)(l: List[Double], e: Double) = {
    val avgLastN =
      if (e.isNaN) l match {
        case Nil => 0.0
        case _ =>
          val lastN = l.take(i)
          lastN.sum / lastN.size
      } else e
    avgLastN :: l
  }

  list.foldLeft( List[Double]() )( patchCurrElem(n) ).reverse
}

val list: List[Double] = List(60, 10, 50, Double.NaN, 20, 90, 40, 80, Double.NaN, Double.NaN, 70, 30)

patchAvgLastN(3)(list)
// res1: List[Double] = List(60.0, 10.0, 50.0, 40.0, 20.0, 90.0, 40.0, 80.0, 70.0, 63.3, 70.0, 30.0)

As shown in the code, method ‘patchCurrElem’ is created to prepend the calculated average of the previous N numbers to the supplied list. Its signature fits well to be a function taken by ‘foldLeft’ to traverse the entire sequence for applying the patch. Since ‘patchCurrElem’ prepends the sub-sequence for optimal operations in Scala List, the final list requires a reversal.

Note that ‘lastN.size’ rather than the literal ‘N’ is used to handle cases when there is less than N prior numbers available for average calculation. And ‘case Nil’ will cover cases when there is no prior number.

Generalizing the patch method

In deriving a generic method for ‘patchAvgLastN’, We’re not going to generalize it for Scala Numeric, as ‘average’ isn’t quite meaningful for non-fractional numbers such as integers. Instead, we’ll generalize it for Scala Fractional, which provides method ‘mkNumericOps’ for access to FractionalOps that consists of division operator (i.e. ‘/’) necessary for average calculation.

Since we’re no longer handling Double-type number, ‘-999’ is used as the default value (of type Int) to replace Double.Nan as the missing (or bad) number.

// Patching sequence of generic numbers with average of previous N numbers

def patchAvgLastN[T](n: Int, badNum: Int = -999)(list: List[T])(implicit num: Fractional[T]) = {
  def patchCurrElem(i: Int)(l: List[T], e: T) = {
    import num.mkNumericOps
    val unk = num.fromInt(badNum)
    val avgLastN =
      if (e == unk) l match {
        case Nil => num.zero
        case _ =>
          val lastN = l.take(i)
          lastN.reduce(_ + _) / num.fromInt(lastN.size)
      } else e
    avgLastN :: l
  }

  list.foldLeft( List[T]() )( patchCurrElem(n) ).reverse
}

val list: List[Double] = List(60, 10, 50, -999, 20, 90, 40, 80, -999, -999, 70, 30)
patchAvgLastN[Double](3)(list)
// res1: List[Double] = List(60.0, 10.0, 50.0, 40.0, 20.0, 90.0, 40.0, 80.0, 70.0, 63.3, 70.0, 30.0)

val list: List[Float] = List(60, 10, 50, -999, 20, 90, 40, 80, -999, -999, 70, 30)
patchAvgLastN[Float](3)(list)
// res2: List[Float] = List(60.0, 10.0, 50.0, 40.0, 20.0, 90.0, 40.0, 80.0, 70.0, 63.3, 70.0, 30.0)

val list: List[BigDecimal] = List(60, 10, 50, -999, 20, 90, 40, 80, -999, -999, 70, 30)
patchAvgLastN[BigDecimal](3)(list)
// res3: List[BigDecimal] = List(60, 10, 50, 40, 20, 90, 40, 80, 70, 63.33, 70, 30)

Work-arounds for patching integer sequences

A quick work-around for interpolating a list of integers (type Int, Long or BigInt) would be to transform the integers to a Fractional type, apply the patch and transform back to the original type. Note that in some cases, rounding or truncation might occur in the transformations. For example:

// A work-around for patching integer sequences

val list: List[Int] = List(60, 10, 50, -999, 20, 90, 40, 80, -999, -999, 70, 30)
patchAvgLastN[Double](3)(list.map(_.toDouble)).map(_.toInt)
// res1: List[Int] = List(60, 10, 50, 40, 20, 90, 40, 80, 70, 63, 70, 30)

val list: List[Long] = List(60, 10, 50, -999, 20, 90, 40, 80, -999, -999, 70, 30)
patchAvgLastN[Double](3)(list.map(_.toDouble)).map(_.toLong)
// res2: List[Long] = List(60, 10, 50, 40, 20, 90, 40, 80, 70, 63, 70, 30)

val list: List[BigInt] = List(60, 10, 50, -999, 20, 90, 40, 80, -999, -999, 70, 30)
patchAvgLastN[BigDecimal](3)(list.map(BigDecimal(_))).map(_.toBigInt)
// res3: List[scala.math.BigInt] = List(60, 10, 50, 40, 20, 90, 40, 80, 70, 63, 70, 30)