Monthly Archives: July 2018

Patching Numeric Sequence In Scala

Like fetching top N elements from a sequence of comparable elements, patching numeric sequence is also a common need, especially when processing data that isn’t complete or clean. By “patching”, I mean interpolating missing spots in a list of numbers. A simplistic patch or interpolation is to fill a missing number with the average of the previous few numbers.

For example, given the following list of numbers:

60, 10, 50, (), 20, 90, 40, 80, (), (), 70, 30

we would like to replace each of the missing numbers with the average of, say, its previous 3 numbers. In this case, the leftmost missing number should be replace with 40 (i.e. (60 + 10 + 50) / 3).

Below is a simple snippet that patches missing numbers in a Double-type sequence with the average of the previous N numbers. The missing (or bad) numbers in the original sequence are represented as Double.NaN.

As shown in the code, method ‘patchCurrElem’ is created to prepend the calculated average of the previous N numbers to the supplied list. Its signature fits well to be a function taken by ‘foldLeft’ to traverse the entire sequence for applying the patch. Since ‘patchCurrElem’ prepends the sub-sequence for optimal operations in Scala List, the final list requires a reversal.

Note that ‘lastN.size’ rather than the literal ‘N’ is used to handle cases when there is less than N prior numbers available for average calculation. And ‘case Nil’ will cover cases when there is no prior number.

Generalizing the patch method

In deriving a generic method for ‘patchAvgLastN’, We’re not going to generalize it for Scala Numeric, as ‘average’ isn’t quite meaningful for non-fractional numbers such as integers. Instead, we’ll generalize it for Scala Fractional, which provides method ‘mkNumericOps’ for access to FractionalOps that consists of division operator (i.e. ‘/’) necessary for average calculation.

Since we’re no longer handling Double-type number, ‘-999’ is used as the default value (of type Int) to replace Double.Nan as the missing (or bad) number.

Work-arounds for patching integer sequences

A quick work-around for interpolating a list of integers (type Int, Long or BigInt) would be to transform the integers to a Fractional type, apply the patch and transform back to the original type. Note that in some cases, rounding or truncation might occur in the transformations. For example: