Tag Archives: pattern matching

String Pattern Matching Implementation

Regardless of the programming platform one is on, string pattern matching is often needed in various use cases, particularly as a filtering condition for data retrieval. Regex, short for regular expression, has been one of the most common and versatile string pattern matching methods. Then, there is also the more simplistic wildcard string pattern style.

In this blog post, we’re going to implement in Scala the wildcard pattern matching as well as a minified version of regex pattern matching. On top of that, we’ll also come up with their dynamic programming version by means of memoization for better performance.

Recursion will be the first approach to implementing the string pattern matching methods. Why? That’s because algorithmically it’s probably the crispest way to reason about the implementation logic. There is also the benefit that in many cases the more performant dynamic programming variants can be formulated based on the recursion logic.

Wildcard pattern matching via recursion

Wildcard pattern matching consists of two special characters ? and *, with ? matching any single character and * matching zero or more characters.

For example, wildcard pattern “a*b?c” would match “axybzc” and “abbc”, but not “axyzbc” or “aczc”.

// Wildcard pattern matching implementation via recursion in Scala
def wildcardMatch(s: String, p: String): Boolean = {
  if (s.isEmpty)
    p.isEmpty || p.forall(_ == '*')
  else {
    if (p.isEmpty)
      false
    else {
      if (p(0) == '*')
        wildcardMatch(s, p.substring(1)) || wildcardMatch(s.substring(1), p)
      else
        (p(0) == '?' || p(0) == s(0)) &&
          wildcardMatch(s.substring(1), p.substring(1))
    }
  }
}
// wildcardMatch("aabbbcddddd", "a*bcdd?dd") // true
// wildcardMatch("aabbbcddddd", "a*cdd?d") // false

As with any recursive code, we first come up with the “exit” strategy which in this case is that when the iterative parsing of the input string s is exhausted, the iterative interpreting of the wildcare pattern p must be either also exhausted or left with only * characters.

On the core recursion logic that evaluates pattern matching character *:

wildcardMatch(s, p.substring(1)) || wildcardMatch(s.substring(1), p)

if the current character of the pattern string p being iteratively parsed is * the algorithm will make sure — in a recursive manner — either it consumes the wildcare character to match zero character from the current character of the input string s, or it matches the current character of s and continues to use the wildcare character to evaluate for the next character of s.

As for the evaluation of pattern matching character ?:

(p(0) == '?' || p(0) == s(0)) &&
  wildcardMatch(s.substring(1), p.substring(1))

if the current character of the pattern string p being parsed is ?, it is no different from evaluating whether the current character of p is the same as that of s and the remaining substrings of p and s still match.

Wildcard pattern matching via dynamic programming

Despite the vast difference in approach between recursion and dynamic programming, it’s interesting that how the programming logic being formulated in the recursive version can be “borrowed” for the dynamic programming version. If one looks closely at the following dynamic programming implementation, its logic resembles a great deal of the recursive version’s.

// Wildcard pattern matching implementation via dynamic programming in Scala
def wildcardMatchDP(s: String, p: String): Boolean = {
  val n = s.length
  val m = p.length
  val dp: Array[Array[Boolean]] = Array.fill(n+1, m+1)(false)
  dp(0)(0) = true
  for (j <- 1 to m) {
    if (p(j-1) == '*')
      dp(0)(j) = dp(0)(j-1)
  }
  for (i <- 1 to n) {
    for (j <- 1 to m) {
      if (p(j-1) == '*')
        dp(i)(j) = dp(i)(j-1) || dp(i-1)(j)
      else
        dp(i)(j) = (p(j-1) == '?' || p(j-1) == s(i-1)) && dp(i-1)(j-1)
    }
  }
  dp(n)(m)
}
// wildcardMatchDP("aabbbcddddd", "a*bcdd?dd") // true
// wildcardMatchDP("aabbbcddddd", "a*cdd?d") // false

To use the programming logic from the recursive version for dynamic programing, we maintain a 2-dimensional array dp (with dimension n+1 x m+1, where n is length of s and m is length of p) and have dp(i)(j) represent whether we have a wildcard match with only the i right-most characters in input string s and j right-most characters in pattern string p. For instance, dp(0)(1) would represent whether we have a match with an empty s and a p with only its right-most character.

The exit strategy in the recursive code that says “when the iterative parsing of s finishes, either the parsing of p is also finished or the remaining characters in p must all be *” now becomes the initialization of array dp:

dp(0)(0) = true
for (j <- 1 to m) {
  if (p(j - 1) == '*')
    dp(0)(j) = dp(0)(j - 1)
}

For the core programming logic implementation, we iteratively build, bottom up, from the initialized array by computing dp(i)(j) values with the same reasoning of how the * and ? wildcard matching works:

for (i <- 1 to n) {
  for (j <- 1 to m) {
    if (p(j-1) == '*')
      dp(i)(j) = dp(i)(j-1) || dp(i-1)(j)
    else
      dp(i)(j) = (p(j-1) == '?' || p(j-1) == s(i-1)) && dp(i-1)(j-1)
  }
}
dp(n)(m)

It should be noted that one could come up similar programming logic for the dynamic programming implementation without referencing the recursive version at all. The above exercise is just an interesting showcase that two seemingly disparate programming methods could do their job using the same logic.

Performance wise, obviously the dynamic programming version with a time complexity of O(nm) is significantly better than the recursive version which is at the scale of O((n+m)2^(n+m)).

Regex pattern matching via recursion

Regex is a comprehensive pattern matching toolset equipped with a suite of versatile methods. We’ll be implementing only two of the most commonly used patterns . and *, with . matching any single character and * matching zero or more contiguous occurrences of the preceding character. In case the character preceding * is ., it’ll match zero or any combinations of characters.

For example, regex pattern “ab.c” would match “aaabzc” and “abbc”, but not “axybzc”. And “a.b” would match “axyzb” and “ab”.

// Regex pattern matching implementation via recursion in Scala
def regexMatch(s: String, p: String): Boolean = {
  def matchOne: Boolean = p(0) == '.' || p(0) == s(0)
  if (p.isEmpty)
    s.isEmpty
  else {
    if (p.length > 1 && p(1) == '*')
      regexMatch(s, p.substring(2)) || (matchOne && regexMatch(s.substring(1), p))
    else
      matchOne && regexMatch(s.substring(1), p.substring(1))
  }
}
// regexMatch("aabbbcddddd", "a.*bcdd.dd") // true
// regexMatch("aabbbcddddd", "a*bcdd.dd") // false
// regexMatch("aabbbcddddd", "a.*bcd*") // true

Similar to the recursive wildcard pattern matching implementation, we come up with the exist strategy that the iterative parsing of the input string s and the iteratively interpreting of the regex pattern p must be exhausted simultaneously.

Like wildcard’s ?, regex’s . has exactly the same matching pattern. Since pattern matching a single character will be used more than once (as can be seen shortly), we create a simple method matchOne:

def matchOne: Boolean = p(0) == '.' || p(0) == s(0)

Contrary to iterating s in the main recursive loop in the wildcard matching implementation, we iterate p for the convenience of evaluating regex’s * that always works with its preceding character — hence the need for singling out the case of p.length > 1.

if (p.length > 1 && p(1) == '*')
  regexMatch(s, p.substring(2)) || (matchOne && regexMatch(s.substring(1), p))
else
  matchOne && regexMatch(s.substring(1), p.substring(1))

Here, the main recursive logic is similar to that of the wildcard implementation, except that the character preceding * is now an integral element for the * pattern matching.

Regex pattern matching via dynamic programming

Next, we come up with the dynamic programming version by “borrowing” the programming logic, again, from the recursive version:

// Regex pattern matching implementation using dynamic programming in Scala
def regexMatchDP(s: String, p: String): Boolean = {
  val n = s.length
  val m = p.length
  val dp: Array[Array[Boolean]] = Array.fill(n+1, m+1)(false)
  dp(0)(0) = true
  for (i <- 1 to n) {
    for (j <- 1 to m) {
      val matchOne = p(j-1) == '.' || p(j-1) == s(i-1)
      if (j > 1 && p(j-2) == '*')
        dp(i)(j) = dp(i)(j-2) || (matchOne && dp(i-1)(j))
      else {
        if (p(j-1) == '*')
          dp(i)(j) = dp(i)(j-1) || dp(i-1)(j)
        else
          dp(i)(j) = matchOne && dp(i-1)(j-1)
      }
    }
  }
  dp(n)(m)
}
// regexMatchDP("aabbbcddddd", "a.*bcdd.dd") // true
// regexMatchDP("aabbbcddddd", "a*bcdd.dd") // false
// regexMatchDP("aabbbcddddd", "a.*bcd*") // true

Again, we maintain a 2-dimensional array dp (with dimension n+1 x m+1, where n is length of s and m is length of p) and have dp(i)(j) represent whether we have a regex match with only the i right-most characters in input string s and j right-most characters in pattern string p.

The exit strategy in the recursive code becomes the simple initialization of array dp:

dp(0)(0) = true

And, similar to the wildward implementation, the core programming logic that resembles the recursive version’s is carried out by iteratively computing dp(i)(j) of incremental indices starting from dp(0)(0):

for (i <- 1 to n) {
  for (j <- 1 to m) {
    val matchOne = p(j-1) == '.' || p(j-1) == s(i-1)
    if (j > 1 && p(j-2) == '*')
      dp(i)(j) = dp(i)(j-2) || (matchOne && dp(i-1)(j))
    else {
      if (p(j-1) == '*')  // <--- additional logic needed for DP
        dp(i)(j) = dp(i)(j-1) || dp(i-1)(j)
      else
        dp(i)(j) = matchOne && dp(i-1)(j-1)
    }
  }
}
dp(n)(m)

As noted in the extracted snippet though, there is a conditional logic that needs to be explicitly included in the dynamic programming version to handle the case of having * as pattern string p‘s last character. Since j stops at m, the p(j-2) == '*' condition will never get evaluated if the last pattern string character p(m-1) is *.

Implementing Linked List In Scala

In Scala, if you wonder why its standard library doesn’t come with a data structure called LinkedList, you may have overlooked. The collection List is in fact a linked list — although it often appears in the form of a Seq or Vector collection rather than the generally “mysterious” linked list that exposes its “head” with a hidden “tail” to be revealed only iteratively.

Our ADT: LinkedNode

Perhaps because of its simplicity and dynamicity as a data structure, implementation of linked list remains a popular coding exercise. To implement our own linked list, let’s start with a barebone ADT (algebraic data structure) as follows:

If you’re familiar with Scala List, you’ll probably notice that our ADT resembles List and its subclasses Cons (i.e. ::) and Nil (see source code):

Expanding LinkedNode

Let’s expand trait LinkedNode to create class methods insertNode/deleteNode at a given index for inserting/deleting a node, toList for extracting contained elements into a List collection, and toString for display:

Note that LinkedNode is made covariant. In addition, method insertNode has type A as its lower type bound because Function1 is contravariant over its parameter type.

Recursion and pattern matching

A couple of notes on the approach we implement our class methods with:

  1. We use recursive functions to avoid using of mutable variables. They should be made tail-recursive for optimal performance, but that isn’t the focus of this implementation. If performance is a priority, using conventional while-loops with mutable class fields elem/next would be a more practical option.
  2. Pattern matching is routinely used for handling cases of Node versus EmptyNode. An alternative approach would be to define fields elem and next in the base trait and implement class methods accordingly within Node and EmptyNode.

Finding first/last matching nodes

Next, we add a couple of class methods for finding first/last matching nodes.

Reversing a linked list by groups of nodes

Reversing a given LinkedNode can be accomplished via recursion by cumulatively wrapping the element of each Node in a new Node with its next pointer set to the Node created in the previous iteration.

Method reverseK reverses a LinkedNode by groups of k elements using a different approach that extracts the elements into groups of k elements, reverses the elements in each of the groups and re-wraps each of the flattened elements in a Node.

Using LinkedNode as a Stack

For LinkedNode to serve as a Stack, we include simple methods push and pop as follows:

Addendum: implementing map, flatMap, fold

From a different perspective, if we view LinkedNode as a collection like a Scala List or Vector, we might be craving for methods like map, flatMap, fold. Using the same approach of recursion along with pattern matching, it’s rather straight forward to crank them out.

Putting everything together

Along with a few additional simple class methods and a factory method wrapped in LinkedNode‘s companion object, below is the final LinkedNode ADT that includes everything described above.

A cursory test-run …