Knuth-Morris-Pratt Pattern Matching Algorithm

Surya Teja Reddy
4 min readApr 9, 2022

Pattern matching is a way of determining whether a given text matches a data sequence in the given data or series.

A decorative image representing Patterns/Puzzles
By Surya Teja Reddy

Pattern Matching has a wide range of Applications

Few of them are listed below,

  • Plagiarism Detection
  • DNA/RNA Sequencing
  • Grammar Validator
  • Digital Forensics
  • Spam Detection
  • Content searching in search engines and databases.
  • Intrusion Detection.
Various Applications of Pattern Searching.

Different Types of Pattern Matching Algorithms

From a computational point of view, there are many Pattern matching Algorithms,

But very few of them are widely used and learned.

1. Naïve pattern matching Approach/Brute force approach

2. The Boyer-Moore Approach

3. The Knuth-Morris-Pratt Approach

And a few others.

Quick overview of Naïve Pattern Searching Algorithm

The Naïve /Brute force approach is a pattern searching approach where we typically enumerate all possible configurations and pick the matching parts.

For example, we have a text, char [] text, and a pattern char[] pattern,

When we try to find the given string pattern among the text, we run a loop form the initial point to the end of the text given.

We try to match the first character of text with the first of the pattern. If it matches, we advance to the next character. Else we again compare the next character with the first character of the pattern.

The above can be visualized much more easily with the following code.

Working of Naïve Algorithm (Code 1.1)

Knuth-Morris-Pratt Algorithm

After going through the Naïve Approach, we identified a fundamental way of pattern searching, but the algorithm’s worst-case performance is not good. We may deduce from the preceding code that the external loop runs up to n-m+1 times and the second loop is only executed m times, implying that the worst-case running time is O(nm).

Since the running time of Naïve is extensive, we need a better algorithm to reduce the running time.

Knuth-Morris-Pratt Algorithm is a pattern/string matching algorithm, which is faster and more efficient than the Naïve algorithm. In Knuth-Morris-Pratt Algorithm, we make fewer comparisons than Naïve by learning the pattern.

Motivation for the KMP Algorithm

If a character mismatch occurs at a specific place, the pattern can be slid to the second alignment without having to reconfirm the partial match with the prefix, which is one of the motivations for the KMP Algorithm. If the mismatched character is not matched, the common character can be used in the pattern’s next potential alignment.

Motivation for KMP
Motivation for KMP

Working of KMP Algorithm

To get started with KMP Algorithm, we need to create a PI table for the pattern string.

It is made with the help of the failure function. The function specifies the proper iteration shift in the event of a failed comparison. The length of the longest pattern prefix that is the substring pattern’s suffix [1..k] is used to determine the failure function.

Construction of Failure function:

Code for Failure Function (Code 2.1)

We saw how to establish a failure function in the previous code; nevertheless, the core KMP algorithm is based on a while loop, as seen below. Each iteration of the KMP algorithm Compares the text’s character at index J to the pattern’s character at index key. If the result of this comparison is a match, the algorithm continues on to the characters after that in both. If the comparison fails, the algorithm looks up a new candidate character in the pattern and starts over with the next index in the text.

Code of KMP algorithm (Code 2.2)

Complexity Analysis of KMP Algorithm

The KMP algorithm’s execution duration is proportional to the total number of times the while loop is executed. Because j or s increase by at least one or both at each iteration of the while loop, the total number of iterations is at most 2n. The failure function computation algorithm takes O(m) time.

As a result, the Knuth-Morris-Pratt method matches patterns on a text string of length n, and a pattern of length m is O(m+n).

Visualizing String Matching Algorithms

Comparing KMP and Naïve algorithms

From the above image we can see how good the KMP algo is better than Naïve algorithm. KMP is widely used where pattern matching is done in long strings. One such example is DNA matching. Imagine this situation, KMP algorithm will be more productive in this case because there are only 4 repeating alphabet.

I’d love to know what you think about this!

If you enjoyed this article, share it with your friends and colleagues!

--

--