The Knuth-Morris-Pratt (KMP)Algorithm - Implement strStr()

ZeeshanAli-0704 - Aug 14 '22 - - Dev Community

The provided code implements the Knuth-Morris-Pratt (KMP) algorithm, which is used for substring search. The KMP algorithm is efficient for finding occurrences of a "needle" (substring) in a "haystack" (main string) by preprocessing the needle to create a prefix table (also known as the Longest Prefix Suffix or LPS table) that helps in skipping unnecessary comparisons.

Here’s a breakdown of the code:

Main Function: strStr

const allPointers = [];
var strStr = function (haystack, needle) {
  if (needle === "") return 0; // If the needle is an empty string, return 0
  const prefixTable = lpsTable(needle); // Generate the LPS table for the needle
  let i = 0; // Pointer for the haystack
  let j = 0; // Pointer for the needle

  while (i < haystack.length) {
    if (haystack[i] === needle[j]) {
      i += 1;
      j += 1;
    }
    if (j === needle.length) {
      allPointers.push(i - j); // Record the starting index of the match
      j = prefixTable[j - 1]; // Move j to the last matched prefix
    } else if (i < haystack.length && haystack[i] !== needle[j]) {
      if (j > 0) {
        j = prefixTable[j - 1]; // Use the prefix table to skip characters in the needle
      } else {
        i += 1; // Move to the next character in the haystack
      }
    }
  }
};

console.log(allPointers); // Output all starting indices of matches found
Enter fullscreen mode Exit fullscreen mode

LPS Table Function: lpsTable

var lpsTable = function(word) {
  const patternTable = [0]; // Initialize the LPS table with the first value as 0
  let prefixIndex = 0; // Length of the previous longest prefix suffix
  let suffixIndex = 1; // Index in the string to be processed

  while (suffixIndex < word.length) {
    if (word[prefixIndex] === word[suffixIndex]) {
      patternTable[suffixIndex] = prefixIndex + 1; // Increment the prefix length
      suffixIndex += 1;
      prefixIndex += 1;
    } else if (prefixIndex === 0) {
      patternTable[suffixIndex] = 0; // No proper prefix which is suffix exists
      suffixIndex += 1;
    } else {
      prefixIndex = patternTable[prefixIndex - 1]; // Fallback to the previous prefix
    }
  }

  return patternTable; // Return the completed LPS table
}
Enter fullscreen mode Exit fullscreen mode

Explanation:

  1. LPS Table (lpsTable):

    • The LPS table is constructed to store the length of the longest proper prefix which is also a suffix for each substring of the needle.
    • patternTable[suffixIndex] = prefixIndex + 1 indicates that the longest prefix which is also a suffix up to suffixIndex is of length prefixIndex + 1.
    • This table is used to skip characters while matching the needle with the haystack, avoiding unnecessary comparisons.
  2. KMP Algorithm (strStr):

    • The algorithm starts by checking if the needle is an empty string. If it is, it returns 0 (as per the problem definition).
    • The main loop iterates through the haystack. When characters of the haystack and needle match, both pointers (i and j) are incremented.
    • If the entire needle is found (j === needle.length), the start index of the match is recorded (i - j), and j is reset using the LPS table to allow for overlapping matches.
    • If there is a mismatch after some matches, j is reset using the LPS table to the last matched prefix, and i is incremented only when j is 0.
  3. allPointers Array:

    • This array is used to store the starting indices of all occurrences of the needle found in the haystack.

This code efficiently finds all occurrences of the needle in the haystack using the KMP algorithm, which preprocesses the needle to allow skipping over unnecessary comparisons in the haystack, thus reducing the overall time complexity to (O(n + m)), where (n) is the length of the haystack and (m) is the length of the needle.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .