Skip to content
Related Articles

Related Articles

Find the Longest Common Substring using Binary search and Rolling Hash

Improve Article
Save Article
  • Difficulty Level : Medium
  • Last Updated : 14 Sep, 2022
Improve Article
Save Article

Given two strings X and Y, the task is to find the length of the longest common substring. 

Examples:

Input: X = “GeeksforGeeks”, y = “GeeksQuiz” 
Output:
Explanation: The longest common substring is “Geeks” and is of length 5.

Input: X = “abcdxyz”, y = “xyzabcd” 
Output:
Explanation: The longest common substring is “abcd” and is of length 4.

Input: X = “zxabcdezy”, y = “yzabcdezx” 
Output:
Explanation: The longest common substring is “abcdez” and is of length 6.

Longest Common Substring using Dynamic Programming:

This problem can be solved using dynamic programming in O(len(X) * len(Y)), see this. In this article we are going to discuss about an efficient approach.

Longest Common Substring using Binary Search and Rolling Hash

Pre-requisites:

Observation:

If there is a common substring of length K in both the strings, then there will be common substrings of length 0, 1, …, K – 1. Hence, binary search on answer can be applied.\

Follow the below steps to implement the idea:

  • Smallest possible answer(low) = 0 and largest possible answer(high) = min(len(X), len(Y)), Range of binary search will be [0, min(len(X), len(Y))].
    • For every mid, check if there exists a common substring of length mid, if exists then update low, else update high.
    • To check the existence of a common substring of length K, Polynomial rolling hash function can be used. 
      • Iterate over all the windows of size K  in string X and string Y and get the hash
      • If there is a common hash return True, else return False.

Below is the implementation of this approach.

Python3




# Python code to implement the approach
  
# Function to implement rolling hash
class ComputeHash:
  
    # Generates hash in O(n(log(n)))
    def __init__(self, s, p, mod):
        n = len(s)
        self.hash = [0] * n
        self.inv_mod = [0] * n
        self.mod = mod
        self.p = p
  
        p_pow = 1
        hash_value = 0
  
        for i in range(n):
            c = ord(s[i]) - 65 + 1
            hash_value = (hash_value + c * p_pow) % self.mod
            self.hash[i] = hash_value
            self.inv_mod[i] = pow(p_pow, self.mod - 2, self.mod)
            p_pow = (p_pow * self.p) % self.mod
  
    # Return hash of a window in O(1)
    def get_hash(self, l, r):
  
        if l == 0:
            return self.hash[r]
  
        window = (self.hash[r] - self.hash[l - 1]) % self.mod
        return (window * self.inv_mod[l]) % self.mod
  
# Function to get the longest common substring
def longestCommonSubstr(X, Y, n, m):
  
    p1, p2 = 31, 37
    m1, m2 = pow(10, 9) + 9, pow(10, 9) + 7
  
    # Initialize two hash objects
    # with different p1, p2, m1, m2
    # to reduce collision
    hash_X1 = ComputeHash(X, p1, m1)
    hash_X2 = ComputeHash(X, p2, m2)
  
    hash_Y1 = ComputeHash(Y, p1, m1)
    hash_Y2 = ComputeHash(Y, p2, m2)
  
    # Function that returns the existence
    # of a common substring of length k
    def exists(k):
  
        if k == 0:
            return True
  
        st = set()
          
        # Iterate on X and get hash tuple
        # of all windows of size k
        for i in range(n - k + 1):
            h1 = hash_X1.get_hash(i, i + k - 1)
            h2 = hash_X2.get_hash(i, i + k - 1)
  
            cur_window_hash = (h1, h2)
              
            # Put the hash tuple in the set
            st.add(cur_window_hash)
  
        # Iterate on Y and get hash tuple
        # of all windows of size k
        for i in range(m - k + 1):
            h1 = hash_Y1.get_hash(i, i + k - 1)
            h2 = hash_Y2.get_hash(i, i + k - 1)
  
            cur_window_hash = (h1, h2)
              
            # If hash exists in st return True
            if cur_window_hash in st:
                return True
        return False
  
    # Binary Search on length
    answer = 0
    low, high = 0, min(n, m)
  
    while low <= high:
        mid = (low + high) // 2
  
        if exists(mid):
            answer = mid
            low = mid + 1
        else:
            high = mid - 1
  
    return answer
  
  
# Driver Code
if __name__ == '__main__':
    X = 'GeeksforGeeks'
    Y = 'GeeksQuiz'
    print(longestCommonSubstr(X, Y, len(X), len(Y)))


Output

5
4
6

Time Complexity: O(n * log(m1)) + O(m * log((m1)) + O((n + m) * log(min(n, m)))

  1. Generating hash object takes O(n*log(m1)), where n is the length of string and m1 = pow(10, 9) + 7.
  2. Binary search takes O(log(min(n, m))), where n, m are the lengths of both strings.
  3. Hash of a window takes O(1) time.
  4. Exist function takes O(n + m) time.

Auxiliary Space: O(n + m)


My Personal Notes arrow_drop_up
Related Articles

Start Your Coding Journey Now!