Skip to content
Related Articles
Open in App
Not now

Related Articles

Find the Longest Common Substring using Binary search and Rolling Hash

Improve Article
Save Article
Like Article
  • Difficulty Level : Medium
  • Last Updated : 31 Jan, 2023
Improve Article
Save Article
Like Article

Given two strings X and Y, the task is to find the length of the longest common substring. 

Examples:

Input: X = “GeeksforGeeks”, y = “GeeksQuiz” 
Output:
Explanation: The longest common substring is “Geeks” and is of length 5.

Input: X = “abcdxyz”, y = “xyzabcd” 
Output:
Explanation: The longest common substring is “abcd” and is of length 4.

Input: X = “zxabcdezy”, y = “yzabcdezx” 
Output:
Explanation: The longest common substring is “abcdez” and is of length 6.

Longest Common Substring using Dynamic Programming:

This problem can be solved using dynamic programming in O(len(X) * len(Y)), see this. In this article we are going to discuss about an efficient approach.

Longest Common Substring using Binary Search and Rolling Hash

Pre-requisites:

Observation:

If there is a common substring of length K in both the strings, then there will be common substrings of length 0, 1, …, K – 1. Hence, binary search on answer can be applied.\

Follow the below steps to implement the idea:

  • Smallest possible answer(low) = 0 and largest possible answer(high) = min(len(X), len(Y)), Range of binary search will be [0, min(len(X), len(Y))].
    • For every mid, check if there exists a common substring of length mid, if exists then update low, else update high.
    • To check the existence of a common substring of length K, Polynomial rolling hash function can be used. 
      • Iterate over all the windows of size K  in string X and string Y and get the hash
      • If there is a common hash return True, else return False.

Below is the implementation of this approach.

Python3




# Python code to implement the approach
 
# Function to implement rolling hash
class ComputeHash:
 
    # Generates hash in O(n(log(n)))
    def __init__(self, s, p, mod):
        n = len(s)
        self.hash = [0] * n
        self.inv_mod = [0] * n
        self.mod = mod
        self.p = p
 
        p_pow = 1
        hash_value = 0
 
        for i in range(n):
            c = ord(s[i]) - 65 + 1
            hash_value = (hash_value + c * p_pow) % self.mod
            self.hash[i] = hash_value
            self.inv_mod[i] = pow(p_pow, self.mod - 2, self.mod)
            p_pow = (p_pow * self.p) % self.mod
 
    # Return hash of a window in O(1)
    def get_hash(self, l, r):
 
        if l == 0:
            return self.hash[r]
 
        window = (self.hash[r] - self.hash[l - 1]) % self.mod
        return (window * self.inv_mod[l]) % self.mod
 
# Function to get the longest common substring
def longestCommonSubstr(X, Y, n, m):
 
    p1, p2 = 31, 37
    m1, m2 = pow(10, 9) + 9, pow(10, 9) + 7
 
    # Initialize two hash objects
    # with different p1, p2, m1, m2
    # to reduce collision
    hash_X1 = ComputeHash(X, p1, m1)
    hash_X2 = ComputeHash(X, p2, m2)
 
    hash_Y1 = ComputeHash(Y, p1, m1)
    hash_Y2 = ComputeHash(Y, p2, m2)
 
    # Function that returns the existence
    # of a common substring of length k
    def exists(k):
 
        if k == 0:
            return True
 
        st = set()
         
        # Iterate on X and get hash tuple
        # of all windows of size k
        for i in range(n - k + 1):
            h1 = hash_X1.get_hash(i, i + k - 1)
            h2 = hash_X2.get_hash(i, i + k - 1)
 
            cur_window_hash = (h1, h2)
             
            # Put the hash tuple in the set
            st.add(cur_window_hash)
 
        # Iterate on Y and get hash tuple
        # of all windows of size k
        for i in range(m - k + 1):
            h1 = hash_Y1.get_hash(i, i + k - 1)
            h2 = hash_Y2.get_hash(i, i + k - 1)
 
            cur_window_hash = (h1, h2)
             
            # If hash exists in st return True
            if cur_window_hash in st:
                return True
        return False
 
    # Binary Search on length
    answer = 0
    low, high = 0, min(n, m)
 
    while low <= high:
        mid = (low + high) // 2
 
        if exists(mid):
            answer = mid
            low = mid + 1
        else:
            high = mid - 1
 
    return answer
 
 
# Driver Code
if __name__ == '__main__':
    X = 'GeeksforGeeks'
    Y = 'GeeksQuiz'
    print(longestCommonSubstr(X, Y, len(X), len(Y)))


Java




import java.util.HashSet;
import java.util.Set;
 
class ComputeHash {
    private long[] hash;
    private long[] invMod;
    private long mod;
    private long p;
 
    // Generates hash in O(n(log(n)))
    public ComputeHash(String s, long p, long mod) {
        int n = s.length();
        this.hash = new long[n];
        this.invMod = new long[n];
        this.mod = mod;
        this.p = p;
 
        long pPow = 1;
        long hashValue = 0;
 
        for (int i = 0; i < n; i++) {
            char c = s.charAt(i);
            c = (char) (c - 'A' + 1);
            hashValue = (hashValue + c * pPow) % this.mod;
            this.hash[i] = hashValue;
            this.invMod[i] = (long)(Math.pow(pPow, this.mod - 2) % this.mod);
            pPow = (pPow * this.p) % this.mod;
        }
    }
 
    // Return hash of a window in O(1)
    public long getHash(int l, int r) {
        if (l == 0) {
            return this.hash[r];
        }
 
        long window = (this.hash[r] - this.hash[l - 1]) % this.mod;
        return (window * this.invMod[l]) % this.mod;
    }
}
 
public class Main {
    // Function to get the longest common substring
    public static int longestCommonSubstr(String X, String Y) {
    int n = X.length();
    int m = Y.length();
 
    long p1 = 31;
    long p2 = 37;
    long m1 = (long) (Math.pow(10, 9) + 9);
    long m2 = (long) (Math.pow(10, 9) + 7);
 
    // Initialize two hash objects
    // with different p1, p2, m1, m2
    // to reduce collision
    ComputeHash hashX1 = new ComputeHash(X, p1, m1);
    ComputeHash hashX2 = new ComputeHash(X, p2, m2);
 
    ComputeHash hashY1 = new ComputeHash(Y, p1, m1);
    ComputeHash hashY2 = new ComputeHash(Y, p2, m2);
 
    // Function that returns the existence
    // of a common substring of length k
    int low = 0, high = Math.min(n, m);
    int answer = 0;
    while (low <= high) {
        int mid = (low + high) / 2;
        if (exists(mid, X, Y)) {
        answer = mid;
        low = mid + 1;
    } else {
        high = mid - 1;
    }
 
    }
    return answer;
}
    private static boolean exists(int k, String X, String Y) {
    for (int i = 0; i <= X.length() - k; i++) {
        for (int j = 0; j <= Y.length() - k; j++) {
            if (X.substring(i, i + k).equals(Y.substring(j, j + k))) {
                return true;
            }
        }
    }
    return false;
}
 
    public static void main(String[] args) {
    String X = "GeeksforGeeks";
    String Y = "GeeksQuiz";
    System.out.println(longestCommonSubstr(X, Y));
}
}


Output

5

Time Complexity: O(n * log(m1)) + O(m * log((m1)) + O((n + m) * log(min(n, m)))

  1. Generating hash object takes O(n*log(m1)), where n is the length of string and m1 = pow(10, 9) + 7.
  2. Binary search takes O(log(min(n, m))), where n, m are the lengths of both strings.
  3. Hash of a window takes O(1) time.
  4. Exist function takes O(n + m) time.

Auxiliary Space: O(n + m)


My Personal Notes arrow_drop_up
Like Article
Save Article
Related Articles

Start Your Coding Journey Now!