 Open in App
Not now

# Find the Longest Common Substring using Binary search and Rolling Hash

• Difficulty Level : Medium
• Last Updated : 31 Jan, 2023

Given two strings X and Y, the task is to find the length of the longest common substring.

Examples:

Input: X = “GeeksforGeeks”, y = “GeeksQuiz”
Output:
Explanation: The longest common substring is “Geeks” and is of length 5.

Input: X = “abcdxyz”, y = “xyzabcd”
Output:
Explanation: The longest common substring is “abcd” and is of length 4.

Input: X = “zxabcdezy”, y = “yzabcdezx”
Output:
Explanation: The longest common substring is “abcdez” and is of length 6.

## Longest Common Substring using Dynamic Programming:

This problem can be solved using dynamic programming in O(len(X) * len(Y)), see this. In this article we are going to discuss about an efficient approach.

## Longest Common Substring using Binary Search and Rolling Hash

Pre-requisites:

Observation:

If there is a common substring of length K in both the strings, then there will be common substrings of length 0, 1, …, K – 1. Hence, binary search on answer can be applied.\

Follow the below steps to implement the idea:

• Smallest possible answer(low) = 0 and largest possible answer(high) = min(len(X), len(Y)), Range of binary search will be [0, min(len(X), len(Y))].
• For every mid, check if there exists a common substring of length mid, if exists then update low, else update high.
• To check the existence of a common substring of length K, Polynomial rolling hash function can be used.
• Iterate over all the windows of size K  in string X and string Y and get the hash
• If there is a common hash return True, else return False.

Below is the implementation of this approach.

## Python3

 `# Python code to implement the approach`   `# Function to implement rolling hash` `class` `ComputeHash:`   `    ``# Generates hash in O(n(log(n)))` `    ``def` `__init__(``self``, s, p, mod):` `        ``n ``=` `len``(s)` `        ``self``.``hash` `=` `[``0``] ``*` `n` `        ``self``.inv_mod ``=` `[``0``] ``*` `n` `        ``self``.mod ``=` `mod` `        ``self``.p ``=` `p`   `        ``p_pow ``=` `1` `        ``hash_value ``=` `0`   `        ``for` `i ``in` `range``(n):` `            ``c ``=` `ord``(s[i]) ``-` `65` `+` `1` `            ``hash_value ``=` `(hash_value ``+` `c ``*` `p_pow) ``%` `self``.mod` `            ``self``.``hash``[i] ``=` `hash_value` `            ``self``.inv_mod[i] ``=` `pow``(p_pow, ``self``.mod ``-` `2``, ``self``.mod)` `            ``p_pow ``=` `(p_pow ``*` `self``.p) ``%` `self``.mod`   `    ``# Return hash of a window in O(1)` `    ``def` `get_hash(``self``, l, r):`   `        ``if` `l ``=``=` `0``:` `            ``return` `self``.``hash``[r]`   `        ``window ``=` `(``self``.``hash``[r] ``-` `self``.``hash``[l ``-` `1``]) ``%` `self``.mod` `        ``return` `(window ``*` `self``.inv_mod[l]) ``%` `self``.mod`   `# Function to get the longest common substring` `def` `longestCommonSubstr(X, Y, n, m):`   `    ``p1, p2 ``=` `31``, ``37` `    ``m1, m2 ``=` `pow``(``10``, ``9``) ``+` `9``, ``pow``(``10``, ``9``) ``+` `7`   `    ``# Initialize two hash objects` `    ``# with different p1, p2, m1, m2` `    ``# to reduce collision` `    ``hash_X1 ``=` `ComputeHash(X, p1, m1)` `    ``hash_X2 ``=` `ComputeHash(X, p2, m2)`   `    ``hash_Y1 ``=` `ComputeHash(Y, p1, m1)` `    ``hash_Y2 ``=` `ComputeHash(Y, p2, m2)`   `    ``# Function that returns the existence` `    ``# of a common substring of length k` `    ``def` `exists(k):`   `        ``if` `k ``=``=` `0``:` `            ``return` `True`   `        ``st ``=` `set``()` `        `  `        ``# Iterate on X and get hash tuple` `        ``# of all windows of size k` `        ``for` `i ``in` `range``(n ``-` `k ``+` `1``):` `            ``h1 ``=` `hash_X1.get_hash(i, i ``+` `k ``-` `1``)` `            ``h2 ``=` `hash_X2.get_hash(i, i ``+` `k ``-` `1``)`   `            ``cur_window_hash ``=` `(h1, h2)` `            `  `            ``# Put the hash tuple in the set` `            ``st.add(cur_window_hash)`   `        ``# Iterate on Y and get hash tuple` `        ``# of all windows of size k` `        ``for` `i ``in` `range``(m ``-` `k ``+` `1``):` `            ``h1 ``=` `hash_Y1.get_hash(i, i ``+` `k ``-` `1``)` `            ``h2 ``=` `hash_Y2.get_hash(i, i ``+` `k ``-` `1``)`   `            ``cur_window_hash ``=` `(h1, h2)` `            `  `            ``# If hash exists in st return True` `            ``if` `cur_window_hash ``in` `st:` `                ``return` `True` `        ``return` `False`   `    ``# Binary Search on length` `    ``answer ``=` `0` `    ``low, high ``=` `0``, ``min``(n, m)`   `    ``while` `low <``=` `high:` `        ``mid ``=` `(low ``+` `high) ``/``/` `2`   `        ``if` `exists(mid):` `            ``answer ``=` `mid` `            ``low ``=` `mid ``+` `1` `        ``else``:` `            ``high ``=` `mid ``-` `1`   `    ``return` `answer`     `# Driver Code` `if` `__name__ ``=``=` `'__main__'``:` `    ``X ``=` `'GeeksforGeeks'` `    ``Y ``=` `'GeeksQuiz'` `    ``print``(longestCommonSubstr(X, Y, ``len``(X), ``len``(Y)))`

## Java

 `import` `java.util.HashSet;` `import` `java.util.Set;`   `class` `ComputeHash {` `    ``private` `long``[] hash;` `    ``private` `long``[] invMod;` `    ``private` `long` `mod;` `    ``private` `long` `p;`   `    ``// Generates hash in O(n(log(n)))` `    ``public` `ComputeHash(String s, ``long` `p, ``long` `mod) {` `        ``int` `n = s.length();` `        ``this``.hash = ``new` `long``[n];` `        ``this``.invMod = ``new` `long``[n];` `        ``this``.mod = mod;` `        ``this``.p = p;`   `        ``long` `pPow = ``1``;` `        ``long` `hashValue = ``0``;`   `        ``for` `(``int` `i = ``0``; i < n; i++) {` `            ``char` `c = s.charAt(i);` `            ``c = (``char``) (c - ``'A'` `+ ``1``);` `            ``hashValue = (hashValue + c * pPow) % ``this``.mod;` `            ``this``.hash[i] = hashValue;` `            ``this``.invMod[i] = (``long``)(Math.pow(pPow, ``this``.mod - ``2``) % ``this``.mod);` `            ``pPow = (pPow * ``this``.p) % ``this``.mod;` `        ``}` `    ``}`   `    ``// Return hash of a window in O(1)` `    ``public` `long` `getHash(``int` `l, ``int` `r) {` `        ``if` `(l == ``0``) {` `            ``return` `this``.hash[r];` `        ``}`   `        ``long` `window = (``this``.hash[r] - ``this``.hash[l - ``1``]) % ``this``.mod;` `        ``return` `(window * ``this``.invMod[l]) % ``this``.mod;` `    ``}` `}`   `public` `class` `Main {` `    ``// Function to get the longest common substring` `    ``public` `static` `int` `longestCommonSubstr(String X, String Y) {` `    ``int` `n = X.length();` `    ``int` `m = Y.length();`   `    ``long` `p1 = ``31``;` `    ``long` `p2 = ``37``;` `    ``long` `m1 = (``long``) (Math.pow(``10``, ``9``) + ``9``);` `    ``long` `m2 = (``long``) (Math.pow(``10``, ``9``) + ``7``);`   `    ``// Initialize two hash objects` `    ``// with different p1, p2, m1, m2` `    ``// to reduce collision` `    ``ComputeHash hashX1 = ``new` `ComputeHash(X, p1, m1);` `    ``ComputeHash hashX2 = ``new` `ComputeHash(X, p2, m2);`   `    ``ComputeHash hashY1 = ``new` `ComputeHash(Y, p1, m1);` `    ``ComputeHash hashY2 = ``new` `ComputeHash(Y, p2, m2);`   `    ``// Function that returns the existence` `    ``// of a common substring of length k` `    ``int` `low = ``0``, high = Math.min(n, m);` `    ``int` `answer = ``0``;` `    ``while` `(low <= high) {` `        ``int` `mid = (low + high) / ``2``;` `        ``if` `(exists(mid, X, Y)) {` `        ``answer = mid;` `        ``low = mid + ``1``;` `    ``} ``else` `{` `        ``high = mid - ``1``;` `    ``}`   `    ``}` `    ``return` `answer;` `}` `    ``private` `static` `boolean` `exists(``int` `k, String X, String Y) {` `    ``for` `(``int` `i = ``0``; i <= X.length() - k; i++) {` `        ``for` `(``int` `j = ``0``; j <= Y.length() - k; j++) {` `            ``if` `(X.substring(i, i + k).equals(Y.substring(j, j + k))) {` `                ``return` `true``;` `            ``}` `        ``}` `    ``}` `    ``return` `false``;` `}`   `    ``public` `static` `void` `main(String[] args) {` `    ``String X = ``"GeeksforGeeks"``;` `    ``String Y = ``"GeeksQuiz"``;` `    ``System.out.println(longestCommonSubstr(X, Y));` `}` `}`

Output

```5
```

Time Complexity: O(n * log(m1)) + O(m * log((m1)) + O((n + m) * log(min(n, m)))

1. Generating hash object takes O(n*log(m1)), where n is the length of string and m1 = pow(10, 9) + 7.
2. Binary search takes O(log(min(n, m))), where n, m are the lengths of both strings.
3. Hash of a window takes O(1) time.
4. Exist function takes O(n + m) time.

Auxiliary Space: O(n + m)

My Personal Notes arrow_drop_up
Related Articles