Longest Common Subsequence | DP-4

• Difficulty Level : Medium
• Last Updated : 10 Jan, 2022

We have discussed Overlapping Subproblems and Optimal Substructure properties in Set 1 and Set 2 respectively. We also discussed one example problem in Set 3. Let us discuss Longest Common Subsequence (LCS) problem as one more example problem that can be solved using Dynamic Programming.

LCS Problem Statement: Given two sequences, find the length of longest subsequence present in both of them. A subsequence is a sequence that appears in the same relative order, but not necessarily contiguous. For example, “abc”, “abg”, “bdf”, “aeg”, ‘”acefg”, .. etc are subsequences of “abcdefg”.

In order to find out the complexity of brute force approach, we need to first know the number of possible different subsequences of a string with length n, i.e., find the number of subsequences with lengths ranging from 1,2,..n-1. Recall from theory of permutation and combination that number of combinations with 1 element are nC1. Number of combinations with 2 elements are nC2 and so forth and so on. We know that nC0 + nC1 + nC2 + … nCn = 2n. So a string of length n has 2n-1 different possible subsequences since we do not consider the subsequence with length 0. This implies that the time complexity of the brute force approach will be O(n * 2n). Note that it takes O(n) time to check if a subsequence is common to both the strings. This time complexity can be improved using dynamic programming.

It is a classic computer science problem, the basis of diff (a file comparison program that outputs the differences between two files), and has applications in bioinformatics.

Examples:
LCS for input Sequences “ABCDGH” and “AEDFHR” is “ADH” of length 3.
LCS for input Sequences “AGGTAB” and “GXTXAYB” is “GTAB” of length 4.

The naive solution for this problem is to generate all subsequences of both given sequences and find the longest matching subsequence. This solution is exponential in term of time complexity. Let us see how this problem possesses both important properties of a Dynamic Programming (DP) Problem.

1) Optimal Substructure:
Let the input sequences be X[0..m-1] and Y[0..n-1] of lengths m and n respectively. And let L(X[0..m-1], Y[0..n-1]) be the length of LCS of the two sequences X and Y. Following is the recursive definition of L(X[0..m-1], Y[0..n-1]).

If last characters of both sequences match (or X[m-1] == Y[n-1]) then
L(X[0..m-1], Y[0..n-1]) = 1 + L(X[0..m-2], Y[0..n-2])

If last characters of both sequences do not match (or X[m-1] != Y[n-1]) then
L(X[0..m-1], Y[0..n-1]) = MAX ( L(X[0..m-2], Y[0..n-1]), L(X[0..m-1], Y[0..n-2]) )

Examples:
1) Consider the input strings “AGGTAB” and “GXTXAYB”. Last characters match for the strings. So length of LCS can be written as:
L(“AGGTAB”, “GXTXAYB”) = 1 + L(“AGGTA”, “GXTXAY”) 2) Consider the input strings “ABCDGH” and “AEDFHR. Last characters do not match for the strings. So length of LCS can be written as:
L(“ABCDGH”, “AEDFHR”) = MAX ( L(“ABCDG”, “AEDFHR”), L(“ABCDGH”, “AEDFH”) )
So the LCS problem has optimal substructure property as the main problem can be solved using solutions to subproblems.

2) Overlapping Subproblems:
Following is simple recursive implementation of the LCS problem. The implementation simply follows the recursive structure mentioned above.

C++

 /* A Naive recursive implementation of LCS problem */ #include using namespace std;       /* Returns length of LCS for X[0..m-1], Y[0..n-1] */ int lcs( char *X, char *Y, int m, int n ) {     if (m == 0 || n == 0)         return 0;     if (X[m-1] == Y[n-1])         return 1 + lcs(X, Y, m-1, n-1);     else         return max(lcs(X, Y, m, n-1), lcs(X, Y, m-1, n)); }       /* Driver code */ int main() {     char X[] = "AGGTAB";     char Y[] = "GXTXAYB";           int m = strlen(X);     int n = strlen(Y);           cout<<"Length of LCS is "<< lcs( X, Y, m, n ) ;           return 0; }   // This code is contributed by rathbhupendra

C

 /* A Naive recursive implementation of LCS problem */ #include   int max(int a, int b);   /* Returns length of LCS for X[0..m-1], Y[0..n-1] */ int lcs( char *X, char *Y, int m, int n ) { if (m == 0 || n == 0)     return 0; if (X[m-1] == Y[n-1])     return 1 + lcs(X, Y, m-1, n-1); else     return max(lcs(X, Y, m, n-1), lcs(X, Y, m-1, n)); }   /* Utility function to get max of 2 integers */ int max(int a, int b) {     return (a > b)? a : b; }   /* Driver program to test above function */ int main() { char X[] = "AGGTAB"; char Y[] = "GXTXAYB";   int m = strlen(X); int n = strlen(Y);   printf("Length of LCS is %d", lcs( X, Y, m, n ) );   return 0; }

Java

 /* A Naive recursive implementation of LCS problem in java*/ public class LongestCommonSubsequence {   /* Returns length of LCS for X[0..m-1], Y[0..n-1] */ int lcs( char[] X, char[] Y, int m, int n ) {     if (m == 0 || n == 0)     return 0;     if (X[m-1] == Y[n-1])     return 1 + lcs(X, Y, m-1, n-1);     else     return max(lcs(X, Y, m, n-1), lcs(X, Y, m-1, n)); }   /* Utility function to get max of 2 integers */ int max(int a, int b) {     return (a > b)? a : b; }   public static void main(String[] args) {     LongestCommonSubsequence lcs = new LongestCommonSubsequence();     String s1 = "AGGTAB";     String s2 = "GXTXAYB";       char[] X=s1.toCharArray();     char[] Y=s2.toCharArray();     int m = X.length;     int n = Y.length;       System.out.println("Length of LCS is" + " " +                                 lcs.lcs( X, Y, m, n ) ); }   }   // This Code is Contributed by Saket Kumar

Python3

 # A Naive recursive Python implementation of LCS problem   def lcs(X, Y, m, n):       if m == 0 or n == 0:         return 0     elif X[m-1] == Y[n-1]:         return 1 + lcs(X, Y, m-1, n-1);     else:         return max(lcs(X, Y, m, n-1), lcs(X, Y, m-1, n));     # Driver program to test the above function X = "AGGTAB" Y = "GXTXAYB" print ("Length of LCS is ", lcs(X , Y, len(X), len(Y)) )

C#

 /* C# Naive recursive implementation of LCS problem */ using System;   class GFG {             /* Returns length of LCS for X[0..m-1], Y[0..n-1] */     static int lcs( char[] X, char[] Y, int m, int n )     {         if (m == 0 || n == 0)         return 0;         if (X[m - 1] == Y[n - 1])         return 1 + lcs(X, Y, m - 1, n - 1);         else         return max(lcs(X, Y, m, n - 1), lcs(X, Y, m - 1, n));     }           /* Utility function to get max of 2 integers */     static int max(int a, int b)     {         return (a > b)? a : b;     }           public static void Main()     {         String s1 = "AGGTAB";         String s2 = "GXTXAYB";               char[] X=s1.ToCharArray();         char[] Y=s2.ToCharArray();         int m = X.Length;         int n = Y.Length;               Console.Write("Length of LCS is" + " "                     +lcs( X, Y, m, n ) );     } } // This code is Contributed by Sam007



Javascript



Output:

Length of LCS is 4

Time complexity of the above naive recursive approach is O(2^n) in worst case and worst case happens when all characters of X and Y mismatch i.e., length of LCS is 0.

Considering the above implementation, following is a partial recursion tree for input strings “AXYT” and “AYZX”

lcs("AXYT", "AYZX")
/
lcs("AXY", "AYZX")            lcs("AXYT", "AYZ")
/                              /
lcs("AX", "AYZX") lcs("AXY", "AYZ")   lcs("AXY", "AYZ") lcs("AXYT", "AY")

In the above partial recursion tree, lcs(“AXY”, “AYZ”) is being solved twice. If we draw the complete recursion tree, then we can see that there are many subproblems which are solved again and again. So this problem has Overlapping Substructure property and recomputation of same subproblems can be avoided by either using Memoization or Tabulation. Following is a tabulated implementation for the LCS problem.

Python3

 def lcs(s1 , s2):    m, n = len(s1), len(s2)    prev, cur = *(n+1), *(n+1)    for i in range(1, m+1):        for j in range(1, n+1):            if s1[i-1] == s2[j-1]:                cur[j] = 1 + prev[j-1]            else:                if cur[j-1] > prev[j]:                    cur[j] = cur[j-1]                else:                    cur[j] = prev[j]        cur, prev = prev, cur    return prev[n]    #end of function lcs       # Driver program to test the above function s1 = "AGGTAB" s2 = "GXTXAYB" print("Length of LCS is ", lcs(s1, s2)) # BY PRASHANT SHEKHAR MISHRA

C

 /* Dynamic Programming C implementation of LCS problem */ #include   int max(int a, int b);   /* Returns length of LCS for X[0..m-1], Y[0..n-1] */ int lcs( char *X, char *Y, int m, int n ) { int L[m+1][n+1]; int i, j;   /* Following steps build L[m+1][n+1] in bottom up fashion. Note     that L[i][j] contains length of LCS of X[0..i-1] and Y[0..j-1] */ for (i=0; i<=m; i++) {     for (j=0; j<=n; j++)     {     if (i == 0 || j == 0)         L[i][j] = 0;       else if (X[i-1] == Y[j-1])         L[i][j] = L[i-1][j-1] + 1;       else         L[i][j] = max(L[i-1][j], L[i][j-1]);     } }       /* L[m][n] contains length of LCS for X[0..n-1] and Y[0..m-1] */ return L[m][n]; }   /* Utility function to get max of 2 integers */ int max(int a, int b) {     return (a > b)? a : b; }   /* Driver program to test above function */ int main() { char X[] = "AGGTAB"; char Y[] = "GXTXAYB";   int m = strlen(X); int n = strlen(Y);   printf("Length of LCS is %d", lcs( X, Y, m, n ) );   return 0; }

Java

 /* Dynamic Programming Java implementation of LCS problem */ public class LongestCommonSubsequence {   /* Returns length of LCS for X[0..m-1], Y[0..n-1] */ int lcs( char[] X, char[] Y, int m, int n ) {     int L[][] = new int[m+1][n+1];       /* Following steps build L[m+1][n+1] in bottom up fashion. Note         that L[i][j] contains length of LCS of X[0..i-1] and Y[0..j-1] */     for (int i=0; i<=m; i++)     {     for (int j=0; j<=n; j++)     {         if (i == 0 || j == 0)             L[i][j] = 0;         else if (X[i-1] == Y[j-1])             L[i][j] = L[i-1][j-1] + 1;         else             L[i][j] = max(L[i-1][j], L[i][j-1]);     }     } return L[m][n]; }   /* Utility function to get max of 2 integers */ int max(int a, int b) {     return (a > b)? a : b; }   public static void main(String[] args) {     LongestCommonSubsequence lcs = new LongestCommonSubsequence();     String s1 = "AGGTAB";     String s2 = "GXTXAYB";       char[] X=s1.toCharArray();     char[] Y=s2.toCharArray();     int m = X.length;     int n = Y.length;       System.out.println("Length of LCS is" + " " +                                 lcs.lcs( X, Y, m, n ) ); }   }   // This Code is Contributed by Saket Kumar

Python3

 # Dynamic Programming implementation of LCS problem   def lcs(X , Y):     # find the length of the strings     m = len(X)     n = len(Y)       # declaring the array for storing the dp values     L = [[None]*(n+1) for i in range(m+1)]       """Following steps build L[m+1][n+1] in bottom up fashion     Note: L[i][j] contains length of LCS of X[0..i-1]     and Y[0..j-1]"""     for i in range(m+1):         for j in range(n+1):             if i == 0 or j == 0 :                 L[i][j] = 0             elif X[i-1] == Y[j-1]:                 L[i][j] = L[i-1][j-1]+1             else:                 L[i][j] = max(L[i-1][j] , L[i][j-1])       # L[m][n] contains the length of LCS of X[0..n-1] & Y[0..m-1]     return L[m][n] #end of function lcs     # Driver program to test the above function X = "AGGTAB" Y = "GXTXAYB" print ("Length of LCS is ", lcs(X, Y) )   # This code is contributed by Nikhil Kumar Singh(nickzuck_007)

C#

 // Dynamic Programming C# implementation // of LCS problem using System;   class GFG {       /* Returns length of LCS for X[0..m-1], Y[0..n-1] */     static int lcs( char[] X, char[] Y, int m, int n )     {         int [,]L = new int[m+1,n+1];               /* Following steps build L[m+1][n+1]         in bottom up fashion. Note         that L[i][j] contains length of         LCS of X[0..i-1] and Y[0..j-1] */         for (int i = 0; i <= m; i++)         {             for (int j = 0; j <= n; j++)             {                 if (i == 0 || j == 0)                     L[i, j] = 0;                 else if (X[i - 1] == Y[j - 1])                     L[i, j] = L[i - 1, j - 1] + 1;                 else                     L[i, j] = max(L[i - 1, j], L[i, j - 1]);             }         }         return L[m, n];     }           /* Utility function to get max of 2 integers */     static int max(int a, int b)     {         return (a > b)? a : b;     }           // Driver code     public static void Main()     {                   String s1 = "AGGTAB";         String s2 = "GXTXAYB";               char[] X=s1.ToCharArray();         char[] Y=s2.ToCharArray();         int m = X.Length;         int n = Y.Length;               Console.Write("Length of LCS is" + " " +lcs( X, Y, m, n ) );     } } // This Code is Contributed by Sam007



Javascript



Output:

Length of LCS is 4

Time Complexity of the above implementation is O(mn) which is much better than the worst-case time complexity of Naive Recursive implementation.
The above algorithm/code returns only length of LCS. Please see the following post for printing the LCS.
Printing Longest Common Subsequence
You can also check the space optimized version of LCS at
Space Optimized Solution of LCS