Persistent Trie | Set 1 (Introduction)
Prerequisite:
Trie is one handy data structure that often comes into play when performing multiple string lookups. In this post, we will introduce the concept of Persistency in this data structure. Persistency simply means to retain the changes. But obviously, retaining the changes cause extra memory consumption and hence affect the Time Complexity.
Our aim is to apply persistency in Trie and also to ensure that it does not take more than the standard trie searching i.e. O(length_of_key). We will also analyze the extra space complexity that persistency causes over the standard Space Complexity of a Trie.
Let’s think in terms of versions i.e. for each change/insertion in our Trie we create a new version of it.
We will consider our initial version to be Version-0. Now, as we do any insertion in the trie we will create a new version for it and in similar fashion track the record for all versions.
But creating the whole trie every time for every version keeps doubling up memory and affects the Space Complexity very badly. So, this idea will easily run out of memory for a large number of versions.
Let’s exploit the fact that for each new insertion in the trie, exactly X (length_of_key) nodes will be visited/modified. So, our new version will only contain these X new nodes and rest trie nodes will be the same as the previous version. Therefore, it is quite clear that for each new version we only need to create these X new nodes whereas the rest of the trie nodes can be shared from the previous version.
Consider the below figure for better visualization:
Now, the Question arises: How to keep track of all the versions?
We only need to keep track the first root node for all the versions and this will serve the purpose to track all the newly created nodes in the different versions as the root node gives us the entry point for that particular version. For this purpose, we can maintain an array of pointers to the root node of the trie for all versions.
Let’s consider the below scenario and see how we can use Persistent Trie to solve it !
Given an array of strings and we need to determine if a string exists in some
range [l, r] in the array. To have an analogy, consider the array to be a
list of words in a dictionary at ith page(i is the index of the array) and
we need to determine whether a given word X exists in the page range [l, r]?
Below is the implementation for the above problem:-
C++
// C++ implementation of the approach #include <bits/stdc++.h> using namespace std; // Distinct numbers of chars in key const int sz = 26; // Persistent Trie node structure struct PersistentTrie { // Stores all children nodes, where ith children denotes // ith alphabetical character vector<PersistentTrie*> children; // Marks the ending of the key bool keyEnd = false ; // Constructor 1 PersistentTrie( bool keyEnd = false ) { this ->keyEnd = keyEnd; } // Constructor 2 PersistentTrie(vector<PersistentTrie*>& children, bool keyEnd = false ) { this ->children = children; this ->keyEnd = keyEnd; } // detects existence of key in trie bool findKey(string& key, int len); // Inserts key into trie // returns new node after insertion PersistentTrie* insert(string& key, int len); }; // Dummy PersistentTrie node PersistentTrie* dummy; // Initialize dummy for easy implementation void init() { dummy = new PersistentTrie( true ); // All children of dummy as dummy vector<PersistentTrie*> children(sz, dummy); dummy->children = children; } // Inserts key into current trie // returns newly created trie node after insertion PersistentTrie* PersistentTrie::insert(string& key, int len) { // If reached the end of key string if (len == key.length()) { // Create new trie node with current trie node // marked as keyEnd return new PersistentTrie((* this ).children, true ); } // Fetch current child nodes vector<PersistentTrie*> new_version_PersistentTrie = (* this ).children; // Insert at key[len] child and // update the new child node PersistentTrie* tmpNode = new_version_PersistentTrie[key[len] - 'a' ]; new_version_PersistentTrie[key[len] - 'a' ] = tmpNode->insert(key, len + 1); // Return a new node with modified key[len] child node return new PersistentTrie(new_version_PersistentTrie); } // Returns the presence of key in current trie bool PersistentTrie::findKey(string& key, int len) { // If reached end of key if (key.length() == len) // Return if this is a keyEnd in trie return this ->keyEnd; // If we cannot find key[len] child in trie // we say key doesn't exist in the trie if ( this ->children[key[len] - 'a' ] == dummy) return false ; // Recursively search the rest of // key length in children[key] trie return this ->children[key[len] - 'a' ]->findKey(key, len + 1); } // dfs traversal over the current trie // prints all the keys present in the current trie void printAllKeysInTrie(PersistentTrie* root, string& s) { int flag = 0; for ( int i = 0; i < sz; i++) { if (root->children[i] != dummy) { flag = 1; s.push_back( 'a' + i); printAllKeysInTrie(root->children[i], s); s.pop_back(); } } if (flag == 0 and s.length() > 0) cout << s << endl; } // Driver code int main( int argc, char const * argv[]) { // Initialize the PersistentTrie init(); // Input keys vector<string> keys({ "goku" , "gohan" , "goten" , "gogeta" }); // Cache to store trie entry roots after each insertion PersistentTrie* root[keys.size()]; // Marking first root as dummy root[0] = dummy; // Inserting all keys for ( int i = 1; i <= keys.size(); i++) { // Caching new root for ith version of trie root[i] = root[i - 1]->insert(keys[i - 1], 0); } int idx = 3; cout << "All keys in trie after version - " << idx << endl; string key = "" ; printAllKeysInTrie(root[idx], key); string queryString = "goku" ; int l = 2, r = 3; cout << "range : " << "[" << l << ", " << r << "]" << endl; if (root[r]->findKey(queryString, 0) and !root[l - 1]->findKey(queryString, 0)) cout << queryString << " - exists in above range" << endl; else cout << queryString << " - does not exist in above range" << endl; queryString = "goten" ; l = 2, r = 4; cout << "range : " << "[" << l << ", " << r << "]" << endl; if (root[r]->findKey(queryString, 0) and !root[l - 1]->findKey(queryString, 0)) cout << queryString << " - exists in above range" << endl; else cout << queryString << " - does not exist in above range" << endl; return 0; } |
Java
// Java program for the above approach import java.io.*; import java.util.*; // Persistent Trie node structure class PersistentTrie { // Stores all children nodes, where // ith children denotes ith // alphabetical character PersistentTrie[] children; // Marks the ending of the key boolean keyEnd = false ; // Constructor 1 PersistentTrie( boolean keyEnd) { this .keyEnd = keyEnd; } // Constructor 2 PersistentTrie(PersistentTrie[] children, boolean keyEnd) { this .children = children; this .keyEnd = keyEnd; } // Detects existence of key in trie boolean findKey(String key, int len, PersistentTrie dummy) { // If reached end of key if (key.length() == len) // Return if this is a keyEnd in trie return this .keyEnd; // If we cannot find key[len] child in trie // we say key doesn't exist in the trie if ( this .children[key.charAt(len) - 'a' ] == dummy) return false ; // Recursively search the rest of // key length in children[key] trie return this .children[key.charAt(len) - 'a' ].findKey( key, len + 1 , dummy); } // Inserts key into trie // returns new node after insertion PersistentTrie insert(String key, int len) { // If reached the end of key string if (len == key.length()) { // Create new trie node with current trie node // marked as keyEnd return new PersistentTrie( this .children.clone(), true ); } // Fetch current child nodes PersistentTrie[] new_version_PersistentTrie = this .children.clone(); // Insert at key[len] child and // update the new child node PersistentTrie tmpNode = new_version_PersistentTrie[key.charAt(len) - 'a' ]; new_version_PersistentTrie[key.charAt(len) - 'a' ] = tmpNode.insert(key, len + 1 ); // Return a new node with modified key[len] child // node return new PersistentTrie( new_version_PersistentTrie, false ); } } class GFG{ static final int sz = 26 ; // Dummy PersistentTrie node static PersistentTrie dummy; // Initialize dummy for easy implementation static void init() { dummy = new PersistentTrie( false ); // All children of dummy as dummy PersistentTrie[] children = new PersistentTrie[sz]; for ( int i = 0 ; i < sz; i++) children[i] = dummy; dummy.children = children; } // dfs traversal over the current trie // prints all the keys present in the current trie static void printAllKeysInTrie(PersistentTrie root, String s) { int flag = 0 ; for ( int i = 0 ; i < sz; i++) { if (root.children[i] != dummy) { flag = 1 ; printAllKeysInTrie(root.children[i], s + (( char )( 'a' + i))); } if (root.children[i].keyEnd) System.out.println(s + ( char )( 'a' + i)); } } // Driver code public static void main(String[] args) { // Initialize the PersistentTrie init(); // Input keys List<String> keys = Arrays.asList( new String[]{ "goku" , "gohan" , "goten" , "gogeta" }); // Cache to store trie entry roots after each // insertion PersistentTrie[] root = new PersistentTrie[keys.size() + 1 ]; // Marking first root as dummy root[ 0 ] = dummy; // Inserting all keys for ( int i = 1 ; i <= keys.size(); i++) { // Caching new root for ith version of trie root[i] = root[i - 1 ].insert(keys.get(i - 1 ), 0 ); } int idx = 3 ; System.out.println( "All keys in trie " + "after version - " + idx); String key = "" ; printAllKeysInTrie(root[ 3 ], key); String queryString = "goku" ; int l = 2 , r = 3 ; System.out.println( "range : " + "[" + l + ", " + r + "]" ); if (root[r].findKey(queryString, 0 , dummy) && !root[l - 1 ].findKey(queryString, 0 , dummy)) System.out.println(queryString + " - exists in above range" ); else System.out.println(queryString + " - does not exist in " + "above range" ); queryString = "goten" ; l = 2 ; r = 4 ; System.out.println( "range : " + "[" + l + ", " + r + "]" ); if (root[r].findKey(queryString, 0 , dummy) && !root[l - 1 ].findKey(queryString, 0 , dummy)) System.out.println(queryString + " - exists in above range" ); else System.out.println(queryString + " - does not exist in above range" ); } } // This code is contributed by jithin |
C#
// C# program for the above approach using System; using System.Collections.Generic; // Persistent Trie node structure public class PersistentTrie { // Stores all children nodes, where // ith children denotes ith // alphabetical character public PersistentTrie[] Children; // Marks the ending of the key public bool KeyEnd; // Constructor 1 public PersistentTrie( bool keyEnd) { KeyEnd = keyEnd; } // Constructor 2 public PersistentTrie(PersistentTrie[] children, bool keyEnd) { Children = children; KeyEnd = keyEnd; } // Detects existence of key in trie public bool FindKey( string key, int len, PersistentTrie dummy) { // If reached end of key if (key.Length == len) return KeyEnd; // If we cannot find key[len] child in trie // we say key doesn't exist in the trie if (Children[key[len] - 'a' ] == dummy) return false ; // Recursively search the rest of // key length in children[key] trie return Children[key[len] - 'a' ].FindKey(key, len + 1, dummy); } // Recursively search the rest of // key length in children[key] trie public PersistentTrie Insert( string key, int len) { // If reached the end of key string if (len == key.Length) { // Create new trie node with current trie node // marked as keyEnd return new PersistentTrie((PersistentTrie[])Children.Clone(), true ); } // Fetch current child nodes PersistentTrie[] new_version_PersistentTrie = (PersistentTrie[])Children.Clone(); // Insert at key[len] child and // update the new child node PersistentTrie tmpNode = new_version_PersistentTrie[key[len] - 'a' ]; new_version_PersistentTrie[key[len] - 'a' ] = tmpNode.Insert(key, len + 1); // Return a new node with modified key[len] child // node return new PersistentTrie(new_version_PersistentTrie, false ); } } public class GFG { static readonly int sz = 26; // Dummy PersistentTrie node static PersistentTrie dummy; // Initialize dummy for easy implementation static void Init() { dummy = new PersistentTrie( false ); // All children of dummy as dummy PersistentTrie[] children = new PersistentTrie[sz]; for ( int i = 0; i < sz; i++) children[i] = dummy; dummy.Children = children; } // dfs traversal over the current trie // prints all the keys present in the current trie static void PrintAllKeysInTrie(PersistentTrie root, string s) { int flag = 0; for ( int i = 0; i < sz; i++) { if (root.Children[i] != dummy) { flag = 1; PrintAllKeysInTrie(root.Children[i], s + ( char )( 'a' + i)); } if (root.Children[i].KeyEnd) Console.WriteLine(s + ( char )( 'a' + i)); } } // Driver code public static void Main( string [] args) { // Initialize the PersistentTrie Init(); // Input keys List< string > keys = new List< string > { "goku" , "gohan" , "goten" , "gogeta" }; // Cache to store trie entry roots after each // insertion PersistentTrie[] root = new PersistentTrie[keys.Count + 1]; // Marking first root as dummy root[0] = dummy; // Inserting all keys for ( int i = 1; i <= keys.Count; i++) { // Caching new root for ith version of trie root[i] = root[i - 1].Insert(keys[i - 1], 0); } int idx = 3; Console.WriteLine( "All keys in trie after version - " + idx); string key = "" ; PrintAllKeysInTrie(root[3], key); string queryString = "goku" ; int l = 2, r = 3; Console.WriteLine( "range : [" + l + ", " + r + "]" ); if (root[r].FindKey(queryString, 0, dummy) && !root[l - 1].FindKey(queryString, 0, dummy)) Console.WriteLine(queryString + " - exists in above range" ); else Console.WriteLine(queryString + " - does not exist in above range" ); } } //this article is contributed by bhardwajji |
Python3
# Persistent Trie node structure class PersistentTrie: def __init__( self , keyEnd = False , children = None ): self .children = children if children is not None else [ None ] * 26 self .keyEnd = keyEnd # Returns the presence of key in current trie def findKey( self , key, length, dummy): # If reached end of key if len (key) = = length: # Return if this is a keyEnd in trie return self .keyEnd # If we cannot find key[len] child in trie # we say key doesn't exist in the trie if self .children[ ord (key[length]) - ord ( 'a' )] = = dummy: return False # Recursively search the rest of # key length in children[key] trie return self .children[ ord (key[length]) - ord ( 'a' )].findKey(key, length + 1 , dummy) # Inserts key into current trie # returns newly created trie node after insertion def insert( self , key, length): # If reached the end of key string if length = = len (key): # Create new trie node with current trie node # marked as keyEnd return PersistentTrie(children = self .children.copy(), keyEnd = True ) # Fetch current child nodes new_children = self .children.copy() # Insert at key[len] child and # update the new child node new_children[ ord (key[length]) - ord ( 'a' ) ] = self .children[ ord (key[length]) - ord ( 'a' )].insert(key, length + 1 ) # Return a new node with modified key[len] child node return PersistentTrie(children = new_children, keyEnd = False ) class GFG: # Distinct numbers of chars in key sz = 26 dummy = None @staticmethod def init(): GFG.dummy = PersistentTrie() GFG.dummy.children = [GFG.dummy] * GFG.sz # dfs traversal over the current trie # prints all the keys present in the current trie @staticmethod def printAllKeysInTrie(root, s): flag = 0 for i in range (GFG.sz): if root.children[i] ! = GFG.dummy: flag = 1 GFG.printAllKeysInTrie(root.children[i], s + chr ( ord ( 'a' ) + i)) if root.children[i].keyEnd: print (s + chr ( ord ( 'a' ) + i)) # Driver code @staticmethod def main(): # Initialize the PersistentTrie GFG.init() keys = [ "goku" , "gohan" , "goten" , "gogeta" ] # Cache to store trie entry roots after each insertion root = [ None ] * ( len (keys) + 1 ) # Marking first root as dummy root[ 0 ] = GFG.dummy # Inserting all keys for i in range ( 1 , len (keys) + 1 ): root[i] = root[i - 1 ].insert(keys[i - 1 ], 0 ) idx = 3 print ( "All keys in trie after version -" , idx) key = "" GFG.printAllKeysInTrie(root[ 3 ], key) queryString = "goku" l, r = 2 , 3 print ( "range : [" , l, "," , r, "]" ) if root[r].findKey(queryString, 0 , GFG.dummy) and not root[l - 1 ].findKey(queryString, 0 , GFG.dummy): print (queryString, "- exists in above range" ) else : print (queryString, "- does not exist in above range" ) queryString = "goten" l, r = 2 , 4 print ( "range : [" , l, "," , r, "]" ) if root[r].findKey(queryString, 0 , GFG.dummy) and not root[l - 1 ].findKey(queryString, 0 , GFG.dummy): print (queryString, "- exists in above range" ) else : print (queryString, "- does not exist in above range" ) if __name__ = = '__main__' : GFG.main() |
All keys in trie after version - 3 gohan goku goten range : [2, 3] goku - does not exist in above range range : [2, 4] goten - exists in above range
Time Complexity: As discussed above we will be visiting all the X(length of key) number of nodes in the trie while inserting; So, we will be visiting the X number of states and at each state we will be doing O(sz) amount of work by liking the sz children of the previous version with the current version for the newly created trie nodes. Hence, Time Complexity of insertion becomes O(length_of_key * sz). But the searching the is still linear over the length of the key to be searched and hence, the time complexity of searching a key is still O(length_of_key) just like a standard trie.
Space Complexity: Obviously, persistency in data structures comes with a trade of space and we will be consuming more memory in maintaining the different versions of the trie. Now, let us visualize the worst case – for insertion, we are creating O(length_of_key) nodes and each newly created node will take a space of O(sz) to store its children. Hence, the space complexity for insertion of the above implementation is O(length_of_key * sz).
Please Login to comment...