Skip to content
Related Articles

Related Articles

Hash Sort Algorithm

View Discussion
Improve Article
Save Article
  • Last Updated : 29 Aug, 2022
View Discussion
Improve Article
Save Article

There have always been arguments about how can be a sorting algorithm of linear time complexity be achieved, as all the traditional sorting algorithms are at least of the order of O(N*log(N)) in worst and cases. 

The reason for the difficulty in achieving linearly proportional time complexity in a sorting algorithm is that most of these traditional sorting algorithms are comparison based which cannot produce a sorted output of a dataset in linear time in the worst cases. To solve this problem, the hash sort algorithm came into the picture and is even faster than the fastest traditional sorting algorithm, i.e. quick sort.

Assumptions of Hash Sort:

The assumptions taken while implementing the Hash sort are: 

  1. The data values are within a known range.
  2. The values are numeric in nature.

Working of the Algorithm:

As the name says, the algorithm combines the concept of hashing with sorting. 

But how can be hashing used for sorting? The answer to this is by using a super-hash function

The super-hash function is a composite function of 2 sub-functions: 

  1. Hash function 
  2. Mash function, which helps the super-hash function to allocate the data values to unique addresses (i.e. no collision happens) in a consistent way (means no scattering of the data).

Now, these mappings of data values obtained from super-hash functions are utilized by the main hash sort methods. The data structures used for storing and sorting the data are matrices. There are a few versions of hash sort methods, primarily there are two :  

  1. In-situ hash sort –  In this method, both the storage and sorting of the values occur in the same data structure
  2. Direct hash sort – In this method, a separate data list is used to store the data, and then the mapping is done into the multidimensional data structure from that list.

Super-Hash Function:

Super-Hash function is a combination of two sub-functions named hash function, and mash function. These 2 functions used together to achieve the distinction of the data values from each other, as a simple hash function can not do this due to the issue of collision. The two sub-functions work in the following ways to achieve the goal of super-hash function: 

Consider a generic form for a number: cx + r, where c is a magnitude value, x is the base (usually 10) and r is the remainder obtained.

  • Hash Function: This works upon the remainder-based distinction of values for mapping, i.e. mod operator is applied on the data values to get the remainders and those remainders are used to allocate the data to a particular location in mapping. So, the basic hash function is: (X mod N). But using this method alone would cause collisions because for a given remainder value there can be multiple data values hence they cannot be distinguished.

Example: Taking N=10 and r=1, an equivalent set of values can be like { 1, 11, 21, 31, 41, 51, 61, 71, 81, 91. . . . . . . cnx + 1 }, which all would be allocated to same location.

  • Mash function: The mash function is named so because it is a magnitude hash function, which means it calculates the magnitudes to map the values of the data set. 
    For a number cx + r, where c is the magnitude and x is the base (usually taken 10), c is calculated by using the div operator (X div N) in the mash instead of the mod operator as used in the hash function. But again the values are not fully distinguished from each other using the mash function alone and multiple values are mapped to a particular location.

Example: Taking N=10, c=5, we get the following set: { 50, 51, 52, 53, 54, 55, 56, 57, 58, 59 }. This set contains all the values of the cx + r where c  is constant for all values (c=5) and r is variable. That’s why for a single value of c we get multiple values mapping to the same slot.

Now, to solve this, we use both hash and mash functions together to get a unique ordinal pair (c, r), where c is the magnitude obtained using the div operator in the mash function and r is the remainder value obtained by the mod operator in the hash function.

Construction of Super-Hash Function:
Let us consider the range of values [i, j], where i is the lower bound and j is the upper bound of the range. Now,

  1. Compute the length of the range,  
        L = (j – i) + 1.
  2. Determine the nearest square integer ( θ ) to L by :       
        θ = ceil( √L ) 
  3. Now the ordinal pair values can be calculated using :    
        value(dx, mx)  =  dxθ + mx    

Example: 

Consider the range of values in the interval [ 20, 144 ], applying the steps of construction of the super-hash function:

  1. Calculating the value of L ⇒ (144 – 20) + 1 = 125 
  2. Calculating nearest square value to L ⇒  θ = ceil( √L ) = ceil( √125 ) = 12
  3. Now for mapping, we also subtract the lower bound (here 20), so that all the values are mapped in the range 0. . . .(j – i).
    The resultant super-hash function: F(x)  =  { d = (x –  20) div 12  ,  m = (x – 20) mod 12 }
  4. From the above super-hash function, we can find that for x = 20, we get the ordinal pair (0, 0) and for x = 144, we get (10, 4).

In-situ Hash Sort :

In-situ means “in-site”, named due to the in-place functioning of this variation. In in-situ hash sort, the super-hash function is used iteratively over the data values to sort the data. But before this, an initialization step occurs in which a source value is taken, and is used by the super-hash function to map it into another location, where the source value swaps with the value present at that destination location and hence creating a new source value (After swapping). This process keeps on repeating until the whole data is not sorted, or at the end of the data list.

Pseudo-code of the Algorithm:

(v1, v2, v3, . . . . . .vn) <– initialize         // initialization for retrieving the source code
While Not(End of List) Do
    Temp <– get(v1, v2, v3. . . . vn);       // swapping of source and destination value
    Value –> put(v1, v2, v3. . . ..vn);
    Value = temp;
    (v1, v2, v3. . . . . .vn) –> super_Hash_Function(temp);      // using super-hash function to  retrieve the next source value
End of Loop                                                              

Follow the below illustration for a better understanding of the algorithm:

Illustration:

Let us consider a 2-dimensional 3 x 3 matrix with values ranging from 1-9 to illustrate the in-situ hash sort: 

Finding the range length, L  = (9 – 1) + 1 = 9 

Computing the nearest square integer to L = 9 ⇒ θ = ceil ( √9 ) = 3, Lower bound, i = 1
Therefore, the resultant super-hash function comes out to be: d = (x – 1) / 3, m = (x – 1) % 3
Let the initial matrix configuration be like this: 

m/d 0 1 2
0 5 8 1
1 9 7 2
2 4 6 3
  • Starting initially at value(0, 0) = 5,  
    • Calculate the value of d and m for the value 5: d = (5- 1) div 3 = 1,  m = (5 -1 ) mod 3 = 1.
    • Hence, the new destination is at (1, 1) with value = 7.
    • Now, 5 and 7 will swap their respective positions. The resultant matrix will be: 
m/d 0 1 2
0 7 8 1
1 9 5 2
2 4 6 3
  • Again using the (0, 0) position with value = 7. 
    • Find the values of d and m: d = (7 – 1) div 3 = 2 m = (7 – 1) mod 3 = 0. 
    • Hence the obtained location is at (2, 0) where the value 4 is present. 
    • So, 4 and 7 will swap.
m/d 0 1 2
0 4 8 1
1 9 5 2
2 7 6 3
  • On position (0, 0) value = 4
    • Calculating d and m for x = 4: d = (4 – 1) div 3 = 1, m = (4 – 1) mod 3 = 0.
    • The obtained position is (1, 0) with destination value = 9
    • Therefore, 9 and 4 will swap their positions. 
m/d 0 2 2
0 9 8 1
1 4 5 2
2 7 6 3
  • Now, at position (0, 0), we have value = 9
    • So, d = (9 – 1) div 3 = 2,  m = (9 – 1) mod 3 = 2
    • The obtained position is (2, 2) with value  = 3. 
    • Therefore, 9 and 3 will swap their positions.
m/d 0 1 2
0 3 8 1
1 4 5 2
2 7 6 9
  • At position (0, 0) value =  3.
    • Calculate d and m, d = (3 – 1) div 3 = 0, m = (3 – 1) mod 3 = 2.
    • Resultant position is (0, 2) with value = 1
    • Therefore, 1 and 3 will swap their positions.
m/d 0 1 2
0 1 8 3
1 4 5 2
2 7 6 9
  • At position (0, 0), value = 1.
    • d = (1 – 1) div 3 = 0,  m = (1 – 1) mod 3 = 0.
    • The obtained position: (0, 0). This is the case of hysteresis, where the source value maps to its own location. This will cause an infinite loop as 1 on (0, 0) will keep on mapping position (0, 0) itself. 
    • So, to resolve this, we increment the position by 1 and get (0, 1) as our new source location. 
    • Now, computing the values of d and m for value = 8 at position (0, 1): d = (8 – 1) div 3 = 2, m  = (8 -1) mod 3 = 1. 
    • The new obtained location is (2, 1) with value = 6
    • Therefore, we swap the positions of 8 and 6.
m/d 0 1 2
0 1 6 3
1 4 5 2
2 7 8 9
  • At position (0, 1) value = 6.
    • d = (6 – 1) div 3 = 1, m = (6 – 1) mod 3 = 2.
    • The new obtained location is at (1, 2) with value = 2. 
    • Therefore, 6 and 2 will swap their positions. The resultant matrix is:
m/d 0 1 2
0 1 2 3
1 4 5 6
2 7 8 9

The matrix is fully sorted now, so we stop our implementation here. 

Below is the code to implement In-Situ hash sort.

C++




#include <iostream>
using namespace std;
 
// C++ code to implement Hash Sort Algorithm
// dimension values variable, i.e. 3x3 matrix
int Dim = 3;
 
// function implementing Hash Sort
void HashSort(int data[][3], int N, int low)
{
  int swapCount = 0, hysteresisCount = 0;
  int position = 0;
  int value;
 
  // condition to check if the end of data list is
  // reached
  while ((swapCount < N) && (hysteresisCount < N)) {
    value = data[(position) / Dim][(position) % Dim];
    int d = (value - low) / Dim;
    int m = (value - low) % Dim;
 
    // if hysteresis occurs
    if (data[d][m] == value) {
 
      // force push the position by 1 to avoid
      // hysteresis
      position++;
      hysteresisCount++;
    }
 
    // if no hysteresis, then swap the positions
    else {
      int temp = data[(position) / Dim][(position) % Dim];
      data[(position) / Dim][(position) % Dim] = data[d][m];
      data[d][m] = temp;
      swapCount++;
    }
  }
}
 
// Driver Code
 
int main() {
  int N = 1;
  int arr[][3] = { { 5, 8, 1 }, { 9, 7, 2 }, { 4, 6, 3 } };
  HashSort(arr, 9, N);
  cout <<"The resultant sorted array is: "<<endl;
  // printing the resultant sorted array
  for (int i = 0; i < 3; i++) {
    for (int j = 0; j < 3; j++) {
      cout <<arr[i][j] << " ";
    }
    cout << endl;
  }
 
  return 0;
}
 
// This code is contributed by satwik4409.


Java




// Java code to implement Hash Sort Algorithm
public class GFG {
 
    // dimension values variable, i.e. 3x3 matrix
    static int Dim = 3;
 
    // function implementing Hash Sort
    static void HashSort(int[][] data, int N, int low)
    {
        int swapCount = 0, hysteresisCount = 0;
        int position = 0;
        int value;
 
        // condition to check if the end of data list is
        // reached
        while ((swapCount < N) && (hysteresisCount < N)) {
            value = data[(position) / Dim][(position) % Dim];
            int d = (value - low) / Dim;
            int m = (value - low) % Dim;
 
            // if hysteresis occurs
            if (data[d][m] == value) {
 
                // force push the position by 1 to avoid
                // hysteresis
                position++;
                hysteresisCount++;
            }
 
            // if no hysteresis, then swap the positions
            else {
                int temp = data[(position) / Dim][(position) % Dim];
                data[(position) / Dim][(position) % Dim] = data[d][m];
                data[d][m] = temp;
                swapCount++;
            }
        }
    }
 
    // Driver Code
    public static void main(String[] args)
    {
        // lower range of data
        int N = 1;
        int[][] arr = { { 5, 8, 1 }, { 9, 7, 2 }, { 4, 6, 3 } };
        HashSort(arr, 9, N);
        System.out.println(
            "The resultant sorted array is: ");
        // printing the resultant sorted array
        for (int i = 0; i < 3; i++) {
            for (int j = 0; j < 3; j++) {
                System.out.print(arr[i][j] + " ");
            }
            System.out.println();
        }
    }
}


C#




// C# code to implement the approach
using System;
using System.Collections.Generic;
 
public class GFG{
 
  // dimension values variable, i.e. 3x3 matrix
  static int Dim = 3;
 
  // function implementing Hash Sort
  static void HashSort(int[,] data, int N, int low)
  {
    int swapCount = 0, hysteresisCount = 0;
    int position = 0;
    int value;
 
    // condition to check if the end of data list is
    // reached
    while ((swapCount < N) && (hysteresisCount < N)) {
      value = data[(position) / Dim, (position) % Dim];
      int d = (value - low) / Dim;
      int m = (value - low) % Dim;
 
      // if hysteresis occurs
      if (data[d, m] == value) {
 
        // force push the position by 1 to avoid
        // hysteresis
        position++;
        hysteresisCount++;
      }
 
      // if no hysteresis, then swap the positions
      else {
        int temp = data[(position) / Dim, (position) % Dim];
        data[(position) / Dim, (position) % Dim] = data[d, m];
        data[d, m] = temp;
        swapCount++;
      }
    }
  }
 
  // Driver Code
  static public void Main (){
 
    // lower range of data
    int N = 1;
    int[,] arr = { { 5, 8, 1 }, { 9, 7, 2 }, { 4, 6, 3 } };
    HashSort(arr, 9, N);
    Console.WriteLine(
      "The resultant sorted array is: ");
 
    // printing the resultant sorted array
    for (int i = 0; i < 3; i++) {
      for (int j = 0; j < 3; j++) {
        Console.Write(arr[i, j] + " ");
      }
      Console.WriteLine();
    }
  }
}
 
// This code is contributed by sanjoy_62.


Javascript




<script>
    // JavaScript code to implement the approach
 
    // dimension values variable, i.e. 3x3 matrix
    let Dim = 3;
  
    // function implementing Hash Sort
    function HashSort(data, N, low)
    {
        let swapCount = 0, hysteresisCount = 0;
        let position = 0;
        let value = 0;
  
        // condition to check if the end of data list is
        // reached
        while ((swapCount < N) && (hysteresisCount < N)) {
            value = data[Math.floor((position) / Dim)][(position) % Dim];
            let d = Math.floor((value - low) / Dim);
            let m = (value - low) % Dim;
  
            // if hysteresis occurs
            if (data[d][m] == value) {
  
                // force push the position by 1 to avoid
                // hysteresis
                position++;
                hysteresisCount++;
            }
  
            // if no hysteresis, then swap the positions
            else {
                let temp = data[Math.floor((position) / Dim)][(position) % Dim];
                data[Math.floor((position) / Dim)][(position) % Dim] = data[d][m];
                data[d][m] = temp;
                swapCount++;
            }
        }
    }
 
    // Driver Code
           
        // lower range of data
        let N = 1;
        let arr = [[ 5, 8, 1 ], [ 9, 7, 2 ], [ 4, 6, 3 ]];
        HashSort(arr, 9, N);
        document.write(
            "The resultant sorted array is: " + "<br/>");
             
        // printing the resultant sorted array
        for (let i = 0; i < 3; i++) {
            for (let j = 0; j < 3; j++) {
                document.write(arr[i][j] + " ");
            }
            document.write("<br/>");
        }
 
// This code is contributed by code_hunt.
</script>


Output

The resultant sorted array is: 
1 2 3 
4 5 6 
7 8 9 

Time and Space Complexity Analysis:

Time Complexity: O(N)

  • Hash sort mapping functions have multiple possible number of implementations due to the extendible nature of the hash sort, so we can take a constant c, where c >=1, denoting that at least one mapping is required. 
  • Now, as the super-hash function is a composite function of 2 sub-functions, so the overall time for mappings will be twice the number of sub-mappings (denoted by c), So we multiply c by 2 to get the time taken for the mapping function for one data value. 
  • The mapping function will be applied iteratively on every element of the data, which is n in this case, so the total time complexity comes out to be:T(n) = 2.c.N

Therefore, T(N) = O(N)

Auxiliary Space: O(N)

Auxiliary Space depends upon the variation of the hash sort used. 

  • For in-situ hash sort, the sorting happens in place in the data structure itself. 
  • Therefore, we require (N + 1) space for implementing the in-situ algorithm, where N is the number of elements in the data structure and one extra space is required for holding the temporary values while doing the swapping process.

Therefore, S(N) = O(N)

Applications of Hash Sort:

  • Data mining: Hash sort can be handy for organizing and searching of data in data mining where there are huge quantities of data.
  • Database Systems: For an efficient retrieval of the data.
  • Operating Systems: File organization.

My Personal Notes arrow_drop_up
Recommended Articles
Page :

Start Your Coding Journey Now!