Skip to content
Related Articles

Related Articles

Median in a stream of integers (running integers)

Improve Article
Save Article
  • Difficulty Level : Hard
  • Last Updated : 09 Sep, 2022
Improve Article
Save Article

Given that integers are read from a data stream. Find median of elements read so for in an efficient way. For simplicity assume, there are no duplicates. For example, let us consider the stream 5, 15, 1, 3 … 
 

After reading 1st element of stream - 5 -> median - 5
After reading 2nd element of stream - 5, 15 -> median - 10
After reading 3rd element of stream - 5, 15, 1 -> median - 5
After reading 4th element of stream - 5, 15, 1, 3 -> median - 4, so on...

Making it clear, when the input size is odd, we take the middle element of sorted data. If the input size is even, we pick the average of the middle two elements in the sorted stream.
Note that output is the effective median of integers read from the stream so far. Such an algorithm is called an online algorithm. Any algorithm that can guarantee the output of i-elements after processing i-th element, is said to be online algorithm. Let us discuss three solutions to the above problem.

Recommended Practice

Method 1: Insertion Sort

If we can sort the data as it appears, we can easily locate the median element. Insertion Sort is one such online algorithm that sorts the data appeared so far. At any instance of sorting, say after sorting i-th element, the first i elements of the array are sorted. The insertion sort doesn’t depend on future data to sort data input till that point. In other words, insertion sort considers data sorted so far while inserting the next element. This is the key part of insertion sort that makes it an online algorithm.

However, insertion sort takes O(n2) time to sort n elements. Perhaps we can use binary search on insertion sort to find the location of the next element in O(log n) time. Yet, we can’t do data movement in O(log n) time. No matter how efficient the implementation is, it takes polynomial time in case of insertion sort.
Interested readers can try the implementation of Method 1.

C++




// This code is contributed by Anjali Saxena
 
#include <bits/stdc++.h>
 
using namespace std;
 
// Function to find position to insert current element of
// stream using binary search
int binarySearch(int arr[], int item, int low, int high)
{
    if (low >= high) {
        return (item > arr[low]) ? (low + 1) : low;
    }
 
    int mid = (low + high) / 2;
 
    if (item == arr[mid])
        return mid + 1;
 
    if (item > arr[mid])
        return binarySearch(arr, item, mid + 1, high);
 
    return binarySearch(arr, item, low, mid - 1);
}
 
// Function to print median of stream of integers
void printMedian(int arr[], int n)
{
    int i, j, pos, num;
    int count = 1;
 
    cout << "Median after reading 1"
         << " element is " << arr[0] << "\n";
 
    for (i = 1; i < n; i++) {
        float median;
        j = i - 1;
        num = arr[i];
 
        // find position to insert current element in sorted
        // part of array
        pos = binarySearch(arr, num, 0, j);
 
        // move elements to right to create space to insert
        // the current element
        while (j >= pos) {
            arr[j + 1] = arr[j];
            j--;
        }
 
        arr[j + 1] = num;
 
        // increment count of sorted elements in array
        count++;
 
        // if odd number of integers are read from stream
        // then middle element in sorted order is median
        // else average of middle elements is median
        if (count % 2 != 0) {
            median = arr[count / 2];
        }
        else {
            median = (arr[(count / 2) - 1] + arr[count / 2])
                     / 2;
        }
 
        cout << "Median after reading " << i + 1
             << " elements is " << median << "\n";
    }
}
 
// Driver Code
int main()
{
    int arr[] = { 5, 15, 1, 3, 2, 8, 7, 9, 10, 6, 11, 4 };
    int n = sizeof(arr) / sizeof(arr[0]);
 
    printMedian(arr, n);
 
    return 0;
}


Java




// Java code to implement the approach
import java.util.*;
 
class GFG {
 
  // Function to find position to insert current element of
  // stream using binary search
  static int binarySearch(int arr[], int item, int low, int high)
  {
    if (low >= high) {
      return (item > arr[low]) ? (low + 1) : low;
    }
 
    int mid = (low + high) / 2;
 
    if (item == arr[mid])
      return mid + 1;
 
    if (item > arr[mid])
      return binarySearch(arr, item, mid + 1, high);
 
    return binarySearch(arr, item, low, mid - 1);
  }
 
  // Function to print median of stream of integers
  static void printMedian(int arr[], int n)
  {
    int i, j, pos, num;
    int count = 1;
 
    System.out.println( "Median after reading 1"
                       + " element is " + arr[0]);
 
    for (i = 1; i < n; i++) {
      float median;
      j = i - 1;
      num = arr[i];
 
      // find position to insert current element in sorted
      // part of array
      pos = binarySearch(arr, num, 0, j);
 
      // move elements to right to create space to insert
      // the current element
      while (j >= pos) {
        arr[j + 1] = arr[j];
        j--;
      }
 
      arr[j + 1] = num;
 
      // increment count of sorted elements in array
      count++;
 
      // if odd number of integers are read from stream
      // then middle element in sorted order is median
      // else average of middle elements is median
      if (count % 2 != 0) {
        median = arr[count / 2];
      }
      else {
        median = (arr[(count / 2) - 1] + arr[count / 2])
          / 2;
      }
 
      System.out.println( "Median after reading " + (i + 1)
                         + " elements is " + (int)median );
    }
  }
 
 
  // Driver code
  public static void main (String[] args) {
 
    int arr[] = { 5, 15, 1, 3, 2, 8, 7, 9, 10, 6, 11, 4 };
    int n = arr.length;
 
    printMedian(arr, n);
  }
}
 
// This code is contributed by sanjoy_62.


Python3




# Function to find position to insert current element of
# stream using binary search
def binarySearch(arr, item, low, high):
 
    if (low >= high):
        return (low + 1) if (item > arr[low]) else low
 
    mid = (low + high) // 2
 
    if (item == arr[mid]):
        return mid + 1
 
    if (item > arr[mid]):
        return binarySearch(arr, item, mid + 1, high)
 
    return binarySearch(arr, item, low, mid - 1)
 
# Function to print median of stream of integers
def printMedian(arr, n):
 
    i, j, pos, num = 0, 0, 0, 0
    count = 1
 
    print(f"Median after reading 1 element is {arr[0]}")
 
    for i in range(1, n):
        median = 0
        j = i - 1
        num = arr[i]
 
        # find position to insert current element in sorted
        # part of array
        pos = binarySearch(arr, num, 0, j)
 
        # move elements to right to create space to insert
        # the current element
        while (j >= pos):
            arr[j + 1] = arr[j]
            j -= 1
 
        arr[j + 1] = num
 
        # increment count of sorted elements in array
        count += 1
 
        # if odd number of integers are read from stream
        # then middle element in sorted order is median
        # else average of middle elements is median
        if (count % 2 != 0):
            median = arr[count // 2]
 
        else:
            median = (arr[(count // 2) - 1] + arr[count // 2]) // 2
 
        print(f"Median after reading {i + 1} elements is {median} ")
 
# Driver Code
if __name__ == "__main__":
 
    arr = [5, 15, 1, 3, 2, 8, 7, 9, 10, 6, 11, 4]
    n = len(arr)
 
    printMedian(arr, n)
 
# This code is contributed by rakeshsahni


C#




// C# program for the above approach
using System;
class GFG{
 
  // Function to find position to insert current element of
  // stream using binary search
  static int binarySearch(int[] arr, int item, int low, int high)
  {
    if (low >= high) {
      return (item > arr[low]) ? (low + 1) : low;
    }
 
    int mid = (low + high) / 2;
 
    if (item == arr[mid])
      return mid + 1;
 
    if (item > arr[mid])
      return binarySearch(arr, item, mid + 1, high);
 
    return binarySearch(arr, item, low, mid - 1);
  }
 
  // Function to print median of stream of integers
  static void printMedian(int[] arr, int n)
  {
    int i, j, pos, num;
    int count = 1;
 
    Console.WriteLine( "Median after reading 1"
                       + " element is " + arr[0]);
 
    for (i = 1; i < n; i++) {
      float median;
      j = i - 1;
      num = arr[i];
 
      // find position to insert current element in sorted
      // part of array
      pos = binarySearch(arr, num, 0, j);
 
      // move elements to right to create space to insert
      // the current element
      while (j >= pos) {
        arr[j + 1] = arr[j];
        j--;
      }
 
      arr[j + 1] = num;
 
      // increment count of sorted elements in array
      count++;
 
      // if odd number of integers are read from stream
      // then middle element in sorted order is median
      // else average of middle elements is median
      if (count % 2 != 0) {
        median = arr[count / 2];
      }
      else {
        median = (arr[(count / 2) - 1] + arr[count / 2])
          / 2;
      }
 
      Console.WriteLine( "Median after reading " + (i + 1)
                         + " elements is " + (int)median );
    }
  }
 
 
// Driver Code
public static void Main(String[] args)
{
    int[] arr = { 5, 15, 1, 3, 2, 8, 7, 9, 10, 6, 11, 4 };
    int n = arr.Length;
 
    printMedian(arr, n);
}
}
 
// This code is contributed by code_hunt.


Javascript




<script>
    // JavaScript implementation for the above approach
 
    // Function to find position to insert current element of
    // stream using binary search
    const binarySearch = (arr, item, low, high) => {
        if (low >= high) {
            return (item > arr[low]) ? (low + 1) : low;
        }
 
        let mid = parseInt((low + high) / 2);
 
        if (item == arr[mid])
            return mid + 1;
 
        if (item > arr[mid])
            return binarySearch(arr, item, mid + 1, high);
 
        return binarySearch(arr, item, low, mid - 1);
    }
 
    // Function to print median of stream of integers
    const printMedian = (arr, n) => {
        let i, j, pos, num;
        let count = 1;
 
        document.write(`Median after reading 1 element is ${arr[0]}<br/>`);
 
        for (i = 1; i < n; i++) {
            let median;
            j = i - 1;
            num = arr[i];
 
            // find position to insert current element in sorted
            // part of array
            pos = binarySearch(arr, num, 0, j);
 
            // move elements to right to create space to insert
            // the current element
            while (j >= pos) {
                arr[j + 1] = arr[j];
                j--;
            }
 
            arr[j + 1] = num;
 
            // increment count of sorted elements in array
            count++;
 
            // if odd number of integers are read from stream
            // then middle element in sorted order is median
            // else average of middle elements is median
            if (count % 2 != 0) {
                median = arr[parseInt(count / 2)];
            }
            else {
                median = parseInt((arr[parseInt(count / 2) - 1] + arr[parseInt(count / 2)]) / 2);
            }
 
            document.write(`Median after reading ${i + 1} elements is ${median}<br/>`);
        }
    }
 
    // Driver Code
    let arr = [5, 15, 1, 3, 2, 8, 7, 9, 10, 6, 11, 4];
    let n = arr.length;
 
    printMedian(arr, n);
 
 // This code is contributed by rakeshsahni
 
</script>


Output

Median after reading 1 element is 5
Median after reading 2 elements is 10
Median after reading 3 elements is 5
Median after reading 4 elements is 4
Median after reading 5 elements is 3
Median after reading 6 elements is 4
Median after reading 7 elements is 5
Median after reading 8 elements is 6
Median after reading 9 elements is 7
Median after reading 10 elements is 6
Median after reading 11 elements is 7
Median after reading 12 elements is 6

Time Complexity: O(n2)
Space Complexity: O(1)

Method 2: Augmented self-balanced binary search tree (AVL, RB, etc…)
At every node of BST, maintain a number of elements in the subtree rooted at that node. We can use a node as the root of a simple binary tree, whose left child is self-balancing BST with elements less than root and right child is self-balancing BST with elements greater than root. The root element always holds effective median.

If the left and right subtrees contain a same number of elements, the root node holds the average of left and right subtree root data. Otherwise, the root contains the same data as the root of subtree which is having more elements. After processing an incoming element, the left and right subtrees (BST) are differed utmost by 1.

Self-balancing BST is costly in managing the balancing factor of BST. However, they provide sorted data which we don’t need. We need median only. The next method makes use of Heaps to trace the median.

Method 3: Heaps
Similar to balancing BST in Method 2 above, we can use a max heap on the left side to represent elements that are less than effective median, and a min-heap on the right side to represent elements that are greater than effective median.

After processing an incoming element, the number of elements in heaps differs utmost by 1 element. When both heaps contain the same number of elements, we pick the average of heaps root data as effective median. When the heaps are not balanced, we select effective median from the root of the heap containing more elements.

Given below is the implementation of the above method. For the algorithm to build these heaps, please read the highlighted code.

C++14




// C++ code to implement the approach
 
#include <bits/stdc++.h>
using namespace std;
 
// Function to find the median of stream of data
void streamMed(int A[], int n)
{
    // Declared two max heap
    priority_queue<int> g, s;
   
    for (int i = 0; i < n; i++) {
        s.push(A[i]);
        int temp = s.top();
        s.pop();
       
        // Negation for treating it as min heap
        g.push(-1 * temp);
        if (g.size() > s.size()) {
            temp = g.top();
            g.pop();
            s.push(-1 * temp);
        }
        if (g.size() != s.size())
            cout << (double)s.top() << "\n";
        else
            cout << (double)((s.top() * 1.0
                              - g.top() * 1.0)
                             / 2)
                 << "\n";
    }
}
 
// Driver code
int main()
{
    int A[] = { 5, 15, 1, 3, 2, 8, 7, 9, 10, 6, 11, 4 };
    int N = sizeof(A) / sizeof(A[0]);
   
    // Function call
    streamMed(A, N);
    return 0;
}


Java




// Java code to implement the approach
 
import java.io.*;
import java.util.*;
 
class GFG {
   
    // Function to find the median of stream of data
    public static void streamMed(int A[], int N)
    {
        // Declaring two min heap
        PriorityQueue<Double> g = new PriorityQueue<>();
        PriorityQueue<Double> s = new PriorityQueue<>();
        for (int i = 0; i < N; i++) {
             
            // Negation for treating it as max heap
            s.add(-1.0 * A[i]);
            g.add(-1.0 * s.poll());
            if (g.size() > s.size())
                s.add(-1.0 * g.poll());
           
            if (g.size() != s.size())
                System.out.println(-1.0 * s.peek());
            else
                System.out.println((g.peek() - s.peek())
                                   / 2);
        }
    }
 
    // Driver code
    public static void main(String[] args)
    {
        int A[] = { 5, 15, 1, 3, 2, 8, 7, 9, 10, 6, 11, 4 };
        int N = A.length;
         
        // Function call
        streamMed(A, N);
    }
}


Python3




# Python code to implement the approach
 
from heapq import heappush, heappop, heapify
import math
 
# Function to find the median of stream of data
def streamMed(arr, N):
     
    # Declaring two min heap
    g = []
    s = []
    for i in range(len(arr)):
       
        # Negation for treating it as max heap
        heappush(s, -arr[i])
        heappush(g, -heappop(s))
        if len(g) > len(s):
            heappush(s, -heappop(g))
 
        if len(g) != len(s):
            print(-s[0])
        else:
            print((g[0] - s[0])/2)
 
 
# Driver code
if __name__ == '__main__':
    A = [5, 15, 1, 3, 2, 8, 7, 9, 10, 6, 11, 4]
    N = len(A)
     
    # Function call
    streamMed(A, N)


Output

5
10
5
4
3
4
5
6
7
6.5
7
6.5

Time Complexity: If we omit the way how stream was read, complexity of median finding is O(N log N), as we need to read the stream, and due to heap insertions/deletions.

Auxiliary Space: O(N)
At first glance the above code may look complex. If you read the code carefully, it is simple algorithm. 

Median of Stream of Running Integers using STL


My Personal Notes arrow_drop_up
Related Articles

Start Your Coding Journey Now!