 Open in App
Not now

# Kolmogorov-Smirnov Test (KS Test)

• Last Updated : 10 Mar, 2023

In this article, we will look at the non-parametric test which can be used to determine whether the shape of the two distributions is the same or not.

## What is Kolmogorov-Smirnov Test?

Kolmogorov–Smirnov test is a very efficient way to determine if two samples are significantly different from each other. It is usually used to check the uniformity of random numbers. Uniformity is one of the most important properties of any random number generator and the Kolmogorov–Smirnov test can be used to test it. The Kolmogorov–Smirnov test may also be used to test whether two underlying one-dimensional probability distributions differ. It is a very efficient way to determine if two samples are significantly different from each other. The Kolmogorov–Smirnov statistic quantifies the distance between the empirical distribution function of the sample and the cumulative distribution function of the reference distribution, or between the empirical distribution functions of two samples.

## How Kolmogorov-Smirnov test works?

To answer this first we need to discuss the purpose to use this test. The main idea behind using this test is to check whether the two samples that we are dealing with follow the same type of distribution or if the shape of the distribution is the same or not.

First of all, if we assume that the shape or the probability distribution of the two samples is the same then the maximum value of the absolute difference between the cumulative probability distribution difference between the two functions will be the same. And higher the value the difference between the shape of the distribution is high.

To check the shape of the sample of data we generally used hypothesis testing which is of two types:

• Parametric Test
• Non – Parametric Test

### Null Hypothesis of Kolmogorov-Smirnov Test

H0(Null Hypothesis): Null hypothesis assumes that the two samples of the data at hand are from the same distribution.

As KS Test is a non – parametric method there is no restriction that the samples should be from the normal distribution for which we use the chi-square distribution.

```-> Rank the N random numbers in ascending order.
-> Calculate D+ as max(i/N-Ri) for all i in(1, N)
-> Calculate D- as max(Ri-((i-1)/N)) for all i in(1, N)
-> Calculate D as max(sqrt(N) * D+, sqrt(N) * D-)
-> If D>D(alpha)
Rejects Uniformity
else
It fails to reject the Null Hypothesis.```

Below is the Python implementation of the above algorithm :

## Python3

 `import` `numpy as np`   `# Rank the N random numbers` `N ``=` `30` `# F(X) can be any continuous distribution,` `# here I am using normal distribution` `f_x ``=` `np.random.normal(size``=``N)` `f_x_sorted ``=` `np.sort(f_x)`   `# Calculate max(i/N-Ri)` `plus_max ``=` `list``()` `for` `i ``in` `range``(``1``, N ``+` `1``):` `    ``x ``=` `i ``/` `N ``-` `f_x_sorted[i``-``1``]` `    ``plus_max.append(x)` `K_plus_max ``=` `np.sqrt(N) ``*` `np.``max``(plus_max)`     `# Calculate max(Ri-((i-1)/N))` `minus_max ``=` `list``()` `for` `i ``in` `range``(``1``, N ``+` `1``):` `    ``y ``=` `(i``-``1``)``/``N` `    ``y ``=` `f_x_sorted[i``-``1``]``-``y` `    ``minus_max.append(y)` `K_minus_max ``=` `np.sqrt(N) ``*` `np.``max``(minus_max)`   `# Calculate KS Statistic` `K_max ``=` `max``(K_plus_max, K_minus_max)`

Output:

`11.691053208016287`

### What is the purpose to use the Kolmogorov – Smirnov Test?

There are times when we would like to test whether a  particular sample follows normal distribution or not. As to apply other statistical methods for the sample analysis for which the necessary condition is that it should follow a normal distribution. For that, we use the One-Sample Kolmogorov Smirnov test.

But this test can also be used to find whether the two samples that we have follows the same distribution or not.

### Limitations of the Kolmogorov-Smirnov Test

1. It only applies to continuous distributions.
2. It tends to be more sensitive near the center of the distribution than at the tails.
3. Perhaps the most serious limitation is that the distribution must be fully specified. That is, if location, scale, and shape parameters are estimated from the data, the critical region of the K-S test is no longer valid. It typically must be determined by simulation.

My Personal Notes arrow_drop_up
Related Articles