Probability and Statistics | Simpson’s Paradox (UC Berkeley’s Lawsuit)
Simpson’s Paradox in layman’s term is the reversal of relationship within data with respect to the subgroups of data after combining those subgroups data.
For Example, If there are two departments in a university and both of them have a high probability of a woman getting accepted then after combining their data by intuition overall woman’s acceptance probability should be high but this may not be true.
Given, a1/b1 < c1/d1 and a2/b2 < c2/d2 then (a1+a2)/(b1+b2) < (c1+c2)/(d1+d2)?
Simpson’s Paradox says it may not be true.
7/8 < 2/2 and 1/2 < 5/8 yet, (7+1)/(2+2) > (2+5)/(2+8)
A similar case was seen in the lawsuit against UC Berkeley’s regarding the admissions data showing that men were having a higher probability of getting applications accepted than the woman’s application. But after examining the individual departments a reverse scenario came into consideration as maximum of the departments were favoring women over men.
Why was this happening ?
This kind of behavior was seen because more women were applying to competitive departments with low rates of admission whereas more men were applying to less competitive departments with
high acceptance rates.
We can see from the table that 825 men have applied in comparison to 108 women in high acceptance rate department A. Whereas more girls are applying in departments with low rates like F and E. Which finally led to more men being accepted by the university than women.
Suppose we have a configuration as shown in figure below with two types of beans green and blue colored.
Probability of picking a green bean from Jar,
7/8 < 2/2 (Jar1) (Jar2) 1/2 < 5/8 (Jar3) (Jar4)
Probability of picking a green bean from Jar
8/10 > 7/10 Inequality (Jar1 + Jar3) (Jar2 + Jar4)
Here also we can see that initially jars 1 and 3 had a higher probability of picking green beans than Jar 2 and Jar 4 respectively, but after mixing the content of jars the relationship got reversed. After mixing, the content of Jar 2 and Jar 4 combined had a higher probability of picking green beans. This is a very simple example of Simpson’s Paradox.