Skip to content
Related Articles

Related Articles

How to Perform a Two-Way ANOVA in Python

View Discussion
Improve Article
Save Article
  • Last Updated : 28 Feb, 2022
View Discussion
Improve Article
Save Article

Two-Way ANOVA: Two-Way ANOVA in statistics stands for Analysis of Variance and it is used to check whether there is a statistically significant difference between the mean value of three or more that has been divided into two factors. In simple words, ANOVA is a test conducted in statistics and it is used to interpret the difference between the mean value of at least three groups. The main objective of a two-way ANOVA is to find out how two factors affect a response variable and to find out whether there is a relation between the two factors on the response variable.

Syntax to installs pandas and NumPy libraries in the system:

pip3 install numpy pandas

Performing a Two-Way ANOVA in Python:

Let us consider an example in which scientists need to know whether plant growth is affected by fertilizers and watering frequency. They planted exactly 30 plants and allowed them to grow for six months under different conditions for fertilizers and watering frequency. After exactly six months, they recorded the heights of each plant centimeters. Performing a Two-Way ANOVA in Python is a step by step process and these are discussed below:

Step 1: Import libraries.

The very first step is to import the libraries installed above. 

Python3




# Importing libraries
import numpy as np
import pandas as pd


Step 2: Enter the data.

Let us create a pandas DataFrame that consist of the following three variables:

  • fertilizers: how frequently each plant was fertilized that is daily or weekly.
  • watering: how frequently each plant was watered that is daily or weekly.
  • height: the height of each plant (in inches) after six months.

Example:

Python3




# Importing libraries
import numpy as np
import pandas as pd
  
# Create a dataframe
dataframe = pd.DataFrame({'Fertilizer': np.repeat(['daily', 'weekly'], 15),
                          'Watering': np.repeat(['daily', 'weekly'], 15),
                          'height': [14, 16, 15, 15, 16, 13, 12, 11, 14
                                     15, 16, 16, 17, 18, 14, 13, 14, 14
                                     14, 15, 16, 16, 17, 18, 14, 13, 14
                                     14, 14, 15]})


Step 3: Conduct the two-way ANOVA:

To perform the two-way ANOVA, the Statsmodels library provides us with anova_lm() function. The syntax of the function is given below, 

Syntax:

sm.stats.anova_lm(model, type=2)

Parameters:

  • model: It represents model statistics
  • type: It represents the type of Anova test to perform that is { I or II or III or 1 or 2 or 3 }

Python3




# Importing libraries
import statsmodels.api as sm
from statsmodels.formula.api import ols
  
# Performing two-way ANOVA
model = ols(
    'height ~ C(Fertilizer) + C(Watering) +\
    C(Fertilizer):C(Watering)', data=df).fit()
sm.stats.anova_lm(model, typ=2)


Step 4: Combining all the steps.

Example:

Python3




# Importing libraries
import statsmodels.api as sm
from statsmodels.formula.api import ols
  
# Create a dataframe
dataframe = pd.DataFrame({'Fertilizer': np.repeat(['daily', 'weekly'], 15),
                          'Watering': np.repeat(['daily', 'weekly'], 15),
                          'height': [14, 16, 15, 15, 16, 13, 12, 11,
                                     14, 15, 16, 16, 17, 18, 14, 13
                                     14, 14, 14, 15, 16, 16, 17, 18,
                                     14, 13, 14, 14, 14, 15]})
  
  
# Performing two-way ANOVA
model = ols('height ~ C(Fertilizer) + C(Watering) +\
C(Fertilizer):C(Watering)',
            data=dataframe).fit()
result = sm.stats.anova_lm(model, type=2)
  
# Print the result
print(result)


Output:

Output

Interpreting the result:

Following are the p-values for each of the factors in the output:

  • The fertilizer p-value is equal to 0.913305
  • The Watering p-value is equal to 0.990865
  • The Fertilizer * Watering: p-value is equal to 0.904053

The p-values for water and sun turn out to be less than 0.05 which implies that the means of both the factors possess a statistically significant effect on plant height. The p-value for the interaction effect (0.904053) is greater than 0.05 which depicts that there is no significant interaction effect between fertilizer frequency and watering frequency.


My Personal Notes arrow_drop_up
Recommended Articles
Page :

Start Your Coding Journey Now!