Skip to content
Related Articles

Related Articles

Python | Pandas DataFrame.where()

View Discussion
Improve Article
Save Article
  • Difficulty Level : Medium
  • Last Updated : 17 Sep, 2018
View Discussion
Improve Article
Save Article

Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier.

Pandas where() method is used to check a data frame for one or more condition and return the result accordingly. By default, The rows not satisfying the condition are filled with NaN value.

Syntax:
DataFrame.where(cond, other=nan, inplace=False, axis=None, level=None, errors=’raise’, try_cast=False, raise_on_error=None)
 

Parameters:

cond: One or more condition to check data frame for.
other: Replace rows which don’t satisfy the condition with user defined object, Default is NaN
inplace: Boolean value, Makes changes in data frame itself if True
axis: axis to check( row or columns)

For link to the CSV file used, Click here.

Example #1: Single Condition operation

In this example, rows having particular Team name will be shown and rest will be replaced by NaN using .where() method.




# importing pandas package
import pandas as pd
  
# making data frame from csv file
data = pd.read_csv("nba.csv")
  
# sorting dataframe
data.sort_values("Team", inplace = True)
  
# making boolean series for a team name
filter = data["Team"]=="Atlanta Hawks"
  
# filtering data
data.where(filter, inplace = True)
  
# display
data


Output:

As shown in the output image, every row which doesn’t have Team = Atlanta Hawks is replaced with NaN.

 

Example #2: Multi-condition Operations

Data is filtered on the basis of both Team and Age. Only the rows having Team name “Atlanta Hawks” and players having age above 24 will be displayed.




# importing pandas package
import pandas as pd
  
# making data frame from csv file
data = pd.read_csv("nba.csv")
  
# sorting dataframe
data.sort_values("Team", inplace = True)
  
# making boolean series for a team name
filter1 = data["Team"]=="Atlanta Hawks"
  
# making boolean series for age
filter2 = data["Age"]>24
  
# filtering data on basis of both filters
data.where(filter1 & filter2, inplace = True)
  
# display
data


Output:
As shown in the output image, Only the rows having Team name “Atlanta Hawks” and players having age above 24 are displayed.


My Personal Notes arrow_drop_up
Recommended Articles
Page :

Start Your Coding Journey Now!