Skip to content
Related Articles

Related Articles

Python | Read csv using pandas.read_csv()

View Discussion
Improve Article
Save Article
  • Difficulty Level : Easy
  • Last Updated : 22 Jun, 2022
View Discussion
Improve Article
Save Article

Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier.

Most of the data for analysis is available in the form of a tabular format such as  Excel and Comma Separated files(CSV). To access data from csv file, we require a function read_csv() that retrieves data in the form of data frame.  Before using this function, we must import the pandas library.
Importing Pandas library: 
 

import pandas as pd

  
 The read_csv() function is used to retrieve data from csv file. The syntax of read_csv() method is:

pd.read_csv(filepath_or_buffer, sep=', ', delimiter=None, header='infer', names=None, index_col=None, 
             usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, 
             dtype=None, engine=None, converters=None, true_values=None, false_values=None, 
             skipinitialspace=False, skiprows=None, nrows=None, na_values=None, keep_default_na=True, 
             na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=False, infer_datetime_format=False, 
             keep_date_col=False, date_parser=None, dayfirst=False, iterator=False, chunksize=None, compression='infer', 
             thousands=None, decimal=b'.', lineterminator=None, quotechar='"', quoting=0, escapechar=None, comment=None, 
             encoding=None, dialect=None, tupleize_cols=None, error_bad_lines=True, warn_bad_lines=True, skipfooter=0, 
             doublequote=True, delim_whitespace=False, low_memory=True, memory_map=False, float_precision=None) 
             

Code #1  Retrieving data from csv file

PYTHON3




# Import pandas
import pandas as pd
 
# reading csv file
pd.read_csv("filename.csv")


Here is the list of parameters with their Default values. Not all of them are much important but remembering these actually save time of performing some functions on own. One can see parameters of any function by pressing shift + tab in jupyter notebook. Useful ones are given below with their usage :
 
 

  • filepath_or_buffer: It is the location of the file which is to be retrieved using this function. It accepts any string path or URL of the file.
  • sep: It stands for separator, default is ‘, ‘ as in csv(comma separated values).
  • header: It accepts int, list of int, row numbers to use as the column names and start of the data. If no names are passed, i.e., header=None, then,  it will display first column as 0, second as 1, and so on.
  • usecols: It is used to retrieve only selected columns from the csv file.
  • nrows: It means number of rows to be displayed from the dataset.
  • index_col: If None, there are no index numbers displayed along with records.  
  • squeeze: If true and only one column is passed, returns pandas series.
  • skiprows: Skips passed rows in new data frame.
  • names: It allows to retrieve columns with new names.
     
Parameter Use
filepath_or_buffer URL or Dir location of file
sep Stands for separator, default is ‘, ‘ as in csv(comma separated values)
index_col

Makes passed column as index instead of 0, 1, 2, 3…r 
 

 

header

Makes passed row/s[int/int list] as header
 

 

use_cols Only uses the passed col[string list] to make data frame
squeeze If true and only one column is passed, returns pandas series
skiprows Skips passed rows in new data frame

Refer the link to data set used from here.
Code #2 :
 

PYTHON3




# importing Pandas library
import pandas as pd
 
pd.read_csv(filepath_or_buffer = "pokemon.csv")
 
# makes the passed rows header
pd.read_csv("pokemon.csv", header =[1, 2])
 
# make the passed column as index instead of 0, 1, 2, 3....
pd.read_csv("pokemon.csv", index_col ='Type')
 
# uses passed cols only for data frame
pd.read_csv("pokemon.csv", usecols =["Type"])
 
# returns pandas series if there is only one column
pd.read_csv("pokemon.csv", usecols =["Type"], squeeze = True)
                               
# skips the passed rows in new series
pd.read_csv("pokemon.csv", skiprows = [1, 2, 3, 4])



My Personal Notes arrow_drop_up
Recommended Articles
Page :

Start Your Coding Journey Now!