This tutorial explains how to read a CSV file in python using read_csv function of pandas package. Without use of read_csv function, it is not straightforward to import CSV file with python object-oriented programming. Pandas is an awesome powerful python package for data manipulation and supports various functions to load and import data from various formats. Here we are covering how to deal with common issues in importing CSV file. Show
Install and Load Pandas Package Make sure you have pandas package already installed on your system. If you set up python using Anaconda, it comes with pandas package so you don't need to install it again. Otherwise you can install it by using command import pandas as pd Create Sample Data for Import The program below creates a sample pandas dataframe which can be used further for demonstration. dt = {'ID': [11, 12, 13, 14, 15], 'first_name': ['David', 'Jamie', 'Steve', 'Stevart', 'John'], 'company': ['Aon', 'TCS', 'Google', 'RBS', '.'], 'salary': [74, 76, 96, 71, 78]} mydt = pd.DataFrame(dt, columns = ['ID', 'first_name', 'company', 'salary']) The sample data looks like below - ID first_name company salary 0 11 David Aon 74 1 12 Jamie TCS 76 2 13 Steve Google 96 3 14 Stevart RBS 71 4 15 John . 78 Save data as CSV in the working directory Check working directory before you save your datafile. import os os.getcwd() Incase you want to change the working directory, you can specify it in under os.chdir("C:\\Users\\DELL\\Documents\\") The following command tells python to write data in CSV format in your working directory. mydt.to_csv('workingfile.csv', index=False) Example 1 : Read CSV file with header rowIt's the basic syntax of read_csv() function. You just need to mention the filename. It assumes you have column names in first row of your CSV file. mydata = pd.read_csv("workingfile.csv") It stores the data the way It should be as we have headers in the first row of our datafile. It is important to highlight that Inspect data after importing mydata.shape mydata.columns mydata.dtypes It returns 5 number of rows and 4 number of columns. Column Names are See the column types of data we imported. first_name and company are character variables. Remaining variables are numeric ones. ID int64 first_name object company object salary int64 Example 2 : Read CSV file with header in second rowSuppose you have column or variable names in second row. To read this kind of CSV file, you can submit the following command. mydata = pd.read_csv("workingfile.csv", header = 1)
11 David Aon 74 0 12 Jamie TCS 76 1 13 Steve Google 96 2 14 Stevart RBS 71 3 15 John . 78 Define your own column names instead of header row from CSV file mydata0 = pd.read_csv("workingfile.csv", skiprows=1, names=['CustID', 'Name', 'Companies', 'Income']) skiprows = 1 means we are ignoring first row and names= option is used to assign variable names manually. CustID Name Companies Income 0 11 David Aon 74 1 12 Jamie TCS 76 2 13 Steve Google 96 3 14 Stevart RBS 71 4 15 John . 78 Example 3 : Skip rows but keep headermydata = pd.read_csv("workingfile.csv", skiprows=[1,2]) In this case, we are skipping second and third rows while importing. Don't forget index starts from 0 in python so 0 refers to first row and 1 refers to second row and 2 implies third row. ID first_name company salary 0 13 Steve Google 96 1 14 Stevart RBS 71 2 15 John . 78 Instead of [1,2]
you can also write Hidden secret of skiprows option When skiprows = 4, it means skipping four rows from top. skiprows=[1,2,3,4] means skipping rows from second through fifth. It is because when list is specified in skiprows= option, it skips rows at index positions. When a single integer value is specified in the option, it considers skip those rows from top Example 4 : Read CSV file without header rowIf you specify "header = None", python would assign a series of numbers starting from 0 to (number of columns - 1) as column names. In this datafile, we have column names in first row. mydata0 = pd.read_csv("workingfile.csv", header = None) See the output shown below-
Add prefix to column names mydata0 = pd.read_csv("workingfile.csv", header = None, prefix="var") In this case, we are setting var0 var1 var2 var3 0 ID first_name company salary 1 11 David Aon 74 2 12 Jamie TCS 76 3 13 Steve Google 96 4 14 Stevart RBS 71 5 15 John . 78 Example 5 : Specify missing values The mydata00 = pd.read_csv("workingfile.csv", na_values=['.']) ID first_name company salary 0 11 David Aon 74 1 12 Jamie TCS 76 2 13 Steve Google 96 3 14 Stevart RBS 71 4 15 John NaN 78 Example 6 : Set Index Columnmydata01 = pd.read_csv("workingfile.csv", index_col ='ID') first_name company salary ID 11 David Aon 74 12 Jamie TCS 76 13 Steve Google 96 14 Stevart RBS 71 15 John . 78 As you can see in the above output, the column ID has been set as index column. Example 7 : Read CSV File from External URLYou can directly read data from the CSV file that is stored on a web link. It is very handy when you need to load publicly available datasets from github, kaggle and other websites. mydata02 = pd.read_csv("http://winterolympicsmedals.com/medals.csv") This DataFrame contains 2311 rows and 8 columns. Using Example 8 : Skip Last 5 Rows While Importing CSVmydata04 = pd.read_csv("http://winterolympicsmedals.com/medals.csv", skip_footer=5) In the above code, we are excluding bottom 5 rows using skip_footer= parameter. Example 9 : Read only first 5 rowsmydata05 = pd.read_csv("http://winterolympicsmedals.com/medals.csv", nrows=5) Using nrows= option, you can load top K number of rows. Example 10 : Interpreting "," as thousands separatormydata06 = pd.read_csv("http://winterolympicsmedals.com/medals.csv", thousands=",") Example 11 : Read only specific columnsmydata07 = pd.read_csv("http://winterolympicsmedals.com/medals.csv", usecols=[1,5,7]) The above code reads only columns based on index positions which are second, sixth and eighth position. Example 12 : Read some rows and columnsmydata08 = pd.read_csv("http://winterolympicsmedals.com/medals.csv", usecols=[1,5,7], nrows=5) In the above command, we have combined usecols= and nrows= options. It will select only first 5 rows and selected columns. Example 13 : Read file with semi colon delimitermydata09 = pd.read_csv("file_path", sep = ';') Using sep= parameter in read_csv( ) function, you can import file with any delimiter other than default comma. In this case, we are using semi-colon as a separator. Example 14 : Change column type while importing CSVSuppose you want to change column format from int64 to float64 while loading CSV file into Python. We can use dtype = option for the same. mydf = pd.read_csv("workingfile.csv", dtype = {"salary" : "float64"}) Example 15 : Measure time taken to import big CSV file With the use of mydf = pd.read_csv("workingfile.csv", verbose=True) Example 16 : How to read CSV file without using Pandas packageTo import CSV file with pure python way, you can submit the following command : import csv with open("C:/Users/DELL/Downloads/nycflights.csv") as f: d = DictReader(f) l=list(d) You can also download and load CSV file from URL or external webpage. import csv import requests response = requests.get('https://dyurovsky.github.io/psyc201/data/lab2/nycflights.csv').text lines = response.splitlines() d = csv.DictReader(lines) l = list(d) EndNote After completion of this tutorial, I hope you gained confidence in importing CSV file into Python with ways to clean and manage file. You can also check out this tutorial which explains how to import files of different format to Python. Once done, you should learn how to perform common data manipulation or wrangling tasks like filtering, selecting and renaming columns, identify and remove duplicates etc on pandas dataframe. What is a csv file in Python?A CSV file (Comma Separated Values file) is a type of plain text file that uses specific structuring to arrange tabular data. Because it's a plain text file, it can contain only actual text data—in other words, printable ASCII or Unicode characters. The structure of a CSV file is given away by its name.
Which of these classes can be used to load the contents of a CSV file into a Python dictionary?DictReader() Class. The objects of a csv. DictReader() class can be used to read a CSV file as a dictionary.
What is CSV Quote_all?csv. QUOTE_MINIMAL means add quote only when required, for example, when a field contains either the quotechar or the delimiter. This is the default. csv. QUOTE_ALL means quotes everything regardless of the field type.
|