working with CSV files in Python

Python is a great programming language; it is used in many fields like Web Development, Machine Learning, Data Science, Computer Vision, etc. In python, we can easily write programs that output data into the console. But writing data to the console is not the best thing when we required that data for further processing.

We can also use the text files to write the data in raw format, making the data harder to process. Here comes the CSV format that can easily import, export, and analyze data. Let us see more about this file format and how it is useful to us as a Python programmer.

Introduction to CSV file

CSV files are crucial files used in programming. They are mostly used for storing data. Using CSV is similar to storing data in a plain text file; we need to separate each column of data by using a comma. Storing the program’s data into a CSV file instead of displaying the data into the console is a good practice, and the data is also made available for the future. As CSV files are plain text files, they can be used in any programming language without any problems.

The CSV file has the extension .csv and are recognized by many application. We can view the data inside a CSV file using Excel, Libre Office, Spreadsheet tools, or a simple text editor like Notepad or Vim. They are also perfect for handling a large amount of data and importing, exporting data between programs, and are one of the most used file formats for data science, machine learning, data mining tasks.

Although we can also build our own parser for parsing the CSV data, there are many libraries available in Python, making our tasks easy as a programmer. There are two popular libraries for working with CSV files in python: they are the csv library and the pandas library.

Prerequisites

To follow this tutorial, you need python to be installed in your system. If you don’t have it installed in your system, you can follow our guidance on installing python. In this tutorial, we will use the CSV and the pandas library for working with CSV files. The CSV library will be installed while installing the python, but the pandas library must be installed manually. You can install the pandas library in your python system using the pip package manager by typing the terminal command.

pip install -U pandas

I am using a popular CSV dataset present at Github in this tutorial, which contains some data about houses. You can download and use this dataset or can use any other CSV file.

Parsing CSV Files Using Built-in csv library

The Python programming language comes with a built-in library named csv that can work with CSV files. This module is present in the python standard library, so we don’t need any manual installation to use it. We can use the library to perform both reading and writing operations on the CSV file.

Reading a CSV file using csv library

To read a CSV file, we need to use the reader() method of the csv module. This method accepts the file object of the CSV file as an argument and transforms each of the rows of the csv file into a list. The following code block shows a practical illustration of reading a csv file using the csv.reader() method.

# importing the csv module
import csv
# opening the housing.csv file
with open("housing.csv", "r") as fileobj:
    t_row = 0
    t_column = 0
    # creating a csv reader object
    csv_reader = csv.reader(fileobj)
    # displaying the rows and counting total rows and columns
    for row in csv_reader:
        t_row = t_row + 1
        print(row)
        t_column = len(row)
print("\n\n")
print(" [+] Total Rows", t_row)
print(" [+] Total Columns", t_column)
print("\n")

In the above code, we import the csv module in our code by using the import statement of python. Then we use the open() function of python to open the file housing.csv in reading mode. Then we created two variables t_row, t_column that will store the total rows and total columns of the csv data.

Next, we use the csv.reader() method, which accepts the csv file object as an argument. The csv.reader() method returns an object which contains a list of each row. We used the python for loop to access each row list of the csv_reader object and print them.

Output:

reading a csv file using the python csv module
reading a csv file using the python csv module

Writing a CSV file using csv library

The csv module has a writer() class, which can create a writer object. The writer object provides some useful methods that can be used to insert data into a CSV file. The methods are discussed below.

  • writerow(): The writerow() method is used to write a single row into a CSV file at a time. We can use it to write the header of a CSV file or any single row.
  • writerows(): This method is used to write multiple rows at a time into a csv file.

The following code block shows a practical illustration of the writer methods.

# importing the required modules
import csv
# creating a list for the header of CSV file
header = ['name','age','occupation']
# creating a list for single row
single_row = ['david', 26, 'programmer']
# creating a nested list for inserting multiple rows
rows = [['ayush', 35,'web developer'],['ankit', 32, 'data scientist'], ['Suresh', 10 , 'student']]
# the output file path
filename = "file.csv"
with open(filename, "a") as fileobj:
    # creating a writer object
    writer = csv.writer(fileobj)
    # writing a single row into the CSV file
    writer.writerow(header)
    writer.writerow(single_row)
    # writing multiple rows into the CSV file
    writer.writerows(rows)
    print(" [+] The data has been inserted Successfully ")

In the above code block, we created three python lists; the first one is the header of the csv file, the second list is a single list to be inserted as a single row, the third list is a nested list which can be used to insert multiple lists at a time into the CSV file.

Next, we create a new csv file by using the open() function of python and pass the file object into the argument of the csv.writer() method. Then we use the writerow(), and writerows() methods of the writer object to write the lists into the CSV file. After running the code, you can see that a new csv file has been created in the working directory with the name file.csv, and it contains the data as shown in the below image.

writing csv data using the csv library
writing csv data using the csv library

We can also write a dictionary to a CSV file using the csv module. We need to use the DictWriter() class of the csv module to convert a python dictionary and write it to a csv file. The DictWriter() class creates a dictionary writer object, which provides two writing data methods into a CSV file. The methods are discussed below.

  • writerow(): This method is used to write a single row at a time. The argument must be a python dictionary.
  • writerows(): This method can write multiple rows at a time. It accepts a list of python dictionary as an argument.
  • writeheader(): This method is used to write the header of the CSV file.

The following code block shows a practical illustration of both the dictionary writer methods:

# importing the required modules
import csv
# the header of the CSV file
header = ['name','age','profession']
# creating a dictionary for single row
single_row = {'name':'david', 'age': 26, 'profession' :'programmer'}
# creating a list contauining multiple dictionaries
rows = [{'name':'ayush', 'age': 35, 'profession': 'web developer'},
        {'name':'ankit', 'age': 32, 'profession': 'data scientist'}, 
        {'name':'Suresh', 'age': 10 , 'profession': 'student'}]
# The output file path
filename = "file2.csv"
with open(filename, "a") as fileobj:
    # creating a DictWriter object
    writer = csv.DictWriter(fileobj, fieldnames = header)
    # writing The header of the CSV file
    writer.writeheader()
    # writing multiple rows
    writer.writerows(rows)
    # writing single row
    writer.writerow(single_row)
    print(" [+] The data has been inserted Successfully ")

In the above code, we used the created three variables; the first one is a python list that contains the field names for the csv file, a second variable is a python dictionary that contains data for a single row, the third variable is a python list that contains multiple dictionaries. Then we used the csv.DictWriter() class. It takes two arguments.

The first argument is the file object of the CSV file, and the second argument is the field names. Then we use the writeheader() method, which will write the field names to the CSV file. Then we use the writerows() method and write a list of multiple dictionaries into multiple rows. Finally, we use the writerow() method to write a single row into the CSV file.

On running the above code, we will get a CSV file created in the working directory with the data shown in the below image.

writing csv data using the csv DictWriter class
writing csv data using the csv DictWriter class

Parsing CSV Files Using Pandas library

We have seen how to read and write CSV files using the csv library, but we can also perform those tasks using the Python pandas library. Pandas is one of the most powerful and popular frameworks for data analysis in python. It provide a compelling data structure called Data-frame, which can easily work with tabular data.

Pandas can be used to work with many different files like JSON, CSV, etc. It takes the file and converts them into its Data-frame so it can easily work with them. Pandas is the most useful library for working with a large dataset. That is why it is used mainly for data analysis while building machine learning models.

This library provides two functions that can be used to work with a CSV file. We can read data from a CSV file using the read_csv() function, and we can write data to a CSV file by using the to_csv() function. Let us see more about these two functions.

Reading a CSV file using Pandas library

To read a CSV file using pandas, we need to use the read_csv() function of pandas. The following code shows a simple illustration.

# importing the pandas library
import pandas as pd
# reading the csv file and converting it into a pandas dataframe
df = pd.read_csv("housing.csv")
print(df)

In the above code, we first imported the pandas library using the import statement. Then we use the read_csv() function of the pandas library, which accepts the CSV file path as an argument and converts the CSV data into a pandas Dataframe.

Output:

reading a csv file and transforming it into a pandas dataframe
reading a csv file and transforming it into a pandas dataframe

After converting the CSV data into a pandas Dataframe, we can perform many useful operations on the data using the pandas library.

Writing a CSV file using Pandas library

To achieve this, we need to use the to_csv() method of the Dataframe object. The following code shows a practical illustration of writing a pandas Dataframe into a CSV file.

# importing the pandas library
import pandas as pd
# reading the csv file and converting it into a pandas dataframe
df = pd.read_csv("housing.csv")
# writing the dataframe into a CSV file
df.to_csv("new.csv")

In the above code block, we first use the read_csv() function to read the CSV file data. Then we use the to_csv() method of the Dataframe object to write the Dataframe that we created into another CSV file. We will get a new file created in the current directory with the name new.csv containing the data shown in the below image on running the above code.

writing a pandas dataframe into csv file
writing a pandas dataframe into csv file

Conclusion

That brings us to the end of this tutorial. We have learned how to perform reading and writing operations on CSV files using both csv and pandas library. You may also like to see our guide on working with JSON file using Python.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *