Creating a Pandas DataFrame

Pandas is one of the most powerful Data Science libraries in python. Almost every Data Scientist working in python uses the pandas library. The popularity of pandas is due to its inbuilt data structure, i.e., the DataFrame. Pandas DataFrame is a modern and efficient way to store data in tabular format. Pandas DataFrame also has many integrated methods that perform many operations to make our task as data scientists easy. DataFrame provides methods for sorting data, aggregating data, and many other operations. In fact, we can also draw a Data Frame with ease using the plot () method.

Creating Pandas DataFrames

In this tutorial, we are going to learn how to create Data Frames in pandas. It is recommended to install an IDE for writing code. Here, I will use the open-source and cross-platform VS Code provided by Microsoft. You can use the VS Code or any other IDE like Jupyter Notebook or PyCharm for writing the codes.

Installation

To follow this tutorial, we need to install the pandas library. If you have the pandas library installed, you can skip this section. To install the pandas library in our system, we need to use the pip package manager of python. PIP is a package manager of python that is used to install and remove packages in python easily. To install the pandas library in our system, we need to run the following command in our terminal.

pip install pandas

On running the above code, we will have the pandas library installed in our system. We can check our installation by importing the pandas module, as I did in the below image.

verifying the pandas installation
verifying the pandas installation

Introduction

Pandas DataFrame is a two-dimensional data structure. It is structured in a tabular form in rows and columns. Pandas DataFrame is the most used Data Structure of the Pandas library. We can perform various operations in a DataFrame very easily that are necessary for Data scientists. One Special thing about pandas DataFrame is that they are easy to create. We can easily create a DataFrame from python’s Data structures like List, Dictionary, etc. We can also read files like CSV, JSON, EXCEL, etc. using pandas and convert them into a DataFrame.

The DataFrame Class

To create a DataFrame, we need to use the DataFrame() class of the pandas library. The syntax of the DataFrame() is:-

DataFrame(data=Noneindex=Nonecolumns=Nonedtype=Nonecopy=None)

Let us see what each parameter of the argument means,

Data: The data parameter is used to accept the data required for creating the DataFrame. It will accept a ndarray, iterable, Dict, or a DataFrame.
Index: The index parameter is used to provide a custom index for the DataFrame. It accepts an array-like object containing the index for the DataFrame. If no index is provided, it will create a default index in the range(0, 1, 2, 3, …, n-1) where n is the length of the DataFrame.
Columns: The columns parameter is used to provide a custom column name for the DataFrame. It also accepts an array-like object containing the names of the columns. If no name is provided, it will create a default name from digits in the range(0, 1, 2, 3, …, n-1), where n is the number of columns in the DataFrame.

We have discussed the syntax of the DataFrame() class. Now, let us see how to use the DataFrame() class of the pandas library to create DataFrame in python.

Creating Empty DataFrame

To create an empty DataFrame in python, we need to call the DataFrame() class without any parameter. The below code shows an illustration of how to create an empty DataFrame.

# Importing the pandas library
import pandas as pd
# creating a dataframe object
df = pd.DataFrame()
# displaying the DataFrame
print(df)

We imported the pandas library in our code in the above code and used its DataFrame() class without any argument to create an empty DataFrame. On running the above code, we will get the output as shown in the below image.

creating an empty dataframe using pandas
creating an empty data frame using pandas

We can also create an empty DataFrame containing NaN values. To achieve this, we need to provide the index and columns in the DataFrame() class parameter. See the below code for illustration.

# Importing the pandas library
import pandas as pd
# creating a dataframe object by providing a 
# list of columns and the index in range of 3
df = pd.DataFrame(columns = ['Name', 'Age'], index = range(3))
# displaying the DataFrame
print(df)

In the above code, we imported the pandas library in the first line. In the next line, we use the DataFrame() class of the pandas library. Then, we use the columns parameter of the DataFrame() and give it a list containing the names of the columns. We also used the index parameter and pass the range function with argument 3 to generate the numbers 0, 1, 2. At last, we use the print() function to display the DataFrame. On running the above code, we will get the output as shown in the below image.

creating an empty dataframe containg NaN values
creating an empty data frame containing NaN values

Creating DataFrame from python DataStructures

We can use the DataFrame() class of the pandas library to create a DataFrame from the data structures of python. Python has a wide range of data structures like Dictionary, lists that can be used to create a DataFrame. Let us discuss how to create a DataFrame from Python’s Data Structures.

Creating from a List

To create a pandas DataFrame from a python list, we need to pass the list in the argument of the DataFrame() class. In addition, we can also provide the column name for the DataFrame by using the columns parameter of the DataFrame() class. The below code shows a practical demonstration of creating a pandas DataFrame from a python list.

# importing the required modules
import pandas as pd
# A python list containing soem string names
names = ['harry', 'Dan', 'Ajay', 'Sam', 'karan']
# creating the dataframe from the lists
df = pd.DataFrame(names, columns=['Names'])
# displaying teh dataframe
print(df)

In the first line of the above code, we imported the pandas library. In the second line, we created a python list containing some strings. Next, we used the DataFrame() class of the pandas library and passed it the lists as a parameter. We also use the columns parameter of the DataFrame() class to set the column’s name. At last, we use the print() function of python to display the DataFrame to the console. On running the above code, we will get the output as shown in the below image.

creating a dataframe from a python list
creating a data frame from a python list

Creating from a Nested List

Nested lists are lists that contain another list in their body. We can create a DataFrame using the nested lists. Each list inside the nested list represents a single row. The below code shows a practical demonstration of creating a DataFrame from a nested list.

# importing the required modules
import pandas as pd
# A python list containing some sample Data
data = [
    ['harry', 30, 'India'],
    ['Dan', 21, 'USA'],
    ['Ajay', 31, 'India'], 
    ['Sam', 28, 'USA'],
    ['karan', 23, 'India']
    ]
# creating the dataframe from the nested lists
df = pd.DataFrame(data, columns=['Names', 'Age', 'location'])
# displaying the dataframe
print(df)

In the first line of the above code, we imported the pandas module into our code. In the second line, we created a nested list containing five inner lists of length 3. Next, we used the DataFrame() class of the pandas library and passed the nested list and the column names to its parameter. At last, we use the print function of python to display the list in the console. On running the above code, we will get the output as shown in the below image.

creating a dataframe from a nested list of python
creating a data frame from a nested list of python

Creating from a list of Dictionaries

We can also create a DataFrame from a list of Dictionaries. The dictionaries key will be taken as the columns names by default. We can also provide custom columns names by using the columns parameter of the DataFrame class. The below code shows a practical demonstration of creating a DataFrame from a List of Dictionaries.

# Importing the pandas library into aour code
import pandas as pd
# Creating a python list containing some Dictionaries
data = [
    {
        "Name": 'Sam',
        "age": 24
    },
    {
        "Name": 'Dan',
        "age": 33
    },
    {
        "Name": 'Karan',
        "age": 24
    },
    {
        "Name": 'Ajay',
        "age": 30
    }
]
# Creating the DataFrame from rthe list of Dictionaries
df = pd.DataFrame(data)
# Displaying the DataFrame
print(df)

In the above code, we first imported the pandas library into our code. Then, in the second line of code, we created a python list containing some dictionaries. Next, we pass the list to the argument of the DataFrame() class. At last, we display the DataFrame to the console using the print() function. On running the above code, we will get the output as shown in the below image.

Creating a dataframe from a list of dictionaries
Creating a data frame from a list of dictionaries

Creating from a Dictionary of list

We can also create a DataFrame from a Dictionary of lists. The key of the Dictionary is taken as the column name for the resulting DataFrame. We can also specify custom column names by using the columns parameter of the DataFrame() class. The below code shows a practical demonstration on creating a DataFrame from a Dictionary of Lists.

# Importing the pandas library into aour code
import pandas as pd
# Creating a python Dictionary containing some Lists
data = {
    "Name": ['Sam', 'Daniel', 'Karan', 'Raj', 'Ajay', 'Amit'],
    "Age": ['31', '22', '21', '22', '30', '28'],
    "country": ['USA', 'USA', 'India', 'India', 'India', 'India']
}
# Creating the DataFrame from rthe Dictionary of Lists
df = pd.DataFrame(data)
# Displaying the DataFrame
print(df)

In the first line of the above code, we imported the pandas library into our code. In the second line of the code, we created a python dictionary containing some lists. Next, we created a DataFrame by passing the Dictionary into the argument of the DataFrame() class. At last, we display the DataFrame to the console by using the print() function of python. On running the above code, we will get the output as shown in the below image.

creating a Dataframe from a dictionary of list
creating a data frame from a dictionary of list

Creating from NumPy Arrays

NumPy is a popular python library used for creating and manipulating arrays. We can perform many operations in NumPy arrays. Though NumPy is a powerful library, we sometimes want to convert it into a pandas DataFrame for processing. To convert a NumPy Array into pandas DataFrame, we need to pass the array into the argument of the DataFrame() class. The below code shows a practical demonstration of creating a NumPy array and converting it into a DataFrame.

# Importing the required libraries
import numpy as np
import pandas as pd
# Creating a Numpy Array
arr = np.array([
    ['Karan', 21, 'India'],
    ['Amit', 24, 'India'],
    ['David', 33, 'USA'],
    ['Dan', 29, 'USA'],
    ['Sam', 28, 'USA'],
])
# Displaying the Numpy Array
print(arr)
print('\n')
# Creating the Dataframe from the numpy array
df = pd.DataFrame(arr, columns=['Name', 'Age', 'Country'])
# Dislaying the DataFrame
print(df)

In the first line of the above code, we imported the pandas and the NumPy library. We used the array() class of the NumPy library to create an array. After that, we displayed the array into the console using the print() function of python. Next, we pass the array as an argument to the DataFrame() class. We also use the columns parameter of the DataFrame() class to specify the columns name. This will, convert the NumPy array into a pandas DataFrame. At last, we display the DataFrame using the print() function of python. On running the above code, we will get the output as shown in the below image.

creating a DataFrame from a numpy array
creating a DataFrame from a NumPy array

Creating from a CSV file

CSV files are one of the most important files used by a Data Scientist. CSV files are used to store data in tabular form, separated by a comma. Pandas provide a read_csv() method that can be used to read a CSV file and convert it into a DataFrame. We can read the CSV data from a file and also from a URL. The below code shows a practical demonstration of converting a CSV file into a pandas DataFrame. The CSV file that I used for the demonstration is available to download at Kaggle.

# importing the pandas library
import pandas as pd
# creating the DataFrame by reading the CSV file
df = pd.read_csv('covid.csv')
# Displaying the DataFrame
print(df)

In the first line of the above code, we imported the pandas library into our code. In the next line, we use the read_csv() method of pandas to read the CSV file and convert it into a DataFrame. At last, we display the DataFrame into the console using the print() function of python. On running the above code, we will get the output as shown in the below image.

creating a DataFrame by reading a CSV file
creating a DataFrame by reading a CSV file

Creating from an Excel file

Excel files are also one of the important files for working with data. Pandas also support reading and writing Excel files, which ease the work of Data Scientists. To read an Excel file in pandas, we need to use the read_excel() method of the pandas library. The below code shows a practical demo of reading an Excel file using pandas. The Excel file I used here is available to download at Kaggle.

# importing the pandas library
import pandas as pd
# reading the excel file and converting it into a Dataframe
df = pd.read_excel('excel.xlsx')
# Displaying the DataFrame to the console
print(df)

In the above code, we first imported the pandas library into our code. Next, we used the read_excel() method of the pandas library to read the Excel file and convert it into a DataFrame. At last, we display the DataFrame into the console by using python’s print() function. On running the above code, we will get the output as shown in the below image.

reading an excel file using the pandas library and converting it into a DataFrame
reading an Excel file using the pandas library and converting it into a DataFrame

Conclusion

In this tutorial, we learned the different ways of creating a pandas DataFrame in python. We have learned how to create DataFrame from python’s Data structures. We have also discussed how to create DataFrame from files such as CSV, EXCEL. You may also want to see our step-by-step guide on creating visualizations on python.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *