People work with vast amounts of big data every day. There are times when the massive data has column names and times when it doesn’t. Sometimes when the column names are present, they contain unnecessary names or other characters, such as spaces. So, before beginning the analysis, we must pre-process those enormous amounts of data. Therefore, we must first rename the column names.
The Pandas DataFrame object’s columns and indexes can occasionally be renamed for various reasons depending on user needs. While doing so, depending on the values in the dictionary, we can choose to rename a single column or several columns. Further, keyword arguments are highly recommended when using the rename() function to express the intent clearly.
Let’s look at some instances of how to rename columns in a Pandas DataFrame. There are several alternative approaches to renaming columns in a pandas DataFrame, as we will examine them in this article.
Pandas DataFrame information
- A rectangular grid called a Pandas DataFrame is used to store data. Data saved in a dataFrame is simple to visualize and manipulate.
- There are rows and columns in it.
- The axes of a Dataframe in pandas are labeled, unlike those of a two-dimensional array.
- Each row represents a measurement of a single instance, whereas each column is a vector of data for a single attribute or variable.
- Dataframe rows can have homogeneous or heterogeneous data throughout any given row, but each Dataframe column contains homogenous data throughout any given column.
- Changing the names of columns in a Pandas DataFrame
Here, we explore the various approaches to renaming columns ina Pandas DataFrame.
Using the DataFrame set_axis() method, rename the column names
In this example, we’ll rename the column’s name using the set_axis function. As an argument, we’ll pass the new column name and the axis that needs to have its name changed in the column.
# Importing the package ~ pandas import pandas as pd # Definition of cricket performances in competing nations in a dictionary rugby_performances = {'u20': ['India', 'South Africa', 'England', 'New Zealand', 'Australia'], 'wsl': ['England', 'India', 'New Zealand', 'South Africa', 'Pakistan'], 'msl': ['Pakistan', 'India', 'Australia', 'England', 'New Zealand']} # Conversion of the given dictionary to a corresponding DataFrame rugby_per_pd = pd.DataFrame(rugby_performances) # results prior to columns renaming print(rugby_per_pd.columns) rugby_per_pd.set_axis(['A', 'B', 'C'], axis='columns', inplace=True) # results post-column renaming print(rugby_per_pd .columns) rugby_per_pd.head()
Using the rename() function
The rename() function is one technique to rename the columns in a Pandas Dataframe. When we need to rename a few specific columns, this approach comes in handy because we just need to supply information for the columns that need to be changed.
The Pandas feature a built-in function called rename() that allows the column name to be changed immediately. To use this, we must supply the rename function beneath the column attribute with a key (the column’s original name) and value (the column’s new name). Another option, inplace as True, that directly modifies the current Dataframe is also available. By default, the inplace is False.
Here is an example demonstrating how to rename a single column in a DataFrame.
The format is as follows:
df.rename(columns = {'prev_col_1':'curr_col_1', 'prev_col_2':'curr_col_2'}, inplace = True)
The demo for this is as follows
# Import pandas package import pandas as pd # Definition of a dictionary containing rugby performances per participating nations rugby_performances = {'u20': ['India', 'South Africa', 'England', 'New Zealand', 'Australia'], 'wsl': ['England', 'India', 'New Zealand', 'South Africa', 'Pakistan'], 'msl': ['Pakistan', 'India', 'Australia', 'England', 'New Zealand']} # Conversion of the provided dictionary to the corresponding DataFrame rugby_per_pd= pd.DataFrame(rugby_performances) # prior to columns-renaming print(rugby_per_pd) rugby_per_pd.rename(columns = {'u20':'juniors'}, inplace = True) # results prior to column-renaming print("\nAfter modification of the initial column:\n", rugby_per_pd.columns)
The second example explores the renaming of multiple columns in a DataFrame. In this second approach, the generic format is as follows:
df.columns = ['curr_col_1', 'curr_col_2', 'curr_col_3', 'curr_col_4']
Example 1: Rename multiple columns
# Import pandas package import pandas as pd # dictionary definition having the rugby performances of participating nations rugby_performances = {'u20': ['India', 'South Africa', 'England', 'New Zealand', 'Australia'], 'wsl': ['England', 'India', 'New Zealand', 'South Africa', 'Pakistan'], 'msl': ['Pakistan', 'India', 'Australia', 'England', 'New Zealand']} # Conversion of the given dictionary to the corresponding DataFrame rugby_per_pd = pd.DataFrame(rugby_performances) # appearance prior to column -rename print(rugby_per_pd.columns) rugby_per_pd.rename(columns = {'u20':'juniors', 'wsl':'WSL', 'msl':'MSL'}, inplace = True) # results post-column-renaming print(rugby_per_pd.columns)
Example 2: Rename Particular Columns
How to rename particular columns in a pandas DataFrame is demonstrated by the code below:
import pandas as pd #define DataFrame team_performance = pd.DataFrame({'soccer_club':['G', 'G', 'G', 'G', 'F', 'F', 'F', 'F'], 'wins': [25, 12, 15, 14, 19, 23, 25, 29], 'losses': [5, 7, 7, 9, 12, 9, 9, 4], 'draws': [11, 8, 10, 6, 6, 5, 9, 12]}) #list column names list(team_performance) ['soccer_club', 'wins', 'losses', 'draws'] #rename specific column names team_performance.rename(columns = {'soccer_club':'soccer_club_fc', 'wins':'wins_count'}, inplace = True) #view the updated list of column names list(team_performance) ['soccer_club_fc', 'wins_count', 'losses', 'draws']
Example 3: Rename all the columns
To rename every column in a pandas DataFrame, use the code below:
import pandas as pd #define DataFrame team_performance = pd.DataFrame({'soccer_club':['G', 'G', 'G', 'G', 'F', 'F', 'F', 'F'], 'wins': [25, 12, 15, 14, 19, 23, 25, 29], 'losses': [5, 7, 7, 9, 12, 9, 9, 4], 'draws': [11, 8, 10, 6, 6, 5, 9, 12]}) #list column names list(team_performance) ['soccer_club', 'wins', 'losses', 'draws'] #rename all column names team_performance .columns = ['soccer_club_fc', 'win_cnt', 'loss_cnt', 'draw_cnt'] #view the updated list of column names list(team_performance) ['soccer_club_fc', 'win_cnt', 'loss_cnt', 'draw_cnt']
It should be noted that using this method to rename most or all column names in a DataFrame is quicker. We can also replace particular characters in the columns by following the rubric below.
df.columns = df.columns.str.replace('prev_char', 'curr_char')
Example 4: Change Particular Characters in Columns
The code below demonstrates how to change a particular character in each column name:
import pandas as pd #define DataFrame team_performance = pd.DataFrame({'$soccer_club':['G', 'G', 'G', 'G', 'F', 'F', 'F', 'F'], '$wins': [25, 12, 15, 14, 19, 23, 25, 29], '$losses': [5, 7, 7, 9, 12, 9, 9, 4], '$draws': [11, 8, 10, 6, 6, 5, 9, 12]}) #list column names list(team_performance ) ['soccer_club', 'wins', 'losses', 'draws'] #rename $ with blank in every column name team_performance .columns = team_performance .columns.str.replace('$', '') #view the updated list of column names list(team_performance ) ['soccer_club', 'wins', 'losses', 'draws']
As you can see, the ‘$’ from each column name was rapidly removed using this technique.
By naming a series of fresh columns
Pandas DataFrame has an attribute name column that enables us to retrieve all of a Dataframe’s column names. Therefore, we can also change the column name by utilizing this attribute for columns. The columns can also be changed by directly changing the names of the columns by setting a list containing the new names to the columns attribute of the Dataframe object.
The drawback of employing this technique is that we must offer new names for all of them, even if we only wish to rename part of the columns. As demonstrated below, we must pass a fresh set of columns and assign them to the columns attribute.
# Import pandas package import pandas as pd # dictionary definition of rugby performances rugby_performances = {'u20': ['India', 'South Africa', 'England', 'New Zealand', 'Australia'], 'wsl': ['England', 'India', 'New Zealand', 'South Africa', 'Pakistan'], 'msl': ['Pakistan', 'India', 'Australia', 'England', 'New Zealand']} # Conversion of the dictionary into a corresponding DataFrame rugby_per_pd = pd.DataFrame(rugby_performances) # appearance prior to column-renaming print(rugby_per_pd.columns) rugby_per_pd.columns = ['juniors', 'WSL', 'MSL'] # After renaming the columns print(rugby_per_pd.columns)
Utilize DataFrame to rename columns using the Functions add_prefix() and add_suffix()
Using the add_Sufix and add_Prefix functions, we will rename the column in this example. We will pass the prefix and suffix, which are subsequently added to the column name’s first and last names.
# Import pandas package import pandas as pd # dictionary definition of rugby performances by participating nations rugby_performances = {'u20': ['India', 'South Africa', 'England', 'New Zealand', 'Australia'], 'wsl': ['England', 'India', 'New Zealand', 'South Africa', 'Pakistan'], 'msl': ['Pakistan', 'India', 'Australia', 'England', 'New Zealand']} # Conversion of the provided dictionary into a corresponding DataFrame rugby_per_pd = pd.DataFrame(rugby_performances) # prior to column-renaming print(rugby_per_pd.columns) rugby_per_pd = rugby_per_pd.add_prefix('col_') rugby_per_pd= rugby_per_pd.add_suffix('_1') # results post-column-renaming rugby_per_pd.head()
Use a dataframe function to replace specific names of columns through Dataframe.columns.str.replace
In this example, we’ll use the replace function to rename the column’s name. As an argument for the column, we’ll pass the old and new names.
# Import pandas package import pandas as pd # Define a dictionary containing ICC rankings rugby_performances = {'u20': ['India', 'South Africa', 'England', 'New Zealand', 'Australia'], 'wsl': ['England', 'India', 'New Zealand', 'South Africa', 'Pakistan'], 'msl': ['Pakistan', 'India', 'Australia', 'England', 'New Zealand']} # Conversion of the provided dictionary to a corresponding DataFrame rugby_per_pd = pd.DataFrame(rugby_performances ) # Before renaming the columns print(rugby_per_pd.columns) # df = rugby_per_pd rugby_per_pdcolumns = rugby_per_pd.columns.str.replace('u20', 'Col_JUNIORS') rugby_per_pd.columns = rugby_per_pd.columns.str.replace('wsl', 'Col_WSL') rugby_per_pd.columns = rugby_per_pd.columns.str.replace('msl', 'Col_MSL') rugby_per_pd.head()
Conclusion
Row-oriented tabular data in the form of a DataFrame has both rows and columns. We can alternatively describe a DataFrame as a collection of various columns, each of which has a variety of column kinds, including string, numeric, and others.
The rename() method, which requires that we supply only the columns we wish to rename in dictionary (key, value) format, is the best approach, in our view. The columns property is the most straightforward technique, but its main disadvantage is that even if we only want to rename a few columns, we must pass all the columns. Another helpful option is to rename columns as the CSV file is being read. Also, note that only when we wish to replace some characters with other characters is columns.str.replace() the best choice.