Renaming columns in a pandas DataFrame

People work with vast amounts of big data every day. There are times when the massive data has column names and times when it doesn’t. Sometimes when the column names are present, they contain unnecessary names or other characters, such as spaces. So, before beginning the analysis, we must pre-process those enormous amounts of data. Therefore, we must first rename the column names.

The Pandas DataFrame object’s columns and indexes can occasionally be renamed for various reasons depending on user needs. While doing so, depending on the values in the dictionary, we can choose to rename a single column or several columns. Further, keyword arguments are highly recommended when using the rename() function to express the intent clearly.

Let’s look at some instances of how to rename columns in a Pandas DataFrame. There are several alternative approaches to renaming columns in a pandas DataFrame, as we will examine them in this article.

Pandas DataFrame information

  • A rectangular grid called a Pandas DataFrame is used to store data. Data saved in a dataFrame is simple to visualize and manipulate.
  • There are rows and columns in it.
  • The axes of a Dataframe in pandas are labeled, unlike those of a two-dimensional array.
  • Each row represents a measurement of a single instance, whereas each column is a vector of data for a single attribute or variable.
  • Dataframe rows can have homogeneous or heterogeneous data throughout any given row, but each Dataframe column contains homogenous data throughout any given column.
  • Changing the names of columns in a Pandas DataFrame

Here, we explore the various approaches to renaming columns ina Pandas DataFrame.

Using the DataFrame set_axis() method, rename the column names

In this example, we’ll rename the column’s name using the set_axis function. As an argument, we’ll pass the new column name and the axis that needs to have its name changed in the column.

<div class="wp-block-codemirror-blocks-code-block code-block">
<pre># Importing the package ~ pandas
import pandas as pd

# Definition of cricket performances in competing nations in a dictionary 
rugby_performances = {'u20': ['India', 'South Africa', 'England',
					'New Zealand', 'Australia'],
			'wsl': ['England', 'India', 'New Zealand',
					'South Africa', 'Pakistan'],
			'msl': ['Pakistan', 'India', 'Australia',
					'England', 'New Zealand']}

# Conversion of the given dictionary to a corresponding DataFrame
rugby_per_pd = pd.DataFrame(rugby_performances)

# results prior to columns renaming 
print(rugby_per_pd.columns)

rugby_per_pd.set_axis(['A', 'B', 'C'], axis='columns', inplace=True)

# results post-column renaming
print(rugby_per_pd .columns)
rugby_per_pd.head()</pre>
</div>

Using the rename() function

The rename() function is one technique to rename the columns in a Pandas Dataframe. When we need to rename a few specific columns, this approach comes in handy because we just need to supply information for the columns that need to be changed.

The Pandas feature a built-in function called rename() that allows the column name to be changed immediately. To use this, we must supply the rename function beneath the column attribute with a key (the column’s original name) and value (the column’s new name). Another option, inplace as True, that directly modifies the current Dataframe is also available. By default, the inplace is False.

Here is an example demonstrating how to rename a single column in a DataFrame.

The format is as follows:

<div class="wp-block-codemirror-blocks-code-block code-block">
<pre>df.rename(columns = {'prev_col_1':'curr_col_1', 'prev_col_2':'curr_col_2'}, inplace = True)</pre>
</div>

The demo for this is as follows

<div class="wp-block-codemirror-blocks-code-block code-block">
<pre># Import pandas package
import pandas as pd
# Definition of a dictionary containing rugby performances per participating nations
rugby_performances = {'u20': ['India', 'South Africa', 'England',
					'New Zealand', 'Australia'],
			'wsl': ['England', 'India', 'New Zealand',
					'South Africa', 'Pakistan'],
			'msl': ['Pakistan', 'India', 'Australia',
					'England', 'New Zealand']}

# Conversion of the provided dictionary to the corresponding DataFrame
rugby_per_pd= pd.DataFrame(rugby_performances)

# prior to columns-renaming
print(rugby_per_pd)

rugby_per_pd.rename(columns = {'u20':'juniors'}, inplace = True)
# results prior to column-renaming 
print("\nAfter modification of the initial column:\n", rugby_per_pd.columns)</pre>
</div>

The second example explores the renaming of multiple columns in a DataFrame. In this second approach, the generic format is as follows:

<div class="wp-block-codemirror-blocks-code-block code-block">
<pre>df.columns = ['curr_col_1', 'curr_col_2', 'curr_col_3', 'curr_col_4']</pre>
</div>

Example 1: Rename multiple columns

<div class="wp-block-codemirror-blocks-code-block code-block">
<pre># Import pandas package
import pandas as pd

# dictionary definition having the rugby performances of participating nations
rugby_performances = {'u20': ['India', 'South Africa', 'England',
					'New Zealand', 'Australia'],
			'wsl': ['England', 'India', 'New Zealand',
					'South Africa', 'Pakistan'],
			'msl': ['Pakistan', 'India', 'Australia',
					'England', 'New Zealand']}

# Conversion of the given dictionary to the corresponding DataFrame
rugby_per_pd = pd.DataFrame(rugby_performances)

# appearance prior to column -rename 
print(rugby_per_pd.columns)

rugby_per_pd.rename(columns = {'u20':'juniors', 'wsl':'WSL',
							'msl':'MSL'}, inplace = True)

# results post-column-renaming
print(rugby_per_pd.columns)</pre>
</div>

Example 2: Rename Particular Columns

How to rename particular columns in a pandas DataFrame is demonstrated by the code below:

<div class="wp-block-codemirror-blocks-code-block code-block">
<pre>import pandas as pd

#define DataFrame
team_performance = pd.DataFrame({'soccer_club':['G', 'G', 'G', 'G', 'F', 'F', 'F', 'F'],
                   'wins': [25, 12, 15, 14, 19, 23, 25, 29],
                   'losses': [5, 7, 7, 9, 12, 9, 9, 4],
                   'draws': [11, 8, 10, 6, 6, 5, 9, 12]})

#list column names
list(team_performance)

['soccer_club', 'wins', 'losses', 'draws']

#rename specific column names
team_performance.rename(columns = {'soccer_club':'soccer_club_fc', 'wins':'wins_count'}, inplace = True)

#view the updated list of column names
list(team_performance)

['soccer_club_fc', 'wins_count', 'losses', 'draws']</pre>
</div>

Example 3: Rename all the columns

To rename every column in a pandas DataFrame, use the code below:

<div class="wp-block-codemirror-blocks-code-block code-block">
<pre>import pandas as pd

#define DataFrame
team_performance = pd.DataFrame({'soccer_club':['G', 'G', 'G', 'G', 'F', 'F', 'F', 'F'],
                   'wins': [25, 12, 15, 14, 19, 23, 25, 29],
                   'losses': [5, 7, 7, 9, 12, 9, 9, 4],
                   'draws': [11, 8, 10, 6, 6, 5, 9, 12]})

#list column names
list(team_performance)

['soccer_club', 'wins', 'losses', 'draws']

#rename all column names
team_performance .columns = ['soccer_club_fc', 'win_cnt', 'loss_cnt', 'draw_cnt']

#view the updated list of column names
list(team_performance)

['soccer_club_fc', 'win_cnt', 'loss_cnt', 'draw_cnt']</pre>
</div>

It should be noted that using this method to rename most or all column names in a DataFrame is quicker. We can also replace particular characters in the columns by following the rubric below.

<div class="wp-block-codemirror-blocks-code-block code-block">
<pre>df.columns = df.columns.str.replace('prev_char', 'curr_char')</pre>
</div>

Example 4: Change Particular Characters in Columns

The code below demonstrates how to change a particular character in each column name:

<div class="wp-block-codemirror-blocks-code-block code-block">
<pre>import pandas as pd

#define DataFrame
team_performance = pd.DataFrame({'$soccer_club':['G', 'G', 'G', 'G', 'F', 'F', 'F', 'F'],
                   '$wins': [25, 12, 15, 14, 19, 23, 25, 29],
                   '$losses': [5, 7, 7, 9, 12, 9, 9, 4],
                   '$draws': [11, 8, 10, 6, 6, 5, 9, 12]})

#list column names
list(team_performance )

['soccer_club', 'wins', 'losses', 'draws']

#rename $ with blank in every column name
team_performance .columns = team_performance .columns.str.replace('$', '')

#view the updated list of column names
list(team_performance )

['soccer_club', 'wins', 'losses', 'draws']
</pre>
</div>

As you can see, the ‘$’ from each column name was rapidly removed using this technique.

By naming a series of fresh columns

Pandas DataFrame has an attribute name column that enables us to retrieve all of a Dataframe’s column names. Therefore, we can also change the column name by utilizing this attribute for columns. The columns can also be changed by directly changing the names of the columns by setting a list containing the new names to the columns attribute of the Dataframe object.
The drawback of employing this technique is that we must offer new names for all of them, even if we only wish to rename part of the columns. As demonstrated below, we must pass a fresh set of columns and assign them to the columns attribute.

<div class="wp-block-codemirror-blocks-code-block code-block">
<pre># Import pandas package
import pandas as pd

# dictionary definition of rugby performances
rugby_performances = {'u20': ['India', 'South Africa', 'England',
					'New Zealand', 'Australia'],
			'wsl': ['England', 'India', 'New Zealand',
					'South Africa', 'Pakistan'],
			'msl': ['Pakistan', 'India', 'Australia',
					'England', 'New Zealand']}

# Conversion of the dictionary into a corresponding DataFrame
rugby_per_pd = pd.DataFrame(rugby_performances)

# appearance prior to column-renaming
print(rugby_per_pd.columns)

rugby_per_pd.columns = ['juniors', 'WSL', 'MSL']

# After renaming the columns
print(rugby_per_pd.columns)</pre>
</div>

Utilize DataFrame to rename columns using the Functions add_prefix() and add_suffix()

Using the add_Sufix and add_Prefix functions, we will rename the column in this example. We will pass the prefix and suffix, which are subsequently added to the column name’s first and last names.

<div class="wp-block-codemirror-blocks-code-block code-block">
<pre># Import pandas package
import pandas as pd

# dictionary definition of rugby performances by participating nations
rugby_performances = {'u20': ['India', 'South Africa', 'England',
					'New Zealand', 'Australia'],
			'wsl': ['England', 'India', 'New Zealand',
					'South Africa', 'Pakistan'],
			'msl': ['Pakistan', 'India', 'Australia',
					'England', 'New Zealand']}

# Conversion of the provided dictionary into a corresponding DataFrame
rugby_per_pd = pd.DataFrame(rugby_performances)

# prior to column-renaming 
print(rugby_per_pd.columns)

rugby_per_pd = rugby_per_pd.add_prefix('col_')
rugby_per_pd= rugby_per_pd.add_suffix('_1')

# results post-column-renaming
rugby_per_pd.head()</pre>
</div>

Use a dataframe function to replace specific names of columns through Dataframe.columns.str.replace

In this example, we’ll use the replace function to rename the column’s name. As an argument for the column, we’ll pass the old and new names.

<div class="wp-block-codemirror-blocks-code-block code-block">
<pre># Import pandas package
import pandas as pd

# Define a dictionary containing ICC rankings
rugby_performances = {'u20': ['India', 'South Africa', 'England',
					'New Zealand', 'Australia'],
			'wsl': ['England', 'India', 'New Zealand',
					'South Africa', 'Pakistan'],
			'msl': ['Pakistan', 'India', 'Australia',
					'England', 'New Zealand']}

# Conversion of the  provided dictionary to a corresponding DataFrame
rugby_per_pd = pd.DataFrame(rugby_performances )

# Before renaming the columns
print(rugby_per_pd.columns)
# df = rugby_per_pd

rugby_per_pdcolumns = rugby_per_pd.columns.str.replace('u20', 'Col_JUNIORS')
rugby_per_pd.columns = rugby_per_pd.columns.str.replace('wsl', 'Col_WSL')
rugby_per_pd.columns = rugby_per_pd.columns.str.replace('msl', 'Col_MSL')

rugby_per_pd.head()</pre>
</div>

Conclusion

Row-oriented tabular data in the form of a DataFrame has both rows and columns. We can alternatively describe a DataFrame as a collection of various columns, each of which has a variety of column kinds, including string, numeric, and others.

The rename() method, which requires that we supply only the columns we wish to rename in dictionary (key, value) format, is the best approach, in our view. The columns property is the most straightforward technique, but its main disadvantage is that even if we only want to rename a few columns, we must pass all the columns. Another helpful option is to rename columns as the CSV file is being read. Also, note that only when we wish to replace some characters with other characters is columns.str.replace() the best choice.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *