Home Python Changing Index in Pandas explained with examples

Changing Index in Pandas explained with examples

In a Pandas DataFrame, a row is uniquely identified by its Index. It is merely a label for a row. The default values, or numbers ranging from 0 to n-1, will be used if we don’t specify index values when creating the DataFrame, where n is the number of rows.

The DataFrame Index can be set using pre-existing columns by using the set_index() function. Use one or more pre-existing columns or arrays of the right length to set the DataFrame Index (row labels). The Index may supplement or replace the current Index. To modify the indices of rows in the DataFrame that we will construct or that has already been produced by default, we will use the set_index() function.

This article will cover a DataFrame’s indexes and how to add additional indexes to an existing DataFrame. We will also endeavor to see how the set_index() function allows us to adjust the integer index that the Python constructor creates by default for each row. At the core of its functionality is understanding the set_index() function’s syntax and how we can use it to set the row index of a DataFrame in Pandas using lists, series, and columns.

How to modify the Index in a Column in Pandas

Using the Pandas set_index method, we may convert one of the columns in the DataFrame into the Index. Let’s examine the syntax of the set_index() approach to understand better how it functions.

The syntax(dataframe.set_index) is as follows:

DataFrame.set_index(self, keys, drop=True, append=False, inplace=False, verify_integrity=False)

Explanation of the parameters mentioned in the syntax above:

keys

This option can either be a single column key, an array the same size as the calling DataFrame, or a list with any possible combination. Series, Index, np.ndarray, and Iterator instances are all included in this definition of “array.”

The default value is a label or array-like or list of labels/arrays. Also, note that this parameter is required.

bool

It is a boolean value whose default value is True. Essentially, it requires columns that will be the new Index should be deleted. Additionally, it is a required parameter.

append

The default value for this parameter is False though it is a required parameter. If necessary, note its correspondence to adding columns to an existing index.

inplace

It is a required parameter with a default value of False. It implies updating the existing DataFrame (do not create a new object).

verify_integrity

By default, it is False though it is a required parameter. It checks for duplicates in the new Index.
If not, postpone the check until it is required. This method will operate more effectively if the value is set to False.

Make a dataframe first

# import necessary packages
import pandas as pd

# let's create a new dataframe
Employee = pd.DataFrame({'reg_id': ['EMP001', 'EMP002', 'EMP003',
										'EMP004', 'EMP005'],
						'emp_id': ['22EMP1', '22EMP2', '22EMP3',
										'22EMP4', '22EMP5'],
						'emp_name': ['Green', 'Bright', 'Mike',
										'Joy', 'Ann'],
						'height': [5.9, 6.2, 5.6, 5.8, 5.10]})
# display dataframe
Employee

The set_index() method

The set_index method, which is present in Pandas and allows defining the indexes, is required to update the index values.

The syntax is as follows:

DataFrameName.set_index("column_name_to_setas_Index",inplace=True/False)

where,

The inplace parameter, which determines whether an index change is permanent or transient, supports True or False values. True means the change is long-lasting. On the other hand, False means the change is only temporary. By setting the inplace option to false (or not at all), we can temporarily change the Index. By default, the inplace value is false.

# import necessary packages
import pandas as pd

# let's create a new dataframe
Employee = pd.DataFrame({'reg_id': ['EMP001', 'EMP002', 'EMP003',
										'EMP004', 'EMP005'],
						'emp_id': ['22EMP1', '22EMP2', '22EMP3',
										'22EMP4', '22EMP5'],
						'emp_name': ['Green', 'Bright', 'Mike',
										'Joy', 'Ann'],
						'height': [5.9, 6.2, 5.6, 5.8, 5.10]})


# temporarily putting the registration id as the Index
Employee.set_index("reg_id")

However, as the action was only temporary, it was not stored when the data was displayed in a DataFrame. Thus, if you show the DataFrame by running the following command, it still appears as before.

print(Employee)

As we didn’t specify the inplace parameter in the set_index method, it is considered false and a temporary operation by default. Now we can attempt the same by changing the Index permanently by specifying inplace=True in the set_index method.

# import necessary packages
import pandas as pd

# let's create a new dataframe
Employee = pd.DataFrame({'reg_id': ['EMP001', 'EMP002', 'EMP003',
										'EMP004', 'EMP005'],
						'emp_id': ['22EMP1', '22EMP2', '22EMP3',
										'22EMP4', '22EMP5'],
						'emp_name': ['Green', 'Bright', 'Mike',
										'Joy', 'Ann'],
						'height': [5.9, 6.2, 5.6, 5.8, 5.10]})


# permanently putting the registration id as the Index
Employee.set_index("reg_id",inplace=True)

print(Employee)

Follow the code below if you want to obtain specific columns instead of all of them selectively.

# import necessary packages
import pandas as pd

# let's create a new dataframe
Employee = pd.DataFrame({'reg_id': ['EMP001', 'EMP002', 'EMP003',
										'EMP004', 'EMP005'],
						'emp_id': ['22EMP1', '22EMP2', '22EMP3',
										'22EMP4', '22EMP5'],
						'emp_name': ['Green', 'Bright', 'Mike',
										'Joy', 'Ann'],
						'height': [5.9, 6.2, 5.6, 5.8, 5.10]})


# permanently putting the registration id as the Index
Employee.set_index("reg_id",inplace=True)

# displaying the necessary columns in a dataframe.
Employee[["emp_name", "height"]]

How to reset Index: reset_index ()

Let’s say we wish to undo all we’ve done so far and stop using the Index provided by some particular columns in our dataframe. As demonstrated in the following examples, we can use the reset_index() command in this situation:

#If the dataframe does not have the column that serves as an index
df. reset_index()
#if the dataframe contains the column that serves as an index
df. reset_index ( drop = True )

In conclusion, the reset_index () command repositions the dataframe’s index column. If it is already there, you must supply the “True” option for the reset_index() command’s “drop” parameter.

Example: How to change the month attribute as the Index

import numpy as np
import pandas as pd

month_df = pd.DataFrame({'month': [4, 7, 10, 12],
                   'year': [2019, 2021, 2020, 2021],
                   'no_of_sales': [73, 58, 103, 49]})
print(month_df)

The intention is to set the Index as the ‘month’ column from the DataFrame above.

month_df.set_index('month')

Alternatively, you can use the columns “year” and “month” to create a MultiIndex by running the following commands:

month_df.set_index([pd.Index([2, 3, 4, 5]), 'year'])

The other option is using two Series to create a MultiIndex:

n_series = month_df.Series([2, 3, 4, 5])
df.set_index([n_series, n_series**2])

Example: Using Python Range, set the DataFrame’s Index

Let’s imagine that for the DataFrame to start at any number, and we need to define a set of numbers as the Index. For instance, we want the employee DataFrame’s ID number to begin at 1. The DataFrame cannot be utilized. Using a list of all the numbers as input, use the set index() function. In this case, the Python range() function is appropriate. We can generate a Pandas index that we can then send to the DataFrame.set_index() using the range() function. Let’s establish a DataFrame to use the range() function to change the row index.

import pandas as pd

emp_df = pd.DataFrame({
"f_name":["Tom","White","Mike","Green","Tyson"],
"position":[3,5,6,2,4],
"commision":[1500,1300,1300,1699,1400]
"net_pay":[3200,3600,3000,3200,3300]
})

print(emp_df)

We used the columns “f_name,” “position,” “commission,” and “net_pay” when we established our DataFrame. Let’s replace the integer index’s default value with one set using the range() method. The range() method produces a set of numbers that, by default, begins at 0, grows by 1, and terminates just before a given number.

index_val =pd.Index(range(1,6,1))

emp_df = emp_df.set_index(index_val)
print(emp_df)

We defined the index range as beginning at 1, increasing by 1, and ending before 6. After determining the index range, we used the set_index() function to set the row index of our DataFrame by using the “index” variable as an input.

Example: Using Multiple Columns to Set the DataFrame’s Index

Multi-index DataFrames in Python Pandas is defined as having more than one row or column as an index. We can designate several columns as row labels by using the DataFrame.set Index () function. It should be clear that adding additional indexes complicates our DataFrame.
There are various methods to structure the Index. We’ll demonstrate a straightforward process for setting many columns as an index. Let’s start by making a DataFrame.

import pandas as pd

emp_df = pd.DataFrame({

"id":["emp_1","emp_2","emp_3","emp_4","emp_5"],
"name":["Mikeson","Jonathan","White","Bright","Nathan"],
"department":["operations","human resource","sales","marketing","information technology"]
"dep_code":["OP","HR","S","M","IT"]
})

Our DataFrame consists of four columns – “id”, “name”, “department”, and “dep_code”.

To view this in an organized manner, run the following command.

print(emp_df)

We select the columns that should serve as the DataFrame’s indexes based on these columns. After selecting the appropriate columns, we pass a list with two labels inside the set_index() function.

emp_df =emp_df .set_index(['id','dep_code'])
print(emp_df)

The DataFrame’s row indexes are assigned to the columns “id” and “dep_code.” We assigned these columns as the indexes by utilizing the names of the columns inside the list and providing them to set_index().Set index accepts the list [“id,” “dep_code”] as a parameter(). As you will find out in the output, the name and department columns are the new indexes.

Conclusion

Pandas automatically provide a column as an “index” when we construct a dataframe or import a dataset. We’ve seen how to set the Pandas DataFrame’s Index using either a list of labels or the columns already in this article. Further, we’ve discussed every scenario in which new row labels must be assigned or current ones modified.

A DataFrame is the name of the tabular structure in the Pandas package. Labels are used to represent each row and column. A column label is a column index or header, whereas an index is a row label. When creating a DataFrame, Python Pandas, by default, designate a range of numbers (starting at 0) as an index for rows. A row index is used to identify each row specifically.

You may also like

Leave a Comment