The ability to sort datasets is one of Pandas’ most appealing features. By sorting, you can see your relevant data at your table’s top (or bottom). There isn’t much you need to know right away. When you sort several columns with sort keys, the magic happens.
Sorting pandas by column are where the rows in the DataFrame are arranged in ascending or descending order using DataFrame by column value.
Sort DataFrame by column in Pandas
To sort a DataFrame by column values, use pandas.DataFrame.sort_values(). The syntax for sorting pandas by column is as follows:
YourDataFrame.sort_values('your_column_to_sort')
Essentially, sorting pandas by column values, use pandas.DataFrame.sort_values(columns, ascending=True) with a list of column names to sort by as columns and either True or False as ascending.
While sorting with sort values() is simple in theory, you may have issues with missing values or custom labels in practice (for example, H, L, M for High, Low, and Medium). The initial results of print(df) are:
A B 0 5 7 3 3 6 4 10 11 5 7 4
A_sorted = df.sort_values(["A", "B"], ascending=True) print(A_sorted)
A B 3 3 6 2 5 7 5 7 4 4 10 11
The example above sorts by column “A” in ascending order.
B_sorted = df.sort_values(["B", "A"], ascending=False) print(B_sorted)
A B 4 10 11 2 5 7 3 3 6 5 7 4
The example shown above sorts the column “B” in descending order.
Sorting Pandas
Let’s look at the many parameters you can send to pd.
DataFrame.sort_values():
by – You can sort by a single name or a list of names. It could be the names of columns or indexes. When you want to sort by multiple columns, pass a list of names.
axis (Default: ‘index’ or 0)
It is the axis that will be ordered. You must specify whether Pandas you want the rows to be sorted (axis=’index’ or 0)? Do you want to sort the columns (axis=’columns’ or 1)?
ascending (Default: True) – If you’re sorting by multiple columns, you can pass a single boolean (True or False) or a list of booleans ([True, False]).
inplace (Default: False)
If true, your new sort order will overwrite your current DataFrame. It will alter the situation. Your DataFrame will be returned to you if false. When working with the DataFrame later in the code, we usually use inplace=True. However, if we’re visually looking at the sort order, we’ll use inplace=False.
type (defaults to ‘quicksort’)
Select the sorting algorithm you want to use. Unless you’re dealing with massive databases, this won’t matter much. Even then, you’d have to understand the distinctions and drawbacks.
na_position (Default: ‘last’)
You may tell pandas where you want your NAs to go (if you have them). At the beginning (‘first’) or the end (‘last’).
ignore_index (Default: False)
If false, your index values will change as the sorting progresses. It is beneficial when you want to check how the rows have shifted around. Set ignore_index=True if you wish your index to stay in order and remain at 0, 1, 2, 3,…, n-1.
Returns: DataFrame or None
DataFrame with sorted values or None if inplace=True.
Key: callable, optional
The latter is an incredible parameter! You can supply a function to the key that will produce a derived value that will be the key that is sorted on, based on your column or row. Before sorting, apply the key function to the values. It is comparable to the built-in sorted() method’s key parameter, except that you should vectorize this critical function. It should expect and deliver a Series with the same shape as the input. It will apply separately to each column in the table.
Take a look at the sample below.
Let’s say you wanted to sort by a column’s absolute value. You could sort by creating a derived column with fundamental values, but it feels inconvenient. Instead, use a pivotal function to sort your column by absolute values.
Let’s start by making a dataframe.
# importing pandas library import pandas as pd # creation and nesting of the list df = pd.DataFrame.from_dict({ "San Francisco": [67, 72, 49, 56], "Chicago": [102, 75, 80, -3], "Fairbanks": [45, 5, -10, 80], "Miami": [67, 87, 90, 75] }) df
Ascending vs. Descending Pandas
You’ll need to decide whether you want your values sorted from highest to lowest (descending) or from lowest to highest (ascending).
- Ascending = The lowest values will appear first or at the top.
- Ascending = The higher values will appear first or on top.
A Jupyter notebook demonstrates the various ways to sort a pandas DataFrame can be found here.
import pandas as pd
Values Sorted by Pandas
Sort Values allows you to sort a DataFrame (or series) by a column or row. Consider the following examples:
- Sort DataFrame by a single column
- Sort DataFrame by mulitple columns
- Sort DataFrame by a single row
- Apply a key to sort on – Example: Sort by absolute value
Let’s start by making our DataFrame of city temperatures.
df = pd.DataFrame.from_dict({ "San Francisco": [67, 72, 49, 56], "Chicago": [102, 75, 80, -3], "Fairbanks": [45, 5, -10, 80], "Miami": [67, 87, 90, 75] }) df
Sorting a DataFrame by a single column
You have to call YourDataFrame to sort a dataframe by a single column.
sort_values('your_column')
Let’s sort our DataFrame by temperatures in Chicago in this case.
df.sort_values('Chicago')
It’s worth noting how the DataFrame was ordered from lowest to highest by the Chicago column. Because ascending=True is the default sort order, this is the case. Set ascending=False if you wish to reverse the order.
df.sort_values('Chicago', ascending=False)
Sorting DataFrame by numerous columns
We’re going to make another DataFrame that will work better for sorting by several columns.
df = pd.DataFrame.from_dict({ "500 Club": ["Bar", 34.64], "Liho Liho": ["Restaurant", 200.45], "Foreign Cinema": ["Restaurant", 180.45], "The Square": ["Bar", 45.54] }, orient='index', columns=['Type', 'AvgBill']) df
Assume we wanted to arrange ‘Type’ alphabetically (such that Bar is above the Restaurant) before sorting AvgBill in decreasing order (highest > lowest). We’ll need to do the following to accomplish this.
Use the “by=” argument to specify a list of column names. The “ascending” parameter takes a list of booleans that informs pandas which columns we want ascending or descending.
df.sort_values(by=['Type', 'AvgBill'], ascending=[True, False])
Notice how we sorted the first column, ‘Type,’ ascending=True, then ascending=False for the second column, ‘AvgBill.’
# importing pandas library import pandas as pd # Initializing the nested list with Data set age_list = [['Myanmar', 1952, 8425333, 'Asia'], ['Austria', 1957, 9712569, 'Oceania'], ['Canada', 1962, 76039390, 'Americas'], ['South Korea', 1957, 637408000, 'Asia'], ['Germany', 1957, 44310863, 'Europe'], ['Vietname', 1952, 3.72e+08, 'Asia'], ['Mexico', 1957, 0, 'Americas']] # creating a pandas dataframe df = pd.DataFrame(age_list, columns=['Country', 'Year', 'Population', 'Continent']) # Sorting by column "Population" # by putting missing values first df.sort_values(by=['Population'], na_position='first') # column sorting by "Country" and then "Continent" df.sort_values(by=['Country', 'Continent'])
Sorting the DataFrame by a single row
Let’s move on to the row side now. We only want to sort my columns in a precise order on rare occasions (we prefer tall tables to wide tables, so this doesn’t happen often). Further, we need to inform Pandas that I want to sort by rows and which row I want to sort by to accomplish this. Let’s go back to the DataFrame we started with. We’re going to sort by the label index=3. So, we’ll need to set axis=1 to accomplish this.
df.sort_values(by=3, axis=1)
Our DataFrame’s columns have now been sorted in ascending order by index=3!
Special Key Sorting the Columns
We wish to sort the column by the absolute worth of its contents in this situation. Check out Fairbanks; currently, -10 is the lowest value, but we’ll order by absolute value such that five is at the top.
What’s going on under the hood?
Pandas apply a function to each column value (much like pandas does). The object (or key) that is sorted on will be the outcome of that function.
df.sort_values(by='Fairbanks', key=pd.Series.abs)
Now, look at how we sorted on Fairbanks, with the lowest numbers at the top (descending) and the value 5 being higher than the value -10. It is because we specified the key as the column’s absolute value!
If you’re utilizing numerous columns, you won’t be able to call multiple vital functions. You’ll need to use the same process to sort all the columns.
How to handle Missing Values when Sorting
Because missing values or NaN are not comparable to other values, sort_values() default to sorting the NaN at the end of the DataFrame and modifying an existing DataFrame to add NaN and sorting on the age column, for example.
import pandas as pd name =['Edith', 'Mike', 'Thomas','Hans', 'Joy', 'Ann', 'Cyrillah'] age =[18,28,70,34,20,85,21] height =[133,183,141,172,199,122,201] weight =[90, 48, 70,59, 86,95,63] shirt_size=['S','M','M','L','S','L','L'] # DataFrame df = pd.DataFrame.from_dict({"name":name,"age":age,"height":height,"weight":weight,"shirt_size":shirt_size}) df.head(10) df.loc[5,'age'] = np.nan import numpy as np df.loc[5,'age'] = np.nan df
Sorting Pandas Data frame by placing the missing values first
# importing pandas library import pandas as pd # Initializing the nested list with Data set age_list = [['Myanmar', 1952, 8425333, 'Asia'], ['Austria', 1957, 9712569, 'Oceania'], ['Canada', 1962, 76039390, 'Americas'], ['South Korea', 1957, 637408000, 'Asia'], ['Germany', 1957, 44310863, 'Europe'], ['Vietname', 1952, 3.72e+08, 'Asia'], ['Mexico', 1957, 0, 'Americas']] # creating a pandas dataframe df = pd.DataFrame(age_list, columns=['Country', 'Year', 'Population', 'Continent']) # Sorting by column "Population" # by putting missing values first df.sort_values(by=['Population'], na_position='first') # or sort by df.sort_values(by=['Fairbanks'], na_position='first')
Example: Sorting Pandas Data frame by placing the missing values first
df = pd.DataFrame({ 'first col': ['A', 'A', 'B', np.nan, 'D', 'C'], 'second col': [2, 1, 9, 8, 7, 4], 'third col': [0, 1, 9, 4, 2, 3], 'fourth col': ['a', 'B', 'c', 'D', 'e', 'F'] }) df.sort_values(by='first col', ascending=False, na_position='first')
Natural sort with the critical argument, using the natsort
df = pd.DataFrame({ "time": ['0hr', '128hr', '72hr', '48hr', '96hr'], "value": [10, 20, 30, 40, 50] }) df from natsort import index_natsorted df.sort_values( by="time", key=lambda x: np.argsort(index_natsorted(df["time"])) )
Modifying Your DataFrame Using Sort Methods
Both.sort_values() and.sort_index() have yielded DataFrame objects in all of the instances you’ve seen thus far. It is the case because sorting in pandas does not operate in place by default. Because it creates a new DataFrame instead of changing the original, this is the most popular and preferred technique to examine data with pandas. It allows you to keep the data in the same state as read from the file.
However, you can directly edit the original DataFrame by setting the optional argument inplace to True. The inplace parameter is present in the majority of pandas methods. In the examples below, you’ll learn how to use inplace=True to sort your DataFrame.
In-place use of.sort_values()
When inplace is set to True, the original DataFrame is modified so that the sort methods return None. Sort your DataFrame by the values of the city08 column. But with inplace set to True, as in the first example:
import pandas as pd column_subset = [ "id", "make", "model", "year", "cylinders", "fuelType", "trany", "mpgData", "city08", "highway08" ] df = pd.read_csv( "https://www.fueleconomy.gov/feg/epadata/vehicles.csv", usecols=column_subset, nrows=10 ) df.head() df.sort_values("city08", inplace=True)
.sort_values() does not return a DataFrame, as you can see. It is how the original df looks. The data in the df object are now sorted in ascending order by the city08 column. Your original DataFrame has been altered, and the changes will remain in place. Because you can’t undo the changes to your DataFrame, it’s best to avoid using inplace=True for analysis.
In-Place use of .sort_index()
The following example shows how to use inplace with .sort_index(). Because the index is constructed in ascending order when you read your file into a DataFrame, you can change the order of your df object again. To change the DataFrame, use.sort _ndex() with inplace set to True:
df.sort_index(inplace=True) df
.sort_index() has now been used to modify your DataFrame once more. Because your DataFrame still has its default index, sorting it in ascending order restores the original order of the data.
If you’re familiar with Python’s built-in functions sort() and sorted(), you’ll recognize the inplace parameter in the pandas’ sort methods. Check out How to Use sorted() and sort() in Python for additional information.
Conclusion
This article covered all possibilities for using Python’s sort_values() to sort a DataFrame in Pandas. Sorting a DataFrame in Pandas in a given order, such as ascending or descending, is very easy with the built-in method sort)values()!
You now know how to utilize the pandas library’s.sort_values() and.sort_index() methods (). You can use a DataFrame to perform fundamental data analysis with this understanding. While there are many similarities between these two methodologies, understanding the differences allows you to choose which one to utilize for various analytical tasks.
These techniques are an essential aspect of mastering data analysis. They’ll assist you in laying a solid foundation for doing more sophisticated pandas operations. The pandas manual is an excellent resource if you want to see some examples of more advanced uses of pandas sort methods.
You can give these a shot and tell us what other techniques you use to sort your Pandas DataFrame.