Open In App

How to sort a Pandas DataFrame by multiple columns in Python?

Improve
Improve
Like Article
Like
Save
Share
Report

Sorting is a fundamental operation applied to dataframes to arrange data based on specific conditions. Dataframes can be sorted alphabetically or numerically, providing flexibility in organizing information. This article explores the process of sorting a Pandas Dataframe by multiple columns, demonstrating the versatile capabilities of Pandas in handling complex sorting requirements.

Sort DataFrame by One or More Columns Syntax

Syntax: df_name.sort_values(by column_name, axis=0, ascending=True, inplace=False, kind=’quicksort’, na_position=’last’, ignore_index=False, key=None)

Parameters:

  • by: name of list or column it should sort by
  • axis: Axis to be sorted.(0 or ‘axis’  1 or ‘column’) by default its 0.(column number)
  • ascending: Sorting ascending or descending. Specify lists of bool values for multiple sort orders. The list of bool values must match the no. of values of ‘by’ i.e. column_names. By default it is true.
  • inplace: By default it is false. but if its value is true it performs operation in-place i.e. in proper place.
  • kind: Choice of sorting algorithm like quick sort. merge sort, heap sort. by default it is quick sort.

Ways to Sort DataFrame by One or More Columns

There are various way to Sort DataFrame by One or More Columns. here we are discussing some generally used method for Sort DataFrame by One or More Columns those are follows.

Creating a DataFrame

In this example code creates a Pandas DataFrame with columns ‘Name’, ‘Age’, and ‘Rank’. The ‘Name’ column contains names, the ‘Age’ column represents ages, and the ‘Rank’ column contains numerical values with some NaN (Not a Number) entries.

Python3




#import libraries
import numpy as np
import pandas as pd
 
# creating a dataframe
df = pd.DataFrame({'Name': ['Raj', 'Akhil', 'Sonum', 'Tilak', 'Divya', 'Megha'],
                   'Age': [20, 22, 21, 19, 17, 23],
                   'Rank': [1, np.nan, 8, 9, 4, np.nan]})
 
# printing the dataframe
df


Output:

    Name  Age  Rank
0 Raj 20 1.0
1 Akhil 22 NaN
2 Sonum 21 8.0
3 Tilak 19 9.0
4 Divya 17 4.0
5 Megha 23 NaN

Sort DataFrame by One or More Columns Using sort_values() method

Use pandas’ `sort_values()` method to easily organize a DataFrame by one or more columns, specifying column names and sorting direction with the `ascending` parameter.

Sort by Single Column

In this example the below code sorts the DataFrame ‘df’ by the ‘Age’ column in descending order and prints the resulting sorted DataFrame, ‘sorted_df.’ ascending value is false so, DataFrame is sorted into descending order.

Python3




# Using the sorting function
print('SORTED DATAFRAME')
sorted_df = df.sort_values(by=['Age'], ascending=False)
print(sorted_df)


Output:

SORTED DATAFRAME
Name Age Rank
5 Megha 23 NaN
1 Akhil 22 NaN
2 Sonum 21 8.0
0 Raj 20 1.0
3 Tilak 19 9.0
4 Divya 17 4.0

Sort By Two Column

In this example code sorts the DataFrame “df” by ‘Rank’ in ascending order and ‘Age’ in descending order, placing missing values first, and prints the resulting sorted DataFrame as “sorted_df.” DataFrame is sorted according to ‘Rank’ column and the nan values are positioned at the first.

Python3




print('SORTED DATAFRAME')
sorted_df = df.sort_values(by = ['Rank', 'Age'], ascending = [True, False], na_position = 'first')
print(sorted_df)


Output:

SORTED DATAFRAME
Name Age Rank
1 Akhil 22 NaN
5 Megha 23 NaN
0 Raj 20 1.0
4 Divya 17 4.0
2 Sonum 21 8.0
3 Tilak 19 9.0

Sort by Multiple Column

In the above example the dataframe is sorted based on the ‘Rank’ column, but the index number is started with 0 because we have given parameter ‘ignore_index = True’. In other examples the index is unordered because we have not given ‘ignore_index’ parameter.

Python3




print('SORTED DATAFRAME')
sorted_df = df.sort_values(by = ['Name', 'Rank'], axis=0,
                           ascending=[False, True],
                           inplace=False,
               kind='quicksort', na_position='first',
                           ignore_index=True, key=None)
print(sorted_df)


Output:

SORTED DATAFRAME
Name Age Rank
0 Sonum 21 8.0
1 Tilak 19 9.0
2 Raj 20 1.0
3 Megha 23 NaN
4 Divya 17 4.0
5 Akhil 22 NaN

Sort DataFrame by One or More Columns Using sort_index() method

Syntax: df_name.sort_index(axis=0, level=None, ascending=True, inplace=False, kind=’quicksort’, na_position=’last’, sort_remaining=True, ignore_index=False, key=None)

Short by Single Column

The index of the DataFrame is in descending order because the value of ascending parameter is False. The DataFrame is sorted in order of index.

Python3




print('SORTED DATAFRAME')
sorted_df = df.sort_index(ascending=False)
print(sorted_df)


Output:

SORTED DATAFRAME
Name Age Rank
5 Megha 23 NaN
4 Divya 17 4.0
3 Tilak 19 9.0
2 Sonum 21 8.0
1 Akhil 22 NaN
0 Raj 20 1.0

Sort by Two Column

In this example code in Python prints a sorted version of a DataFrame (`df`) by rearranging its columns in descending order based on their index values. The result is displayed as “SORTED DATAFRAME” followed by the sorted DataFrame (`sorted_df`).

Python3




print('SORTED DATAFRAME')
sorted_df = df.sort_index(axis=1, ascending=False)
print(sorted_df)


Output:

SORTED DATAFRAME
Rank Name Age
0 1.0 Raj 20
1 NaN Akhil 22
2 8.0 Sonum 21
3 9.0 Tilak 19
4 4.0 Divya 17
5 NaN Megha 23

Sort DataFrame by One or More Columns Using nlargest() Method

To sort a DataFrame by one or more columns using the nlargest() method in pandas, you can specify the column(s) by which to perform the sorting. The method will return the specified number of rows with the largest values in the chosen column(s).

Sort by Single Column

In this example code utilizes the sorting function in Pandas to arrange the DataFrame (`df`) in descending order based on the values in ‘Column1’. The result is stored in the ‘sorted_df1’ variable, and the sorted DataFrame is then printed.

Python3




print('SORTED DATAFRAME')
sorted_df1 = df.nlargest(n=len(df), columns='Column1')
print(sorted_df1)


Output :

SORTED DATAFRAME
Name Age Rank
3 Tilak 19 9.0
2 Sonum 21 8.0
4 Divya 17 4.0
0 Raj 20 1.0
1 Akhil 22 NaN
5 Megha 23 NaN

Sort by Two Column

In this example code utilizes the sorting function to arrange a DataFrame named ‘df’ based on two columns, ‘Age’ and ‘Rank,’ in ascending order. The sorted DataFrame, ‘df_sorted,’ is then printed to display the organized data.

Python3




print('SORTED DATAFRAME')
df_sorted = df.sort_values(by=['Age', 'Rank'], ascending=[True, True])
 
print(df_sorted)


Output :

SORTED DATAFRAME
Name Age Rank
4 Divya 17 4.0
3 Tilak 19 9.0
0 Raj 20 1.0
2 Sonum 21 8.0
1 Akhil 22 NaN
5 Megha 23 NaN

Sort by Multiple Column

In this example code utilizes the sorting function on a DataFrame called ‘df’ based on two columns, ‘Rank’ and ‘Age,’ in descending order. The sorted DataFrame, ‘df_sorted,’ is then printed to display the arranged data.

Python3




print('SORTED DATAFRAME')
df_sorted = df.sort_values(by=['Rank', 'Age'], ascending=[False, False])
 
print(df_sorted)


Output :

SORTED DATAFRAME
Name Age Rank
0 Raj 20 1.0
2 Sonum 21 8.0
3 Tilak 19 9.0
4 Divya 17 4.0
1 Akhil 22 NaN
5 Megha 23 NaN


Last Updated : 18 Dec, 2023
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads