How to sort a Pandas DataFrame by multiple columns in Python?
Sorting is a fundamental operation applied to dataframes to arrange data based on specific conditions. Dataframes can be sorted alphabetically or numerically, providing flexibility in organizing information. This article explores the process of sorting a Pandas Dataframe by multiple columns, demonstrating the versatile capabilities of Pandas in handling complex sorting requirements.
Sort DataFrame by One or More Columns Syntax
Syntax: df_name.sort_values(by column_name, axis=0, ascending=True, inplace=False, kind=’quicksort’, na_position=’last’, ignore_index=False, key=None)
Parameters:
- by: name of list or column it should sort by
- axis: Axis to be sorted.(0 or ‘axis’ 1 or ‘column’) by default its 0.(column number)
- ascending: Sorting ascending or descending. Specify lists of bool values for multiple sort orders. The list of bool values must match the no. of values of ‘by’ i.e. column_names. By default it is true.
- inplace: By default it is false. but if its value is true it performs operation in-place i.e. in proper place.
- kind: Choice of sorting algorithm like quick sort. merge sort, heap sort. by default it is quick sort.
Ways to Sort DataFrame by One or More Columns
There are various way to Sort DataFrame by One or More Columns. here we are discussing some generally used method for Sort DataFrame by One or More Columns those are follows.
Creating a DataFrame
In this example code creates a Pandas DataFrame with columns ‘Name’, ‘Age’, and ‘Rank’. The ‘Name’ column contains names, the ‘Age’ column represents ages, and the ‘Rank’ column contains numerical values with some NaN (Not a Number) entries.
Python3
import numpy as np
import pandas as pd
df = pd.DataFrame({ 'Name' : [ 'Raj' , 'Akhil' , 'Sonum' , 'Tilak' , 'Divya' , 'Megha' ],
'Age' : [ 20 , 22 , 21 , 19 , 17 , 23 ],
'Rank' : [ 1 , np.nan, 8 , 9 , 4 , np.nan]})
df
|
Output:
Name Age Rank
0 Raj 20 1.0
1 Akhil 22 NaN
2 Sonum 21 8.0
3 Tilak 19 9.0
4 Divya 17 4.0
5 Megha 23 NaN
Sort DataFrame by One or More Columns Using sort_values() method
Use pandas’ `sort_values()` method to easily organize a DataFrame by one or more columns, specifying column names and sorting direction with the `ascending` parameter.
Sort by Single Column
In this example the below code sorts the DataFrame ‘df’ by the ‘Age’ column in descending order and prints the resulting sorted DataFrame, ‘sorted_df.’ ascending value is false so, DataFrame is sorted into descending order.
Python3
print ( 'SORTED DATAFRAME' )
sorted_df = df.sort_values(by = [ 'Age' ], ascending = False )
print (sorted_df)
|
Output:
SORTED DATAFRAME
Name Age Rank
5 Megha 23 NaN
1 Akhil 22 NaN
2 Sonum 21 8.0
0 Raj 20 1.0
3 Tilak 19 9.0
4 Divya 17 4.0
Sort By Two Column
In this example code sorts the DataFrame “df” by ‘Rank’ in ascending order and ‘Age’ in descending order, placing missing values first, and prints the resulting sorted DataFrame as “sorted_df.” DataFrame is sorted according to ‘Rank’ column and the nan values are positioned at the first.
Python3
print ( 'SORTED DATAFRAME' )
sorted_df = df.sort_values(by = [ 'Rank' , 'Age' ], ascending = [ True , False ], na_position = 'first' )
print (sorted_df)
|
Output:
SORTED DATAFRAME
Name Age Rank
1 Akhil 22 NaN
5 Megha 23 NaN
0 Raj 20 1.0
4 Divya 17 4.0
2 Sonum 21 8.0
3 Tilak 19 9.0
Sort by Multiple Column
In the above example the dataframe is sorted based on the ‘Rank’ column, but the index number is started with 0 because we have given parameter ‘ignore_index = True’. In other examples the index is unordered because we have not given ‘ignore_index’ parameter.
Python3
print ( 'SORTED DATAFRAME' )
sorted_df = df.sort_values(by = [ 'Name' , 'Rank' ], axis = 0 ,
ascending = [ False , True ],
inplace = False ,
kind = 'quicksort' , na_position = 'first' ,
ignore_index = True , key = None )
print (sorted_df)
|
Output:
SORTED DATAFRAME
Name Age Rank
0 Sonum 21 8.0
1 Tilak 19 9.0
2 Raj 20 1.0
3 Megha 23 NaN
4 Divya 17 4.0
5 Akhil 22 NaN
Sort DataFrame by One or More Columns Using sort_index() method
Syntax: df_name.sort_index(axis=0, level=None, ascending=True, inplace=False, kind=’quicksort’, na_position=’last’, sort_remaining=True, ignore_index=False, key=None)
Short by Single Column
The index of the DataFrame is in descending order because the value of ascending parameter is False. The DataFrame is sorted in order of index.
Python3
print ( 'SORTED DATAFRAME' )
sorted_df = df.sort_index(ascending = False )
print (sorted_df)
|
Output:
SORTED DATAFRAME
Name Age Rank
5 Megha 23 NaN
4 Divya 17 4.0
3 Tilak 19 9.0
2 Sonum 21 8.0
1 Akhil 22 NaN
0 Raj 20 1.0
Sort by Two Column
In this example code in Python prints a sorted version of a DataFrame (`df`) by rearranging its columns in descending order based on their index values. The result is displayed as “SORTED DATAFRAME” followed by the sorted DataFrame (`sorted_df`).
Python3
print ( 'SORTED DATAFRAME' )
sorted_df = df.sort_index(axis = 1 , ascending = False )
print (sorted_df)
|
Output:
SORTED DATAFRAME
Rank Name Age
0 1.0 Raj 20
1 NaN Akhil 22
2 8.0 Sonum 21
3 9.0 Tilak 19
4 4.0 Divya 17
5 NaN Megha 23
Sort DataFrame by One or More Columns Using nlargest()
Method
To sort a DataFrame by one or more columns using the nlargest()
method in pandas, you can specify the column(s) by which to perform the sorting. The method will return the specified number of rows with the largest values in the chosen column(s).
Sort by Single Column
In this example code utilizes the sorting function in Pandas to arrange the DataFrame (`df`) in descending order based on the values in ‘Column1’. The result is stored in the ‘sorted_df1’ variable, and the sorted DataFrame is then printed.
Python3
print ( 'SORTED DATAFRAME' )
sorted_df1 = df.nlargest(n = len (df), columns = 'Column1' )
print (sorted_df1)
|
Output :
SORTED DATAFRAME
Name Age Rank
3 Tilak 19 9.0
2 Sonum 21 8.0
4 Divya 17 4.0
0 Raj 20 1.0
1 Akhil 22 NaN
5 Megha 23 NaN
Sort by Two Column
In this example code utilizes the sorting function to arrange a DataFrame named ‘df’ based on two columns, ‘Age’ and ‘Rank,’ in ascending order. The sorted DataFrame, ‘df_sorted,’ is then printed to display the organized data.
Python3
print ( 'SORTED DATAFRAME' )
df_sorted = df.sort_values(by = [ 'Age' , 'Rank' ], ascending = [ True , True ])
print (df_sorted)
|
Output :
SORTED DATAFRAME
Name Age Rank
4 Divya 17 4.0
3 Tilak 19 9.0
0 Raj 20 1.0
2 Sonum 21 8.0
1 Akhil 22 NaN
5 Megha 23 NaN
Sort by Multiple Column
In this example code utilizes the sorting function on a DataFrame called ‘df’ based on two columns, ‘Rank’ and ‘Age,’ in descending order. The sorted DataFrame, ‘df_sorted,’ is then printed to display the arranged data.
Python3
print ( 'SORTED DATAFRAME' )
df_sorted = df.sort_values(by = [ 'Rank' , 'Age' ], ascending = [ False , False ])
print (df_sorted)
|
Output :
SORTED DATAFRAME
Name Age Rank
0 Raj 20 1.0
2 Sonum 21 8.0
3 Tilak 19 9.0
4 Divya 17 4.0
1 Akhil 22 NaN
5 Megha 23 NaN
Last Updated :
18 Dec, 2023
Like Article
Save Article
Share your thoughts in the comments
Please Login to comment...