Get topmost N records within each group of a Pandas DataFrame
Last Updated :
08 Sep, 2022
Firstly, the pandas dataframe stores data in the form of a table. In some situations we need to retrieve data from dataframe according to some conditions. Such as if we want to get top N records of each group of the dataframe. We create the dataframe and use the methods mentioned below.
Get topmost N records within each group
Firstly, we created a pandas dataframe in Python:
Python3
import pandas as pd
df = pd.DataFrame({ 'Variables' : [ 'A' , 'A' , 'A' , 'A' , 'B' , 'B' ,
'B' , 'C' , 'C' , 'C' , 'C' ],
'Value' : [ 2 , 5 , 0 , 3 , 1 , 0 , 9 , 0 , 7 , 5 , 4 ]})
df
|
Output:
Variables Value
0 A 2
1 A 5
2 A 0
3 A 3
4 B 1
5 B 0
6 B 9
7 C 0
8 C 7
9 C 5
10 C 4
Using Groupby() function of pandas to group the columns
Now, we will get topmost N values of each group of the ‘Variables’ column. Here reset_index() is used to provide a new index according to the grouping of data. And head() is used to get topmost N values from the top.
Example 1: Suppose the value of N=2
Python3
N = 2
print (df.groupby( 'Variables' ).head(N).reset_index(drop = True ))
|
Output:
Variables Value
0 A 2
1 A 5
2 B 1
3 B 0
4 C 0
5 C 7
Example 2: Now, suppose the value of N=4
Python3
N = 4
print (df.groupby( 'Variables' ).head(N).reset_index(drop = True ))
|
Output:
Variables Value
0 A 2
1 A 5
2 A 0
3 A 3
4 B 1
5 B 0
6 B 9
7 C 0
8 C 7
9 C 5
10 C 4
Using nlargest() function of pandas to group the columns
Now, we will get topmost N values of each group of the ‘Variables’ column. Here nlargest() function is used to get the n largest values in the specified column.
Python3
import pandas as pd
df = pd.DataFrame({ 'Variables' : [ 'A' , 'A' , 'A' , 'A' , 'B' , 'B' ,
'B' , 'C' , 'C' , 'C' , 'C' ],
'Value' : [ 2 , 5 , 0 , 3 , 1 , 0 , 9 , 0 , 7 , 5 , 4 ]})
d = df.nlargest( 4 , 'Value' )
print (d)
|
Output:
Variables Value
6 B 9
8 C 7
1 A 5
9 C 5
Share your thoughts in the comments
Please Login to comment...