Get topmost N records within each group of a Pandas DataFrame

Last Updated : 08 Sep, 2022

Firstly, the pandas dataframe stores data in the form of a table. In some situations we need to retrieve data from dataframe according to some conditions. Such as if we want to get top N records of each group of the dataframe. We create the dataframe and use the methods mentioned below.

Get topmost N records within each group

Firstly, we created a pandas dataframe in Python:

Python3

#importing pandas as pd
import pandas as pd
 
#creating dataframe
df=pd.DataFrame({ 'Variables': ['A','A','A','A','B','B',
                                'B','C','C','C','C'],
                 'Value': [2,5,0,3,1,0,9,0,7,5,4]})
df

Output:

   Variables  Value
0          A      2
1          A      5
2          A      0
3          A      3
4          B      1
5          B      0
6          B      9
7          C      0
8          C      7
9          C      5
10         C      4

Using Groupby() function of pandas to group the columns

Now, we will get topmost N values of each group of the ‘Variables’ column. Here reset_index() is used to provide a new index according to the grouping of data. And head() is used to get topmost N values from the top.

Example 1: Suppose the value of N=2

Python3

# setting value of N as 2
N = 2
 
# using groupby to group acc. to
# column 'Variable'
print(df.groupby('Variables').head(N).reset_index(drop=True))

Output:

  Variables  Value
0         A      2
1         A      5
2         B      1
3         B      0
4         C      0
5         C      7

Example 2: Now, suppose the value of N=4

Python3

# setting value of N as 2
N = 4
 
# using groupby to group acc. 
# to column 'Variable'
print(df.groupby('Variables').head(N).reset_index(drop=True))

Output:

   Variables  Value
0          A      2
1          A      5
2          A      0
3          A      3
4          B      1
5          B      0
6          B      9
7          C      0
8          C      7
9          C      5
10         C      4

Using nlargest() function of pandas to group the columns

Now, we will get topmost N values of each group of the ‘Variables’ column. Here nlargest() function is used to get the n largest values in the specified column.

Python3

# importing pandas as pd
import pandas as pd
 
# creating dataframe
df=pd.DataFrame({ 'Variables': ['A','A','A','A','B','B',
                                'B','C','C','C','C'],
                'Value': [2,5,0,3,1,0,9,0,7,5,4]})
#print(df)
d = df.nlargest(4, 'Value')
print(d)

Output:

 Variables  Value
6         B      9
8         C      7
1         A      5
9         C      5

Suggest improvement

Get first n records of a Pandas DataFrame

Share your thoughts in the comments

Get topmost N records within each group of a Pandas DataFrame