Open In App

How to Remove Duplicate Rows in R DataFrame?

Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we will discuss how to remove duplicate rows in dataframe in R programming language.

Dataset in use:

Method 1: Using distinct()

This method is available in dplyr package which is used to get the unique rows from the dataframe. We can remove rows from the entire which are duplicates and also we cab remove duplicate rows in a particular column.

Syntax:

distinct(dataframe)

distinct(dataframe,column1,column2,.,column n)

Example: R program to remove duplicate rows using distinct() function

R




# load the package
library(dplyr)
 
# create dataframe
data=data.frame(names=c("manoj","bobby","sravan",
                        "deepu","manoj","bobby") ,
                id=c(1,2,3,4,1,2),
                subjects=c("java","python","php",
                           "html","java","python"))
 
 
# remove all duplicate rows
print(distinct(data))
 
# remove  duplicate rows in subjects column
print(distinct(data,subjects))
 
# remove  duplicate rows in namescolumn
print(distinct(data,names))


Output:

Method 2: Using duplicated()

This function will return the duplicates from the dataframe, In order to get the unique rows, we have to specify ! operator  before this method

Syntax:

data[!duplicated(data$column_name), ]

where,

  • data is the input dataframe
  • column_name is the column where duplicates are removed in this column

Example: R program to remove duplicate rows using duplicated() function 

R




# create dataframe
data=data.frame(names=c("manoj","bobby","sravan",
                        "deepu","manoj","bobby") ,
                id=c(1,2,3,4,1,2),
                subjects=c("java","python","php",
                           "html","java","python"))
 
 
# remove duplicate rows in subjects column
print(data[!duplicated(data$subjects), ])
 
# remove  duplicate rows in names column
print(data[!duplicated(data$names), ])
 
# remove  duplicate rows in  id column
print(data[!duplicated(data$id), ])


Output:

Method 3 : Using unique()

This will get the unique rows from the dataframe.

Syntax:

unique(dataframe)

To get in a particular column 

Syntax:

unique(dataframe$column_name

Example: R program to remove duplicate rows using unique() function

R




# create dataframe
data=data.frame(names=c("manoj","bobby","sravan",
                        "deepu","manoj","bobby") ,
                id=c(1,2,3,4,1,2),
                subjects=c("java","python","php",
                           "html","java","python"))
 
 
# remove duplicate rows in subjects column
print(unique(data$subjects))
 
# remove  duplicate rows in names column
print(unique(data$names))
 
# remove  duplicate rows in  id column
print(unique(data$id))


 
 

Output:

 

[1] "java"   "python" "php"    "html"  
[1] "manoj"  "bobby"  "sravan" "deepu"  
[1] 1 2 3 4

 

Example: R program to apply unique() function in entire dataframe

 

R




# create dataframe
data=data.frame(names=c("manoj","bobby","sravan",
                        "deepu","manoj","bobby") ,
                id=c(1,2,3,4,1,2),
                subjects=c("java","python","php",
                           "html","java","python"))
 
 
# remove duplicate rows in entire dataframe
print(unique(data))


Output:



Last Updated : 15 Feb, 2022
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads