Open In App

Split DataFrame into Custom Bins in R

Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we are going to see how to split dataframe into custom bins in R Programming Language.

The cut() method in base R is used to first divide the range of the dataframe and then divide the values based on the intervals in which they fall. Each of the intervals corresponds to one level of the dataframe. Therefore, the number of levels is equivalent to the length of the breaks argument in the cut method.

Syntax: cut(x, breaks, labels = NULL)

Arguments :

  • x – Numeric vector to be divided
  • Breaks – A vector containing the intervals
  • Labels – labelling of the groups

Example 1: Split dataframe into Custom Bins

R




# creating a dataframe
data_frame <- data.frame(col1 = c(1:10),
                         col2 = letters[1:10],
                         col3 = c(rep(TRUE,4),
                                  rep(FALSE,6)))
print("Original DataFrame")
print(data_frame)
 
# getting rows of data
rows <- nrow(data_frame)
 
# custom bins
bins <- cut(1:rows,            
            breaks = c(0,6,rows        
                       ))
level_bins <- levels(bins)
 
# printing the subsets of dataframe
for(i in 1:length(level_bins)) {   
  assign(paste0("data_frame_", i),
         data_frame[bins == levels(bins)[i], ])
}
 
# retrieving dataframe subsets
print("DataFrame Subset 1")
print(data_frame_1)
 
print("DataFrame Subset 2")
print(data_frame_2)


 

 

Output:

 

 

Example 2: Illustrates the usage where three breakpoints are specified, thereby, dividing the rows into three subsets of the original dataframe.

 

R




# creating a dataframe
data_frame <- data.frame(col1 = c(1:10),
                         col2 = letters[1:10],
                         col3 = c(rep(TRUE,4),
                                  rep(FALSE,6)))
print("Original DataFrame")
print(data_frame)
 
# getting rows of data
rows <- nrow(data_frame)
 
# custom bins
bins <- cut(1:rows,            
            breaks = c(0,2,4,rows      
                       ))
level_bins <- levels(bins)
 
# printing the subsets of dataframe
for(i in 1:length(level_bins)) {   
  assign(paste0("data_frame_", i),
         data_frame[bins == levels(bins)[i], ])
}
 
# retrieving dataframe subsets
print("DataFrame Subset 1")
print(data_frame_1)
 
print("DataFrame Subset 2")
print(data_frame_2)
 
print("DataFrame Subset 3")
print(data_frame_3)


 

 

Output:

 

 

Example 3: The cut method may also specify the number of equal parts in which the dataframe is to be divided. This is specified as the second argument of the method. The dataframe is divided into those numbers of equivalent parts and correspondingly assigned the names specified. The following code divides the dataframe into 5 custom bins of equal sizes :

 

R




# creating a dataframe
data_frame <- data.frame(col1 = c(1:10),
                         col2 = letters[1:10],
                         col3 = c(rep(TRUE,4),
                                  rep(FALSE,6)))
 
print("Original DataFrame")
print(data_frame)
 
# getting rows of data
rows <- nrow(data_frame)
 
# custom bins
bins <- cut(1:rows,5)
level_bins <- levels(bins)
 
# printing the subsets of dataframe
for(i in 1:length(level_bins)) {   
  assign(paste0("data_frame_", i),
         data_frame[bins == levels(bins)[i], ])
}
 
# retrieving dataframe subsets
print("DataFrame Subset 1")
print(data_frame_1)
 
print("DataFrame Subset 2")
print(data_frame_2)
 
print("DataFrame Subset 3")
print(data_frame_3)
 
print("DataFrame Subset 4")
print(data_frame_4)
 
print("DataFrame Subset 5")
print(data_frame_5)


 

 

Output:

 

 



Last Updated : 14 Feb, 2022
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads