Split DataFrame into Custom Bins in R
In this article, we are going to see how to split dataframe into custom bins in R Programming Language.
The cut() method in base R is used to first divide the range of the dataframe and then divide the values based on the intervals in which they fall. Each of the intervals corresponds to one level of the dataframe. Therefore, the number of levels is equivalent to the length of the breaks argument in the cut method.
Syntax: cut(x, breaks, labels = NULL)
Arguments :
- x – Numeric vector to be divided
- Breaks – A vector containing the intervals
- Labels – labelling of the groups
Example 1: Split dataframe into Custom Bins
R
data_frame <- data.frame (col1 = c (1:10),
col2 = letters [1:10],
col3 = c ( rep ( TRUE ,4),
rep ( FALSE ,6)))
print ( "Original DataFrame" )
print (data_frame)
rows <- nrow (data_frame)
bins <- cut (1:rows,
breaks = c (0,6,rows
))
level_bins <- levels (bins)
for (i in 1: length (level_bins)) {
assign ( paste0 ( "data_frame_" , i),
data_frame[bins == levels (bins)[i], ])
}
print ( "DataFrame Subset 1" )
print (data_frame_1)
print ( "DataFrame Subset 2" )
print (data_frame_2)
|
Output:
Example 2: Illustrates the usage where three breakpoints are specified, thereby, dividing the rows into three subsets of the original dataframe.
R
data_frame <- data.frame (col1 = c (1:10),
col2 = letters [1:10],
col3 = c ( rep ( TRUE ,4),
rep ( FALSE ,6)))
print ( "Original DataFrame" )
print (data_frame)
rows <- nrow (data_frame)
bins <- cut (1:rows,
breaks = c (0,2,4,rows
))
level_bins <- levels (bins)
for (i in 1: length (level_bins)) {
assign ( paste0 ( "data_frame_" , i),
data_frame[bins == levels (bins)[i], ])
}
print ( "DataFrame Subset 1" )
print (data_frame_1)
print ( "DataFrame Subset 2" )
print (data_frame_2)
print ( "DataFrame Subset 3" )
print (data_frame_3)
|
Output:
Example 3: The cut method may also specify the number of equal parts in which the dataframe is to be divided. This is specified as the second argument of the method. The dataframe is divided into those numbers of equivalent parts and correspondingly assigned the names specified. The following code divides the dataframe into 5 custom bins of equal sizes :
R
data_frame <- data.frame (col1 = c (1:10),
col2 = letters [1:10],
col3 = c ( rep ( TRUE ,4),
rep ( FALSE ,6)))
print ( "Original DataFrame" )
print (data_frame)
rows <- nrow (data_frame)
bins <- cut (1:rows,5)
level_bins <- levels (bins)
for (i in 1: length (level_bins)) {
assign ( paste0 ( "data_frame_" , i),
data_frame[bins == levels (bins)[i], ])
}
print ( "DataFrame Subset 1" )
print (data_frame_1)
print ( "DataFrame Subset 2" )
print (data_frame_2)
print ( "DataFrame Subset 3" )
print (data_frame_3)
print ( "DataFrame Subset 4" )
print (data_frame_4)
print ( "DataFrame Subset 5" )
print (data_frame_5)
|
Output:
Last Updated :
14 Feb, 2022
Like Article
Save Article
Share your thoughts in the comments
Please Login to comment...