Open In App

Label Encoding in R programming

Last Updated : 09 Oct, 2022
Improve
Improve
Like Article
Like
Save
Share
Report

The data that has to be processed for performing manipulations and Analyses should be easily understood and well denoted. The computer finds it difficult to process strings and other objects when data training and predictions based on it have to be performed. Label encoding is a mechanism to assign numerical values to the string variables so that they are easily transformed and fed into various models. Therefore label encoders typically perform the conversion of categorical variables into integral values. Decoders perform the reverse operation. 

Label Encoding in R programming

Label encoders take as input a vector of categorical variables and convert it into numerical form. Initially, a vector is fed as input to the encoder. 

To implement the Label Encoding in R Programming Language, we have two methods : 

  1. Using superml
  2. Using factors()

Let’s discuss the method below: 

Using superml to Get Label Encoding in R programming

The superml package in R is designed to unify the model training process in R. It can be downloaded and installed into the working space using the following command : 

install.packages("superml")

Initially, a new label encoder object is instantiated using LabelEncoder$new(). The vector supplied as input is used for fitting the model. The transformation takes place using the fit_transform method, which performs the transformation. The final result is the numerical vector. 

The following sequence of operations is performed : 

  • encoder$fit(x)
  • encoder$fit_transform(x)
  • encoder$transform(x)

Arguments : 

  • x – The vector to be supplied 
  • In the following code snippet, there were 2 groups therefore, numerically a binary vector of 0s and 1s have been created. 

After installing the superml library with the above mentioned command, we can now run the below code.

R




x = c("Geekster","GeeksforGeeks","Geekster","Geekster",
      "GeeksforGeeks","GeeksforGeeks","Geekster","GeeksforGeeks",
      "Geekster","Geekster")
  
print("Original Data Vector")
print(x )
  
# create a label encoder object
encoder = LabelEncoder$new()
  
# fitting the data over the x vector
encoder$fit(x)
  
# transforming the data
encoder$fit_transform(x)
  
# printing the transformed data
encoder$transform(x)


Output: 

Label Encoding in R programming

 

Using factors() to Get Label Encoding in R programming

The factors method in base R is used to transform the given data into categorical variables. The values are assigned to each of the variables. In case, we wish to use the numerical instances, we can simply use as.numeric() method for the conversion. 

Syntax : factor(x)

Arguments : x – The vector to be encoded 

In the following code, the data contained in the companies vector is first sorted lexicographically. The levels are then assigned to the values and mapped to integers beginning with 1. The word “GeeksForGeeks” is assigned 1 level, and all its occurrences are replaced with 1 in the final output. 

R




# creating a data vector
companies =  c("Geekster","TCS","Geekster","Geekster",
               "GeeksforGeeks",
               "Wipro","Geekster",
               "GeeksforGeeks",
               "Geekster","Wipro","TCS")
  
# printing the original vector
print("Original Data")
print(companies)
  
# converting the data to factors
factors <- factor(companies)
  
# converting data to label encoded values
print("Label Encoded Data")
  
# printing the numeric equivalents of these vector values
print(as.numeric(factors))


Output : 

Label Encoding in R programming

 



Similar Reads

Encoding Categorical Data in R
Encoding Categorical Data in R The categorical variables are very often found in data while conducting data analysis and ML(machine learning). The Data which can be classified into categories or groups, such as colors or job titles is generally called as categorical data. The categorical variables must be encoded into numerical values in order to b
6 min read
Move Axis Label Closer to Plot in Base R
In this article, we will discuss how to move the axis label closer to the plot in R Programming Language. Before changing the distance, let us see how the initial plot will look like. Example: C/C++ Code x &lt;- seq(-pi,pi,0.1) plot(x, sin(x), main=&quot;The Sine Function&quot;, ylab=&quot;sin(x)&quot;, type=&quot;l&quot;, col=&quot;blue&quot;) Out
2 min read
Change the Background color of ggplot2 Text Label Annotation in R
In this article, we will be looking at the approach to change the background color of ggplot2 text label Annotation in the R programming language. This can be done using the geom_label() function using its fill argument. The user needs to first install and import the ggplot2 package in the R console, and then call the goem_label() function to add t
1 min read
How to label specific points in scatter plot in R ?
Scatter plots in the R programming language can be plotted to depict complex data easily and graphically. It is used to plot points, lines as well as curves. The points can be labeled using various methods available in base R and by incorporating some external packages. Method 1: Using ggplot package The ggplot() method can be used in this package
3 min read
How to label plot tick marks using ggvis in R
In this article, we will be looking at the approach to label tick marks using ggvis in the R programming language. The data frame is then subjected to the ggvis operations using the pipe operator. The ggvis method is used to start ggvis graphical window. The ggvis method has the following syntax : ggvis( data , mp1, mp2.,) Arguments : data - The da
3 min read
How to change maximum and minimum label in ggvis plot in R
A plot in R is used to depict the data in a pictorial form, representing the points using the coordinates. A plot has two axes, namely, the x and y axes, respectively. The x and y axes are represented using the labels, the minimum and maximum, respectively. There are multiple external packages in R which are used to draw plots. The ggplot2 library
4 min read
Getting the Modulus of the Determinant of a Matrix in R Programming - determinant() Function
determinant() function in R Language is a generic function that returns separately the modulus of the determinant, optionally on the logarithm scale, and the sign of the determinant. Syntax: determinant(x, logarithm = TRUE, ...) Parameters: x: matrix logarithm: if TRUE (default) return the logarithm of the modulus of the determinant Example 1: # R
2 min read
tidyr Package in R Programming
Packages in the R language are a collection of R functions, compiled code, and sample data. They are stored under a directory called “library” in the R environment. By default, R installs a set of packages during installation. One of the most important packages in R is the tidyr package. The sole purpose of the tidyr package is to simplify the proc
14 min read
Get Exclusive Elements between Two Objects in R Programming - setdiff() Function
setdiff() function in R Programming Language is used to find the elements which are in the first Object but not in the second Object. Syntax: setdiff(x, y) Parameters: x and y: Objects with sequence of itemsR - setdiff() Function ExampleExample 1: Apply setdiff to Numeric Vectors in R Language C/C++ Code # R program to illustrate # the use of setdi
2 min read
Add Leading Zeros to the Elements of a Vector in R Programming - Using paste0() and sprintf() Function
paste0() and sprintf() functions in R Language can also be used to add leading zeros to each element of a vector passed to it as argument. Syntax: paste0("0", vec) or sprintf("%0d", vec)Parameters: paste0: It will add zeros to vector sprintf: To format a vector(adding zeros) vec: Original vector dataReturns: Vectors by addition of leading zeros Exa
1 min read
Article Tags :