Open In App

Guided Ordinal Encoding Techniques

Last Updated : 27 Sep, 2021
Improve
Improve
Like Article
Like
Save
Share
Report

There are specifically two types of guided encoding techniques for categorical features, namely – target guided ordinal encoding & mean guided ordinal encoding.

Tools and Technologies needed:

  1. Understanding of pandas library
  2. Basic knowledge of how a pandas Dataframe work.
  3. Jupyter Notebook or Google Collab or any similar platform.

What is encoding?

 Encoding is the technique we use to convert categorical entry in a dataset to a numerical data. Let say we have a dataset of employees in which there is a column that contains the information about the city location of an employee. Now we want to use this data to form a model which could predict the salary of an employee based upon his/her other details. Obviously, this model doesn’t understand anything about the city name. So how will you make the model know about it? For example, an employee who lives in a metropolitan city earns more than employees of a small city. Someway we need to make the model know about this . Yes, the way you are thinking in your mind is what we will do through code. As obvious we are thinking to rank the city based upon some spec . These ways of converting a categorical data to a numerical data are our target. 

What is target guided encoding technique?

In this technique we will take help of our target variable to encode the categorical data . lets understand by an example,

Employee Id City  Highest Qualification Salary
A100 delhi Phd 50000
A101 delhi bsc 30000
A102 mumbai msc 45000
B101 pune bsc 25000
B102 kolkata phd 48000
C100 pune msc 30000
D103 kolkata msc 44000

Lets try to encode the city column using the target guided encoding. Here our target variable is salary.

step 1: sort the cities based upon the corresponding salary. Now to do this we will take mean of all the salaries of that particular city.

step 2: Based upon the mean of the salary  the descending order of the city is :

                                                         kolkata>mumbai>delhi>pune

step3: Based upon this order we will rank the cities.

City Rank
kolkata 4
mumbai 3
delhi 2
pune 1

(note: you can rank them in the opposite order too)

step 4 : we will use this information to encode the City column of the dataset.

Employee Id  City Highest Qualification Salary
A100 2 phd 50000
A101 2 bsc 30000
A102 3 msc 45000
B101 1 bsc 25000
B102 4 phd 48000
C100 1 msc 30000
D103 4 msc 44000

This is all what target guided encoding is! simple right? Lets now explore about mean guided encoding.

What is mean guided encoding technique?

We will encode the Highest qualification column using the mean guided encoding technique.

step 1: For each highest qualification we will find the mean of all the corresponding salary.

step 2 : Instead of ranking them based upon the mean value , we will encode this mean value corresponding to the respective highest qualification

Highest Qualification Mean Salary
Phd 49000
Msc 39666.67
Bsc 27500

step 3 : We will use this to encode the Highest Qualification column

Employee Id City Highest Qualification Salary
A100 2 49000 50000
A101 2 27500 30000
A102 3 39666.67 45000
B101 1 27500 25000
B102 4 49000 48000
C100 1 39666.67 30000
D103 4 39666.67 44000

Hence we are ready with our dataset to prepare our model.


Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads