Open In App

How to Make Better Models in Python using SVM Classifier and RBF Kernel

Last Updated : 30 Jan, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

As machine learning models continue to become more popular and widespread, it is important for data scientists and developers to understand how to build the best models possible. One powerful tool that can be used to improve the accuracy and performance of machine learning models is the support vector machine (SVM) classifier, which is a type of linear classifier that works well for a variety of different data types. In this article, we will focus on how to use the SVM classifier and the radial basis function (RBF) kernel in Python to build better models for your data.

A support vector machine is a type of supervised learning algorithm that can be used for classification or regression tasks. It works by finding the hyperplane in a high-dimensional space that maximally separates the different classes in the data. The points closest to the hyperplane called support vectors to have the greatest influence on the position of the hyperplane and the classification of new data points. SVM can be used for both linear and non-linear classification problems by using different types of Kernels.

RBF Kernel in SVM

The RBF kernel is a type of kernel function that can be used with the SVM classifier to transform the data into a higher-dimensional space, where it is easier to find a separation boundary. The RBF kernel is defined by a single parameter, gamma, which determines the width of the kernel and therefore the complexity of the model. The RBF kernel function is defined as:

K(x, y) = exp(-gamma * ||x-y||^2)

The value of gamma controls the width of the kernel and thus the complexity of the model. A small gamma value will result in a wide kernel, leading to a simpler model with low variance and high bias, while a large gamma value will result in a narrow kernel, leading to a more complex model with high variance and low bias.

The other important hyperparameter is C, which controls the trade-off between maximizing the margin and minimizing the misclassification error. A large value of C will result in a smaller margin and fewer misclassifications, while a small value of C will result in a larger margin and more misclassifications.

Now that we have a basic understanding of the SVM classifier and the RBF kernel, let’s go through the steps for using these tools in Python to build a model using a toy dataset.

Importing Libraries and Dataset

First, you will need to load your data into a Pandas dataframe and prepare it for modeling. This may include tasks such as splitting the data into training and testing sets, standardizing the features, and handling missing values.

Python3




from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import pandas as pd
from sklearn import datasets
import numpy as np
  
# load toy dataset
iris = datasets.load_iris()
iris_df = pd.DataFrame(data=np.c_[iris['data'],
                                  iris['target']],
                       columns=iris['feature_names'] + ['target'])
iris_df = iris_df[iris_df["target"] != 2]
iris_df["target"] = iris_df["target"].\
    apply(lambda x: 0 if x == 1 else 1)


Now we will split the complete dataset into training and the testing part so, that we can train the model using the training dataset and then use the leftover dataset for the evaluation part.

Python3




# Split data into training and testing sets
X_train, X_test,\
    y_train, y_test = train_test_split(iris_df.drop('target',
                                                    axis=1),
                                       iris_df['target'],
                                       test_size=0.2)
  
# Standardize features
scaler = StandardScaler().fit(X_train)
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)


Model Training

Depending on the characteristics of your data, you may want to use a different kernel with the SVM classifier. In this case, we will be using the RBF kernel, which is well-suited for data that is not linearly separable. Once you have prepared your data and chosen the appropriate kernel, you can use the scikit-learn library to fit the SVM model to your data. This is done using the fit() method, which takes in the training data and labels it as arguments. You can also set the values of C and gamma here.

Python3




from sklearn.svm import SVC
  
# Create an SVM classifier with an RBF
# kernel and set values of C and gamma
model = SVC(kernel='rbf', C=1, gamma=1)
  
# Fit the model to the training data
model.fit(X_train_scaled, y_train)


Model Evaluation

After fitting the model to the training data, it is important to evaluate its performance on the testing data. This can be done using a variety of metrics, such as accuracy, precision, and recall.

Python3




# Calculate the accuracy of the model on the test data
from sklearn.metrics import accuracy_score
y_pred = model.predict(X_test_scaled)
accuracy = accuracy_score(y_test, y_pred)
print('Accuracy:', accuracy)


Now that you have trained and evaluated your model, you can use it to make predictions on new data. You can do this using the predict() method, which takes in a matrix of data and returns a corresponding array of predictions.

Python3




# Make predictions on new data
new_data = ... # new data that you want to predict on
new_data_scaled = scaler.transform(new_data)
predictions = model.predict(new_data_scaled)


Hyper Parameter Tuning using GridSearchCV

Depending on the results of your model evaluation, you may want to fine-tune the model by adjusting the hyperparameters or using a different kernel. For example, you can use the GridSearchCV function from the scikit-learn library to perform a grid search over different combinations of hyperparameters and choose the best-performing model.

Python3




from sklearn.model_selection import GridSearchCV
  
# Define the parameter grid
param_grid = {'C': [0.1, 1, 10, 100], 'gamma': [1, 0.1, 0.01, 0.001]}
  
# Create a grid search object
grid = GridSearchCV(SVC(), param_grid, refit=True, verbose=2)
  
# Fit the grid search object to the training data
grid.fit(X_train_scaled, y_train)
  
# Get the best parameters
best_params = grid.best_params_


With these steps, you can use the SVM classifier and the RBF kernel in Python to build better models for your data. It’s important to keep in mind that the choice of kernel and the value of the hyperparameters can have a significant impact on the performance of the model and should be chosen carefully based on the characteristics of your data.



Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads