Calibration Curves

Last Updated : 14 Sep, 2021

Generally, for any classification problem, we predict the class value that has the highest probability of being the true class label. However, sometimes, we want to predict the probabilities of a data instance belonging to each class label. For example, say we are building a model to classify fruits and we have three class labels: apples, oranges, and bananas (each fruit is one of these). For any fruit, we want the probabilities of the fruit being an apple, an orange, or a banana.
This is very useful for the evaluation of a classification model. It can help us understand how ‘sure’ a model is while predicting a class label and may help us interpret how decisive a classification model is. Generally, classifiers that have a linear probability of predicting each class’s labels are called calibrated. The problem is, not all classification models are calibrated.
Some models can give poor estimates of class probabilities and some do not even support probability prediction.

Calibration Curves:
Calibration curves are used to evaluate how calibrated a classifier is i.e., how the probabilities of predicting each class label differ. The x-axis represents the average predicted probability in each bin. The y-axis is the ratio of positives (the proportion of positive predictions). The curve of the ideal calibrated model is a linear straight line from (0, 0) moving linearly.

Plotting Calibration Curves in Python3:
For this example, we will use a binary dataset. We will use the popular diabetes dataset. You can learn more about this dataset here.

Code: Implementing a Support Vector Machine’s calibration curve and compare it with a perfectly calibrated model’s curve.

Python3

# Importing required modules
 
from sklearn.datasets import load_breast_cancer
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.calibration import calibration_curve
import matplotlib.pyplot as plt
 
# Loading dataset
dataset = load_breast_cancer()
X = dataset.data
y = dataset.target
 
# Splitting dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                     test_size = 0.1, random_state = 13)
 
# Creating and fitting model
model = SVC()
model.fit(X_train, y_train)
 
# Predict Probabilities
prob = model.decision_function(X_test)
 
# Creating Calibration Curve
x, y = calibration_curve(y_test, prob, n_bins = 10, normalize = True)
 
# Plot calibration curve
 
# Plot perfectly calibrated
plt.plot([0, 1], [0, 1], linestyle = '--', label = 'Ideally Calibrated')
 
# Plot model's calibration curve
plt.plot(y, x, marker = '.', label = 'Support Vector Classifier')
 
leg = plt.legend(loc = 'upper left')
plt.xlabel('Average Predicted Probability in each bin')
plt.ylabel('Ratio of positives')
plt.show()

Output:

From the graph, we can clearly see that the Support Vector classifier is nor very well calibrated. The closes a model’s curve is to the perfect calibrated model’s curve (dotted curve), the better calibrated it is.

Conclusion:
Now that you know what calibration is in terms of Machine Learning and how to plot a calibration curve, next time you classifier gives unpredictable results and you can’t find the cause, try plotting the calibration curve and check if the model is well-calibrated.

Suggest improvement

Probability Calibration Curve in Scikit Learn

Probability Calibration of Classifiers in Scikit Learn

Share your thoughts in the comments

Linear Model Regression

Linear Model Classification

Regularization

K-Nearest Neighbors (KNN)

Support Vector Machines

Decision Tree

Ensemble Learning

Generative Model

Time Series Forecasting

Supervised Dimensionality Reduction Technique

Metrics for Classification & Regression Algorithms

Cross Validation Technique

Optimization Technique

Clustering

Association Rule Mining

Anomaly Detection

Dimensionality Reduction Technique

Model-Based Methods

Model-Free Methods