Open In App

Bias and Variance in Machine Learning

Last Updated : 05 Jun, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

There are various ways to evaluate a machine-learning model. We can use MSE (Mean Squared Error) for Regression; Precision, Recall, and ROC (Receiver of Characteristics) for a Classification Problem along with Absolute Error. In a similar way, Bias and Variance help us in parameter tuning and deciding better-fitted models among several built.

Bias is one type of error that occurs due to wrong assumptions about data such as assuming data is linear when in reality, data follows a complex function. On the other hand, variance gets introduced with high sensitivity to variations in training data. This also is one type of error since we want to make our model robust against noise. There are two types of error in machine learning. Reducible error and Irreducible error. Bias and Variance come under reducible error.

What is Bias?

Bias is simply defined as the inability of the model because of that there is some difference or error occurring between the model’s predicted value and the actual value. These differences between actual or expected values and the predicted values are known as error or bias error or error due to bias. Bias is a systematic error that occurs due to wrong assumptions in the machine learning process. 

Let Y be the true value of a parameter, and let \hat Y be an estimator of Y based on a sample of data. Then, the bias of the estimator \hat Y is given by:

\text{Bias}(\hat Y) = E(\hat Y) - Y

where E(\hat Y)   is the expected value of the estimator \hat Y. It is the measurement of the model that how well it fits the data. 

  • Low Bias: Low bias value means fewer assumptions are taken to build the target function. In this case, the model will closely match the training dataset.
  • High Bias: High bias value means more assumptions are taken to build the target function. In this case, the model will not match the training dataset closely. 

The high-bias model will not be able to capture the dataset trend. It is considered as the underfitting model which has a high error rate. It is due to a very simplified algorithm.

For example, a linear regression model may have a high bias if the data has a non-linear relationship.

Ways to reduce high bias in Machine Learning:

  • Use a more complex model: One of the main reasons for high bias is the very simplified model. it will not be able to capture the complexity of the data. In such cases, we can make our mode more complex by increasing the number of hidden layers in the case of a deep neural network. Or we can use a more complex model like Polynomial regression for non-linear datasets, CNN for image processing, and RNN for sequence learning.
  • Increase the number of features: By adding more features to train the dataset will increase the complexity of the model. And improve its ability to capture the underlying patterns in the data.
  • Reduce Regularization of the model: Regularization techniques such as L1 or L2 regularization can help to prevent overfitting and improve the generalization ability of the model. if the model has a high bias, reducing the strength of regularization or removing it altogether can help to improve its performance.
  • Increase the size of the training data: Increasing the size of the training data can help to reduce bias by providing the model with more examples to learn from the dataset.

What is Variance?

Variance is the measure of spread in data from its mean position. In machine learning variance is the amount by which the performance of a predictive model changes when it is trained on different subsets of the training data. More specifically, variance is the variability of the model that how much it is sensitive to another subset of the training dataset. i.e. how much it can adjust on the new subset of the training dataset.

Let Y be the actual values of the target variable, and  \hat Y    be the predicted values of the target variable. Then the variance of a model can be measured as the expected value of the square of the difference between predicted values and the expected value of the predicted values.

\text{Variance} = E[(\hat Y - E[\hat Y])^2]

where E[\bar Y] is the expected value of the predicted values. Here expected value is averaged over all the training data.

Variance errors are either low or high-variance errors.

  • Low variance: Low variance means that the model is less sensitive to changes in the training data and can produce consistent estimates of the target function with different subsets of data from the same distribution. This is the case of underfitting when the model fails to generalize on both training and test data.
  • High variance: High variance means that the model is very sensitive to changes in the training data and can result in significant changes in the estimate of the target function when trained on different subsets of data from the same distribution. This is the case of overfitting when the model performs well on the training data but poorly on new, unseen test data. It fits the training data too closely that it fails on the new training dataset.

Ways to Reduce the reduce Variance in Machine Learning:

  • Cross-validation: By splitting the data into training and testing sets multiple times, cross-validation can help identify if a model is overfitting or underfitting and can be used to tune hyperparameters to reduce variance.
  • Feature selection: By choosing the only relevant feature will decrease the model’s complexity. and it can reduce the variance error.
  • Regularization: We can use L1 or L2 regularization to reduce variance in machine learning models
  • Ensemble methods: It will combine multiple models to improve generalization performance. Bagging, boosting, and stacking are common ensemble methods that can help reduce variance and improve generalization performance.
  • Simplifying the model: Reducing the complexity of the model, such as decreasing the number of parameters or layers in a neural network, can also help reduce variance and improve generalization performance.
  • Early stopping: Early stopping is a technique used to prevent overfitting by stopping the training of the deep learning model when the performance on the validation set stops improving.

Mathematical Derivation for Total Error

\begin{aligned} \text{MSE} &= (Y-\hat Y)^2 \\ &= (Y-E(\hat Y) +E(\hat Y) -\hat Y)^2 \\ &= (Y-E(\hat Y))^2 + (E(\hat Y) -\hat Y)^2 + 2(Y-E(\hat Y))(E(\hat Y) -\hat Y) \end{aligned}

Applying the Expectations on both sides.

\begin{aligned} E[ (Y-\hat Y)^2] &= E[(Y-E(\hat Y))^2 + (E(\hat Y) -\hat Y)^2 + 2(Y-E(\hat Y))(E(\hat Y) -\hat Y)] \\ & = E[(Y-E(\hat Y))^2] + E[(E(\hat Y) -\hat Y)^2] +  2E[(Y-E(\hat Y))(E(\hat Y) -\hat Y)]] \\ & = [(Y-E(\hat Y))^2] + E[(E(\hat Y) -\hat Y)^2] +  2(Y-E(\hat Y))E[(E(\hat Y) -\hat Y)]] \\ & = [(Y-E(\hat Y))^2] + E[(E(\hat Y) -\hat Y)^2] +  2(Y-E(\hat Y))[E[E(\hat Y)] -E[\hat Y]] \\ & = [(Y-E(\hat Y))^2] + E[(E(\hat Y) -\hat Y)^2] +  2(Y-E(\hat Y))[E(\hat Y)] -E[\hat Y]] \\ & = [(Y-E(\hat Y))^2] + E[(E(\hat Y) -\hat Y)^2] +  2(Y-E(\hat Y))[0] \\ & = [(Y-E(\hat Y))^2] + E[(E(\hat Y) -\hat Y)^2] +  0 \\ &= [\text{Bias}^2] + \text{Variance} \end{aligned}

Different Combinations of Bias-Variance

There can be four combinations between bias and variance.

  • High Bias, Low Variance: A model with high bias and low variance is said to be underfitting.
  • High Variance, Low Bias: A model with high variance and low bias is said to be overfitting.
  • High-Bias, High-Variance: A model has both high bias and high variance, which means that the model is not able to capture the underlying patterns in the data (high bias) and is also too sensitive to changes in the training data (high variance). As a result, the model will produce inconsistent and inaccurate predictions on average.
  • Low Bias, Low Variance: A model that has low bias and low variance means that the model is able to capture the underlying patterns in the data (low bias) and is not too sensitive to changes in the training data (low variance). This is the ideal scenario for a machine learning model, as it is able to generalize well to new, unseen data and produce consistent and accurate predictions. But in practice, it’s not possible.
Bias-Variance Combinations

Bias-Variance Combinations

Now we know that the ideal case will be Low Bias and Low variance, but in practice, it is not possible. So, we trade off between Bias and variance to achieve a balanced bias and variance.

A model with balanced bias and variance is said to have optimal generalization performance. This means that the model is able to capture the underlying patterns in the data without overfitting or underfitting. The model is likely to be just complex enough to capture the complexity of the data, but not too complex to overfit the training data. This can happen when the model has been carefully tuned to achieve a good balance between bias and variance, by adjusting the hyperparameters and selecting an appropriate model architecture.

Machine Learning Algorithm

Bias

Variance

Linear Regression

High Bias

Less Variance

Decision Tree

Low Bias

High Variance

Random Forest

Low Bias

High Variance

Bagging

Low Bias

High Variance

Bias Variance Tradeoff

If the algorithm is too simple (hypothesis with linear equation) then it may be on high bias and low variance condition and thus is error-prone. If algorithms fit too complex (hypothesis with high degree equation) then it may be on high variance and low bias. In the latter condition, the new entries will not perform well. Well, there is something between both of these conditions, known as a Trade-off or Bias Variance Trade-off. This tradeoff in complexity is why there is a tradeoff between bias and variance. An algorithm can’t be more complex and less complex at the same time. For the graph, the perfect tradeoff will be like this.

Bias-Variance Tradeoff

Bias-Variance Tradeoff

The technique by which we analyze the performance of the machine learning model is known as Bias Variance Decomposition. Now we give 1-1 example of Bias Variance Decomposition for classification and regression.

Bias Variance Decomposition for Classification and Regression

As per the formula, we have derived total error as the sum of Bias squares and variance. We try to make sure that the bias and the variance are comparable and one does not exceed the other by too much difference.

Python3

# Import the necessary libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier
from mlxtend.evaluate import bias_variance_decomp
import warnings
warnings.filterwarnings('ignore')
 
# Load the dataset
X, y = load_iris(return_X_y=True)
 
# Split train and test dataset
X_train, X_test,\
    y_train, y_test = train_test_split(X, y,
                                       test_size=0.25,
                                       random_state=23,
                                       shuffle=True,
                                       stratify=y)
 
# Build the classification model
tree = DecisionTreeClassifier(random_state=123)
clf = BaggingClassifier(base_estimator=tree,
                        n_estimators=50,
                        random_state=23)
 
# Bias variance decompositions
avg_expected_loss, avg_bias, \
    avg_var = bias_variance_decomp(clf,
                                   X_train, y_train,
                                   X_test, y_test,
                                   loss='0-1_loss',
                                   random_seed=23)
# Print the value
print('Average expected loss: %.2f' % avg_expected_loss)
print('Average bias: %.2f' % avg_bias)
print('Average variance: %.2f' % avg_var)

                    

Output:

Average expected loss: 0.06
Average bias: 0.05
Average variance: 0.02

Now let’s perform the same on the regression task. And check the values of the bias and variance.

Python3

# Load the necessary libraries
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import tensorflow as tf
from mlxtend.evaluate import bias_variance_decomp
import warnings
warnings.filterwarnings('ignore')
 
# Laod the dataset
X, y = fetch_california_housing(return_X_y=True)
 
# Split train and test dataset
X_train, X_test,\
    y_train, y_test = train_test_split(X, y,
                                       test_size=0.25,
                                       random_state=23,
                                       shuffle=True)
 
# Build the regression model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, activation=tf.nn.relu),
    tf.keras.layers.Dense(1)
])
 
# Set optimizer and loss
optimizer = tf.keras.optimizers.Adam()
model.compile(loss='mean_squared_error',
              optimizer=optimizer)
 
# Train the model
model.fit(X_train, y_train, epochs=25, verbose=0)
# Evaluations
accuracy = model.evaluate(X_test, y_test)
print('Average: %.2f' % accuracy)
 
# Bias variance decompositions
avg_expected_loss, avg_bias,\
    avg_var = bias_variance_decomp(model,
                                   X_train, y_train,
                                   X_test, y_test,
                                   loss='mse',
                                   random_seed=23,
                                   epochs=5,
                                   verbose=0)
 
# Print the result
print('Average expected loss: %.2f' % avg_expected_loss)
print('Average bias: %.2f' % avg_bias)
print('Average variance: %.2f' % avg_var)

                    

Output:

162/162 [==============================] - 0s 802us/step - loss: 0.9195
Average: 0.92
Average expected loss: 2.30
Average bias: 0.72
Average variance: 1.58


Similar Reads

Bias-Variance Trade Off - Machine Learning
It is important to understand prediction errors (bias and variance) when it comes to accuracy in any machine-learning algorithm. There is a tradeoff between a model’s ability to minimize bias and variance which is referred to as the best solution for selecting a value of Regularization constant. A proper understanding of these errors would help to
3 min read
How to Update Bias and Bias's Weight Using Backpropagation Algorithm?
Answer: In backpropagation, biases are updated by applying the chain rule to the loss function with respect to the bias parameters in each layer during gradient descent.Let's explore the details of how biases and their weights are updated using the backpropagation algorithm: Backpropagation Overview:Backpropagation is an algorithm used to train art
3 min read
Differences between Bias, Variance and Residuals?
Answer: Bias is the error introduced by approximating a real-world problem, variance is the model's sensitivity to small fluctuations in the training data, and residuals are the differences between predicted and actual values.Differences between Bias, Variance and Residuals in Tabular form : AspectBiasVarianceResidualsDefinitionError due to simplif
1 min read
How to Balance bias variance tradeoff
A fundamental concept in machine learning is the bias-variance tradeoff, which entails striking the ideal balance between model complexity and generalization performance. It is essential for figuring out which model works best for a certain situation and for comprehending how several models function. What is bias?Bias is the disparity between the p
5 min read
Bias and Ethical Concerns in Machine Learning
The field of Artificial Intelligence (AI) has advanced quickly in recent years. While artificial intelligence (AI) was merely a theory ten years ago and had few practical uses, it is now one of the most rapidly evolving technologies and is being widely adopted. Artificial intelligence (AI) finds use in a wide range of fields, including product reco
6 min read
Need of Data Structures and Algorithms for Deep Learning and Machine Learning
Deep Learning is a field that is heavily based on Mathematics and you need to have a good understanding of Data Structures and Algorithms to solve the mathematical problems optimally. Data Structures and Algorithms can be used to determine how a problem is represented internally or how the actual storage pattern works & what is happening under
6 min read
Passive and Active learning in Machine Learning
Machine learning is a subfield of artificial intelligence that deals with the creation of algorithms that can learn and improve themselves without explicit programming. One of the most critical factors that contribute to the success of a machine learning model is the quality and quantity of data used to train it. Passive learning and active learnin
3 min read
Difference Between Machine Learning and Deep Learning
If you are interested in building your career in the IT industry then you must have come across the term Data Science which is a booming field in terms of technologies and job availability as well. In this article, we will learn about the two major fields in Data Science that are Machine Learning and Deep Learning. So, that you can choose which fie
6 min read
Support vector machine in Machine Learning
In this article, we are going to discuss the support vector machine in machine learning. We will also cover the advantages and disadvantages and application for the same. Let's discuss them one by one. Support Vector Machines : Support vector machine is a supervised learning system and is used for classification and regression problems. Support vec
9 min read
Azure Virtual Machine for Machine Learning
Prerequisites: About Microsoft Azure, Cloud Based Services Some of the Machine Learning and Deep Learning algorithms may require high computation power which may not be supported by your local machine or laptop. In that case, creating a Virtual Machine on a cloud platform can provide you the expected computation power. We can have a system with hig
4 min read