Open In App

Faces dataset decompositions in Scikit Learn

Last Updated : 14 Jan, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

The Faces dataset is a database of labeled pictures of people’s faces that can be found in the well-known machine learning toolkit Scikit-Learn. Face recognition, facial expression analysis, and other computer vision applications are among the frequent uses for it. The Labeled Faces in the Wild (LFW) benchmark includes the dataset.

What is Decompositions?

Decomposition is the process of disassembling a complicated data matrix into smaller, easier-to-understand parts. For high-dimensional data, such as photographs, principal component analysis, or PCA, is a frequently used decomposition approach. It finds the highest variance in the data by identifying the principal components, which are linear combinations of the original characteristics.

Concepts related to the topic:

  1. Principal Component Analysis (PCA): Finding a dataset’s main components is accomplished using the dimensionality reduction approach known as principal component analysis (PCA).
  2. Eigenfaces: The principal components derived by PCA are often referred to as eigenfaces in the context of face recognition.
  3. Singular Value Decomposition (SVD): A further matrix decomposition technique for reducing dimensionality is called singular value decomposition (SVD).

Implementing Faces Dataset Decompositions

1.Import necessary libraries:

Python3




import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_lfw_people
from sklearn.decomposition import PCA


The necessary libraries are imported in this step: NumPy for numerical operations, Matplotlib for charting, and Scikit-Learn for PCA implementation and access to the Faces dataset.

2.Load the Faces dataset:

Python3




faces_data = fetch_lfw_people(min_faces_per_person=70, resize=0.4)


The code uses Scikit-Learn’s fetch_lfw_people method to get the Labeled Faces in the Wild (LFW) dataset. The photographs are resized to 40% of their original size, and the minimum number of faces per person is set at 70.

3.Preprocess the data:

Python3




X = faces_data.data
n_samples, n_features = X.shape


In this stage, the feature matrix X is extracted from the dataset, and the number of features (n_features) and samples (n_samples) in the dataset are calculated.

4.Apply PCA for decomposition:

Python3




n_components = 150
pca = PCA(n_components=n_components, svd_solver='randomized', whiten=True).fit(X)


The code applies PCA to the data using the fit technique and sets the number of components (n_components) for PCA to 150. For efficiency, we use a randomized solution, whitening the data in the process.

5.Visualize eigenfaces:

Python3




eigenfaces = pca.components_.reshape(
    (n_components, faces_data.images.shape[1], faces_data.images.shape[2]))


In this stage, the principal components from PCA are transformed into the form of pictures, or eigenfaces. The directions of highest variance in the original face pictures are represented by these eigenfaces.

6.Plot the first 10 eigenfaces:

Python3




plt.figure(figsize=(10, 3))
for i in range(10):
    plt.subplot(2, 5, i + 1)
    plt.imshow(eigenfaces[i], cmap='gray')
    plt.title(f"Eigenface {i + 1}")
plt.show()


Output:

Eigenfaces-Geeksforgeeks

Eigenfaces

The code uses Matplotlib to plot the first ten eigenfaces, visualizing them in a 2×5 grid.

7.Reconstruct faces using a subset of principal components:

Python3




n_faces = 5
random_faces_indices = np.random.randint(0, n_samples, n_faces)
random_faces = X[random_faces_indices]


Five faces are chosen at random from the dataset in this section to illustrate the reconstruction procedure.

8.Transform faces into principal components:

Python3




faces_pca = pca.transform(random_faces)


With the previously fitted PCA model, the chosen faces are converted into the space of principle components.

9.Reconstruct faces from principal components:

Python3




faces_reconstructed = pca.inverse_transform(faces_pca)


The inverse_transform function is used by the algorithm to recreate the faces from the changed main components.

10.Visualize original and reconstructed faces:

Python3




plt.figure(figsize=(10, 3))
for i in range(n_faces):
    plt.subplot(2, n_faces, i + 1)
    plt.imshow(random_faces[i].reshape(
        faces_data.images.shape[1], faces_data.images.shape[2]), cmap='gray')
    plt.title("Original")
 
    plt.subplot(2, n_faces, i + 1 + n_faces)
    plt.imshow(faces_reconstructed[i].reshape(
        faces_data.images.shape[1], faces_data.images.shape[2]), cmap='gray')
    plt.title("Reconstructed")
plt.show()


Output:

Faces dataset decompositions-Geeksforgeeks

Faces dataset decompositions

Similarly, we can perform Non-Negative Matrix Factorization (NMF).

Non-Negative Matrix Factorization (NMF)

Non-Negative Matrix Factorization (NMF) is a mathematical technique used in machine learning and data analysis for dimensionality reduction and feature extraction. It is particularly useful when the data involved has non-negative values, such as images, audio spectrograms, or text data represented as term-document matrices.

In the following code snippet, we have demonstrated how NMF can be used for facial image decomposition and reconstruction. Through visualizations help in understanding the learned facial features and the effectiveness of the NMF model in reconstructing faces from the reduced feature space. Adjusting parameters such as the number of components (n_components) can impact the quality of reconstruction.

Python3




from sklearn.decomposition import NMF
nmf = NMF(n_components=n_components, tol=5e-3)
nmf.fit(X)  # original non- negative dataset
 
# Visualize
nmf_faces = nmf.components_.reshape(
    (n_components, faces_data.images.shape[1], faces_data.images.shape[2]))
 
# Plot the first 10 faces
plt.figure(figsize=(10, 3))
for i in range(10):
    plt.subplot(2, 5, i + 1)
    plt.imshow(nmf_faces[i], cmap='gray')
    plt.title(f"NMF face {i + 1}")
 
plt.show()
 
# Reconstruct faces
n_faces = 5
random_faces_indices = np.random.randint(0, n_samples, n_faces)
random_faces = X[random_faces_indices]
 
# Transform faces
faces_nmf = nmf.transform(random_faces)
 
# Reconstruct faces
faces_reconstructed = nmf.inverse_transform(faces_nmf)
 
# Visualize original and reconstructed faces
plt.figure(figsize=(10, 3))
for i in range(n_faces):
    plt.subplot(2, n_faces, i + 1)
    plt.imshow(random_faces[i].reshape(
        faces_data.images.shape[1], faces_data.images.shape[2]), cmap='gray')
    plt.title("Original")
 
    plt.subplot(2, n_faces, i + 1 + n_faces)
    plt.imshow(faces_reconstructed[i].reshape(
        faces_data.images.shape[1], faces_data.images.shape[2]), cmap='gray')
    plt.title("Reconstructed")
 
plt.show()


Output:

Non-Negative Matrix Factorization (NMF)-Geeksforgeeks

Non-Negative Matrix Factorization (NMF)

download-(5)

Conclusion

Facial recognition systems may be understood and implemented with the help of the Faces dataset and the eigenfaces decomposition method using Scikit-Learn. In order to use the generated eigenfaces for face-related tasks, the dataset must be loaded, the photos must be preprocessed, and PCA must be used to reduce dimensionality. A rudimentary approach of Scikit-Learn’s features is shown in the sample code.



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads