Active Learning in Data Mining
Last Updated :
24 Mar, 2022
Active learning is an iterative type of supervised learning and this learning method is usually preferred if the data is highly available, yet the class labels are scarce or expensive to obtain. The learning algorithm queries the labels. The number of tuples that use Active learning for learning the concept is much smaller than the number required in typical supervised learning. High-accurate models are developed by just using a few labeled instances in active learning. The cost of active learning is low compared to other learning methodologies.
Active learning gains high accuracy during the training of the data and it takes less time for training the model. Active learning supports only the labeled training set. Several strategies are developed for active learning on data . one of the efficient strategies for active learning is the pool-based approach.
Example of the Pool-based Approach in Active Learning:
Let us consider D as the data set. The labeled data set L is a subset of D. U is the unlabeled data of the data set D. L is the initial training set the Active learner starts training with L. A query function is applied on the unlabeled data U to select one or more data samples and requests class labels for them from an oracle. The newly labeled data is added to the previous training set L, and the active learner learns the features of the labeled samples using the standard supervised algorithms. Active learning algorithms are evaluated by building the learning curves from the training and testing set and plotting the accuracy graph of the constructed model.
The active learning primary task is to choose the data tuples which are to be queried. Many algorithms and methodologies are proposed to choose the data tuples. Uncertainty sampling is the most common method, where the active learner chooses to query the tuples which it is the least certain how to label. There are some other strategies to reduce the version space, in order to find out the subset of all hypotheses that are consistent with the labeled training tuples. It is necessary to perform error detection in the training tuples to remove the noise of the data.
The selected tuples after error detection ensure the maximum reduction in the incorrect predictions by reducing the expected entropy over U. But this approach requires more computational operations.
Share your thoughts in the comments
Please Login to comment...