
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/">
  <dc:rights>All rights reserved</dc:rights>
  <dc:source>Applied Intelligence</dc:source>
  <dc:format>application/pdf</dc:format>
  <dc:format>2354944 bytes</dc:format>
  <dc:date>2021</dc:date>
  <dc:title xml:lang="eng">Active learning using a self-correcting neural network (ALSCN)</dc:title>
  <dc:subject xml:lang="eng">Active learning; Machine learning; Convolutional neural networks (CNN); Dataset labeling</dc:subject>
  <dc:creator id="https://orcid.org/0000-0001-5010-1377">Ilić, Velibor</dc:creator>
  <dc:creator id="https://orcid.org/0000-0003-4655-5063">Tadić, Jovan</dc:creator>
  <dc:language>eng</dc:language>
  <dc:description xml:lang="eng">ABSTRACT
Data labeling represents a major obstacle in the development of new models because the performance of machine learning
models directly depends on the quality of the datasets used to train these models and labeling requires substantial manual effort.
Labeling the entire dataset is not always necessary, and not every item from the image dataset contributes equally to the training
process. Active learning or guided labeling is one of the attempts to automate and speed up labeling as much as possible. In this
study we present a novel active learning algorithm (ALSCN) that contains two networks, convolutional neural network and self-
correcting neural network (SCN). The convolutional network is trained using only manually labeled data, and after training that
network it predicts labels for unlabeled items. The SCN network is trained with all available items, some of those items are
manually labeled and remaining items are automatically labeled with previous network. After training SCN network, it predicts
new labels for all available items, and the new labels are compared with the labels used for training. Items in which differences
have been identified are selected for manual labeling and then added to dataset of previously manually labeled items. After that,
the convolutional network is trained with extended dataset and previously described steps are repeated. Our experiments show
that the network trained using items selected by the proposed method exceeds the performance of a network trained with the same
number of items randomly selected from the set of available items. Items from the complete datasets are selected in several
iterations, and used for training the models. The accuracy of the models trained with selected items matched or exceeded the
accuracy of models trained with the entire dataset, which shows the extent of reduction in the required manual labeling effort. The
efficiency of presented algorithm is tested on three datasets (MNIST, Fashion MNIST and CIFAR-10). The final results show
that manual labeling is required for only 6.11% (3667/60,000), 23.92% (14,353/60,000) and 59.4% (29,704/50,000) items, in
case of MNIST, Fashion MNIST and CIFAR-10 dataset, respectively.</dc:description>
  <dc:type>info:eu-repo/semantics/article</dc:type>
  <dc:identifier>https://unilib.phaidrabg.rs/o:2385</dc:identifier>
  <dc:identifier>doi:10.1007/s10489-021-02515-y</dc:identifier>
</oai_dc:dc>
