1 research outputs found
Becoming More Robust to Label Noise with Classifier Diversity
It is widely known in the machine learning community that class noise can be
(and often is) detrimental to inducing a model of the data. Many current
approaches use a single, often biased, measurement to determine if an instance
is noisy. A biased measure may work well on certain data sets, but it can also
be less effective on a broader set of data sets. In this paper, we present
noise identification using classifier diversity (NICD) -- a method for deriving
a less biased noise measurement and integrating it into the learning process.
To lessen the bias of the noise measure, NICD selects a diverse set of
classifiers (based on their predictions of novel instances) to determine which
instances are noisy. We examine NICD as a technique for filtering, instance
weighting, and selecting the base classifiers of a voting ensemble. We compare
NICD with several other noise handling techniques that do not consider
classifier diversity on a set of 54 data sets and 5 learning algorithms. NICD
significantly increases the classification accuracy over the other considered
approaches and is effective across a broad set of data sets and learning
algorithms.Comment: 37 pages, 10 tables, 2 figure