1 research outputs found
Probabilistic Diagnostic Tests for Degradation Problems in Supervised Learning
Several studies point out different causes of performance degradation in
supervised machine learning. Problems such as class imbalance, overlapping,
small-disjuncts, noisy labels, and sparseness limit accuracy in classification
algorithms. Even though a number of approaches either in the form of a
methodology or an algorithm try to minimize performance degradation, they have
been isolated efforts with limited scope. Most of these approaches focus on
remediation of one among many problems, with experimental results coming from
few datasets and classification algorithms, insufficient measures of prediction
power, and lack of statistical validation for testing the real benefit of the
proposed approach. This paper consists of two main parts: In the first part, a
novel probabilistic diagnostic model based on identifying signs and symptoms of
each problem is presented. Thereby, early and correct diagnosis of these
problems is to be achieved in order to select not only the most convenient
remediation treatment but also unbiased performance metrics. Secondly, the
behavior and performance of several supervised algorithms are studied when
training sets have such problems. Therefore, prediction of success for
treatments can be estimated across classifiers