1,253 research outputs found
Data complexity in machine learning
We investigate the role of data complexity in the context of binary classification problems. The universal data complexity is defined for a data set as the Kolmogorov complexity of the mapping enforced by the data set. It is closely related to several existing principles used in machine learning such as Occam's razor, the minimum description length, and the Bayesian approach. The data complexity can also be defined based on a learning model, which is more realistic for applications. We demonstrate the application of the data complexity in two learning problems, data decomposition and data pruning. In data decomposition, we illustrate that a data set is best approximated by its principal subsets which are Pareto optimal with respect to the complexity and the set size. In data pruning, we show that outliers usually have high complexity contributions, and propose methods for estimating the complexity contribution. Since in practice we have to approximate the ideal data complexity measures, we also discuss the impact of such approximations
Increasing Fairness in Compromise on Accuracy via Weighted Vote with Learning Guarantees
As the bias issue is being taken more and more seriously in widely applied
machine learning systems, the decrease in accuracy in most cases deeply
disturbs researchers when increasing fairness. To address this problem, we
present a novel analysis of the expected fairness quality via weighted vote,
suitable for both binary and multi-class classification. The analysis takes the
correction of biased predictions by ensemble members into account and provides
learning bounds that are amenable to efficient minimisation. We further propose
a pruning method based on this analysis and the concepts of domination and
Pareto optimality, which is able to increase fairness under a prerequisite of
little or even no accuracy decline. The experimental results indicate that the
proposed learning bounds are faithful and that the proposed pruning method can
indeed increase ensemble fairness without much accuracy degradation.Comment: 18 pages, 15 figures, and 6 table
Multiobjective Sparse Ensemble Learning by Means of Evolutionary Algorithms
The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.Ensemble learning can improve the performance of individual classifiers by combining their decisions. The sparseness of ensemble learning has attracted much attention in recent years. In this paper, a novel multiobjective sparse ensemble learning (MOSEL) model is proposed. Firstly, to describe the ensemble classifiers more precisely the detection error trade-off (DET) curve is taken into consideration. The sparsity ratio (sr) is treated as the third objective to be minimized, in addition to false positive rate (fpr) and false negative rate (fnr) minimization. The MOSEL turns out to be augmented DET (ADET) convex hull maximization problem. Secondly, several evolutionary multiobjective algorithms are exploited to find sparse ensemble classifiers with strong performance. The relationship between the sparsity and the performance of ensemble classifiers on the ADET space is explained. Thirdly, an adaptive MOSEL classifiers selection method is designed to select the most suitable ensemble classifiers for a given dataset. The proposed MOSEL method is applied to well-known MNIST datasets and a real-world remote sensing image change detection problem, and several datasets are used to test the performance of the method on this problem. Experimental results based on both MNIST datasets and remote sensing image change detection show that MOSEL performs significantly better than conventional ensemble learning methods
Beyond neural scaling laws: beating power law scaling via data pruning
Widely observed neural scaling laws, in which error falls off as a power of
the training set size, model size, or both, have driven substantial performance
improvements in deep learning. However, these improvements through scaling
alone require considerable costs in compute and energy. Here we focus on the
scaling of error with dataset size and show how in theory we can break beyond
power law scaling and potentially even reduce it to exponential scaling instead
if we have access to a high-quality data pruning metric that ranks the order in
which training examples should be discarded to achieve any pruned dataset size.
We then test this improved scaling prediction with pruned dataset size
empirically, and indeed observe better than power law scaling in practice on
ResNets trained on CIFAR-10, SVHN, and ImageNet. Next, given the importance of
finding high-quality pruning metrics, we perform the first large-scale
benchmarking study of ten different data pruning metrics on ImageNet. We find
most existing high performing metrics scale poorly to ImageNet, while the best
are computationally intensive and require labels for every image. We therefore
developed a new simple, cheap and scalable self-supervised pruning metric that
demonstrates comparable performance to the best supervised metrics. Overall,
our work suggests that the discovery of good data-pruning metrics may provide a
viable path forward to substantially improved neural scaling laws, thereby
reducing the resource costs of modern deep learning.Comment: Outstanding Paper Award @ NeurIPS 2022. Added github link to metric
score
A Diversity-Accuracy Measure for Homogenous Ensemble Selection
Several selection methods in the literature are essentially based on an evaluation function that determines whether a model M contributes positively to boost the performances of the whole ensemble. In this paper, we propose a method called DIversity and ACcuracy for Ensemble Selection (DIACES) using an evaluation function based on both diversity and accuracy. The method is applied on homogenous ensembles composed of C4.5 decision trees and based on a hill climbing strategy. This allows selecting ensembles with the best compromise between maximum diversity and minimum error rate. Comparative studies show that in most cases the proposed method generates reduced size ensembles with better performances than usual ensemble simplification methods
- ā¦