4,906 research outputs found
Convex Calibration Dimension for Multiclass Loss Matrices
We study consistency properties of surrogate loss functions for general
multiclass learning problems, defined by a general multiclass loss matrix. We
extend the notion of classification calibration, which has been studied for
binary and multiclass 0-1 classification problems (and for certain other
specific learning problems), to the general multiclass setting, and derive
necessary and sufficient conditions for a surrogate loss to be calibrated with
respect to a loss matrix in this setting. We then introduce the notion of
convex calibration dimension of a multiclass loss matrix, which measures the
smallest `size' of a prediction space in which it is possible to design a
convex surrogate that is calibrated with respect to the loss matrix. We derive
both upper and lower bounds on this quantity, and use these results to analyze
various loss matrices. In particular, we apply our framework to study various
subset ranking losses, and use the convex calibration dimension as a tool to
show both the existence and non-existence of various types of convex calibrated
surrogates for these losses. Our results strengthen recent results of Duchi et
al. (2010) and Calauzenes et al. (2012) on the non-existence of certain types
of convex calibrated surrogates in subset ranking. We anticipate the convex
calibration dimension may prove to be a useful tool in the study and design of
surrogate losses for general multiclass learning problems.Comment: Accepted to JMLR, pending editin
Radar-based Road User Classification and Novelty Detection with Recurrent Neural Network Ensembles
Radar-based road user classification is an important yet still challenging
task towards autonomous driving applications. The resolution of conventional
automotive radar sensors results in a sparse data representation which is tough
to recover by subsequent signal processing. In this article, classifier
ensembles originating from a one-vs-one binarization paradigm are enriched by
one-vs-all correction classifiers. They are utilized to efficiently classify
individual traffic participants and also identify hidden object classes which
have not been presented to the classifiers during training. For each classifier
of the ensemble an individual feature set is determined from a total set of 98
features. Thereby, the overall classification performance can be improved when
compared to previous methods and, additionally, novel classes can be identified
much more accurately. Furthermore, the proposed structure allows to give new
insights in the importance of features for the recognition of individual
classes which is crucial for the development of new algorithms and sensor
requirements.Comment: 8 pages, 9 figures, accepted paper for 2019 IEEE Intelligent Vehicles
Symposium (IV), Paris, France, June 201
On the Consistency of Ordinal Regression Methods
Many of the ordinal regression models that have been proposed in the
literature can be seen as methods that minimize a convex surrogate of the
zero-one, absolute, or squared loss functions. A key property that allows to
study the statistical implications of such approximations is that of Fisher
consistency. Fisher consistency is a desirable property for surrogate loss
functions and implies that in the population setting, i.e., if the probability
distribution that generates the data were available, then optimization of the
surrogate would yield the best possible model. In this paper we will
characterize the Fisher consistency of a rich family of surrogate loss
functions used in the context of ordinal regression, including support vector
ordinal regression, ORBoosting and least absolute deviation. We will see that,
for a family of surrogate loss functions that subsumes support vector ordinal
regression and ORBoosting, consistency can be fully characterized by the
derivative of a real-valued function at zero, as happens for convex
margin-based surrogates in binary classification. We also derive excess risk
bounds for a surrogate of the absolute error that generalize existing risk
bounds for binary classification. Finally, our analysis suggests a novel
surrogate of the squared error loss. We compare this novel surrogate with
competing approaches on 9 different datasets. Our method shows to be highly
competitive in practice, outperforming the least squares loss on 7 out of 9
datasets.Comment: Journal of Machine Learning Research 18 (2017
Interpretable multiclass classification by MDL-based rule lists
Interpretable classifiers have recently witnessed an increase in attention
from the data mining community because they are inherently easier to understand
and explain than their more complex counterparts. Examples of interpretable
classification models include decision trees, rule sets, and rule lists.
Learning such models often involves optimizing hyperparameters, which typically
requires substantial amounts of data and may result in relatively large models.
In this paper, we consider the problem of learning compact yet accurate
probabilistic rule lists for multiclass classification. Specifically, we
propose a novel formalization based on probabilistic rule lists and the minimum
description length (MDL) principle. This results in virtually parameter-free
model selection that naturally allows to trade-off model complexity with
goodness of fit, by which overfitting and the need for hyperparameter tuning
are effectively avoided. Finally, we introduce the Classy algorithm, which
greedily finds rule lists according to the proposed criterion. We empirically
demonstrate that Classy selects small probabilistic rule lists that outperform
state-of-the-art classifiers when it comes to the combination of predictive
performance and interpretability. We show that Classy is insensitive to its
only parameter, i.e., the candidate set, and that compression on the training
set correlates with classification performance, validating our MDL-based
selection criterion
Multilabel Consensus Classification
In the era of big data, a large amount of noisy and incomplete data can be
collected from multiple sources for prediction tasks. Combining multiple models
or data sources helps to counteract the effects of low data quality and the
bias of any single model or data source, and thus can improve the robustness
and the performance of predictive models. Out of privacy, storage and bandwidth
considerations, in certain circumstances one has to combine the predictions
from multiple models or data sources to obtain the final predictions without
accessing the raw data. Consensus-based prediction combination algorithms are
effective for such situations. However, current research on prediction
combination focuses on the single label setting, where an instance can have one
and only one label. Nonetheless, data nowadays are usually multilabeled, such
that more than one label have to be predicted at the same time. Direct
applications of existing prediction combination methods to multilabel settings
can lead to degenerated performance. In this paper, we address the challenges
of combining predictions from multiple multilabel classifiers and propose two
novel algorithms, MLCM-r (MultiLabel Consensus Maximization for ranking) and
MLCM-a (MLCM for microAUC). These algorithms can capture label correlations
that are common in multilabel classifications, and optimize corresponding
performance metrics. Experimental results on popular multilabel classification
tasks verify the theoretical analysis and effectiveness of the proposed
methods
- …