5 research outputs found
On the Consistency of Ordinal Regression Methods
Many of the ordinal regression models that have been proposed in the
literature can be seen as methods that minimize a convex surrogate of the
zero-one, absolute, or squared loss functions. A key property that allows to
study the statistical implications of such approximations is that of Fisher
consistency. Fisher consistency is a desirable property for surrogate loss
functions and implies that in the population setting, i.e., if the probability
distribution that generates the data were available, then optimization of the
surrogate would yield the best possible model. In this paper we will
characterize the Fisher consistency of a rich family of surrogate loss
functions used in the context of ordinal regression, including support vector
ordinal regression, ORBoosting and least absolute deviation. We will see that,
for a family of surrogate loss functions that subsumes support vector ordinal
regression and ORBoosting, consistency can be fully characterized by the
derivative of a real-valued function at zero, as happens for convex
margin-based surrogates in binary classification. We also derive excess risk
bounds for a surrogate of the absolute error that generalize existing risk
bounds for binary classification. Finally, our analysis suggests a novel
surrogate of the squared error loss. We compare this novel surrogate with
competing approaches on 9 different datasets. Our method shows to be highly
competitive in practice, outperforming the least squares loss on 7 out of 9
datasets.Comment: Journal of Machine Learning Research 18 (2017
A hybrid loss for multiclass and structured prediction
We propose a novel hybrid loss for multiclass and structured prediction problems that is a convex combination of a log loss for Conditional Random Fields (CRFs) and a multiclass hinge loss for Support Vector Machines (SVMs). We provide a sufficient condition for when the hybrid loss is Fisher consistent for classification. This condition depends on a measure of dominance between labels-specifically, the gap between the probabilities of the best label and the second best label. We also prove Fisher consistency is necessary for parametric consistency when learning models such as CRFs. We demonstrate empirically that the hybrid loss typically performs least as well as-and often better than-both of its constituent losses on a variety of tasks, such as human action recognition. In doing so we also provide an empirical comparison of the efficacy of probabilistic and margin based approaches to multiclass and structured prediction.Qinfeng Shi, Mark Reid, Tiberio Caetano, Anton van den Hengel, and Zhenhua Wan