2,768 research outputs found
Ordinal Regression by Extended Binary Classification
We present a reduction framework from ordinal regression to binary classification based on extended examples. The framework consists of three steps: extracting
extended examples from the original examples, learning a binary classifier on the extended examples with any binary classification algorithm, and constructing a
ranking rule from the binary classifier. A weighted 0/1 loss of the binary classifier would then bound the mislabeling cost of the ranking rule. Our framework
allows not only to design good ordinal regression algorithms based on well-tuned binary classification approaches, but also to derive new generalization bounds for
ordinal regression from known bounds for binary classification. In addition, our framework unifies many existing ordinal regression algorithms, such as perceptron
ranking and support vector ordinal regression. When compared empirically on benchmark data sets, some of our newly designed algorithms enjoy advantages
in terms of both training speed and generalization performance over existing algorithms, which demonstrates the usefulness of our framework
Supervised Learning with Similarity Functions
We address the problem of general supervised learning when data can only be
accessed through an (indefinite) similarity function between data points.
Existing work on learning with indefinite kernels has concentrated solely on
binary/multi-class classification problems. We propose a model that is generic
enough to handle any supervised learning task and also subsumes the model
previously proposed for classification. We give a "goodness" criterion for
similarity functions w.r.t. a given supervised learning task and then adapt a
well-known landmarking technique to provide efficient algorithms for supervised
learning using "good" similarity functions. We demonstrate the effectiveness of
our model on three important super-vised learning problems: a) real-valued
regression, b) ordinal regression and c) ranking where we show that our method
guarantees bounded generalization error. Furthermore, for the case of
real-valued regression, we give a natural goodness definition that, when used
in conjunction with a recent result in sparse vector recovery, guarantees a
sparse predictor with bounded generalization error. Finally, we report results
of our learning algorithms on regression and ordinal regression tasks using
non-PSD similarity functions and demonstrate the effectiveness of our
algorithms, especially that of the sparse landmark selection algorithm that
achieves significantly higher accuracies than the baseline methods while
offering reduced computational costs.Comment: To appear in the proceedings of NIPS 2012, 30 page
Separable Convex Optimization with Nested Lower and Upper Constraints
We study a convex resource allocation problem in which lower and upper bounds
are imposed on partial sums of allocations. This model is linked to a large
range of applications, including production planning, speed optimization,
stratified sampling, support vector machines, portfolio management, and
telecommunications. We propose an efficient gradient-free divide-and-conquer
algorithm, which uses monotonicity arguments to generate valid bounds from the
recursive calls, and eliminate linking constraints based on the information
from sub-problems. This algorithm does not need strict convexity or
differentiability. It produces an -approximate solution for the
continuous problem in time
and an integer solution in time, where is
the number of decision variables, is the number of constraints, and is
the resource bound. A complexity of is also achieved
for the linear and quadratic cases. These are the best complexities known to
date for this important problem class. Our experimental analyses confirm the
good performance of the method, which produces optimal solutions for problems
with up to 1,000,000 variables in a few seconds. Promising applications to the
support vector ordinal regression problem are also investigated
On the Consistency of Ordinal Regression Methods
Many of the ordinal regression models that have been proposed in the
literature can be seen as methods that minimize a convex surrogate of the
zero-one, absolute, or squared loss functions. A key property that allows to
study the statistical implications of such approximations is that of Fisher
consistency. Fisher consistency is a desirable property for surrogate loss
functions and implies that in the population setting, i.e., if the probability
distribution that generates the data were available, then optimization of the
surrogate would yield the best possible model. In this paper we will
characterize the Fisher consistency of a rich family of surrogate loss
functions used in the context of ordinal regression, including support vector
ordinal regression, ORBoosting and least absolute deviation. We will see that,
for a family of surrogate loss functions that subsumes support vector ordinal
regression and ORBoosting, consistency can be fully characterized by the
derivative of a real-valued function at zero, as happens for convex
margin-based surrogates in binary classification. We also derive excess risk
bounds for a surrogate of the absolute error that generalize existing risk
bounds for binary classification. Finally, our analysis suggests a novel
surrogate of the squared error loss. We compare this novel surrogate with
competing approaches on 9 different datasets. Our method shows to be highly
competitive in practice, outperforming the least squares loss on 7 out of 9
datasets.Comment: Journal of Machine Learning Research 18 (2017
Convex Calibration Dimension for Multiclass Loss Matrices
We study consistency properties of surrogate loss functions for general
multiclass learning problems, defined by a general multiclass loss matrix. We
extend the notion of classification calibration, which has been studied for
binary and multiclass 0-1 classification problems (and for certain other
specific learning problems), to the general multiclass setting, and derive
necessary and sufficient conditions for a surrogate loss to be calibrated with
respect to a loss matrix in this setting. We then introduce the notion of
convex calibration dimension of a multiclass loss matrix, which measures the
smallest `size' of a prediction space in which it is possible to design a
convex surrogate that is calibrated with respect to the loss matrix. We derive
both upper and lower bounds on this quantity, and use these results to analyze
various loss matrices. In particular, we apply our framework to study various
subset ranking losses, and use the convex calibration dimension as a tool to
show both the existence and non-existence of various types of convex calibrated
surrogates for these losses. Our results strengthen recent results of Duchi et
al. (2010) and Calauzenes et al. (2012) on the non-existence of certain types
of convex calibrated surrogates in subset ranking. We anticipate the convex
calibration dimension may prove to be a useful tool in the study and design of
surrogate losses for general multiclass learning problems.Comment: Accepted to JMLR, pending editin
- …