24 research outputs found
Surrogate regret bounds for generalized classification performance metrics
We consider optimization of generalized performance metrics for binary
classification by means of surrogate losses. We focus on a class of metrics,
which are linear-fractional functions of the false positive and false negative
rates (examples of which include -measure, Jaccard similarity
coefficient, AM measure, and many others). Our analysis concerns the following
two-step procedure. First, a real-valued function is learned by minimizing
a surrogate loss for binary classification on the training sample. It is
assumed that the surrogate loss is a strongly proper composite loss function
(examples of which include logistic loss, squared-error loss, exponential loss,
etc.). Then, given , a threshold is tuned on a separate
validation sample, by direct optimization of the target performance metric. We
show that the regret of the resulting classifier (obtained from thresholding
on ) measured with respect to the target metric is
upperbounded by the regret of measured with respect to the surrogate loss.
We also extend our results to cover multilabel classification and provide
regret bounds for micro- and macro-averaging measures. Our findings are further
analyzed in a computational study on both synthetic and real data sets.Comment: 22 page
Online Isotonic Regression
We consider the online version of the isotonic regression problem. Given a
set of linearly ordered points (e.g., on the real line), the learner must
predict labels sequentially at adversarially chosen positions and is evaluated
by her total squared loss compared against the best isotonic (non-decreasing)
function in hindsight. We survey several standard online learning algorithms
and show that none of them achieve the optimal regret exponent; in fact, most
of them (including Online Gradient Descent, Follow the Leader and Exponential
Weights) incur linear regret. We then prove that the Exponential Weights
algorithm played over a covering net of isotonic functions has a regret bounded
by and present a matching
lower bound on regret. We provide a computationally efficient version of this
algorithm. We also analyze the noise-free case, in which the revealed labels
are isotonic, and show that the bound can be improved to or even to
(when the labels are revealed in isotonic order). Finally, we extend the
analysis beyond squared loss and give bounds for entropic loss and absolute
loss.Comment: 25 page
Generalized test utilities for long-tail performance in extreme multi-label classification
Extreme multi-label classification (XMLC) is the task of selecting a small
subset of relevant labels from a very large set of possible labels. As such, it
is characterized by long-tail labels, i.e., most labels have very few positive
instances. With standard performance measures such as precision@k, a classifier
can ignore tail labels and still report good performance. However, it is often
argued that correct predictions in the tail are more "interesting" or
"rewarding," but the community has not yet settled on a metric capturing this
intuitive concept. The existing propensity-scored metrics fall short on this
goal by confounding the problems of long-tail and missing labels. In this
paper, we analyze generalized metrics budgeted "at k" as an alternative
solution. To tackle the challenging problem of optimizing these metrics, we
formulate it in the expected test utility (ETU) framework, which aims to
optimize the expected performance on a fixed test set. We derive optimal
prediction rules and construct computationally efficient approximations with
provable regret guarantees and robustness against model misspecification. Our
algorithm, based on block coordinate ascent, scales effortlessly to XMLC
problems and obtains promising results in terms of long-tail performance.Comment: This is the authors' version of the work accepted to NeurIPS 2023;
the final version of the paper, errors and typos corrected, and minor
modifications to improve clarit
Random permutation online isotonic regression
We revisit isotonic regression on linear orders, the problem of fitting monotonic functions to best explain the data, in an online setting. It was previously shown that online isotonic regression is unlearnable in a fully adversarial model, which lead to its study in the fixed design model. Here, we instead develop the more practical random permutation model. We show that the regret is bounded above by the excess leave-one-out loss for which we develop efficient algorithms and matching lower bounds. We also analyze the class of simple and popular forward algorithms and recommend where to look for algorithms for online isotonic regression on partial orders
Robust Online Convex Optimization in the Presence of Outliers
We consider online convex optimization when a number k of data points are
outliers that may be corrupted. We model this by introducing the notion of
robust regret, which measures the regret only on rounds that are not outliers.
The aim for the learner is to achieve small robust regret, without knowing
where the outliers are. If the outliers are chosen adversarially, we show that
a simple filtering strategy on extreme gradients incurs O(k) additive overhead
compared to the usual regret bounds, and that this is unimprovable, which means
that k needs to be sublinear in the number of rounds. We further ask which
additional assumptions would allow for a linear number of outliers. It turns
out that the usual benign cases of independently, identically distributed
(i.i.d.) observations or strongly convex losses are not sufficient. However,
combining i.i.d. observations with the assumption that outliers are those
observations that are in an extreme quantile of the distribution, does lead to
sublinear robust regret, even though the expected number of outliers is linear
Generalized test utilities for long-tail performance in extreme multi-label classification
Extreme multi-label classification (XMLC) is the task of selecting a small subset of relevant labels from a very large set of possible labels. As such, it is characterized by long-tail labels, i.e., most labels have very few positive instances. With standard performance measures such as precision@k, a classifier can ignore tail labels and still report good performance. However, it is often argued that correct predictions in the tail are more "interesting" or "rewarding," but the community has not yet settled on a metric capturing this intuitive concept. The existing propensity-scored metrics fall short on this goal by confounding the problems of long-tail and missing labels. In this paper, we analyze generalized metrics budgeted "at k" as an alternative solution. To tackle the challenging problem of optimizing these metrics, we formulate it in the expected test utility (ETU) framework, which aims to optimize the expected performance on a fixed test set. We derive optimal prediction rules and construct computationally efficient approximations with provable regret guarantees and robustness against model misspecification. Our algorithm, based on block coordinate ascent, scales effortlessly to XMLC problems and obtains promising results in terms of long-tail performance
Quantum learning: optimal classification of qubit states
Pattern recognition is a central topic in Learning Theory with numerous
applications such as voice and text recognition, image analysis, computer
diagnosis. The statistical set-up in classification is the following: we are
given an i.i.d. training set where
represents a feature and is a label attached to that
feature. The underlying joint distribution of is unknown, but we can
learn about it from the training set and we aim at devising low error
classifiers used to predict the label of new incoming features.
Here we solve a quantum analogue of this problem, namely the classification
of two arbitrary unknown qubit states. Given a number of `training' copies from
each of the states, we would like to `learn' about them by performing a
measurement on the training set. The outcome is then used to design mesurements
for the classification of future systems with unknown labels. We find the
asymptotically optimal classification strategy and show that typically, it
performs strictly better than a plug-in strategy based on state estimation.
The figure of merit is the excess risk which is the difference between the
probability of error and the probability of error of the optimal measurement
when the states are known, that is the Helstrom measurement. We show that the
excess risk has rate and compute the exact constant of the rate.Comment: 24 pages, 4 figure