11,632 research outputs found
Surrogate Functions for Maximizing Precision at the Top
The problem of maximizing precision at the top of a ranked list, often dubbed
Precision@k (prec@k), finds relevance in myriad learning applications such as
ranking, multi-label classification, and learning with severe label imbalance.
However, despite its popularity, there exist significant gaps in our
understanding of this problem and its associated performance measure.
The most notable of these is the lack of a convex upper bounding surrogate
for prec@k. We also lack scalable perceptron and stochastic gradient descent
algorithms for optimizing this performance measure. In this paper we make key
contributions in these directions. At the heart of our results is a family of
truly upper bounding surrogates for prec@k. These surrogates are motivated in a
principled manner and enjoy attractive properties such as consistency to prec@k
under various natural margin/noise conditions.
These surrogates are then used to design a class of novel perceptron
algorithms for optimizing prec@k with provable mistake bounds. We also devise
scalable stochastic gradient descent style methods for this problem with
provable convergence bounds. Our proofs rely on novel uniform convergence
bounds which require an in-depth analysis of the structural properties of
prec@k and its surrogates. We conclude with experimental results comparing our
algorithms with state-of-the-art cutting plane and stochastic gradient
algorithms for maximizing [email protected]: To appear in the the proceedings of the 32nd International Conference
on Machine Learning (ICML 2015
Perceptron learning with random coordinate descent
A perceptron is a linear threshold classifier that separates examples with a hyperplane. It is perhaps the simplest learning model that is used standalone. In this paper, we propose a family of random coordinate descent algorithms for perceptron learning on binary classification problems. Unlike most perceptron learning algorithms which require smooth cost functions, our algorithms directly minimize the training error, and usually achieve the lowest training error compared with other algorithms. The algorithms are also computational efficient. Such advantages make them favorable for both standalone use and ensemble learning, on problems that are not linearly separable. Experiments show that our algorithms work very well with AdaBoost, and achieve the lowest test errors for half of the datasets
Selective Sampling with Drift
Recently there has been much work on selective sampling, an online active
learning setting, in which algorithms work in rounds. On each round an
algorithm receives an input and makes a prediction. Then, it can decide whether
to query a label, and if so to update its model, otherwise the input is
discarded. Most of this work is focused on the stationary case, where it is
assumed that there is a fixed target model, and the performance of the
algorithm is compared to a fixed model. However, in many real-world
applications, such as spam prediction, the best target function may drift over
time, or have shifts from time to time. We develop a novel selective sampling
algorithm for the drifting setting, analyze it under no assumptions on the
mechanism generating the sequence of instances, and derive new mistake bounds
that depend on the amount of drift in the problem. Simulations on synthetic and
real-world datasets demonstrate the superiority of our algorithms as a
selective sampling algorithm in the drifting setting
Optimizing 0/1 Loss for Perceptrons by Random Coordinate Descent
The 0/1 loss is an important cost function for perceptrons. Nevertheless it cannot be easily minimized by most existing perceptron learning algorithms. In this paper, we propose a family of random coordinate descent algorithms to directly minimize the 0/1 loss for perceptrons, and prove their convergence. Our algorithms are computationally efficient, and usually achieve the lowest 0/1 loss compared with other algorithms. Such advantages make them favorable for nonseparable real-world problems. Experiments show that our algorithms are especially useful for ensemble learning, and could achieve the lowest test error for many complex data sets when coupled with AdaBoost
Learning preferences for large scale multi-label problems
Despite that the majority of machine learning approaches aim to solve binary classification problems, several real-world applications require specialized algorithms able to handle many different classes, as in the case of single-label multi-class and multi-label classification problems. The Label Ranking framework is a generalization of the above mentioned settings, which aims to map instances from the input space to a total order over the set of possible labels. However, generally these algorithms are more complex than binary ones, and their application on large-scale datasets could be untractable. The main contribution of this work is the proposal of a novel general online preference-based label ranking framework. The proposed framework is able to solve binary, multi-class, multi-label and ranking problems. A comparison with other baselines has been performed, showing effectiveness and efficiency in a real-world large-scale multi-label task
- …