421 research outputs found
Online and Stochastic Gradient Methods for Non-decomposable Loss Functions
Modern applications in sensitive domains such as biometrics and medicine
frequently require the use of non-decomposable loss functions such as
precision@k, F-measure etc. Compared to point loss functions such as
hinge-loss, these offer much more fine grained control over prediction, but at
the same time present novel challenges in terms of algorithm design and
analysis. In this work we initiate a study of online learning techniques for
such non-decomposable loss functions with an aim to enable incremental learning
as well as design scalable solvers for batch problems. To this end, we propose
an online learning framework for such loss functions. Our model enjoys several
nice properties, chief amongst them being the existence of efficient online
learning algorithms with sublinear regret and online to batch conversion
bounds. Our model is a provable extension of existing online learning models
for point loss functions. We instantiate two popular losses, prec@k and pAUC,
in our model and prove sublinear regret bounds for both of them. Our proofs
require a novel structural lemma over ranked lists which may be of independent
interest. We then develop scalable stochastic gradient descent solvers for
non-decomposable loss functions. We show that for a large family of loss
functions satisfying a certain uniform convergence property (that includes
prec@k, pAUC, and F-measure), our methods provably converge to the empirical
risk minimizer. Such uniform convergence results were not known for these
losses and we establish these using novel proof techniques. We then use
extensive experimentation on real life and benchmark datasets to establish that
our method can be orders of magnitude faster than a recently proposed cutting
plane method.Comment: 25 pages, 3 figures, To appear in the proceedings of the 28th Annual
Conference on Neural Information Processing Systems, NIPS 201
Bundle methods for regularized risk minimization with applications to robust learning
Supervised learning in general and regularized risk minimization in particular is about solving optimization problem which is jointly defined by a performance measure and a set of labeled training examples. The outcome of learning, a model, is then used mainly for predicting the labels for unlabeled examples in the testing environment. In real-world scenarios: a typical learning process often involves solving a sequence of similar problems with different parameters before a final model is identified. For learning to be successful, the final model must be produced timely, and the model should be robust to (mild) irregularities in the testing environment. The purpose of this thesis is to investigate ways to speed up the learning process and improve the robustness of the learned model. We first develop a batch convex optimization solver specialized to the regularized risk minimization based on standard bundle methods. The solver inherits two main properties of the standard bundle methods. Firstly, it is capable of solving both differentiable and non-differentiable problems, hence its implementation can be reused for different tasks with minimal modification. Secondly, the optimization is easily amenable to parallel and distributed computation settings; this makes the solver highly scalable in the number of training examples. However, unlike the standard bundle methods, the solver does not have extra parameters which need careful tuning. Furthermore, we prove that the solver has faster convergence rate. In addition to that, the solver is very efficient in computing approximate regularization path and model selection. We also present a convex risk formulation for incorporating invariances and prior knowledge into the learning problem. This formulation generalizes many existing approaches for robust learning in the setting of insufficient or noisy training examples and covariate shift. Lastly, we extend a non-convex risk formulation for binary classification to structured prediction. Empirical results show that the model obtained with this risk formulation is robust to outliers in the training examples
Block-Coordinate Frank-Wolfe Optimization for Structural SVMs
We propose a randomized block-coordinate variant of the classic Frank-Wolfe
algorithm for convex optimization with block-separable constraints. Despite its
lower iteration cost, we show that it achieves a similar convergence rate in
duality gap as the full Frank-Wolfe algorithm. We also show that, when applied
to the dual structural support vector machine (SVM) objective, this yields an
online algorithm that has the same low iteration complexity as primal
stochastic subgradient methods. However, unlike stochastic subgradient methods,
the block-coordinate Frank-Wolfe algorithm allows us to compute the optimal
step-size and yields a computable duality gap guarantee. Our experiments
indicate that this simple algorithm outperforms competing structural SVM
solvers.Comment: Appears in Proceedings of the 30th International Conference on
Machine Learning (ICML 2013). 9 pages main text + 22 pages appendix. Changes
from v3 to v4: 1) Re-organized appendix; improved & clarified duality gap
proofs; re-drew all plots; 2) Changed convention for Cf definition; 3) Added
weighted averaging experiments + convergence results; 4) Clarified main text
and relationship with appendi
- …