355 research outputs found
Non-convex Optimization for Machine Learning
A vast majority of machine learning algorithms train their models and perform
inference by solving optimization problems. In order to capture the learning
and prediction problems accurately, structural constraints such as sparsity or
low rank are frequently imposed or else the objective itself is designed to be
a non-convex function. This is especially true of algorithms that operate in
high-dimensional spaces or that train non-linear models such as tensor models
and deep networks.
The freedom to express the learning problem as a non-convex optimization
problem gives immense modeling power to the algorithm designer, but often such
problems are NP-hard to solve. A popular workaround to this has been to relax
non-convex problems to convex ones and use traditional methods to solve the
(convex) relaxed optimization problems. However this approach may be lossy and
nevertheless presents significant challenges for large scale optimization.
On the other hand, direct approaches to non-convex optimization have met with
resounding success in several domains and remain the methods of choice for the
practitioner, as they frequently outperform relaxation-based techniques -
popular heuristics include projected gradient descent and alternating
minimization. However, these are often poorly understood in terms of their
convergence and other properties.
This monograph presents a selection of recent advances that bridge a
long-standing gap in our understanding of these heuristics. The monograph will
lead the reader through several widely used non-convex optimization techniques,
as well as applications thereof. The goal of this monograph is to both,
introduce the rich literature in this area, as well as equip the reader with
the tools and techniques needed to analyze these simple procedures for
non-convex problems.Comment: The official publication is available from now publishers via
http://dx.doi.org/10.1561/220000005
Supervised Learning with Similarity Functions
We address the problem of general supervised learning when data can only be
accessed through an (indefinite) similarity function between data points.
Existing work on learning with indefinite kernels has concentrated solely on
binary/multi-class classification problems. We propose a model that is generic
enough to handle any supervised learning task and also subsumes the model
previously proposed for classification. We give a "goodness" criterion for
similarity functions w.r.t. a given supervised learning task and then adapt a
well-known landmarking technique to provide efficient algorithms for supervised
learning using "good" similarity functions. We demonstrate the effectiveness of
our model on three important super-vised learning problems: a) real-valued
regression, b) ordinal regression and c) ranking where we show that our method
guarantees bounded generalization error. Furthermore, for the case of
real-valued regression, we give a natural goodness definition that, when used
in conjunction with a recent result in sparse vector recovery, guarantees a
sparse predictor with bounded generalization error. Finally, we report results
of our learning algorithms on regression and ordinal regression tasks using
non-PSD similarity functions and demonstrate the effectiveness of our
algorithms, especially that of the sparse landmark selection algorithm that
achieves significantly higher accuracies than the baseline methods while
offering reduced computational costs.Comment: To appear in the proceedings of NIPS 2012, 30 page
- …