7,282 research outputs found
Online Multiple Kernel Learning for Structured Prediction
Despite the recent progress towards efficient multiple kernel learning (MKL),
the structured output case remains an open research front. Current approaches
involve repeatedly solving a batch learning problem, which makes them
inadequate for large scale scenarios. We propose a new family of online
proximal algorithms for MKL (as well as for group-lasso and variants thereof),
which overcomes that drawback. We show regret, convergence, and generalization
bounds for the proposed method. Experiments on handwriting recognition and
dependency parsing testify for the successfulness of the approach
Differentially Private Online Learning
In this paper, we consider the problem of preserving privacy in the online
learning setting. We study the problem in the online convex programming (OCP)
framework---a popular online learning setting with several interesting
theoretical and practical implications---while using differential privacy as
the formal privacy measure. For this problem, we distill two critical
attributes that a private OCP algorithm should have in order to provide
reasonable privacy as well as utility guarantees: 1) linearly decreasing
sensitivity, i.e., as new data points arrive their effect on the learning model
decreases, 2) sub-linear regret bound---regret bound is a popular
goodness/utility measure of an online learning algorithm.
Given an OCP algorithm that satisfies these two conditions, we provide a
general framework to convert the given algorithm into a privacy preserving OCP
algorithm with good (sub-linear) regret. We then illustrate our approach by
converting two popular online learning algorithms into their differentially
private variants while guaranteeing sub-linear regret (). Next, we
consider the special case of online linear regression problems, a practically
important class of online learning problems, for which we generalize an
approach by Dwork et al. to provide a differentially private algorithm with
just regret. Finally, we show that our online learning
framework can be used to provide differentially private algorithms for offline
learning as well. For the offline learning problem, our approach obtains better
error bounds as well as can handle larger class of problems than the existing
state-of-the-art methods Chaudhuri et al
The Case for Full-Matrix Adaptive Regularization
Adaptive regularization methods come in diagonal and full-matrix variants.
However, only the former have enjoyed widespread adoption in training
large-scale deep models. This is due to the computational overhead of
manipulating a full matrix in high dimension. In this paper, we show how to
make full-matrix adaptive regularization practical and useful. We present GGT,
a truly scalable full-matrix adaptive optimizer. At the heart of our algorithm
is an efficient method for computing the inverse square root of a low-rank
matrix. We show that GGT converges to first-order local minima, providing the
first rigorous theoretical analysis of adaptive regularization in non-convex
optimization. In preliminary experiments, GGT trains faster across a variety of
synthetic tasks and standard deep learning benchmarks
A Simple Analysis for Exp-concave Empirical Minimization with Arbitrary Convex Regularizer
In this paper, we present a simple analysis of {\bf fast rates} with {\it
high probability} of {\bf empirical minimization} for {\it stochastic composite
optimization} over a finite-dimensional bounded convex set with exponential
concave loss functions and an arbitrary convex regularization. To the best of
our knowledge, this result is the first of its kind. As a byproduct, we can
directly obtain the fast rate with {\it high probability} for exponential
concave empirical risk minimization with and without any convex regularization,
which not only extends existing results of empirical risk minimization but also
provides a unified framework for analyzing exponential concave empirical risk
minimization with and without {\it any} convex regularization. Our proof is
very simple only exploiting the covering number of a finite-dimensional bounded
set and a concentration inequality of random vectors
On the Generalization Ability of Online Learning Algorithms for Pairwise Loss Functions
In this paper, we study the generalization properties of online learning
based stochastic methods for supervised learning problems where the loss
function is dependent on more than one training sample (e.g., metric learning,
ranking). We present a generic decoupling technique that enables us to provide
Rademacher complexity-based generalization error bounds. Our bounds are in
general tighter than those obtained by Wang et al (COLT 2012) for the same
problem. Using our decoupling technique, we are further able to obtain fast
convergence rates for strongly convex pairwise loss functions. We are also able
to analyze a class of memory efficient online learning algorithms for pairwise
learning problems that use only a bounded subset of past training samples to
update the hypothesis at each step. Finally, in order to complement our
generalization bounds, we propose a novel memory efficient online learning
algorithm for higher order learning problems with bounded regret guarantees.Comment: To appear in proceedings of the 30th International Conference on
Machine Learning (ICML 2013
Fast Rates of ERM and Stochastic Approximation: Adaptive to Error Bound Conditions
Error bound conditions (EBC) are properties that characterize the growth of
an objective function when a point is moved away from the optimal set. They
have recently received increasing attention in the field of optimization for
developing optimization algorithms with fast convergence. However, the studies
of EBC in statistical learning are hitherto still limited. The main
contributions of this paper are two-fold. First, we develop fast and
intermediate rates of empirical risk minimization (ERM) under EBC for risk
minimization with Lipschitz continuous, and smooth convex random functions.
Second, we establish fast and intermediate rates of an efficient stochastic
approximation (SA) algorithm for risk minimization with Lipschitz continuous
random functions, which requires only one pass of samples and adapts to
EBC. For both approaches, the convergence rates span a full spectrum between
and depending on the power
constant in EBC, and could be even faster than in special cases for
ERM. Moreover, these convergence rates are automatically adaptive without using
any knowledge of EBC. Overall, this work not only strengthens the understanding
of ERM for statistical learning but also brings new fast stochastic algorithms
for solving a broad range of statistical learning problems
Regularization Techniques for Learning with Matrices
There is growing body of learning problems for which it is natural to
organize the parameters into matrix, so as to appropriately regularize the
parameters under some matrix norm (in order to impose some more sophisticated
prior knowledge). This work describes and analyzes a systematic method for
constructing such matrix-based, regularization methods. In particular, we focus
on how the underlying statistical properties of a given problem can help us
decide which regularization function is appropriate.
Our methodology is based on the known duality fact: that a function is
strongly convex with respect to some norm if and only if its conjugate function
is strongly smooth with respect to the dual norm. This result has already been
found to be a key component in deriving and analyzing several learning
algorithms. We demonstrate the potential of this framework by deriving novel
generalization and regret bounds for multi-task learning, multi-class learning,
and kernel learning
Exploiting Problem Structure in Optimization under Uncertainty via Online Convex Optimization
In this paper, we consider two paradigms that are developed to account for
uncertainty in optimization models: robust optimization (RO) and joint
estimation-optimization (JEO). We examine recent developments on efficient and
scalable iterative first-order methods for these problems, and show that these
iterative methods can be viewed through the lens of online convex optimization
(OCO). The standard OCO framework has seen much success for its ability to
handle decision-making in dynamic, uncertain, and even adversarial
environments. Nevertheless, our applications of interest present further
flexibility in OCO via three simple modifications to standard OCO assumptions:
we introduce two new concepts of weighted regret and online saddle point
problems and study the possibility of making lookahead (anticipatory)
decisions. Our analyses demonstrate that these flexibilities introduced into
the OCO framework have significant consequences whenever they are applicable.
For example, in the strongly convex case, minimizing unweighted regret has a
proven optimal bound of , whereas we show that a bound of
is possible when we consider weighted regret. Similarly, for the
smooth case, considering -lookahead decisions results in a bound,
compared to in the standard OCO setting. Consequently, these
OCO tools are instrumental in exploiting structural properties of functions and
resulting in improved convergence rates for RO and JEO. In certain cases, our
results for RO and JEO match the best known or optimal rates in the
corresponding problem classes without data uncertainty
Provable Guarantees for Gradient-Based Meta-Learning
We study the problem of meta-learning through the lens of online convex
optimization, developing a meta-algorithm bridging the gap between popular
gradient-based meta-learning and classical regularization-based multi-task
transfer methods. Our method is the first to simultaneously satisfy good sample
efficiency guarantees in the convex setting, with generalization bounds that
improve with task-similarity, while also being computationally scalable to
modern deep learning architectures and the many-task setting. Despite its
simplicity, the algorithm matches, up to a constant factor, a lower bound on
the performance of any such parameter-transfer method under natural task
similarity assumptions. We use experiments in both convex and deep learning
settings to verify and demonstrate the applicability of our theory.Comment: ICML 201
On Distributed Online Classification in the Midst of Concept Drifts
In this work, we analyze the generalization ability of distributed online
learning algorithms under stationary and non-stationary environments. We derive
bounds for the excess-risk attained by each node in a connected network of
learners and study the performance advantage that diffusion strategies have
over individual non-cooperative processing. We conduct extensive simulations to
illustrate the results.Comment: 19 pages, 14 figures, to appear in Neurocomputing, 201
- …