2,740 research outputs found
Efficient Transductive Online Learning via Randomized Rounding
Most traditional online learning algorithms are based on variants of mirror
descent or follow-the-leader. In this paper, we present an online algorithm
based on a completely different approach, tailored for transductive settings,
which combines "random playout" and randomized rounding of loss subgradients.
As an application of our approach, we present the first computationally
efficient online algorithm for collaborative filtering with trace-norm
constrained matrices. As a second application, we solve an open question
linking batch learning and transductive online learningComment: To appear in a Festschrift in honor of V.N. Vapnik. Preliminary
version presented in NIPS 201
On-line PCA with Optimal Regrets
We carefully investigate the on-line version of PCA, where in each trial a
learning algorithm plays a k-dimensional subspace, and suffers the compression
loss on the next instance when projected into the chosen subspace. In this
setting, we analyze two popular on-line algorithms, Gradient Descent (GD) and
Exponentiated Gradient (EG). We show that both algorithms are essentially
optimal in the worst-case. This comes as a surprise, since EG is known to
perform sub-optimally when the instances are sparse. This different behavior of
EG for PCA is mainly related to the non-negativity of the loss in this case,
which makes the PCA setting qualitatively different from other settings studied
in the literature. Furthermore, we show that when considering regret bounds as
function of a loss budget, EG remains optimal and strictly outperforms GD.
Next, we study the extension of the PCA setting, in which the Nature is allowed
to play with dense instances, which are positive matrices with bounded largest
eigenvalue. Again we can show that EG is optimal and strictly better than GD in
this setting
Bandits with heavy tail
The stochastic multi-armed bandit problem is well understood when the reward
distributions are sub-Gaussian. In this paper we examine the bandit problem
under the weaker assumption that the distributions have moments of order
1+\epsilon, for some . Surprisingly, moments of order 2
(i.e., finite variance) are sufficient to obtain regret bounds of the same
order as under sub-Gaussian reward distributions. In order to achieve such
regret, we define sampling strategies based on refined estimators of the mean
such as the truncated empirical mean, Catoni's M-estimator, and the
median-of-means estimator. We also derive matching lower bounds that also show
that the best achievable regret deteriorates when \epsilon <1
On the Complexity of Learning with Kernels
A well-recognized limitation of kernel learning is the requirement to handle
a kernel matrix, whose size is quadratic in the number of training examples.
Many methods have been proposed to reduce this computational cost, mostly by
using a subset of the kernel matrix entries, or some form of low-rank matrix
approximation, or a random projection method. In this paper, we study lower
bounds on the error attainable by such methods as a function of the number of
entries observed in the kernel matrix or the rank of an approximate kernel
matrix. We show that there are kernel learning problems where no such method
will lead to non-trivial computational savings. Our results also quantify how
the problem difficulty depends on parameters such as the nature of the loss
function, the regularization parameter, the norm of the desired predictor, and
the kernel matrix rank. Our results also suggest cases where more efficient
kernel learning might be possible
Online Learning with Switching Costs and Other Adaptive Adversaries
We study the power of different types of adaptive (nonoblivious) adversaries
in the setting of prediction with expert advice, under both full-information
and bandit feedback. We measure the player's performance using a new notion of
regret, also known as policy regret, which better captures the adversary's
adaptiveness to the player's behavior. In a setting where losses are allowed to
drift, we characterize ---in a nearly complete manner--- the power of adaptive
adversaries with bounded memories and switching costs. In particular, we show
that with switching costs, the attainable rate with bandit feedback is
. Interestingly, this rate is significantly worse
than the rate attainable with switching costs in the
full-information case. Via a novel reduction from experts to bandits, we also
show that a bounded memory adversary can force
regret even in the full information case, proving that switching costs are
easier to control than bounded memory adversaries. Our lower bounds rely on a
new stochastic adversary strategy that generates loss processes with strong
dependencies
Cooperative Online Learning: Keeping your Neighbors Updated
We study an asynchronous online learning setting with a network of agents. At
each time step, some of the agents are activated, requested to make a
prediction, and pay the corresponding loss. The loss function is then revealed
to these agents and also to their neighbors in the network. Our results
characterize how much knowing the network structure affects the regret as a
function of the model of agent activations. When activations are stochastic,
the optimal regret (up to constant factors) is shown to be of order
, where is the horizon and is the independence
number of the network. We prove that the upper bound is achieved even when
agents have no information about the network structure. When activations are
adversarial the situation changes dramatically: if agents ignore the network
structure, a lower bound on the regret can be proven, showing that
learning is impossible. However, when agents can choose to ignore some of their
neighbors based on the knowledge of the network structure, we prove a
sublinear regret bound, where is the clique-covering number of the network
Online Learning of Noisy Data with Kernels
We study online learning when individual instances are corrupted by
adversarially chosen random noise. We assume the noise distribution is unknown,
and may change over time with no restriction other than having zero mean and
bounded variance. Our technique relies on a family of unbiased estimators for
non-linear functions, which may be of independent interest. We show that a
variant of online gradient descent can learn functions in any dot-product
(e.g., polynomial) or Gaussian kernel space with any analytic convex loss
function. Our variant uses randomized estimates that need to query a random
number of noisy copies of each instance, where with high probability this
number is upper bounded by a constant. Allowing such multiple queries cannot be
avoided: Indeed, we show that online learning is in general impossible when
only one noisy copy of each instance can be accessed.Comment: This is a full version of the paper appearing in the 23rd
International Conference on Learning Theory (COLT 2010
- …