52 research outputs found
Generalised Mixability, Constant Regret, and Bayesian Updating
Mixability of a loss is known to characterise when constant regret bounds are
achievable in games of prediction with expert advice through the use of Vovk's
aggregating algorithm. We provide a new interpretation of mixability via convex
analysis that highlights the role of the Kullback-Leibler divergence in its
definition. This naturally generalises to what we call -mixability where
the Bregman divergence replaces the KL divergence. We prove that
losses that are -mixable also enjoy constant regret bounds via a
generalised aggregating algorithm that is similar to mirror descent.Comment: 12 page
Competing with Gaussian linear experts
We study the problem of online regression. We prove a theoretical bound on
the square loss of Ridge Regression. We do not make any assumptions about input
vectors or outcomes. We also show that Bayesian Ridge Regression can be thought
of as an online algorithm competing with all the Gaussian linear experts
Leading strategies in competitive on-line prediction
We start from a simple asymptotic result for the problem of on-line
regression with the quadratic loss function: the class of continuous
limited-memory prediction strategies admits a "leading prediction strategy",
which not only asymptotically performs at least as well as any continuous
limited-memory strategy but also satisfies the property that the excess loss of
any continuous limited-memory strategy is determined by how closely it imitates
the leading strategy. More specifically, for any class of prediction strategies
constituting a reproducing kernel Hilbert space we construct a leading
strategy, in the sense that the loss of any prediction strategy whose norm is
not too large is determined by how closely it imitates the leading strategy.
This result is extended to the loss functions given by Bregman divergences and
by strictly proper scoring rules.Comment: 20 pages; a conference version is to appear in the ALT'2006
proceeding
On-line PCA with Optimal Regrets
We carefully investigate the on-line version of PCA, where in each trial a
learning algorithm plays a k-dimensional subspace, and suffers the compression
loss on the next instance when projected into the chosen subspace. In this
setting, we analyze two popular on-line algorithms, Gradient Descent (GD) and
Exponentiated Gradient (EG). We show that both algorithms are essentially
optimal in the worst-case. This comes as a surprise, since EG is known to
perform sub-optimally when the instances are sparse. This different behavior of
EG for PCA is mainly related to the non-negativity of the loss in this case,
which makes the PCA setting qualitatively different from other settings studied
in the literature. Furthermore, we show that when considering regret bounds as
function of a loss budget, EG remains optimal and strictly outperforms GD.
Next, we study the extension of the PCA setting, in which the Nature is allowed
to play with dense instances, which are positive matrices with bounded largest
eigenvalue. Again we can show that EG is optimal and strictly better than GD in
this setting
- …