754 research outputs found
On-line PCA with Optimal Regrets
We carefully investigate the on-line version of PCA, where in each trial a
learning algorithm plays a k-dimensional subspace, and suffers the compression
loss on the next instance when projected into the chosen subspace. In this
setting, we analyze two popular on-line algorithms, Gradient Descent (GD) and
Exponentiated Gradient (EG). We show that both algorithms are essentially
optimal in the worst-case. This comes as a surprise, since EG is known to
perform sub-optimally when the instances are sparse. This different behavior of
EG for PCA is mainly related to the non-negativity of the loss in this case,
which makes the PCA setting qualitatively different from other settings studied
in the literature. Furthermore, we show that when considering regret bounds as
function of a loss budget, EG remains optimal and strictly outperforms GD.
Next, we study the extension of the PCA setting, in which the Nature is allowed
to play with dense instances, which are positive matrices with bounded largest
eigenvalue. Again we can show that EG is optimal and strictly better than GD in
this setting
Online Isotonic Regression
We consider the online version of the isotonic regression problem. Given a
set of linearly ordered points (e.g., on the real line), the learner must
predict labels sequentially at adversarially chosen positions and is evaluated
by her total squared loss compared against the best isotonic (non-decreasing)
function in hindsight. We survey several standard online learning algorithms
and show that none of them achieve the optimal regret exponent; in fact, most
of them (including Online Gradient Descent, Follow the Leader and Exponential
Weights) incur linear regret. We then prove that the Exponential Weights
algorithm played over a covering net of isotonic functions has a regret bounded
by and present a matching
lower bound on regret. We provide a computationally efficient version of this
algorithm. We also analyze the noise-free case, in which the revealed labels
are isotonic, and show that the bound can be improved to or even to
(when the labels are revealed in isotonic order). Finally, we extend the
analysis beyond squared loss and give bounds for entropic loss and absolute
loss.Comment: 25 page
Non-Uniform Stochastic Average Gradient Method for Training Conditional Random Fields
We apply stochastic average gradient (SAG) algorithms for training
conditional random fields (CRFs). We describe a practical implementation that
uses structure in the CRF gradient to reduce the memory requirement of this
linearly-convergent stochastic gradient method, propose a non-uniform sampling
scheme that substantially improves practical performance, and analyze the rate
of convergence of the SAGA variant under non-uniform sampling. Our experimental
results reveal that our method often significantly outperforms existing methods
in terms of the training objective, and performs as well or better than
optimally-tuned stochastic gradient methods in terms of test error.Comment: AI/Stats 2015, 24 page
- …