8 research outputs found
An Implicit Form of Krasulina's k-PCA Update without the Orthonormality Constraint
We shed new insights on the two commonly used updates for the online -PCA
problem, namely, Krasulina's and Oja's updates. We show that Krasulina's update
corresponds to a projected gradient descent step on the Stiefel manifold of the
orthonormal -frames, while Oja's update amounts to a gradient descent step
using the unprojected gradient. Following these observations, we derive a more
\emph{implicit} form of Krasulina's -PCA update, i.e. a version that uses
the information of the future gradient as much as possible. Most interestingly,
our implicit Krasulina update avoids the costly QR-decomposition step by
bypassing the orthonormality constraint. We show that the new update in fact
corresponds to an online EM step applied to a probabilistic -PCA model. The
probabilistic view of the updates allows us to combine multiple models in a
distributed setting. We show experimentally that the implicit Krasulina update
yields superior convergence while being significantly faster. We also give
strong evidence that the new update can benefit from parallelism and is more
stable w.r.t. tuning of the learning rate
A Generalized Online Mirror Descent with Applications to Classification and Regression
Online learning algorithms are fast, memory-efficient, easy to implement, and
applicable to many prediction problems, including classification, regression,
and ranking. Several online algorithms were proposed in the past few decades,
some based on additive updates, like the Perceptron, and some on multiplicative
updates, like Winnow. A unifying perspective on the design and the analysis of
online algorithms is provided by online mirror descent, a general prediction
strategy from which most first-order algorithms can be obtained as special
cases. We generalize online mirror descent to time-varying regularizers with
generic updates. Unlike standard mirror descent, our more general formulation
also captures second order algorithms, algorithms for composite losses and
algorithms for adaptive filtering. Moreover, we recover, and sometimes improve,
known regret bounds as special cases of our analysis using specific
regularizers. Finally, we show the power of our approach by deriving a new
second order algorithm with a regret bound invariant with respect to arbitrary
rescalings of individual features