14,071 research outputs found
Passive Learning with Target Risk
In this paper we consider learning in passive setting but with a slight
modification. We assume that the target expected loss, also referred to as
target risk, is provided in advance for learner as prior knowledge. Unlike most
studies in the learning theory that only incorporate the prior knowledge into
the generalization bounds, we are able to explicitly utilize the target risk in
the learning process. Our analysis reveals a surprising result on the sample
complexity of learning: by exploiting the target risk in the learning
algorithm, we show that when the loss function is both strongly convex and
smooth, the sample complexity reduces to \O(\log (\frac{1}{\epsilon})), an
exponential improvement compared to the sample complexity
\O(\frac{1}{\epsilon}) for learning with strongly convex loss functions.
Furthermore, our proof is constructive and is based on a computationally
efficient stochastic optimization algorithm for such settings which demonstrate
that the proposed algorithm is practically useful
Minimizing Finite Sums with the Stochastic Average Gradient
We propose the stochastic average gradient (SAG) method for optimizing the
sum of a finite number of smooth convex functions. Like stochastic gradient
(SG) methods, the SAG method's iteration cost is independent of the number of
terms in the sum. However, by incorporating a memory of previous gradient
values the SAG method achieves a faster convergence rate than black-box SG
methods. The convergence rate is improved from O(1/k^{1/2}) to O(1/k) in
general, and when the sum is strongly-convex the convergence rate is improved
from the sub-linear O(1/k) to a linear convergence rate of the form O(p^k) for
p \textless{} 1. Further, in many cases the convergence rate of the new method
is also faster than black-box deterministic gradient methods, in terms of the
number of gradient evaluations. Numerical experiments indicate that the new
algorithm often dramatically outperforms existing SG and deterministic gradient
methods, and that the performance may be further improved through the use of
non-uniform sampling strategies.Comment: Revision from January 2015 submission. Major changes: updated
literature follow and discussion of subsequent work, additional Lemma showing
the validity of one of the formulas, somewhat simplified presentation of
Lyapunov bound, included code needed for checking proofs rather than the
polynomials generated by the code, added error regions to the numerical
experiment
- …