7,364 research outputs found
Gradient Descent Only Converges to Minimizers: Non-Isolated Critical Points and Invariant Regions
Given a non-convex twice differentiable cost function f, we prove that the
set of initial conditions so that gradient descent converges to saddle points
where \nabla^2 f has at least one strictly negative eigenvalue has (Lebesgue)
measure zero, even for cost functions f with non-isolated critical points,
answering an open question in [Lee, Simchowitz, Jordan, Recht, COLT2016].
Moreover, this result extends to forward-invariant convex subspaces, allowing
for weak (non-globally Lipschitz) smoothness assumptions. Finally, we produce
an upper bound on the allowable step-size.Comment: 2 figure
The Computational Power of Optimization in Online Learning
We consider the fundamental problem of prediction with expert advice where
the experts are "optimizable": there is a black-box optimization oracle that
can be used to compute, in constant time, the leading expert in retrospect at
any point in time. In this setting, we give a novel online algorithm that
attains vanishing regret with respect to experts in total
computation time. We also give a lower bound showing
that this running time cannot be improved (up to log factors) in the oracle
model, thereby exhibiting a quadratic speedup as compared to the standard,
oracle-free setting where the required time for vanishing regret is
. These results demonstrate an exponential gap between
the power of optimization in online learning and its power in statistical
learning: in the latter, an optimization oracle---i.e., an efficient empirical
risk minimizer---allows to learn a finite hypothesis class of size in time
. We also study the implications of our results to learning in
repeated zero-sum games, in a setting where the players have access to oracles
that compute, in constant time, their best-response to any mixed strategy of
their opponent. We show that the runtime required for approximating the minimax
value of the game in this setting is , yielding
again a quadratic improvement upon the oracle-free setting, where
is known to be tight
- …