81 research outputs found
PCA by Determinant Optimization has no Spurious Local Optima
Principal component analysis (PCA) is an indispensable tool in many learning
tasks that finds the best linear representation for data. Classically,
principal components of a dataset are interpreted as the directions that
preserve most of its "energy", an interpretation that is theoretically
underpinned by the celebrated Eckart-Young-Mirsky Theorem. There are yet other
ways of interpreting PCA that are rarely exploited in practice, largely because
it is not known how to reliably solve the corresponding non-convex optimisation
programs. In this paper, we consider one such interpretation of principal
components as the directions that preserve most of the "volume" of the dataset.
Our main contribution is a theorem that shows that the corresponding non-convex
program has no spurious local optima. We apply a number of solvers for
empirical confirmation
Diffusion Approximations for Online Principal Component Estimation and Global Convergence
In this paper, we propose to adopt the diffusion approximation tools to study
the dynamics of Oja's iteration which is an online stochastic gradient descent
method for the principal component analysis. Oja's iteration maintains a
running estimate of the true principal component from streaming data and enjoys
less temporal and spatial complexities. We show that the Oja's iteration for
the top eigenvector generates a continuous-state discrete-time Markov chain
over the unit sphere. We characterize the Oja's iteration in three phases using
diffusion approximation and weak convergence tools. Our three-phase analysis
further provides a finite-sample error bound for the running estimate, which
matches the minimax information lower bound for principal component analysis
under the additional assumption of bounded samples.Comment: Appeared in NIPS 201
Efficient approaches for escaping higher order saddle points in non-convex optimization
Local search heuristics for non-convex optimizations are popular in applied
machine learning. However, in general it is hard to guarantee that such
algorithms even converge to a local minimum, due to the existence of
complicated saddle point structures in high dimensions. Many functions have
degenerate saddle points such that the first and second order derivatives
cannot distinguish them with local optima. In this paper we use higher order
derivatives to escape these saddle points: we design the first efficient
algorithm guaranteed to converge to a third order local optimum (while existing
techniques are at most second order). We also show that it is NP-hard to extend
this further to finding fourth order local optima
- …