Search CORE

832 research outputs found

Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods

Author: Anandkumar Anima
Janzamin Majid
Sedghi Hanie
Publication venue
Publication date: 01/01/2015
Field of study

Training neural networks is a challenging non-convex optimization problem, and backpropagation or gradient descent can get stuck in spurious local optima. We propose a novel algorithm based on tensor decomposition for guaranteed training of two-layer neural networks. We provide risk bounds for our proposed method, with a polynomial sample complexity in the relevant parameters, such as input dimension and number of neurons. While learning arbitrary target functions is NP-hard, we provide transparent conditions on the function and the input for learnability. Our training method is based on tensor decomposition, which provably converges to the global optimum, under a set of mild non-degeneracy conditions. It consists of simple embarrassingly parallel linear and multi-linear operations, and is competitive with standard stochastic gradient descent (SGD), in terms of computational complexity. Thus, we propose a computationally efficient method with guaranteed risk bounds for training neural networks with one hidden layer.Comment: The tensor decomposition analysis is expanded, and the analysis of ridge regression is added for recovering the parameters of last layer of neural networ

arXiv.org e-Print Archive

CiteSeerX

eScholarship - University of California

Guaranteed Non-Orthogonal Tensor Decomposition via Alternating Rank- $1$ Updates

Author: Anandkumar Animashree
Ge Rong
Janzamin Majid
Publication venue
Publication date: 01/01/2014
Field of study

In this paper, we provide local and global convergence guarantees for recovering CP (Candecomp/Parafac) tensor decomposition. The main step of the proposed algorithm is a simple alternating rank-

1

update which is the alternating version of the tensor power iteration adapted for asymmetric tensors. Local convergence guarantees are established for third order tensors of rank

k

d

dimensions, when

k=o \bigl( d^{1.5} \bigr)

and the tensor components are incoherent. Thus, we can recover overcomplete tensor decomposition. We also strengthen the results to global convergence guarantees under stricter rank condition

k \le \beta d

(for arbitrary constant

\beta > 1

) through a simple initialization procedure where the algorithm is initialized by top singular vectors of random tensor slices. Furthermore, the approximate local convergence guarantees for

p

-th order tensors are also provided under rank condition

k=o \bigl( d^{p/2} \bigr)

. The guarantees also include tight perturbation analysis given noisy tensor.Comment: We have added an additional sub-algorithm to remove the (approximate) residual error left after the tensor power iteratio

arXiv.org e-Print Archive

eScholarship - University of California

Polynomial-time Tensor Decompositions with Sum-of-Squares

Author: Ma Tengyu
Shi Jonathan
Steurer David
Publication venue
Publication date: 06/10/2016
Field of study

We give new algorithms based on the sum-of-squares method for tensor decomposition. Our results improve the best known running times from quasi-polynomial to polynomial for several problems, including decomposing random overcomplete 3-tensors and learning overcomplete dictionaries with constant relative sparsity. We also give the first robust analysis for decomposing overcomplete 4-tensors in the smoothed analysis model. A key ingredient of our analysis is to establish small spectral gaps in moment matrices derived from solutions to sum-of-squares relaxations. To enable this analysis we augment sum-of-squares relaxations with spectral analogs of maximum entropy constraints.Comment: to appear in FOCS 201

arXiv.org e-Print Archive

Crossref

Dictionary Learning and Tensor Decomposition via the Sum-of-Squares Method

Author: Anandkumar Anima
Barak Boaz
Bhaskara Aditya
Henrion Didier
Parrilo Pablo A
Shor NZ
Theodoros Evgeniou Andreas Argyriou
Publication venue
Publication date: 07/11/2014
Field of study

We give a new approach to the dictionary learning (also known as "sparse coding") problem of recovering an unknown

n\times m

matrix

A

(for

m \geq n

) from examples of the form

y = Ax + e,

where

x

is a random vector in

\mathbb R^m

with at most

\tau m

nonzero coordinates, and

e

is a random noise vector in

\mathbb R^n

with bounded magnitude. For the case

m=O(n)

, our algorithm recovers every column of

A

within arbitrarily good constant accuracy in time

m^{O(\log m/\log(\tau^{-1}))}

, in particular achieving polynomial time if

\tau = m^{-\delta}

for any

\delta>0

, and time

m^{O(\log m)}

\tau

is (a sufficiently small) constant. Prior algorithms with comparable assumptions on the distribution required the vector

x

to be much sparser---at most

\sqrt{n}

nonzero coordinates---and there were intrinsic barriers preventing these algorithms from applying for denser

x

. We achieve this by designing an algorithm for noisy tensor decomposition that can recover, under quite general conditions, an approximate rank-one decomposition of a tensor

T

, given access to a tensor

T'

that is

\tau

-close to

T

in the spectral norm (when considered as a matrix). To our knowledge, this is the first algorithm for tensor decomposition that works in the constant spectral-norm noise regime, where there is no guarantee that the local optima of

T

and

T'

have similar structures. Our algorithm is based on a novel approach to using and analyzing the Sum of Squares semidefinite programming hierarchy (Parrilo 2000, Lasserre 2001), and it can be viewed as an indication of the utility of this very general and powerful tool for unsupervised learning problems

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Crossref

Convolutional Dictionary Learning through Tensor Factorization

Author: Anandkumar Animashree
Huang Furong
Publication venue
Publication date: 01/01/2015
Field of study

Tensor methods have emerged as a powerful paradigm for consistent learning of many latent variable models such as topic models, independent component analysis and dictionary learning. Model parameters are estimated via CP decomposition of the observed higher order input moments. However, in many domains, additional invariances such as shift invariances exist, enforced via models such as convolutional dictionary learning. In this paper, we develop novel tensor decomposition algorithms for parameter estimation of convolutional models. Our algorithm is based on the popular alternating least squares method, but with efficient projections onto the space of stacked circulant matrices. Our method is embarrassingly parallel and consists of simple operations such as fast Fourier transforms and matrix multiplications. Our algorithm converges to the dictionary much faster and more accurately compared to the alternating minimization over filters and activation maps

arXiv.org e-Print Archive

eScholarship - University of California

Caltech Authors