58 research outputs found
Dictionary Learning and Tensor Decomposition via the Sum-of-Squares Method
We give a new approach to the dictionary learning (also known as "sparse
coding") problem of recovering an unknown matrix (for ) from examples of the form where is a random vector in
with at most nonzero coordinates, and is a random
noise vector in with bounded magnitude. For the case ,
our algorithm recovers every column of within arbitrarily good constant
accuracy in time , in particular achieving
polynomial time if for any , and time if is (a sufficiently small) constant. Prior algorithms with
comparable assumptions on the distribution required the vector to be much
sparser---at most nonzero coordinates---and there were intrinsic
barriers preventing these algorithms from applying for denser .
We achieve this by designing an algorithm for noisy tensor decomposition that
can recover, under quite general conditions, an approximate rank-one
decomposition of a tensor , given access to a tensor that is
-close to in the spectral norm (when considered as a matrix). To our
knowledge, this is the first algorithm for tensor decomposition that works in
the constant spectral-norm noise regime, where there is no guarantee that the
local optima of and have similar structures.
Our algorithm is based on a novel approach to using and analyzing the Sum of
Squares semidefinite programming hierarchy (Parrilo 2000, Lasserre 2001), and
it can be viewed as an indication of the utility of this very general and
powerful tool for unsupervised learning problems
Provable Sparse Tensor Decomposition
We propose a novel sparse tensor decomposition method, namely Tensor
Truncated Power (TTP) method, that incorporates variable selection into the
estimation of decomposition components. The sparsity is achieved via an
efficient truncation step embedded in the tensor power iteration. Our method
applies to a broad family of high dimensional latent variable models, including
high dimensional Gaussian mixture and mixtures of sparse regressions. A
thorough theoretical investigation is further conducted. In particular, we show
that the final decomposition estimator is guaranteed to achieve a local
statistical rate, and further strengthen it to the global statistical rate by
introducing a proper initialization procedure. In high dimensional regimes, the
obtained statistical rate significantly improves those shown in the existing
non-sparse decomposition methods. The empirical advantages of TTP are confirmed
in extensive simulated results and two real applications of click-through rate
prediction and high-dimensional gene clustering.Comment: To Appear in JRSS-
Improving Efficiency and Scalability of Sum of Squares Optimization: Recent Advances and Limitations
It is well-known that any sum of squares (SOS) program can be cast as a
semidefinite program (SDP) of a particular structure and that therein lies the
computational bottleneck for SOS programs, as the SDPs generated by this
procedure are large and costly to solve when the polynomials involved in the
SOS programs have a large number of variables and degree. In this paper, we
review SOS optimization techniques and present two new methods for improving
their computational efficiency. The first method leverages the sparsity of the
underlying SDP to obtain computational speed-ups. Further improvements can be
obtained if the coefficients of the polynomials that describe the problem have
a particular sparsity pattern, called chordal sparsity. The second method
bypasses semidefinite programming altogether and relies instead on solving a
sequence of more tractable convex programs, namely linear and second order cone
programs. This opens up the question as to how well one can approximate the
cone of SOS polynomials by second order representable cones. In the last part
of the paper, we present some recent negative results related to this question.Comment: Tutorial for CDC 201
Decomposing Overcomplete 3rd Order Tensors using Sum-of-Squares Algorithms
Tensor rank and low-rank tensor decompositions have many applications in
learning and complexity theory. Most known algorithms use unfoldings of tensors
and can only handle rank up to for a -th order
tensor in . Previously no efficient algorithm can decompose
3rd order tensors when the rank is super-linear in the dimension. Using ideas
from sum-of-squares hierarchy, we give the first quasi-polynomial time
algorithm that can decompose a random 3rd order tensor decomposition when the
rank is as large as .
We also give a polynomial time algorithm for certifying the injective norm of
random low rank tensors. Our tensor decomposition algorithm exploits the
relationship between injective norm and the tensor components. The proof relies
on interesting tools for decoupling random variables to prove better matrix
concentration bounds, which can be useful in other settings
New Dependencies of Hierarchies in Polynomial Optimization
We compare four key hierarchies for solving Constrained Polynomial
Optimization Problems (CPOP): Sum of Squares (SOS), Sum of Diagonally Dominant
Polynomials (SDSOS), Sum of Nonnegative Circuits (SONC), and the Sherali Adams
(SA) hierarchies. We prove a collection of dependencies among these hierarchies
both for general CPOPs and for optimization problems on the Boolean hypercube.
Key results include for the general case that the SONC and SOS hierarchy are
polynomially incomparable, while SDSOS is contained in SONC. A direct
consequence is the non-existence of a Putinar-like Positivstellensatz for
SDSOS. On the Boolean hypercube, we show as a main result that Schm\"udgen-like
versions of the hierarchies SDSOS*, SONC*, and SA* are polynomially equivalent.
Moreover, we show that SA* is contained in any Schm\"udgen-like hierarchy that
provides a O(n) degree bound.Comment: 26 pages, 4 figure
- …