9,232 research outputs found
Tensor decompositions for learning latent variable models
This work considers a computationally and statistically efficient parameter
estimation method for a wide class of latent variable models---including
Gaussian mixture models, hidden Markov models, and latent Dirichlet
allocation---which exploits a certain tensor structure in their low-order
observable moments (typically, of second- and third-order). Specifically,
parameter estimation is reduced to the problem of extracting a certain
(orthogonal) decomposition of a symmetric tensor derived from the moments; this
decomposition can be viewed as a natural generalization of the singular value
decomposition for matrices. Although tensor decompositions are generally
intractable to compute, the decomposition of these specially structured tensors
can be efficiently obtained by a variety of approaches, including power
iterations and maximization approaches (similar to the case of matrices). A
detailed analysis of a robust tensor power method is provided, establishing an
analogue of Wedin's perturbation theorem for the singular vectors of matrices.
This implies a robust and computationally tractable estimation approach for
several popular latent variable models
Guaranteed Non-Orthogonal Tensor Decomposition via Alternating Rank- Updates
In this paper, we provide local and global convergence guarantees for
recovering CP (Candecomp/Parafac) tensor decomposition. The main step of the
proposed algorithm is a simple alternating rank- update which is the
alternating version of the tensor power iteration adapted for asymmetric
tensors. Local convergence guarantees are established for third order tensors
of rank in dimensions, when and the tensor
components are incoherent. Thus, we can recover overcomplete tensor
decomposition. We also strengthen the results to global convergence guarantees
under stricter rank condition (for arbitrary constant ) through a simple initialization procedure where the algorithm is
initialized by top singular vectors of random tensor slices. Furthermore, the
approximate local convergence guarantees for -th order tensors are also
provided under rank condition . The guarantees also
include tight perturbation analysis given noisy tensor.Comment: We have added an additional sub-algorithm to remove the (approximate)
residual error left after the tensor power iteratio
Alternating least squares as moving subspace correction
In this note we take a new look at the local convergence of alternating
optimization methods for low-rank matrices and tensors. Our abstract
interpretation as sequential optimization on moving subspaces yields insightful
reformulations of some known convergence conditions that focus on the interplay
between the contractivity of classical multiplicative Schwarz methods with
overlapping subspaces and the curvature of low-rank matrix and tensor
manifolds. While the verification of the abstract conditions in concrete
scenarios remains open in most cases, we are able to provide an alternative and
conceptually simple derivation of the asymptotic convergence rate of the
two-sided block power method of numerical algebra for computing the dominant
singular subspaces of a rectangular matrix. This method is equivalent to an
alternating least squares method applied to a distance function. The
theoretical results are illustrated and validated by numerical experiments.Comment: 20 pages, 4 figure
Graph Kernels
We present a unified framework to study graph kernels, special cases of which include the random
walk (Gärtner et al., 2003; Borgwardt et al., 2005) and marginalized (Kashima et al., 2003, 2004;
Mahé et al., 2004) graph kernels. Through reduction to a Sylvester equation we improve the time
complexity of kernel computation between unlabeled graphs with n vertices from O(n^6) to O(n^3).
We find a spectral decomposition approach even more efficient when computing entire kernel matrices.
For labeled graphs we develop conjugate gradient and fixed-point methods that take O(dn^3)
time per iteration, where d is the size of the label set. By extending the necessary linear algebra to
Reproducing Kernel Hilbert Spaces (RKHS) we obtain the same result for d-dimensional edge kernels,
and O(n^4) in the infinite-dimensional case; on sparse graphs these algorithms only take O(n^2)
time per iteration in all cases. Experiments on graphs from bioinformatics and other application
domains show that these techniques can speed up computation of the kernel by an order of magnitude
or more. We also show that certain rational kernels (Cortes et al., 2002, 2003, 2004) when
specialized to graphs reduce to our random walk graph kernel. Finally, we relate our framework to
R-convolution kernels (Haussler, 1999) and provide a kernel that is close to the optimal assignment
kernel of Fröhlich et al. (2006) yet provably positive semi-definite
Finding a low-rank basis in a matrix subspace
For a given matrix subspace, how can we find a basis that consists of
low-rank matrices? This is a generalization of the sparse vector problem. It
turns out that when the subspace is spanned by rank-1 matrices, the matrices
can be obtained by the tensor CP decomposition. For the higher rank case, the
situation is not as straightforward. In this work we present an algorithm based
on a greedy process applicable to higher rank problems. Our algorithm first
estimates the minimum rank by applying soft singular value thresholding to a
nuclear norm relaxation, and then computes a matrix with that rank using the
method of alternating projections. We provide local convergence results, and
compare our algorithm with several alternative approaches. Applications include
data compression beyond the classical truncated SVD, computing accurate
eigenvectors of a near-multiple eigenvalue, image separation and graph
Laplacian eigenproblems
Dynamic Tensor Clustering
Dynamic tensor data are becoming prevalent in numerous applications. Existing
tensor clustering methods either fail to account for the dynamic nature of the
data, or are inapplicable to a general-order tensor. Also there is often a gap
between statistical guarantee and computational efficiency for existing tensor
clustering solutions. In this article, we aim to bridge this gap by proposing a
new dynamic tensor clustering method, which takes into account both sparsity
and fusion structures, and enjoys strong statistical guarantees as well as high
computational efficiency. Our proposal is based upon a new structured tensor
factorization that encourages both sparsity and smoothness in parameters along
the specified tensor modes. Computationally, we develop a highly efficient
optimization algorithm that benefits from substantial dimension reduction. In
theory, we first establish a non-asymptotic error bound for the estimator from
the structured tensor factorization. Built upon this error bound, we then
derive the rate of convergence of the estimated cluster centers, and show that
the estimated clusters recover the true cluster structures with a high
probability. Moreover, our proposed method can be naturally extended to
co-clustering of multiple modes of the tensor data. The efficacy of our
approach is illustrated via simulations and a brain dynamic functional
connectivity analysis from an Autism spectrum disorder study.Comment: Accepted at Journal of the American Statistical Associatio
- …