21 research outputs found
Sparse and Low-rank Tensor Estimation via Cubic Sketchings
In this paper, we propose a general framework for sparse and low-rank tensor
estimation from cubic sketchings. A two-stage non-convex implementation is
developed based on sparse tensor decomposition and thresholded gradient
descent, which ensures exact recovery in the noiseless case and stable recovery
in the noisy case with high probability. The non-asymptotic analysis sheds
light on an interplay between optimization error and statistical error. The
proposed procedure is shown to be rate-optimal under certain conditions. As a
technical by-product, novel high-order concentration inequalities are derived
for studying high-moment sub-Gaussian tensors. An interesting tensor
formulation illustrates the potential application to high-order interaction
pursuit in high-dimensional linear regression.Comment: Accepted at IEEE Transactions on Information Theor
Sparse Tensor Additive Regression
Tensors are becoming prevalent in modern applications such as medical imaging
and digital marketing. In this paper, we propose a sparse tensor additive
regression (STAR) that models a scalar response as a flexible nonparametric
function of tensor covariates. The proposed model effectively exploits the
sparse and low-rank structures in the tensor additive regression. We formulate
the parameter estimation as a non-convex optimization problem, and propose an
efficient penalized alternating minimization algorithm. We establish a
non-asymptotic error bound for the estimator obtained from each iteration of
the proposed algorithm, which reveals an interplay between the optimization
error and the statistical rate of convergence. We demonstrate the efficacy of
STAR through extensive comparative simulation studies, and an application to
the click-through-rate prediction in online advertising.Comment: Accepted by Journal of Machine Learning Researc
Tensor Robust Principal Component Analysis: Better recovery with atomic norm regularization
This paper studies tensor-based Robust Principal Component Analysis (RPCA)
using atomic-norm regularization. Given the superposition of a sparse and a
low-rank tensor, we present conditions under which it is possible to exactly
recover the sparse and low-rank components. Our results improve on existing
performance guarantees for tensor-RPCA, including those for matrix RPCA. Our
guarantees also show that atomic-norm regularization provides better recovery
for tensor-structured data sets than other approaches based on matricization.
In addition to these performance guarantees, we study a nonconvex formulation
of the tensor atomic-norm and identify a class of local minima of this
nonconvex program that are globally optimal. We demonstrate the strong
performance of our approach in numerical experiments, where we show that our
nonconvex model reliably recovers tensors with ranks larger than all of their
side lengths, significantly outperforming other algorithms that require
matricization.Comment: 39 pages, 3 figures, 3 table
ISLET: Fast and Optimal Low-rank Tensor Regression via Importance Sketching
In this paper, we develop a novel procedure for low-rank tensor regression,
namely \emph{\underline{I}mportance \underline{S}ketching \underline{L}ow-rank
\underline{E}stimation for \underline{T}ensors} (ISLET). The central idea
behind ISLET is \emph{importance sketching}, i.e., carefully designed sketches
based on both the responses and low-dimensional structure of the parameter of
interest. We show that the proposed method is sharply minimax optimal in terms
of the mean-squared error under low-rank Tucker assumptions and under
randomized Gaussian ensemble design. In addition, if a tensor is low-rank with
group sparsity, our procedure also achieves minimax optimality. Further, we
show through numerical study that ISLET achieves comparable or better
mean-squared error performance to existing state-of-the-art methods while
having substantial storage and run-time advantages including capabilities for
parallel and distributed computing. In particular, our procedure performs
reliable estimation with tensors of dimension and is or
orders of magnitude faster than baseline methods
Tensor Methods for Additive Index Models under Discordance and Heterogeneity
Motivated by the sampling problems and heterogeneity issues common in high-
dimensional big datasets, we consider a class of discordant additive index
models. We propose method of moments based procedures for estimating the
indices of such discordant additive index models in both low and
high-dimensional settings. Our estimators are based on factorizing certain
moment tensors and are also applicable in the overcomplete setting, where the
number of indices is more than the dimensionality of the datasets. Furthermore,
we provide rates of convergence of our estimator in both high and
low-dimensional setting. Establishing such results requires deriving tensor
operator norm concentration inequalities that might be of independent interest.
Finally, we provide simulation results supporting our theory. Our contributions
extend the applicability of tensor methods for novel models in addition to
making progress on understanding theoretical properties of such tensor methods
Tensor Regression Using Low-rank and Sparse Tucker Decompositions
This paper studies a tensor-structured linear regression model with a scalar
response variable and tensor-structured predictors, such that the regression
parameters form a tensor of order (i.e., a -fold multiway array) in
. It focuses on the task
of estimating the regression tensor from realizations of the response
variable and the predictors where . Despite
the seeming ill-posedness of this problem, it can still be solved if the
parameter tensor belongs to the space of sparse, low Tucker-rank tensors.
Accordingly, the estimation procedure is posed as a non-convex optimization
program over the space of sparse, low Tucker-rank tensors, and a tensor variant
of projected gradient descent is proposed to solve the resulting non-convex
problem. In addition, mathematical guarantees are provided that establish the
proposed method linearly converges to an appropriate solution under a certain
set of conditions. Further, an upper bound on sample complexity of tensor
parameter estimation for the model under consideration is characterized for the
special case when the individual (scalar) predictors independently draw values
from a sub-Gaussian distribution. The sample complexity bound is shown to have
a polylogarithmic dependence on and, orderwise, it matches the bound one can obtain from a heuristic
parameter counting argument. Finally, numerical experiments demonstrate the
efficacy of the proposed tensor model and estimation method on a synthetic
dataset and a collection of neuroimaging datasets pertaining to attention
deficit hyperactivity disorder. Specifically, the proposed method exhibits
better sample complexities on both synthetic and real datasets, demonstrating
the usefulness of the model and the method in settings where .Comment: 28 pages, 5 figures, 2 tables; preprint of a journal paper published
in SIAM Journal on Mathematics of Data Scienc
Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview
Substantial progress has been made recently on developing provably accurate
and efficient algorithms for low-rank matrix factorization via nonconvex
optimization. While conventional wisdom often takes a dim view of nonconvex
optimization algorithms due to their susceptibility to spurious local minima,
simple iterative methods such as gradient descent have been remarkably
successful in practice. The theoretical footings, however, had been largely
lacking until recently.
In this tutorial-style overview, we highlight the important role of
statistical models in enabling efficient nonconvex optimization with
performance guarantees. We review two contrasting approaches: (1) two-stage
algorithms, which consist of a tailored initialization step followed by
successive refinement; and (2) global landscape analysis and
initialization-free algorithms. Several canonical matrix factorization problems
are discussed, including but not limited to matrix sensing, phase retrieval,
matrix completion, blind deconvolution, robust principal component analysis,
phase synchronization, and joint alignment. Special care is taken to illustrate
the key technical insights underlying their analyses. This article serves as a
testament that the integrated consideration of optimization and statistics
leads to fruitful research findings.Comment: Invited overview articl
A Sharp Blockwise Tensor Perturbation Bound for Orthogonal Iteration
In this paper, we develop novel perturbation bounds for the high-order
orthogonal iteration (HOOI) [DLDMV00b]. Under mild regularity conditions, we
establish blockwise tensor perturbation bounds for HOOI with guarantees for
both tensor reconstruction in Hilbert-Schmidt norm \|\widehat{\bcT} - \bcT
\|_{\tHS} and mode- singular subspace estimation in Schatten- norm \|
\sin \Theta (\widehat{\U}_k, \U_k) \|_q for any . We show the upper
bounds of mode- singular subspace estimation are unilateral and converge
linearly to a quantity characterized by blockwise errors of the perturbation
and signal strength. For the tensor reconstruction error bound, we express the
bound through a simple quantity , which depends only on perturbation and
the multilinear rank of the underlying signal. Rate matching deterministic
lower bound for tensor reconstruction, which demonstrates the optimality of
HOOI, is also provided. Furthermore, we prove that one-step HOOI (i.e., HOOI
with only a single iteration) is also optimal in terms of tensor reconstruction
and can be used to lower the computational cost. The perturbation results are
also extended to the case that only partial modes of \bcT have low-rank
structure. We support our theoretical results by extensive numerical studies.
Finally, we apply the novel perturbation bounds of HOOI on two applications,
tensor denoising and tensor co-clustering, from machine learning and
statistics, which demonstrates the superiority of the new perturbation results
Optimal Sparse Singular Value Decomposition for High-dimensional High-order Data
In this article, we consider the sparse tensor singular value decomposition,
which aims for dimension reduction on high-dimensional high-order data with
certain sparsity structure. A method named Sparse Tensor Alternating
Thresholding for Singular Value Decomposition (STAT-SVD) is proposed. The
proposed procedure features a novel double projection \& thresholding scheme,
which provides a sharp criterion for thresholding in each iteration. Compared
with regular tensor SVD model, STAT-SVD permits more robust estimation under
weaker assumptions. Both the upper and lower bounds for estimation accuracy are
developed. The proposed procedure is shown to be minimax rate-optimal in a
general class of situations. Simulation studies show that STAT-SVD performs
well under a variety of configurations. We also illustrate the merits of the
proposed procedure on a longitudinal tensor dataset on European country
mortality rates.Comment: 73 page
Heteroskedastic PCA: Algorithm, Optimality, and Applications
Principal component analysis (PCA) and singular value decomposition (SVD) are
widely used in statistics, econometrics, machine learning, and applied
mathematics. It has been well studied in the case of homoskedastic noise, where
the noise levels of the contamination are homogeneous.
In this paper, we consider PCA and SVD in the presence of heteroskedastic
noise, which is a commonly used model for factor analysis and arises naturally
in a range of applications. We introduce a general framework for
heteroskedastic PCA and propose an algorithm called HeteroPCA, which involves
iteratively imputing the diagonal entries to remove the bias due to
heteroskedasticity. This procedure is computationally efficient and provably
optimal under the generalized spiked covariance model. A key technical step is
a deterministic robust perturbation analysis on singular subspaces, which can
be of independent interest. The effectiveness of the proposed algorithm is
demonstrated in a suite of applications, including heteroskedastic low-rank
matrix denoising, Poisson PCA, and SVD based on heteroskedastic and incomplete
data