2,570 research outputs found
Randomized Dimension Reduction on Massive Data
Scalability of statistical estimators is of increasing importance in modern
applications and dimension reduction is often used to extract relevant
information from data. A variety of popular dimension reduction approaches can
be framed as symmetric generalized eigendecomposition problems. In this paper
we outline how taking into account the low rank structure assumption implicit
in these dimension reduction approaches provides both computational and
statistical advantages. We adapt recent randomized low-rank approximation
algorithms to provide efficient solutions to three dimension reduction methods:
Principal Component Analysis (PCA), Sliced Inverse Regression (SIR), and
Localized Sliced Inverse Regression (LSIR). A key observation in this paper is
that randomization serves a dual role, improving both computational and
statistical performance. This point is highlighted in our experiments on real
and simulated data.Comment: 31 pages, 6 figures, Key Words:dimension reduction, generalized
eigendecompositon, low-rank, supervised, inverse regression, random
projections, randomized algorithms, Krylov subspace method
Adaptive Higher-order Spectral Estimators
Many applications involve estimation of a signal matrix from a noisy data
matrix. In such cases, it has been observed that estimators that shrink or
truncate the singular values of the data matrix perform well when the signal
matrix has approximately low rank. In this article, we generalize this approach
to the estimation of a tensor of parameters from noisy tensor data. We develop
new classes of estimators that shrink or threshold the mode-specific singular
values from the higher-order singular value decomposition. These classes of
estimators are indexed by tuning parameters, which we adaptively choose from
the data by minimizing Stein's unbiased risk estimate. In particular, this
procedure provides a way to estimate the multilinear rank of the underlying
signal tensor. Using simulation studies under a variety of conditions, we show
that our estimators perform well when the mean tensor has approximately low
multilinear rank, and perform competitively when the signal tensor does not
have approximately low multilinear rank. We illustrate the use of these methods
in an application to multivariate relational data.Comment: 29 pages, 3 figure
- …