1,062 research outputs found
Challenges of Big Data Analysis
Big Data bring new opportunities to modern society and challenges to data
scientists. On one hand, Big Data hold great promises for discovering subtle
population patterns and heterogeneities that are not possible with small-scale
data. On the other hand, the massive sample size and high dimensionality of Big
Data introduce unique computational and statistical challenges, including
scalability and storage bottleneck, noise accumulation, spurious correlation,
incidental endogeneity, and measurement errors. These challenges are
distinguished and require new computational and statistical paradigm. This
article give overviews on the salient features of Big Data and how these
features impact on paradigm change on statistical and computational methods as
well as computing architectures. We also provide various new perspectives on
the Big Data analysis and computation. In particular, we emphasis on the
viability of the sparsest solution in high-confidence set and point out that
exogeneous assumptions in most statistical methods for Big Data can not be
validated due to incidental endogeneity. They can lead to wrong statistical
inferences and consequently wrong scientific conclusions
Tensor-on-tensor regression
We propose a framework for the linear prediction of a multi-way array (i.e.,
a tensor) from another multi-way array of arbitrary dimension, using the
contracted tensor product. This framework generalizes several existing
approaches, including methods to predict a scalar outcome from a tensor, a
matrix from a matrix, or a tensor from a scalar. We describe an approach that
exploits the multiway structure of both the predictors and the outcomes by
restricting the coefficients to have reduced CP-rank. We propose a general and
efficient algorithm for penalized least-squares estimation, which allows for a
ridge (L_2) penalty on the coefficients. The objective is shown to give the
mode of a Bayesian posterior, which motivates a Gibbs sampling algorithm for
inference. We illustrate the approach with an application to facial image data.
An R package is available at https://github.com/lockEF/MultiwayRegression .Comment: 33 pages, 3 figure
- …