32,906 research outputs found
Challenges of Big Data Analysis
Big Data bring new opportunities to modern society and challenges to data
scientists. On one hand, Big Data hold great promises for discovering subtle
population patterns and heterogeneities that are not possible with small-scale
data. On the other hand, the massive sample size and high dimensionality of Big
Data introduce unique computational and statistical challenges, including
scalability and storage bottleneck, noise accumulation, spurious correlation,
incidental endogeneity, and measurement errors. These challenges are
distinguished and require new computational and statistical paradigm. This
article give overviews on the salient features of Big Data and how these
features impact on paradigm change on statistical and computational methods as
well as computing architectures. We also provide various new perspectives on
the Big Data analysis and computation. In particular, we emphasis on the
viability of the sparsest solution in high-confidence set and point out that
exogeneous assumptions in most statistical methods for Big Data can not be
validated due to incidental endogeneity. They can lead to wrong statistical
inferences and consequently wrong scientific conclusions
Generic continuous spectrum for multi-dimensional quasi periodic Schr\"odinger operators with rough potentials
We study the multi-dimensional operator , where is the shift of the torus
\T^d. When , we show the spectrum of is almost surely purely
continuous for a.e. and generic continuous potentials. When ,
the same result holds for frequencies under an explicit arithmetic criterion.
We also show that general multi-dimensional operators with measurable
potentials do not have eigenvalue for generic
Robust Inference of Risks of Large Portfolios
We propose a bootstrap-based robust high-confidence level upper bound (Robust
H-CLUB) for assessing the risks of large portfolios. The proposed approach
exploits rank-based and quantile-based estimators, and can be viewed as a
robust extension of the H-CLUB method (Fan et al., 2015). Such an extension
allows us to handle possibly misspecified models and heavy-tailed data. Under
mixing conditions, we analyze the proposed approach and demonstrate its
advantage over the H-CLUB. We further provide thorough numerical results to
back up the developed theory. We also apply the proposed method to analyze a
stock market dataset.Comment: 45 pages, 2 figure
Estimating False Discovery Proportion Under Arbitrary Covariance Dependence
Multiple hypothesis testing is a fundamental problem in high dimensional
inference, with wide applications in many scientific fields. In genome-wide
association studies, tens of thousands of tests are performed simultaneously to
find if any SNPs are associated with some traits and those tests are
correlated. When test statistics are correlated, false discovery control
becomes very challenging under arbitrary dependence. In the current paper, we
propose a novel method based on principal factor approximation, which
successfully subtracts the common dependence and weakens significantly the
correlation structure, to deal with an arbitrary dependence structure. We
derive an approximate expression for false discovery proportion (FDP) in large
scale multiple testing when a common threshold is used and provide a consistent
estimate of realized FDP. This result has important applications in controlling
FDR and FDP. Our estimate of realized FDP compares favorably with Efron
(2007)'s approach, as demonstrated in the simulated examples. Our approach is
further illustrated by some real data applications. We also propose a
dependence-adjusted procedure, which is more powerful than the fixed threshold
procedure.Comment: 51 pages, 7 figures. arXiv admin note: substantial text overlap with
arXiv:1012.439
- …