50,512 research outputs found
RANK: Large-Scale Inference with Graphical Nonlinear Knockoffs
Power and reproducibility are key to enabling refined scientific discoveries
in contemporary big data applications with general high-dimensional nonlinear
models. In this paper, we provide theoretical foundations on the power and
robustness for the model-free knockoffs procedure introduced recently in
Cand\`{e}s, Fan, Janson and Lv (2016) in high-dimensional setting when the
covariate distribution is characterized by Gaussian graphical model. We
establish that under mild regularity conditions, the power of the oracle
knockoffs procedure with known covariate distribution in high-dimensional
linear models is asymptotically one as sample size goes to infinity. When
moving away from the ideal case, we suggest the modified model-free knockoffs
method called graphical nonlinear knockoffs (RANK) to accommodate the unknown
covariate distribution. We provide theoretical justifications on the robustness
of our modified procedure by showing that the false discovery rate (FDR) is
asymptotically controlled at the target level and the power is asymptotically
one with the estimated covariate distribution. To the best of our knowledge,
this is the first formal theoretical result on the power for the knockoffs
procedure. Simulation results demonstrate that compared to existing approaches,
our method performs competitively in both FDR control and power. A real data
set is analyzed to further assess the performance of the suggested knockoffs
procedure.Comment: 37 pages, 6 tables, 9 pages supplementary materia
Forecasting of commercial sales with large scale Gaussian Processes
This paper argues that there has not been enough discussion in the field of
applications of Gaussian Process for the fast moving consumer goods industry.
Yet, this technique can be important as it e.g., can provide automatic feature
relevance determination and the posterior mean can unlock insights on the data.
Significant challenges are the large size and high dimensionality of commercial
data at a point of sale. The study reviews approaches in the Gaussian Processes
modeling for large data sets, evaluates their performance on commercial sales
and shows value of this type of models as a decision-making tool for
management.Comment: 1o pages, 5 figure
Adaptive Reduced Rank Regression
We study the low rank regression problem , where and are and dimensional
vectors respectively. We consider the extreme high-dimensional setting where
the number of observations is less than . Existing algorithms
are designed for settings where is typically as large as
. This work provides an efficient algorithm which
only involves two SVD, and establishes statistical guarantees on its
performance. The algorithm decouples the problem by first estimating the
precision matrix of the features, and then solving the matrix denoising
problem. To complement the upper bound, we introduce new techniques for
establishing lower bounds on the performance of any algorithm for this problem.
Our preliminary experiments confirm that our algorithm often out-performs
existing baselines, and is always at least competitive.Comment: 40 page
A Computationally Efficient Projection-Based Approach for Spatial Generalized Linear Mixed Models
Inference for spatial generalized linear mixed models (SGLMMs) for
high-dimensional non-Gaussian spatial data is computationally intensive. The
computational challenge is due to the high-dimensional random effects and
because Markov chain Monte Carlo (MCMC) algorithms for these models tend to be
slow mixing. Moreover, spatial confounding inflates the variance of fixed
effect (regression coefficient) estimates. Our approach addresses both the
computational and confounding issues by replacing the high-dimensional spatial
random effects with a reduced-dimensional representation based on random
projections. Standard MCMC algorithms mix well and the reduced-dimensional
setting speeds up computations per iteration. We show, via simulated examples,
that Bayesian inference for this reduced-dimensional approach works well both
in terms of inference as well as prediction, our methods also compare favorably
to existing "reduced-rank" approaches. We also apply our methods to two real
world data examples, one on bird count data and the other classifying rock
types
- …