Search CORE

50,512 research outputs found

RANK: Large-Scale Inference with Graphical Nonlinear Knockoffs

Author: Demirkaya Emre
Fan Yingying
Li Gaorong
Lv Jinchi
Publication venue
Publication date: 31/08/2017
Field of study

Power and reproducibility are key to enabling refined scientific discoveries in contemporary big data applications with general high-dimensional nonlinear models. In this paper, we provide theoretical foundations on the power and robustness for the model-free knockoffs procedure introduced recently in Cand\`{e}s, Fan, Janson and Lv (2016) in high-dimensional setting when the covariate distribution is characterized by Gaussian graphical model. We establish that under mild regularity conditions, the power of the oracle knockoffs procedure with known covariate distribution in high-dimensional linear models is asymptotically one as sample size goes to infinity. When moving away from the ideal case, we suggest the modified model-free knockoffs method called graphical nonlinear knockoffs (RANK) to accommodate the unknown covariate distribution. We provide theoretical justifications on the robustness of our modified procedure by showing that the false discovery rate (FDR) is asymptotically controlled at the target level and the power is asymptotically one with the estimated covariate distribution. To the best of our knowledge, this is the first formal theoretical result on the power for the knockoffs procedure. Simulation results demonstrate that compared to existing approaches, our method performs competitively in both FDR control and power. A real data set is analyzed to further assess the performance of the suggested knockoffs procedure.Comment: 37 pages, 6 tables, 9 pages supplementary materia

arXiv.org e-Print Archive

FigShare

Forecasting of commercial sales with large scale Gaussian Processes

Author: Carmen Marsit (334042)
Jia Chen (8203)
Ke Hao (50181)
Luca Lambertini (72724)
Maya Deyssenroth (4238833)
Shouneng Peng (493132)
Publication venue
Publication date: 01/01/2017
Field of study

This paper argues that there has not been enough discussion in the field of applications of Gaussian Process for the fast moving consumer goods industry. Yet, this technique can be important as it e.g., can provide automatic feature relevance determination and the posterior mean can unlock insights on the data. Significant challenges are the large size and high dimensionality of commercial data at a point of sale. The study reviews approaches in the Gaussian Processes modeling for large data sets, evaluates their performance on commercial sales and shows value of this type of models as a decision-making tool for management.Comment: 1o pages, 5 figure

arXiv.org e-Print Archive

Crossref

FigShare

Adaptive Reduced Rank Regression

Author: Kanade Varun
Li Yanhua
Liu Zhenming
Wong Felix Ming Fai
Wu Qiong
Publication venue
Publication date: 26/02/2020
Field of study

We study the low rank regression problem

\mathbf{y} = M\mathbf{x} + \epsilon

, where

\mathbf{x}

and

\mathbf{y}

are

d_1

and

d_2

dimensional vectors respectively. We consider the extreme high-dimensional setting where the number of observations

n

is less than

d_1 + d_2

. Existing algorithms are designed for settings where

n

is typically as large as

\mathrm{rank}(M)(d_1+d_2)

. This work provides an efficient algorithm which only involves two SVD, and establishes statistical guarantees on its performance. The algorithm decouples the problem by first estimating the precision matrix of the features, and then solving the matrix denoising problem. To complement the upper bound, we introduce new techniques for establishing lower bounds on the performance of any algorithm for this problem. Our preliminary experiments confirm that our algorithm often out-performs existing baselines, and is always at least competitive.Comment: 40 page

arXiv.org e-Print Archive

Oxford University Research Archive

A Computationally Efficient Projection-Based Approach for Spatial Generalized Linear Mixed Models

Author: Guan Yawen
Haran Murali
Publication venue
Publication date: 15/01/2018
Field of study

Inference for spatial generalized linear mixed models (SGLMMs) for high-dimensional non-Gaussian spatial data is computationally intensive. The computational challenge is due to the high-dimensional random effects and because Markov chain Monte Carlo (MCMC) algorithms for these models tend to be slow mixing. Moreover, spatial confounding inflates the variance of fixed effect (regression coefficient) estimates. Our approach addresses both the computational and confounding issues by replacing the high-dimensional spatial random effects with a reduced-dimensional representation based on random projections. Standard MCMC algorithms mix well and the reduced-dimensional setting speeds up computations per iteration. We show, via simulated examples, that Bayesian inference for this reduced-dimensional approach works well both in terms of inference as well as prediction, our methods also compare favorably to existing "reduced-rank" approaches. We also apply our methods to two real world data examples, one on bird count data and the other classifying rock types

arXiv.org e-Print Archive

FigShare