9 research outputs found

    Scalable Approximations of Marginal Posteriors in Variable Selection

    Full text link
    In many contexts, there is interest in selecting the most important variables from a very large collection, commonly referred to as support recovery or variable, feature or subset selection. There is an enormous literature proposing a rich variety of algorithms. In scientific applications, it is of crucial importance to quantify uncertainty in variable selection, providing measures of statistical significance for each variable. The overwhelming majority of algorithms fail to produce such measures. This has led to a focus in the scientific literature on independent screening methods, which examine each variable in isolation, obtaining p-values measuring the significance of marginal associations. Bayesian methods provide an alternative, with marginal inclusion probabilities used in place of p-values. Bayesian variable selection has advantages, but is impractical computationally beyond small problems. In this article, we show that approximate message passing (AMP) and Bayesian compressed regression (BCR) can be used to rapidly obtain accurate approximations to marginal inclusion probabilities in high-dimensional variable selection. Theoretical support is provided, simulation studies are conducted to assess performance, and the method is applied to a study relating brain networks to creative reasoning.Comment: 10 pages, 4 figures, PDFLaTeX, submitted to the Twenty-ninth Annual Conference on Neural Information Processing Systems (NIPS 2015

    Thresholding tests

    Full text link
    We derive a new class of statistical tests for generalized linear models based on thresholding point estimators. These tests can be employed whether the model includes more parameters than observations or not. For linear models, our tests rely on pivotal statistics derived from model selection techniques. Affine lasso, a new extension of lasso, allows to unveil new tests and to develop in the same framework parametric and nonparametric tests. Our tests for generalized linear models are based on new asymptotically pivotal statistics. A composite thresholding test attempts to achieve uniformly most power under both sparse and dense alternatives with success. In a simulation, we compare the level and power of these tests under sparse and dense alternative hypotheses. The thresholding tests have a better control of the nominal level and higher power than existing tests.Comment: 18 pages, 3 figure

    Approximating posteriors with high-dimensional nuisance parameters via integrated rotated Gaussian approximation

    Full text link
    Posterior computation for high-dimensional data with many parameters can be challenging. This article focuses on a new method for approximating posterior distributions of a low- to moderate-dimensional parameter in the presence of a high-dimensional or otherwise computationally challenging nuisance parameter. The focus is on regression models and the key idea is to separate the likelihood into two components through a rotation. One component involves only the nuisance parameters, which can then be integrated out using a novel type of Gaussian approximation. We provide theory on approximation accuracy that holds for a broad class of forms of the nuisance component and priors. Applying our method to simulated and real data sets shows that it can outperform state-of-the-art posterior approximation approaches.Comment: 32 pages, 8 figure

    Hierarchical correction of pp-values via a tree running Ornstein-Uhlenbeck process

    Full text link
    Statistical testing is classically used as an exploratory tool to search for association between a phenotype and many possible explanatory variables. This approach often leads to multiple dependence testing under dependence. We assume a hierarchical structure between tests via an Ornstein-Uhlenbeckprocess on a tree. The process correlation structure is used for smoothing the p-values. We design a penalized estimation of the mean of the OU process for p-value computation. The performances of the algorithm are assessed via simulations. Its ability to discover new associations is demonstrated on a metagenomic dataset. The corresponding R package is available from https://github.com/abichat/zazou.Comment: 20 pages, 8 figure

    Confidence Intervals and Hypothesis Testing for High-Dimensional Regression

    Full text link
    Fitting high-dimensional statistical models often requires the use of non-linear parameter estimation procedures. As a consequence, it is generally impossible to obtain an exact characterization of the probability distribution of the parameter estimates. This in turn implies that it is extremely challenging to quantify the \emph{uncertainty} associated with a certain parameter estimate. Concretely, no commonly accepted procedure exists for computing classical measures of uncertainty and statistical significance as confidence intervals or pp-values for these models. We consider here high-dimensional linear regression problem, and propose an efficient algorithm for constructing confidence intervals and pp-values. The resulting confidence intervals have nearly optimal size. When testing for the null hypothesis that a certain parameter is vanishing, our method has nearly optimal power. Our approach is based on constructing a `de-biased' version of regularized M-estimators. The new construction improves over recent work in the field in that it does not assume a special structure on the design matrix. We test our method on synthetic data and a high-throughput genomic data set about riboflavin production rate.Comment: 40 pages, 4 pdf figure

    In Defense of the Indefensible: A Very Naive Approach to High-Dimensional Inference

    Full text link
    A great deal of interest has recently focused on conducting inference on the parameters in a high-dimensional linear model. In this paper, we consider a simple and very na\"{i}ve two-step procedure for this task, in which we (i) fit a lasso model in order to obtain a subset of the variables, and (ii) fit a least squares model on the lasso-selected set. Conventional statistical wisdom tells us that we cannot make use of the standard statistical inference tools for the resulting least squares model (such as confidence intervals and pp-values), since we peeked at the data twice: once in running the lasso, and again in fitting the least squares model. However, in this paper, we show that under a certain set of assumptions, with high probability, the set of variables selected by the lasso is identical to the one selected by the noiseless lasso and is hence deterministic. Consequently, the na\"{i}ve two-step approach can yield asymptotically valid inference. We utilize this finding to develop the \emph{na\"ive confidence interval}, which can be used to draw inference on the regression coefficients of the model selected by the lasso, as well as the \emph{na\"ive score test}, which can be used to test the hypotheses regarding the full-model regression coefficients

    Sparse Nonlinear Regression: Parameter Estimation and Asymptotic Inference

    Full text link
    We study parameter estimation and asymptotic inference for sparse nonlinear regression. More specifically, we assume the data are given by y=f(x⊤β∗)+ϵy = f( x^\top \beta^* ) + \epsilon, where ff is nonlinear. To recover β∗\beta^*, we propose an ℓ1\ell_1-regularized least-squares estimator. Unlike classical linear regression, the corresponding optimization problem is nonconvex because of the nonlinearity of ff. In spite of the nonconvexity, we prove that under mild conditions, every stationary point of the objective enjoys an optimal statistical rate of convergence. In addition, we provide an efficient algorithm that provably converges to a stationary point. We also access the uncertainty of the obtained estimator. Specifically, based on any stationary point of the objective, we construct valid hypothesis tests and confidence intervals for the low dimensional components of the high-dimensional parameter β∗\beta^*. Detailed numerical results are provided to back up our theory.Comment: 32 pages, 2 figures, 1 tabl

    Confidence intervals for high-dimensional Cox models

    Full text link
    The purpose of this paper is to construct confidence intervals for the regression coefficients in high-dimensional Cox proportional hazards regression models where the number of covariates may be larger than the sample size. Our debiased estimator construction is similar to those in Zhang and Zhang (2014) and van de Geer et al. (2014), but the time-dependent covariates and censored risk sets introduce considerable additional challenges. Our theoretical results, which provide conditions under which our confidence intervals are asymptotically valid, are supported by extensive numerical experiments.Comment: 36 pages, 1 figur

    Testing and Confidence Intervals for High Dimensional Proportional Hazards Model

    Full text link
    This paper proposes a decorrelation-based approach to test hypotheses and construct confidence intervals for the low dimensional component of high dimensional proportional hazards models. Motivated by the geometric projection principle, we propose new decorrelated score, Wald and partial likelihood ratio statistics. Without assuming model selection consistency, we prove the asymptotic normality of these test statistics, establish their semiparametric optimality. We also develop new procedures for constructing pointwise confidence intervals for the baseline hazard function and baseline survival function. Thorough numerical results are provided to back up our theory.Comment: 42 pages, 4 figures, 5 table
    corecore