272 research outputs found

    Testing for Differences in Gaussian Graphical Models: Applications to Brain Connectivity

    Get PDF
    Functional brain networks are well described and estimated from data with Gaussian Graphical Models (GGMs), e.g. using sparse inverse covariance estimators. Comparing functional connectivity of subjects in two populations calls for comparing these estimated GGMs. Our goal is to identify differences in GGMs known to have similar structure. We characterize the uncertainty of differences with confidence intervals obtained using a parametric distribution on parameters of a sparse estimator. Sparse penalties enable statistical guarantees and interpretable models even in high-dimensional and low-sample settings. Characterizing the distributions of sparse models is inherently challenging as the penalties produce a biased estimator. Recent work invokes the sparsity assumptions to effectively remove the bias from a sparse estimator such as the lasso. These distributions can be used to give confidence intervals on edges in GGMs, and by extension their differences. However, in the case of comparing GGMs, these estimators do not make use of any assumed joint structure among the GGMs. Inspired by priors from brain functional connectivity we derive the distribution of parameter differences under a joint penalty when parameters are known to be sparse in the difference. This leads us to introduce the debiased multi-task fused lasso, whose distribution can be characterized in an efficient manner. We then show how the debiased lasso and multi-task fused lasso can be used to obtain confidence intervals on edge differences in GGMs. We validate the techniques proposed on a set of synthetic examples as well as neuro-imaging dataset created for the study of autism

    Familywise Error Rate Control via Knockoffs

    Get PDF
    We present a novel method for controlling the kk-familywise error rate (kk-FWER) in the linear regression setting using the knockoffs framework first introduced by Barber and Cand\`es. Our procedure, which we also refer to as knockoffs, can be applied with any design matrix with at least as many observations as variables, and does not require knowing the noise variance. Unlike other multiple testing procedures which act directly on pp-values, knockoffs is specifically tailored to linear regression and implicitly accounts for the statistical relationships between hypothesis tests of different coefficients. We prove that knockoffs controls the kk-FWER exactly in finite samples and show in simulations that it provides superior power to alternative procedures over a range of linear regression problems. We also discuss extensions to controlling other Type I error rates such as the false exceedance rate, and use it to identify candidates for mutations conferring drug-resistance in HIV.Comment: 15 pages, 3 figures. Updated reference

    Exact Post Model Selection Inference for Marginal Screening

    Full text link
    We develop a framework for post model selection inference, via marginal screening, in linear regression. At the core of this framework is a result that characterizes the exact distribution of linear functions of the response yy, conditional on the model being selected (``condition on selection" framework). This allows us to construct valid confidence intervals and hypothesis tests for regression coefficients that account for the selection procedure. In contrast to recent work in high-dimensional statistics, our results are exact (non-asymptotic) and require no eigenvalue-like assumptions on the design matrix XX. Furthermore, the computational cost of marginal regression, constructing confidence intervals and hypothesis testing is negligible compared to the cost of linear regression, thus making our methods particularly suitable for extremely large datasets. Although we focus on marginal screening to illustrate the applicability of the condition on selection framework, this framework is much more broadly applicable. We show how to apply the proposed framework to several other selection procedures including orthogonal matching pursuit, non-negative least squares, and marginal screening+Lasso

    Nearly Optimal Sample Size in Hypothesis Testing for High-Dimensional Regression

    Full text link
    We consider the problem of fitting the parameters of a high-dimensional linear regression model. In the regime where the number of parameters pp is comparable to or exceeds the sample size nn, a successful approach uses an 1\ell_1-penalized least squares estimator, known as Lasso. Unfortunately, unlike for linear estimators (e.g., ordinary least squares), no well-established method exists to compute confidence intervals or p-values on the basis of the Lasso estimator. Very recently, a line of work \cite{javanmard2013hypothesis, confidenceJM, GBR-hypothesis} has addressed this problem by constructing a debiased version of the Lasso estimator. In this paper, we study this approach for random design model, under the assumption that a good estimator exists for the precision matrix of the design. Our analysis improves over the state of the art in that it establishes nearly optimal \emph{average} testing power if the sample size nn asymptotically dominates s0(logp)2s_0 (\log p)^2, with s0s_0 being the sparsity level (number of non-zero coefficients). Earlier work obtains provable guarantees only for much larger sample size, namely it requires nn to asymptotically dominate (s0logp)2(s_0 \log p)^2. In particular, for random designs with a sparse precision matrix we show that an estimator thereof having the required properties can be computed efficiently. Finally, we evaluate this approach on synthetic data and compare it with earlier proposals.Comment: 21 pages, short version appears in Annual Allerton Conference on Communication, Control and Computing, 201
    corecore