Search CORE

272 research outputs found

Testing for Differences in Gaussian Graphical Models: Applications to Brain Connectivity

Author: Belilovsky Eugene
Blaschko Matthew B.
Varoquaux Gaël
Publication venue
Publication date: 18/11/2016
Field of study

Functional brain networks are well described and estimated from data with Gaussian Graphical Models (GGMs), e.g. using sparse inverse covariance estimators. Comparing functional connectivity of subjects in two populations calls for comparing these estimated GGMs. Our goal is to identify differences in GGMs known to have similar structure. We characterize the uncertainty of differences with confidence intervals obtained using a parametric distribution on parameters of a sparse estimator. Sparse penalties enable statistical guarantees and interpretable models even in high-dimensional and low-sample settings. Characterizing the distributions of sparse models is inherently challenging as the penalties produce a biased estimator. Recent work invokes the sparsity assumptions to effectively remove the bias from a sparse estimator such as the lasso. These distributions can be used to give confidence intervals on edges in GGMs, and by extension their differences. However, in the case of comparing GGMs, these estimators do not make use of any assumed joint structure among the GGMs. Inspired by priors from brain functional connectivity we derive the distribution of parameter differences under a joint penalty when parameters are known to be sparse in the difference. This leads us to introduce the debiased multi-task fused lasso, whose distribution can be characterized in an efficient manner. We then show how the debiased lasso and multi-task fused lasso can be used to obtain confidence intervals on edge differences in GGMs. We validate the techniques proposed on a set of synthetic examples as well as neuro-imaging dataset created for the study of autism

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-CEA

HAL-Rennes 1

Familywise Error Rate Control via Knockoffs

Author: Janson Lucas
Su Weijie
Publication venue
Publication date: 09/11/2015
Field of study

We present a novel method for controlling the

k

-familywise error rate (

k

-FWER) in the linear regression setting using the knockoffs framework first introduced by Barber and Cand\`es. Our procedure, which we also refer to as knockoffs, can be applied with any design matrix with at least as many observations as variables, and does not require knowing the noise variance. Unlike other multiple testing procedures which act directly on

p

-values, knockoffs is specifically tailored to linear regression and implicitly accounts for the statistical relationships between hypothesis tests of different coefficients. We prove that knockoffs controls the

k

-FWER exactly in finite samples and show in simulations that it provides superior power to alternative procedures over a range of linear regression problems. We also discuss extensions to controlling other Type I error rates such as the false exceedance rate, and use it to identify candidates for mutations conferring drug-resistance in HIV.Comment: 15 pages, 3 figures. Updated reference

arXiv.org e-Print Archive

ScholarlyCommons@Penn

Exact Post Model Selection Inference for Marginal Screening

Author: Lee Jason D
Taylor Jonathan E
Publication venue
Publication date: 27/02/2014
Field of study

We develop a framework for post model selection inference, via marginal screening, in linear regression. At the core of this framework is a result that characterizes the exact distribution of linear functions of the response

y

, conditional on the model being selected (``condition on selection" framework). This allows us to construct valid confidence intervals and hypothesis tests for regression coefficients that account for the selection procedure. In contrast to recent work in high-dimensional statistics, our results are exact (non-asymptotic) and require no eigenvalue-like assumptions on the design matrix

X

. Furthermore, the computational cost of marginal regression, constructing confidence intervals and hypothesis testing is negligible compared to the cost of linear regression, thus making our methods particularly suitable for extremely large datasets. Although we focus on marginal screening to illustrate the applicability of the condition on selection framework, this framework is much more broadly applicable. We show how to apply the proposed framework to several other selection procedures including orthogonal matching pursuit, non-negative least squares, and marginal screening+Lasso

arXiv.org e-Print Archive

CiteSeerX

Nearly Optimal Sample Size in Hypothesis Testing for High-Dimensional Regression

Author: Javanmard Adel
Montanari Andrea
Publication venue
Publication date: 01/11/2013
Field of study

We consider the problem of fitting the parameters of a high-dimensional linear regression model. In the regime where the number of parameters

p

is comparable to or exceeds the sample size

n

, a successful approach uses an

\ell_1

-penalized least squares estimator, known as Lasso. Unfortunately, unlike for linear estimators (e.g., ordinary least squares), no well-established method exists to compute confidence intervals or p-values on the basis of the Lasso estimator. Very recently, a line of work \cite{javanmard2013hypothesis, confidenceJM, GBR-hypothesis} has addressed this problem by constructing a debiased version of the Lasso estimator. In this paper, we study this approach for random design model, under the assumption that a good estimator exists for the precision matrix of the design. Our analysis improves over the state of the art in that it establishes nearly optimal \emph{average} testing power if the sample size

n

asymptotically dominates

s_0 (\log p)^2

, with

s_0

being the sparsity level (number of non-zero coefficients). Earlier work obtains provable guarantees only for much larger sample size, namely it requires

n

to asymptotically dominate

(s_0 \log p)^2

. In particular, for random designs with a sparse precision matrix we show that an estimator thereof having the required properties can be computed efficiently. Finally, we evaluate this approach on synthetic data and compare it with earlier proposals.Comment: 21 pages, short version appears in Annual Allerton Conference on Communication, Control and Computing, 201

arXiv.org e-Print Archive

Crossref