272 research outputs found
Testing for Differences in Gaussian Graphical Models: Applications to Brain Connectivity
Functional brain networks are well described and estimated from data with
Gaussian Graphical Models (GGMs), e.g. using sparse inverse covariance
estimators. Comparing functional connectivity of subjects in two populations
calls for comparing these estimated GGMs. Our goal is to identify differences
in GGMs known to have similar structure. We characterize the uncertainty of
differences with confidence intervals obtained using a parametric distribution
on parameters of a sparse estimator. Sparse penalties enable statistical
guarantees and interpretable models even in high-dimensional and low-sample
settings. Characterizing the distributions of sparse models is inherently
challenging as the penalties produce a biased estimator. Recent work invokes
the sparsity assumptions to effectively remove the bias from a sparse estimator
such as the lasso. These distributions can be used to give confidence intervals
on edges in GGMs, and by extension their differences. However, in the case of
comparing GGMs, these estimators do not make use of any assumed joint structure
among the GGMs. Inspired by priors from brain functional connectivity we derive
the distribution of parameter differences under a joint penalty when parameters
are known to be sparse in the difference. This leads us to introduce the
debiased multi-task fused lasso, whose distribution can be characterized in an
efficient manner. We then show how the debiased lasso and multi-task fused
lasso can be used to obtain confidence intervals on edge differences in GGMs.
We validate the techniques proposed on a set of synthetic examples as well as
neuro-imaging dataset created for the study of autism
Familywise Error Rate Control via Knockoffs
We present a novel method for controlling the -familywise error rate
(-FWER) in the linear regression setting using the knockoffs framework first
introduced by Barber and Cand\`es. Our procedure, which we also refer to as
knockoffs, can be applied with any design matrix with at least as many
observations as variables, and does not require knowing the noise variance.
Unlike other multiple testing procedures which act directly on -values,
knockoffs is specifically tailored to linear regression and implicitly accounts
for the statistical relationships between hypothesis tests of different
coefficients. We prove that knockoffs controls the -FWER exactly in finite
samples and show in simulations that it provides superior power to alternative
procedures over a range of linear regression problems. We also discuss
extensions to controlling other Type I error rates such as the false exceedance
rate, and use it to identify candidates for mutations conferring
drug-resistance in HIV.Comment: 15 pages, 3 figures. Updated reference
Exact Post Model Selection Inference for Marginal Screening
We develop a framework for post model selection inference, via marginal
screening, in linear regression. At the core of this framework is a result that
characterizes the exact distribution of linear functions of the response ,
conditional on the model being selected (``condition on selection" framework).
This allows us to construct valid confidence intervals and hypothesis tests for
regression coefficients that account for the selection procedure. In contrast
to recent work in high-dimensional statistics, our results are exact
(non-asymptotic) and require no eigenvalue-like assumptions on the design
matrix . Furthermore, the computational cost of marginal regression,
constructing confidence intervals and hypothesis testing is negligible compared
to the cost of linear regression, thus making our methods particularly suitable
for extremely large datasets. Although we focus on marginal screening to
illustrate the applicability of the condition on selection framework, this
framework is much more broadly applicable. We show how to apply the proposed
framework to several other selection procedures including orthogonal matching
pursuit, non-negative least squares, and marginal screening+Lasso
Nearly Optimal Sample Size in Hypothesis Testing for High-Dimensional Regression
We consider the problem of fitting the parameters of a high-dimensional
linear regression model. In the regime where the number of parameters is
comparable to or exceeds the sample size , a successful approach uses an
-penalized least squares estimator, known as Lasso. Unfortunately,
unlike for linear estimators (e.g., ordinary least squares), no
well-established method exists to compute confidence intervals or p-values on
the basis of the Lasso estimator. Very recently, a line of work
\cite{javanmard2013hypothesis, confidenceJM, GBR-hypothesis} has addressed this
problem by constructing a debiased version of the Lasso estimator. In this
paper, we study this approach for random design model, under the assumption
that a good estimator exists for the precision matrix of the design. Our
analysis improves over the state of the art in that it establishes nearly
optimal \emph{average} testing power if the sample size asymptotically
dominates , with being the sparsity level (number of
non-zero coefficients). Earlier work obtains provable guarantees only for much
larger sample size, namely it requires to asymptotically dominate .
In particular, for random designs with a sparse precision matrix we show that
an estimator thereof having the required properties can be computed
efficiently. Finally, we evaluate this approach on synthetic data and compare
it with earlier proposals.Comment: 21 pages, short version appears in Annual Allerton Conference on
Communication, Control and Computing, 201
- …