77,667 research outputs found
Exact Post Model Selection Inference for Marginal Screening
We develop a framework for post model selection inference, via marginal
screening, in linear regression. At the core of this framework is a result that
characterizes the exact distribution of linear functions of the response ,
conditional on the model being selected (``condition on selection" framework).
This allows us to construct valid confidence intervals and hypothesis tests for
regression coefficients that account for the selection procedure. In contrast
to recent work in high-dimensional statistics, our results are exact
(non-asymptotic) and require no eigenvalue-like assumptions on the design
matrix . Furthermore, the computational cost of marginal regression,
constructing confidence intervals and hypothesis testing is negligible compared
to the cost of linear regression, thus making our methods particularly suitable
for extremely large datasets. Although we focus on marginal screening to
illustrate the applicability of the condition on selection framework, this
framework is much more broadly applicable. We show how to apply the proposed
framework to several other selection procedures including orthogonal matching
pursuit, non-negative least squares, and marginal screening+Lasso
Estimation and Inference for High Dimensional Generalized Linear Models: A Splitting and Smoothing Approach
The focus of modern biomedical studies has gradually shifted to explanation
and estimation of joint effects of high dimensional predictors on disease
risks. Quantifying uncertainty in these estimates may provide valuable insight
into prevention strategies or treatment decisions for both patients and
physicians. High dimensional inference, including confidence intervals and
hypothesis testing, has sparked much interest. While much work has been done in
the linear regression setting, there is lack of literature on inference for
high dimensional generalized linear models. We propose a novel and
computationally feasible method, which accommodates a variety of outcome types,
including normal, binomial, and Poisson data. We use a "splitting and
smoothing" approach, which splits samples into two parts, performs variable
selection using one part and conducts partial regression with the other part.
Averaging the estimates over multiple random splits, we obtain the smoothed
estimates, which are numerically stable. We show that the estimates are
consistent, asymptotically normal, and construct confidence intervals with
proper coverage probabilities for all predictors. We examine the finite sample
performance of our method by comparing it with the existing methods and
applying it to analyze a lung cancer cohort study
Targeted Undersmoothing
This paper proposes a post-model selection inference procedure, called
targeted undersmoothing, designed to construct uniformly valid confidence sets
for a broad class of functionals of sparse high-dimensional statistical models.
These include dense functionals, which may potentially depend on all elements
of an unknown high-dimensional parameter. The proposed confidence sets are
based on an initially selected model and two additionally selected models, an
upper model and a lower model, which enlarge the initially selected model. We
illustrate application of the procedure in two empirical examples. The first
example considers estimation of heterogeneous treatment effects using data from
the Job Training Partnership Act of 1982, and the second example looks at
estimating profitability from a mailing strategy based on estimated
heterogeneous treatment effects in a direct mail marketing campaign. We also
provide evidence on the finite sample performance of the proposed targeted
undersmoothing procedure through a series of simulation experiments
Statistical Inference For High Dimensional Models In Genomics And Microbiome
Human microbiome consists of all living microorganisms that are in and on human body. Largescale microbiome studies such as the NIH Human Microbiome Project (HMP), have shown that this complex ecosystem has large impact on human health through multiple ways. The analysis of these datasets leads to new statistical challenges that require the development of novel methodologies. Motivated by several microbiome studies, we develop several methods of statistical inference for high dimensional models to address the association between microbiome compositions and certain outcomes. The high-dimensionality and compositional nature of the microbiome data make the naive application of the classical regression models invalid. To study the association between microbiome
compositions with a disease’s risk, we develop a generalized linear model with linear constraints on regression coefficients and a related debiased procedure to obtain asymptotically unbiased and normally distributed estimates. Application of this method to an inflammatory bowel disease (IBD) study identifies several gut bacterial species that are associated with the risk of IBD. We also consider the post-selection inference for models with linear equality constraints, where we develop methods for constructing the confidence intervals for the selected non-zero coefficients chosen by a Lasso-type estimator with linear constraints. These confidence intervals are shown to have desired coverage probabilities when conditioned on the selected model. Finally, the last chapter of this dissertation presents a method for inference of high dimensional instrumental variable regression. Gene expression and phenotype association can be affected by potential unmeasured confounders, leading to biased estimates of the associations. Using genetic variants as instruments, we consider the problem of hypothesis testing for sparse IV regression models and present methods for testing both single and multiple regression coefficients. A multiple testing procedure is developed for selecting variables and is shown to control the false discovery rate. These methods are illustrated by an analysis of a yeast dataset in order to identify genes that are associated with growth in the presence of hydrogen peroxide
- …