2 research outputs found
Selective Inference and Learning Mixed Graphical Models
This thesis studies two problems in modern statistics. First, we study
selective inference, or inference for hypothesis that are chosen after looking
at the data. The motiving application is inference for regression coefficients
selected by the lasso. We present the Condition-on-Selection method that allows
for valid selective inference, and study its application to the lasso, and
several other selection algorithms.
In the second part, we consider the problem of learning the structure of a
pairwise graphical model over continuous and discrete variables. We present a
new pairwise model for graphical models with both continuous and discrete
variables that is amenable to structure learning. In previous work, authors
have considered structure learning of Gaussian graphical models and structure
learning of discrete models. Our approach is a natural generalization of these
two lines of work to the mixed case. The penalization scheme involves a novel
symmetric use of the group-lasso norm and follows naturally from a particular
parametrization of the model. We provide conditions under which our estimator
is model selection consistent in the high-dimensional regime.Comment: Jason D. Lee PhD Dissertatio
A Flexible Framework for Hypothesis Testing in High-dimensions
Hypothesis testing in the linear regression model is a fundamental
statistical problem. We consider linear regression in the high-dimensional
regime where the number of parameters exceeds the number of samples ().
In order to make informative inference, we assume that the model is
approximately sparse, that is the effect of covariates on the response can be
well approximated by conditioning on a relatively small number of covariates
whose identities are unknown. We develop a framework for testing very general
hypotheses regarding the model parameters. Our framework encompasses testing
whether the parameter lies in a convex cone, testing the signal strength, and
testing arbitrary functionals of the parameter. We show that the proposed
procedure controls the type I error, and also analyze the power of the
procedure. Our numerical experiments confirm our theoretical findings and
demonstrate that we control false positive rate (type I error) near the nominal
level, and have high power. By duality between hypotheses testing and
confidence intervals, the proposed framework can be used to obtain valid
confidence intervals for various functionals of the model parameters. For
linear functionals, the length of confidence intervals is shown to be minimax
rate optimal.Comment: 45 page