5 research outputs found
Joint Nonparametric Precision Matrix Estimation with Confounding
We consider the problem of precision matrix estimation where, due to
extraneous confounding of the underlying precision matrix, the data are
independent but not identically distributed. While such confounding occurs in
many scientific problems, our approach is inspired by recent neuroscientific
research suggesting that brain function, as measured using functional magnetic
resonance imagine (fMRI), is susceptible to confounding by physiological noise
such as breathing and subject motion. Following the scientific motivation, we
propose a graphical model, which in turn motivates a joint nonparametric
estimator. We provide theoretical guarantees for the consistency and the
convergence rate of the proposed estimator. In addition, we demonstrate that
the optimization of the proposed estimator can be transformed into a series of
linear programming problems, and thus be efficiently solved in parallel.
Empirical results are presented using simulated and real brain imaging data,
which suggest that our approach improves precision matrix estimation, as
compared to baselines, when confounding is present
A Flexible Framework for Hypothesis Testing in High-dimensions
Hypothesis testing in the linear regression model is a fundamental
statistical problem. We consider linear regression in the high-dimensional
regime where the number of parameters exceeds the number of samples ().
In order to make informative inference, we assume that the model is
approximately sparse, that is the effect of covariates on the response can be
well approximated by conditioning on a relatively small number of covariates
whose identities are unknown. We develop a framework for testing very general
hypotheses regarding the model parameters. Our framework encompasses testing
whether the parameter lies in a convex cone, testing the signal strength, and
testing arbitrary functionals of the parameter. We show that the proposed
procedure controls the type I error, and also analyze the power of the
procedure. Our numerical experiments confirm our theoretical findings and
demonstrate that we control false positive rate (type I error) near the nominal
level, and have high power. By duality between hypotheses testing and
confidence intervals, the proposed framework can be used to obtain valid
confidence intervals for various functionals of the model parameters. For
linear functionals, the length of confidence intervals is shown to be minimax
rate optimal.Comment: 45 page
Simultaneous Inference for Pairwise Graphical Models with Generalized Score Matching
Probabilistic graphical models provide a flexible yet parsimonious framework
for modeling dependencies among nodes in networks. There is a vast literature
on parameter estimation and consistent model selection for graphical models.
However, in many of the applications, scientists are also interested in
quantifying the uncertainty associated with the estimated parameters and
selected models, which current literature has not addressed thoroughly. In this
paper, we propose a novel estimator for statistical inference on edge
parameters in pairwise graphical models based on generalized Hyv\"arinen
scoring rule. Hyv\"arinen scoring rule is especially useful in cases where the
normalizing constant cannot be obtained efficiently in a closed form, which is
a common problem for graphical models, including Ising models and truncated
Gaussian graphical models. Our estimator allows us to perform statistical
inference for general graphical models whereas the existing works mostly focus
on statistical inference for Gaussian graphical models where finding
normalizing constant is computationally tractable. Under mild conditions that
are typically assumed in the literature for consistent estimation, we prove
that our proposed estimator is -consistent and asymptotically normal,
which allows us to construct confidence intervals and build hypothesis tests
for edge parameters. Moreover, we show how our proposed method can be applied
to test hypotheses that involve a large number of model parameters
simultaneously. We illustrate validity of our estimator through extensive
simulation studies on a diverse collection of data-generating processes
Two-sample inference for high-dimensional Markov networks
Markov networks are frequently used in sciences to represent conditional
independence relationships underlying observed variables arising from a complex
system. It is often of interest to understand how an underlying network differs
between two conditions. In this paper, we develop methodology for performing
valid statistical inference for difference between parameters of Markov network
in a high-dimensional setting where the number of observed variables is allowed
to be larger than the sample size. Our proposal is based on the regularized
Kullback-Leibler Importance Estimation Procedure that allows us to directly
learn the parameters of the differential network, without requiring for
separate or joint estimation of the individual Markov network parameters. This
allows for applications in cases where individual networks are not sparse, such
as networks that contain hub nodes, but the differential network is sparse. We
prove that our estimator is regular and its distribution can be well
approximated by a normal under wide range of data generating processes and, in
particular, is not sensitive to model selection mistakes. Furthermore, we
develop a new testing procedure for equality of Markov networks, which is based
on a max-type statistics. A valid bootstrap procedure is developed that
approximates quantiles of the test statistics. The performance of the
methodology is illustrated through extensive simulations and real data
examples.Comment: 84 pages, 16 figures, 7 table
Quantile Graphical Models: Prediction and Conditional Independence with Applications to Systemic Risk
We propose two types of Quantile Graphical Models (QGMs) --- Conditional
Independence Quantile Graphical Models (CIQGMs) and Prediction Quantile
Graphical Models (PQGMs). CIQGMs characterize the conditional independence of
distributions by evaluating the distributional dependence structure at each
quantile index. As such, CIQGMs can be used for validation of the graph
structure in the causal graphical models (\cite{pearl2009causality,
robins1986new, heckman2015causal}). One main advantage of these models is that
we can apply them to large collections of variables driven by non-Gaussian and
non-separable shocks. PQGMs characterize the statistical dependencies through
the graphs of the best linear predictors under asymmetric loss functions. PQGMs
make weaker assumptions than CIQGMs as they allow for misspecification. Because
of QGMs' ability to handle large collections of variables and focus on specific
parts of the distributions, we could apply them to quantify tail
interdependence. The resulting tail risk network can be used for measuring
systemic risk contributions that help make inroads in understanding
international financial contagion and dependence structures of returns under
downside market movements.
We develop estimation and inference methods for QGMs focusing on the
high-dimensional case, where the number of variables in the graph is large
compared to the number of observations. For CIQGMs, these methods and results
include valid simultaneous choices of penalty functions, uniform rates of
convergence, and confidence regions that are simultaneously valid. We also
derive analogous results for PQGMs, which include new results for penalized
quantile regressions in high-dimensional settings to handle misspecification,
many controls, and a continuum of additional conditioning events