5 research outputs found

    Joint Nonparametric Precision Matrix Estimation with Confounding

    Full text link
    We consider the problem of precision matrix estimation where, due to extraneous confounding of the underlying precision matrix, the data are independent but not identically distributed. While such confounding occurs in many scientific problems, our approach is inspired by recent neuroscientific research suggesting that brain function, as measured using functional magnetic resonance imagine (fMRI), is susceptible to confounding by physiological noise such as breathing and subject motion. Following the scientific motivation, we propose a graphical model, which in turn motivates a joint nonparametric estimator. We provide theoretical guarantees for the consistency and the convergence rate of the proposed estimator. In addition, we demonstrate that the optimization of the proposed estimator can be transformed into a series of linear programming problems, and thus be efficiently solved in parallel. Empirical results are presented using simulated and real brain imaging data, which suggest that our approach improves precision matrix estimation, as compared to baselines, when confounding is present

    A Flexible Framework for Hypothesis Testing in High-dimensions

    Full text link
    Hypothesis testing in the linear regression model is a fundamental statistical problem. We consider linear regression in the high-dimensional regime where the number of parameters exceeds the number of samples (p>np> n). In order to make informative inference, we assume that the model is approximately sparse, that is the effect of covariates on the response can be well approximated by conditioning on a relatively small number of covariates whose identities are unknown. We develop a framework for testing very general hypotheses regarding the model parameters. Our framework encompasses testing whether the parameter lies in a convex cone, testing the signal strength, and testing arbitrary functionals of the parameter. We show that the proposed procedure controls the type I error, and also analyze the power of the procedure. Our numerical experiments confirm our theoretical findings and demonstrate that we control false positive rate (type I error) near the nominal level, and have high power. By duality between hypotheses testing and confidence intervals, the proposed framework can be used to obtain valid confidence intervals for various functionals of the model parameters. For linear functionals, the length of confidence intervals is shown to be minimax rate optimal.Comment: 45 page

    Simultaneous Inference for Pairwise Graphical Models with Generalized Score Matching

    Full text link
    Probabilistic graphical models provide a flexible yet parsimonious framework for modeling dependencies among nodes in networks. There is a vast literature on parameter estimation and consistent model selection for graphical models. However, in many of the applications, scientists are also interested in quantifying the uncertainty associated with the estimated parameters and selected models, which current literature has not addressed thoroughly. In this paper, we propose a novel estimator for statistical inference on edge parameters in pairwise graphical models based on generalized Hyv\"arinen scoring rule. Hyv\"arinen scoring rule is especially useful in cases where the normalizing constant cannot be obtained efficiently in a closed form, which is a common problem for graphical models, including Ising models and truncated Gaussian graphical models. Our estimator allows us to perform statistical inference for general graphical models whereas the existing works mostly focus on statistical inference for Gaussian graphical models where finding normalizing constant is computationally tractable. Under mild conditions that are typically assumed in the literature for consistent estimation, we prove that our proposed estimator is n\sqrt{n}-consistent and asymptotically normal, which allows us to construct confidence intervals and build hypothesis tests for edge parameters. Moreover, we show how our proposed method can be applied to test hypotheses that involve a large number of model parameters simultaneously. We illustrate validity of our estimator through extensive simulation studies on a diverse collection of data-generating processes

    Two-sample inference for high-dimensional Markov networks

    Full text link
    Markov networks are frequently used in sciences to represent conditional independence relationships underlying observed variables arising from a complex system. It is often of interest to understand how an underlying network differs between two conditions. In this paper, we develop methodology for performing valid statistical inference for difference between parameters of Markov network in a high-dimensional setting where the number of observed variables is allowed to be larger than the sample size. Our proposal is based on the regularized Kullback-Leibler Importance Estimation Procedure that allows us to directly learn the parameters of the differential network, without requiring for separate or joint estimation of the individual Markov network parameters. This allows for applications in cases where individual networks are not sparse, such as networks that contain hub nodes, but the differential network is sparse. We prove that our estimator is regular and its distribution can be well approximated by a normal under wide range of data generating processes and, in particular, is not sensitive to model selection mistakes. Furthermore, we develop a new testing procedure for equality of Markov networks, which is based on a max-type statistics. A valid bootstrap procedure is developed that approximates quantiles of the test statistics. The performance of the methodology is illustrated through extensive simulations and real data examples.Comment: 84 pages, 16 figures, 7 table

    Quantile Graphical Models: Prediction and Conditional Independence with Applications to Systemic Risk

    Full text link
    We propose two types of Quantile Graphical Models (QGMs) --- Conditional Independence Quantile Graphical Models (CIQGMs) and Prediction Quantile Graphical Models (PQGMs). CIQGMs characterize the conditional independence of distributions by evaluating the distributional dependence structure at each quantile index. As such, CIQGMs can be used for validation of the graph structure in the causal graphical models (\cite{pearl2009causality, robins1986new, heckman2015causal}). One main advantage of these models is that we can apply them to large collections of variables driven by non-Gaussian and non-separable shocks. PQGMs characterize the statistical dependencies through the graphs of the best linear predictors under asymmetric loss functions. PQGMs make weaker assumptions than CIQGMs as they allow for misspecification. Because of QGMs' ability to handle large collections of variables and focus on specific parts of the distributions, we could apply them to quantify tail interdependence. The resulting tail risk network can be used for measuring systemic risk contributions that help make inroads in understanding international financial contagion and dependence structures of returns under downside market movements. We develop estimation and inference methods for QGMs focusing on the high-dimensional case, where the number of variables in the graph is large compared to the number of observations. For CIQGMs, these methods and results include valid simultaneous choices of penalty functions, uniform rates of convergence, and confidence regions that are simultaneously valid. We also derive analogous results for PQGMs, which include new results for penalized quantile regressions in high-dimensional settings to handle misspecification, many controls, and a continuum of additional conditioning events
    corecore