22,019 research outputs found
Extended Bayesian Information Criteria for Gaussian Graphical Models
Gaussian graphical models with sparsity in the inverse covariance matrix are
of significant interest in many modern applications. For the problem of
recovering the graphical structure, information criteria provide useful
optimization objectives for algorithms searching through sets of graphs or for
selection of tuning parameters of other methods such as the graphical lasso,
which is a likelihood penalization technique. In this paper we establish the
consistency of an extended Bayesian information criterion for Gaussian
graphical models in a scenario where both the number of variables p and the
sample size n grow. Compared to earlier work on the regression case, our
treatment allows for growth in the number of non-zero parameters in the true
model, which is necessary in order to cover connected graphs. We demonstrate
the performance of this criterion on simulated data when used in conjunction
with the graphical lasso, and verify that the criterion indeed performs better
than either cross-validation or the ordinary Bayesian information criterion
when p and the number of non-zero parameters q both scale with n
Penalized EM algorithm and copula skeptic graphical models for inferring networks for mixed variables
In this article, we consider the problem of reconstructing networks for
continuous, binary, count and discrete ordinal variables by estimating sparse
precision matrix in Gaussian copula graphical models. We propose two
approaches: penalized extended rank likelihood with Monte Carlo
Expectation-Maximization algorithm (copula EM glasso) and copula skeptic with
pair-wise copula estimation for copula Gaussian graphical models. The proposed
approaches help to infer networks arising from nonnormal and mixed variables.
We demonstrate the performance of our methods through simulation studies and
analysis of breast cancer genomic and clinical data and maize genetics data
Who Learns Better Bayesian Network Structures: Accuracy and Speed of Structure Learning Algorithms
Three classes of algorithms to learn the structure of Bayesian networks from
data are common in the literature: constraint-based algorithms, which use
conditional independence tests to learn the dependence structure of the data;
score-based algorithms, which use goodness-of-fit scores as objective functions
to maximise; and hybrid algorithms that combine both approaches.
Constraint-based and score-based algorithms have been shown to learn the same
structures when conditional independence and goodness of fit are both assessed
using entropy and the topological ordering of the network is known (Cowell,
2001).
In this paper, we investigate how these three classes of algorithms perform
outside the assumptions above in terms of speed and accuracy of network
reconstruction for both discrete and Gaussian Bayesian networks. We approach
this question by recognising that structure learning is defined by the
combination of a statistical criterion and an algorithm that determines how the
criterion is applied to the data. Removing the confounding effect of different
choices for the statistical criterion, we find using both simulated and
real-world complex data that constraint-based algorithms are often less
accurate than score-based algorithms, but are seldom faster (even at large
sample sizes); and that hybrid algorithms are neither faster nor more accurate
than constraint-based algorithms. This suggests that commonly held beliefs on
structure learning in the literature are strongly influenced by the choice of
particular statistical criteria rather than just by the properties of the
algorithms themselves.Comment: 27 pages, 8 figure
High dimensional Sparse Gaussian Graphical Mixture Model
This paper considers the problem of networks reconstruction from
heterogeneous data using a Gaussian Graphical Mixture Model (GGMM). It is well
known that parameter estimation in this context is challenging due to large
numbers of variables coupled with the degeneracy of the likelihood. We propose
as a solution a penalized maximum likelihood technique by imposing an
penalty on the precision matrix. Our approach shrinks the parameters thereby
resulting in better identifiability and variable selection. We use the
Expectation Maximization (EM) algorithm which involves the graphical LASSO to
estimate the mixing coefficients and the precision matrices. We show that under
certain regularity conditions the Penalized Maximum Likelihood (PML) estimates
are consistent. We demonstrate the performance of the PML estimator through
simulations and we show the utility of our method for high dimensional data
analysis in a genomic application
- …