Search CORE

631 research outputs found

Generalized Permutohedra from Probabilistic Graphical Models

Author: Caroline Uhler
Charles Wang
Chickering D. M.
Chickering D. M.
Drton M.
Fatemeh Mohammadi
Jaakkola T.
Josephine Yu
Lněnička R.
Postnikov A.
Studený M.
Studený M.
Teyssier M.
Verma T.
Publication venue
Publication date: 06/12/2017
Field of study

A graphical model encodes conditional independence relations via the Markov properties. For an undirected graph these conditional independence relations can be represented by a simple polytope known as the graph associahedron, which can be constructed as a Minkowski sum of standard simplices. There is an analogous polytope for conditional independence relations coming from a regular Gaussian model, and it can be defined using multiinformation or relative entropy. For directed acyclic graphical models and also for mixed graphical models containing undirected, directed and bidirected edges, we give a construction of this polytope, up to equivalence of normal fans, as a Minkowski sum of matroid polytopes. Finally, we apply this geometric insight to construct a new ordering-based search algorithm for causal inference via directed acyclic graphical models.Comment: Appendix B is expanded. Final version to appear in SIAM J. Discrete Mat

arXiv.org e-Print Archive

DSpace@MIT

Crossref

Ghent University Academic Bibliography

Explore Bristol Research

Multisite Weather Generators Using Bayesian Networks: An Illustrative Case Study for Precipitation Occurrence

Author: Ailliot P.
Chickering D. M.
Cofiño A. S.
Koller D.
Pearl J.
Pearl J.
Sivia D.
Widmann M.
Publication venue: 'American Geophysical Union (AGU)'
Publication date: 01/01/2020
Field of study

ABSTRACT: Many existing approaches for multisite weather generation try to capture several statistics of the observed data (e.g. pairwise correlations) in order to generate spatially and temporarily consistent series. In this work we analyse the application of Bayesian networks to this problem, focusing on precipitation occurrence and considering a simple case study to illustrate the potential of this new approach. We use Bayesian networks to approximate the multi-variate (-site) probability distribution of observed gauge data, which is factorized according to the relevant (marginal and conditional) dependencies. This factorization allows the simulation of synthetic samples from the multivariate distribution, thus providing a sound and promising methodology for multisite precipitation series generation.We acknowledge funding provided by the project MULTI‐SDM (CGL2015‐ 66583‐R, MINECO/FEDER)

Crossref

UCrea

Digital.CSIC

The IBMAP approach for Markov networks structure learning

Author: Alejandro Edera
C Aliferis
C Aliferis
D Koller
D Margaritis
DM Chickering
F Bromberg
F Bromberg
Facundo Bromberg
Federico Schlüter
J Pearl
M Mitchell
MJ Wainwright
P Larraṅaga
P Ravikumar
P Spirtes
R Santana
S Della Pietra
S Shakya
SL Lauritzen
TM Cover
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/02/2014
Field of study

In this work we consider the problem of learning the structure of Markov networks from data. We present an approach for tackling this problem called IBMAP, together with an efficient instantiation of the approach: the IBMAP-HC algorithm, designed for avoiding important limitations of existing independence-based algorithms. These algorithms proceed by performing statistical independence tests on data, trusting completely the outcome of each test. In practice tests may be incorrect, resulting in potential cascading errors and the consequent reduction in the quality of the structures learned. IBMAP contemplates this uncertainty in the outcome of the tests through a probabilistic maximum-a-posteriori approach. The approach is instantiated in the IBMAP-HC algorithm, a structure selection strategy that performs a polynomial heuristic local search in the space of possible structures. We present an extensive empirical evaluation on synthetic and real data, showing that our algorithm outperforms significantly the current independence-based algorithms, in terms of data efficiency and quality of learned structures, with equivalent computational complexities. We also show the performance of IBMAP-HC in a real-world application of knowledge discovery: EDAs, which are evolutionary algorithms that use structure learning on each generation for modeling the distribution of populations. The experiments show that when IBMAP-HC is used to learn the structure, EDAs improve the convergence to the optimum

arXiv.org e-Print Archive

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

CONICET Digital

The identification of informative genes from multiple datasets with increasing complexity

Author: AH Fielding
Allan Tucker
BC Haynes
C Zhang
D Grossman
D Heckerman
D Madigan
DM Chickering
DR Rhodes
E Segal
G Schwarz
H Ma
J Bockhorst
J Pearl
J Su
JB Tobler
JM Peña
KK Tomczak
KP Murphy
M Miron
M Stone
N Friedman
N Friedman
N Friedman
Peter AC 't Hoen
R Jelier
R Kohavi
R Mac Nally
RA Irizarry
S Iezzi
S Yahya Anvar
SS Shen-Orr
TI Lee
TVan den Bulcke
W Lam
WL Buntine
X Xu
Y Cao
Y Lai
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Background In microarray data analysis, factors such as data quality, biological variation, and the increasingly multi-layered nature of more complex biological systems complicates the modelling of regulatory networks that can represent and capture the interactions among genes. We believe that the use of multiple datasets derived from related biological systems leads to more robust models. Therefore, we developed a novel framework for modelling regulatory networks that involves training and evaluation on independent datasets. Our approach includes the following steps: (1) ordering the datasets based on their level of noise and informativeness; (2) selection of a Bayesian classifier with an appropriate level of complexity by evaluation of predictive performance on independent data sets; (3) comparing the different gene selections and the influence of increasing the model complexity; (4) functional analysis of the informative genes. Results In this paper, we identify the most appropriate model complexity using cross-validation and independent test set validation for predicting gene expression in three published datasets related to myogenesis and muscle differentiation. Furthermore, we demonstrate that models trained on simpler datasets can be used to identify interactions among genes and select the most informative. We also show that these models can explain the myogenesis-related genes (genes of interest) significantly better than others (P < 0.004) since the improvement in their rankings is much more pronounced. Finally, after further evaluating our results on synthetic datasets, we show that our approach outperforms a concordance method by Lai et al. in identifying informative genes from multiple datasets with increasing complexity whilst additionally modelling the interaction between genes. Conclusions We show that Bayesian networks derived from simpler controlled systems have better performance than those trained on datasets from more complex biological systems. Further, we present that highly predictive and consistent genes, from the pool of differentially expressed genes, across independent datasets are more likely to be fundamentally involved in the biological process under study. We conclude that networks trained on simpler controlled systems, such as in vitro experiments, can be used to model and capture interactions among genes in more complex datasets, such as in vivo experiments, where these interactions would otherwise be concealed by a multitude of other ongoing events

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Leiden University Scholary Publications

Brunel University Research Archive

Evolutionary approaches for the reverse-engineering of gene regulatory networks: A study on a biologically realistic dataset

Author: AJ Hartemink
B Schölkopf
C Cotta
C Cotta
CF Aliferis
CG Peter Spirtes
Chickering
Cédric Auliac
D Chickering
D Heckerman
D Husmeier
D Pe'er
DE Goldberg
DG DM Chickering
E Segal
EP van Someren
Florence d'Alché-Buc
Friedman
FV Jensen
GF Cooper
H de Jong
I Tsamardinos
Imoto M Goto
J Cheng
J Pearl
JH Holland
JM Pena
JW Myers
KAD Jong
M Quach
ML Wong
N Friedman
N Friedman
P Giudici
P Larranaga
P Larranaga
P Spirtes
PP Le
R Etxeberria
R Robinson
RG Cowell
SW Mahfoud
T Gärtner
T Kocka
T Verma
Vincent Frouin
W Buntine
WH Hsu
Xavier Gidrol
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Inferring gene regulatory networks from data requires the development of algorithms devoted to structure extraction. When only static data are available, gene interactions may be modelled by a Bayesian Network (BN) that represents the presence of direct interactions from regulators to regulees by conditional probability distributions. We used enhanced evolutionary algorithms to stochastically evolve a set of candidate BN structures and found the model that best fits data without prior knowledge. Results We proposed various evolutionary strategies suitable for the task and tested our choices using simulated data drawn from a given bio-realistic network of 35 nodes, the so-called insulin network, which has been used in the literature for benchmarking. We assessed the inferred models against this reference to obtain statistical performance results. We then compared performances of evolutionary algorithms using two kinds of recombination operators that operate at different scales in the graphs. We introduced a niching strategy that reinforces diversity through the population and avoided trapping of the algorithm in one local minimum in the early steps of learning. We show the limited effect of the mutation operator when niching is applied. Finally, we compared our best evolutionary approach with various well known learning algorithms (MCMC, K2, greedy search, TPDA, MMHC) devoted to BN structure learning. Conclusion We studied the behaviour of an evolutionary approach enhanced by niching for the learning of gene regulatory networks with BN. We show that this approach outperforms classical structure learning methods in elucidating the original model. These results were obtained for the learning of a bio-realistic network and, more importantly, on various small datasets. This is a suitable approach for learning transcriptional regulatory networks from real datasets without prior knowledge.</p

HAL Evry

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

HAL-CEA

HAL: Hyper Article en Ligne

Hal-Diderot

Detection of Unfaithfulness and Robust Causal Inference

Author: C. Glymour
C. Hitchcock
C. Hitckcock
D. M. Chickering
D. Mayo
D. Steel
E. Sober
G. Cooper
G. Hesslow
J. Pearl
J. Pearl
J. Woodward
J. Woodward
Jiji Zhang
K. D. Hoover
M. McDermott
N. Cartwright
N. Cartwright
P. Dawid
Peter Spirtes
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Fault Trees from Data: Efficient Learning with an Evolutionary Algorithm

Author: A Linard
AM Turing
CD Geiger
D Bucur
DJ Allen
DM Chickering
E Ruijters
F Leitner-Fischer
J Henry
J Li
K Deb
M Birch
M Bozzano
M Bozzano
M Kearns
M Nauta
P Dupont
S Kabir
S Li
T Chen
WS Lee
Y Li
Publication venue
Publication date: 01/01/2019
Field of study

Cyber-physical systems come with increasingly complex architectures and failure modes, which complicates the task of obtaining accurate system reliability models. At the same time, with the emergence of the (industrial) Internet-of-Things, systems are more and more often being monitored via advanced sensor systems. These sensors produce large amounts of data about the components' failure behaviour, and can, therefore, be fruitfully exploited to learn reliability models automatically. This paper presents an effective algorithm for learning a prominent class of reliability models, namely fault trees, from observational data. Our algorithm is evolutionary in nature; i.e., is an iterative, population-based, randomized search method among fault-tree structures that are increasingly more consistent with the observational data. We have evaluated our method on a large number of case studies, both on synthetic data, and industrial data. Our experiments show that our algorithm outperforms other methods and provides near-optimal results.Comment: This paper is an extended version of the SETTA 2019 paper, Springer-Verla

arXiv.org e-Print Archive

Crossref

Radboud Repository (Radboud Univ.)

University of Twente Research Information

Association analyses of the MAS-QTL data set using grammar, principal components and Bayesian network methodologies

Author: AL Price
AS Rodin
B Devlin
Burak Karacaören
Chris S Haley
D Chickering
D Heckerman
Dirk Jan de Koning
J Pearl
José M Álvarez-Castro
K Wang
LA Sarabia
P Sebastiani
SB Everitt
SD Pant
SY Aulchenko
Tomi Silander
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background It has been shown that if genetic relationships among individuals are not taken into account for genome wide association studies, this may lead to false positives. To address this problem, we used Genome-wide Rapid Association using Mixed Model and Regression and principal component stratification analyses. To account for linkage disequilibrium among the significant markers, principal components loadings obtained from top markers can be included as covariates. Estimation of Bayesian networks may also be useful to investigate linkage disequilibrium among SNPs and their relation with environmental variables. For the quantitative trait we first estimated residuals while taking polygenic effects into account. We then used a single SNP approach to detect the most significant SNPs based on the residuals and applied principal component regression to take linkage disequilibrium among these SNPs into account. For the categorical trait we used principal component stratification methodology to account for background effects. For correction of linkage disequilibrium we used principal component logit regression. Bayesian networks were estimated to investigate relationship among SNPs. Results Using the Genome-wide Rapid Association using Mixed Model and Regression and principal component stratification approach we detected around 100 significant SNPs for the quantitative trait (p<0.05 with 1000 permutations) and 109 significant (p<0.0006 with local FDR correction) SNPs for the categorical trait. With additional principal component regression we reduced the list to 16 and 50 SNPs for the quantitative and categorical trait, respectively. Conclusions GRAMMAR could efficiently incorporate the information regarding random genetic effects. Principal component stratification should be cautiously used with stringent multiple hypothesis testing correction to correct for ancestral stratification and association analyses for binary traits when there are systematic genetic effects such as half sib family structures. Bayesian networks are useful to investigate relationships among SNPs and environmental variables.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Edinburgh Research Explorer

An Experimental Comparison of Hybrid Algorithms for Bayesian Network Structure Learning

Author: A. Aussem
B. Ellis
C.F. Aliferis
D. Heckerman
D.M. Chickering
E. Perrier
G.E. Schwarz
I. Tsamardinos
I. Tsamardinos
J. Cheng
J. Pearl
J. Peña
J.M. Peña
J.M. Peña
K. Kojima
M. Koivisto
M. Scutari
S. Rodrigues de Morais
S.R. Morais de
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

International audienceWe present a novel hybrid algorithm for Bayesian network structure learning, called Hybrid HPC (H2PC). It first reconstructs the skeleton of a Bayesian network and then performs a Bayesian-scoring greedy hill-climbing search to orient the edges. It is based on a subroutine called HPC, that combines ideas from incremental and divide-and-conquer constraint-based methods to learn the parents and children of a target variable. We conduct an experimental comparison of H2PC against Max-Min Hill-Climbing (MMHC), which is currently the most powerful state-of-the-art algorithm for Bayesian network structure learning, on several benchmarks with various data sizes. Our extensive experiments show that H2PC outperforms MMHC both in terms of goodness of fit to new data and in terms of the quality of the network structure itself, which is closer to the true dependence structure of the data. The source code (in R) of H2PC as well as all data sets used for the empirical tests are publicly available

Crossref

HAL

Hal-Diderot

From Statistical Model Checking to Run-Time Monitoring Using a Bayesian Network Approach

Author: A David
A David
A David
A David
C Jegourel
D Koller
DM Chickering
G Schwarz
K Kalajdzic
KG Larsen
KG Larsen
L Aceto
M AlTurki
P Bulychev
R Alur
R Alur
S Russell
Y Feng
Z Jiang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Crossref

VBN