Search CORE

Open Repository and Bibliography - Luxembourg

Comparison between Suitable Priors for Additive Bayesian Networks

Author: A Djebbari
A Gelman
AFY Poon
AP Hodges
C Zorn
D Firth
D Heckerman
DV Lindley
E Gutiérrez-Peña
EJG Pitman
FI Lewis
FI Lewis
FI Lewis
M Chen
M Koivisto
M Pittavino
MJ Sanchez-Vazquez
MP Ward
N Dojer
P Diaconis
R Jansen
RW Robinson
S Hartnack
Publication venue
Publication date: 18/09/2018
Field of study

Additive Bayesian networks are types of graphical models that extend the usual Bayesian generalized linear model to multiple dependent variables through the factorisation of the joint probability distribution of the underlying variables. When fitting an ABN model, the choice of the prior of the parameters is of crucial importance. If an inadequate prior - like a too weakly informative one - is used, data separation and data sparsity lead to issues in the model selection process. In this work a simulation study between two weakly and a strongly informative priors is presented. As weakly informative prior we use a zero mean Gaussian prior with a large variance, currently implemented in the R-package abn. The second prior belongs to the Student's t-distribution, specifically designed for logistic regressions and, finally, the strongly informative prior is again Gaussian with mean equal to true parameter value and a small variance. We compare the impact of these priors on the accuracy of the learned additive Bayesian network in function of different parameters. We create a simulation study to illustrate Lindley's paradox based on the prior choice. We then conclude by highlighting the good performance of the informative Student's t-prior and the limited impact of the Lindley's paradox. Finally, suggestions for further developments are provided.Comment: 8 pages, 4 figure

arXiv.org e-Print Archive

ZORA

BNFinder: exact and efficient method for learning Bayesian networks

Author: B. Wilczynski
Beer
Dojer
Husmeier
N. Dojer
Needham
Smith
Publication venue: Oxford University Press
Publication date
Field of study

Motivation: Bayesian methods are widely used in many different areas of research. Recently, it has become a very popular tool for biological network reconstruction, due to its ability to handle noisy data. Even though there are many software packages allowing for Bayesian network reconstruction, only few of them are freely available to researchers. Moreover, they usually require at least basic programming abilities, which restricts their potential user base. Our goal was to provide software which would be freely available, efficient and usable to non-programmers

Public Library of Science (PLOS)

Listen to genes : dealing with microarray data in the frequency domain

Author: A Claridge-Chang
AN Stepanova
AN Stepanova
B-R Kim
Diego Di Bernardo
Dongyun Yi
H Guo
H Ueda
HG McWatters
IP Androulakis
J Fan
J Fan
J Qian
JCW Locke
JH Wu
Jianfeng Feng
MJ Yanovsky
MR Doyle
N Dojer
P DHaeseleer
PO Lim
PT Spellman
R Balasubramaniyan
R Cristi
Ritesh Krishna
S Kim
S Wichert
Shuixia Guo
SL Harmer
SX Guo
U Alon
Vicky Buchanan-Wollaston
W Pan
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 06/04/2009
Field of study

Background: We present a novel and systematic approach to analyze temporal microarray data. The approach includes normalization, clustering and network analysis of genes. Methodology: Genes are normalized using an error model based uniform normalization method aimed at identifying and estimating the sources of variations. The model minimizes the correlation among error terms across replicates. The normalized gene expressions are then clustered in terms of their power spectrum density. The method of complex Granger causality is introduced to reveal interactions between sets of genes. Complex Granger causality along with partial Granger causality is applied in both time and frequency domains to selected as well as all the genes to reveal the interesting networks of interactions. The approach is successfully applied to Arabidopsis leaf microarray data generated from 31,000 genes observed over 22 time points over 22 days. Three circuits: a circadian gene circuit, an ethylene circuit and a new global circuit showing a hierarchical structure to determine the initiators of leaf senescence are analyzed in detail. Conclusions: We use a totally data-driven approach to form biological hypothesis. Clustering using the power-spectrum analysis helps us identify genes of potential interest. Their dynamics can be captured accurately in the time and frequency domain using the methods of complex and partial Granger causality. With the rise in availability of temporal microarray data, such methods can be useful tools in uncovering the hidden biological interactions. We show our method in a step by step manner with help of toy models as well as a real biological dataset. We also analyse three distinct gene circuits of potential interest to Arabidopsis researchers

Warwick Research Archives Portal Repository

MiniTUBA: a Web-Based Dynamic Bayesian Network Analysis System

Author: Yongqun He
Zuoshuang Xiang
Publication venue: 'IntechOpen'
Publication date: 18/08/2010
Field of study

IntechOpen

Model selection in the reconstruction of regulatory networks from time-series data

Author: Barillot Emmanuel
Novikov Eugene
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Covenant University Repository

In Silico Gene Regulatory Network of the Maurer’s Cleft Pathway in Plasmodium falciparum

Author: Adebiyi Ezekiel
Brors B.
Isewon Itunuoluwa
Oyelade O. J.
Publication venue
Publication date: 01/01/2015
Field of study

The Maurer’s clefts (MCs) are very important for the survival of Plasmodium falciparum within an infected cell as they are induced by the parasite itself in the erythrocyte for protein trafficking. The MCs form an interesting part of the parasite’s biology as they shed more light on how the parasite remodels the erythrocyte leading to host pathogenesis and death. Here, we predicted and analyzed the genetic regulatory network of genes identified to belong to the MCs using regularized graphical Gaussian model. Our network shows four major activators, their corresponding target genes, and predicted binding sites. One of these master activators is the serine repeat antigen 5 (SERA5), predominantly expressed among the SERA multigene family of P. falciparum, which is one of the blood-stage malaria vaccine candidates. Our results provide more details about functional interactions and the regulation of the genes in the MCs’ pathway of P. falciparum

Almae Matris Studiorum Campus

In Silico Gene Regulatory Network of the Maurer’s Cleft Pathway in Plasmodium falciparum

Author: Isewon Itunuoluwa
Oyelade O. J.
Brors B.
Adebiyi Ezekiel
Publication venue
Publication date: 01/01/2015
Field of study

Covenant University Repository

Partial mixture model for tight clustering of gene expression time-course

Author: Li Chang-Tsun
Wilson Roland
Yuan Yinyin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Background: Tight clustering arose recently from a desire to obtain tighter and potentially more informative clusters in gene expression studies. Scattered genes with relatively loose correlations should be excluded from the clusters. However, in the literature there is little work dedicated to this area of research. On the other hand, there has been extensive use of maximum likelihood techniques for model parameter estimation. By contrast, the minimum distance estimator has been largely ignored. Results: In this paper we show the inherent robustness of the minimum distance estimator that makes it a powerful tool for parameter estimation in model-based time-course clustering. To apply minimum distance estimation, a partial mixture model that can naturally incorporate replicate information and allow scattered genes is formulated. We provide experimental results of simulated data fitting, where the minimum distance estimator demonstrates superior performance to the maximum likelihood estimator. Both biological and statistical validations are conducted on a simulated dataset and two real gene expression datasets. Our proposed partial regression clustering algorithm scores top in Gene Ontology driven evaluation, in comparison with four other popular clustering algorithms. Conclusion: For the first time partial mixture model is successfully extended to time-course data analysis. The robustness of our partial regression clustering algorithm proves the suitability of the ombination of both partial mixture model and minimum distance estimator in this field. We show that tight clustering not only is capable to generate more profound understanding of the dataset under study well in accordance to established biological knowledge, but also presents interesting new hypotheses during interpretation of clustering results. In particular, we provide biological evidences that scattered genes can be relevant and are interesting subjects for study, in contrast to prevailing opinion

Deakin Research Online

Springer - Publisher Connector

Warwick Research Archives Portal Repository

Boolean networks using the chi-square test for inferring large-scale gene regulatory networks

Author: Kim Haseong
Lee Jae K
Park Taesung
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

BACKGROUND: Boolean network (BN) modeling is a commonly used method for constructing gene regulatory networks from time series microarray data. However, its major drawback is that its computation time is very high or often impractical to construct large-scale gene networks. We propose a variable selection method that are not only reduces BN computation times significantly but also obtains optimal network constructions by using chi-square statistics for testing the independence in contingency tables. RESULTS: Both the computation time and accuracy of the network structures estimated by the proposed method are compared with those of the original BN methods on simulated and real yeast cell cycle microarray gene expression data sets. Our results reveal that the proposed chi-square testing (CST)-based BN method significantly improves the computation time, while its ability to identify all the true network mechanisms was effectively the same as that of full-search BN methods. The proposed BN algorithm is approximately 70.8 and 7.6 times faster than the original BN algorithm when the error sizes of the Best-Fit Extension problem are 0 and 1, respectively. Further, the false positive error rate of the proposed CST-based BN algorithm tends to be less than that of the original BN. CONCLUSION: The CST-based BN method dramatically improves the computation time of the original BN algorithm. Therefore, it can efficiently infer large-scale gene regulatory network mechanisms

Springer - Publisher Connector