Search CORE

15,609 research outputs found

Bayesian variable selection and data integration for biological regulatory networks

Author: Chen Guang
Jensen Shane T.
Stoeckert Jr, Christian J.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2006
Field of study

A substantial focus of research in molecular biology are gene regulatory networks: the set of transcription factors and target genes which control the involvement of different biological processes in living cells. Previous statistical approaches for identifying gene regulatory networks have used gene expression data, ChIP binding data or promoter sequence data, but each of these resources provides only partial information. We present a Bayesian hierarchical model that integrates all three data types in a principled variable selection framework. The gene expression data are modeled as a function of the unknown gene regulatory network which has an informed prior distribution based upon both ChIP binding and promoter sequence data. We also present a variable weighting methodology for the principled balancing of multiple sources of prior information. We apply our procedure to the discovery of gene regulatory relationships in Saccharomyces cerevisiae (Yeast) for which we can use several external sources of information to validate our results. Our inferred relationships show greater biological relevance on the external validation measures than previous data integration methods. Our model also estimates synergistic and antagonistic interactions between transcription factors, many of which are validated by previous studies. We also evaluate the results from our procedure for the weighting for multiple sources of prior information. Finally, we discuss our methodology in the context of previous approaches to data integration and Bayesian variable selection.Comment: Published in at http://dx.doi.org/10.1214/07-AOAS130 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

ScholarlyCommons@Penn

Discovering transcriptional modules by Bayesian data integration

Author: Antoniak
Bar-Joseph
Bernard J. de la Cruz
Bähler
Cho
Dahl
Datta
David L. Wild
Eisen
Falcon
Ferguson
Fritsch
Gasch
Gerber
Geweke
Harbison
Ideker
Ihmels
Jim E. Griffin
Kundaje
Lee
Liu
Liu
Medvedovic
Medvedovic
Qin
Rasmussen
Rasmussen
Reid
Richard S. Savage
Savage
Segal
Segal
Teh
Teh
Wild
Yao
Yeung
Zoubin Ghahramani
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2010
Field of study

Motivation: We present a method for directly inferring transcriptional modules (TMs) by integrating gene expression and transcription factor binding (ChIP-chip) data. Our model extends a hierarchical Dirichlet process mixture model to allow data fusion on a gene-by-gene basis. This encodes the intuition that co-expression and co-regulation are not necessarily equivalent and hence we do not expect all genes to group similarly in both datasets. In particular, it allows us to identify the subset of genes that share the same structure of transcriptional modules in both datasets. Results: We find that by working on a gene-by-gene basis, our model is able to extract clusters with greater functional coherence than existing methods. By combining gene expression and transcription factor binding (ChIP-chip) data in this way, we are better able to determine the groups of genes that are most likely to represent underlying TMs

Crossref

PubMed Central

Warwick Research Archives Portal Repository

Kent Academic Repository

CUED - Cambridge University Engineering Department

Bayesian correlated clustering to integrate multiple datasets

Author: Balasubramanian
Barash
Brock
Carlson
Cheng
Cherry
Cho
Cooke
Datta
David L. Wild
Dempster
Friedman
Fritsch
Granovskaia
Green
Harbison
Hubert
Huttenhower
Ideker
Ishwaran
Jackson
Jackson
Jansen
Jim E. Griffin
Kirk
Lee
Liu
Liu
Lockhart
Mistry
Myers
Myers
Neal
Neal
Nieto-Barajas
Paul Kirk
Puig
Rand
Rasmussen
Rasmussen
Reiss
Rhodes
Richard S. Savage
Rigaut
Rogers
Rogers
Rousseau
Santisteban
Savage
Schena
Shen
Solomon
Stark
Suchard
Troyanskaya
Wei
Wong
Yeung
Yuan
Zoubin Ghahramani
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2012
Field of study

Motivation: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct – but often complementary – information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured via parameters that describe the agreement among the datasets. Results: Using a set of 6 artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real S. cerevisiae datasets. In the 2-dataset case, we show that MDI’s performance is comparable to the present state of the art. We then move beyond the capabilities of current approaches and integrate gene expression, ChIP-chip and protein-protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques – as well as to non-integrative approaches – demonstrate that MDI is very competitive, while also providing information that would be difficult or impossible to extract using other methods

CiteSeerX

Crossref

PubMed Central

Warwick Research Archives Portal Repository

Kent Academic Repository

How to understand the cell by breaking it: network analysis of gene perturbation screens

Author: Markowetz Florian
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 26/11/2009
Field of study

Modern high-throughput gene perturbation screens are key technologies at the forefront of genetic research. Combined with rich phenotypic descriptors they enable researchers to observe detailed cellular reactions to experimental perturbations on a genome-wide scale. This review surveys the current state-of-the-art in analyzing perturbation screens from a network point of view. We describe approaches to make the step from the parts list to the wiring diagram by using phenotypes for network inference and integrating them with complementary data sources. The first part of the review describes methods to analyze one- or low-dimensional phenotypes like viability or reporter activity; the second part concentrates on high-dimensional phenotypes showing global changes in cell morphology, transcriptome or proteome.Comment: Review based on ISMB 2009 tutorial; after two rounds of revisio

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

Bioinformatics tools in predictive ecology: Applications to fisheries

Author: Allan Tucker
Anvar Y.
Bishop C. M.
Bundy A.
Choi J. S.
Daniel Duplisea
Ghahramani Z.
Hand D. J.
Hartemink A. J.
Imoto S.
Langley P.
Liang S.
Pe'er D.
Pe'er D.
Pearl J.
Spirtes P.
Steele E.
Publication venue: 'The Royal Society'
Publication date: 19/01/2012
Field of study

This article is made available throught the Brunel Open Access Publishing Fund - Copygith @ 2012 Tucker et al.There has been a huge effort in the advancement of analytical techniques for molecular biological data over the past decade. This has led to many novel algorithms that are specialized to deal with data associated with biological phenomena, such as gene expression and protein interactions. In contrast, ecological data analysis has remained focused to some degree on off-the-shelf statistical techniques though this is starting to change with the adoption of state-of-the-art methods, where few assumptions can be made about the data and a more explorative approach is required, for example, through the use of Bayesian networks. In this paper, some novel bioinformatics tools for microarray data are discussed along with their ‘crossover potential’ with an application to fisheries data. In particular, a focus is made on the development of models that identify functionally equivalent species in different fish communities with the aim of predicting functional collapse

Crossref

PubMed Central

Brunel University Research Archive