15,609 research outputs found
Bayesian variable selection and data integration for biological regulatory networks
A substantial focus of research in molecular biology are gene regulatory
networks: the set of transcription factors and target genes which control the
involvement of different biological processes in living cells. Previous
statistical approaches for identifying gene regulatory networks have used gene
expression data, ChIP binding data or promoter sequence data, but each of these
resources provides only partial information. We present a Bayesian hierarchical
model that integrates all three data types in a principled variable selection
framework. The gene expression data are modeled as a function of the unknown
gene regulatory network which has an informed prior distribution based upon
both ChIP binding and promoter sequence data. We also present a variable
weighting methodology for the principled balancing of multiple sources of prior
information. We apply our procedure to the discovery of gene regulatory
relationships in Saccharomyces cerevisiae (Yeast) for which we can use several
external sources of information to validate our results. Our inferred
relationships show greater biological relevance on the external validation
measures than previous data integration methods. Our model also estimates
synergistic and antagonistic interactions between transcription factors, many
of which are validated by previous studies. We also evaluate the results from
our procedure for the weighting for multiple sources of prior information.
Finally, we discuss our methodology in the context of previous approaches to
data integration and Bayesian variable selection.Comment: Published in at http://dx.doi.org/10.1214/07-AOAS130 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Discovering transcriptional modules by Bayesian data integration
Motivation: We present a method for directly inferring transcriptional modules (TMs) by integrating gene expression and transcription factor binding (ChIP-chip) data. Our model extends a hierarchical Dirichlet process mixture model to allow data fusion on a gene-by-gene basis. This encodes the intuition that co-expression and co-regulation are not necessarily equivalent and hence we do not expect all genes to group similarly in both datasets. In particular, it allows us to identify the subset of genes that share the same structure of transcriptional modules in both datasets.
Results: We find that by working on a gene-by-gene basis, our model is able to extract clusters with greater functional coherence than existing methods. By combining gene expression and transcription factor binding (ChIP-chip) data in this way, we are better able to determine the groups of genes that are most likely to represent underlying TMs
Bayesian correlated clustering to integrate multiple datasets
Motivation: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct – but often complementary – information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured via parameters that describe the agreement among the datasets.
Results: Using a set of 6 artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real S. cerevisiae datasets. In the 2-dataset case, we show that MDI’s performance is comparable to the present state of the art. We then move beyond the capabilities of current approaches and integrate gene expression, ChIP-chip and protein-protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques – as well as to non-integrative approaches – demonstrate that MDI is very competitive, while also providing information that would be difficult or impossible to extract using other methods
How to understand the cell by breaking it: network analysis of gene perturbation screens
Modern high-throughput gene perturbation screens are key technologies at the
forefront of genetic research. Combined with rich phenotypic descriptors they
enable researchers to observe detailed cellular reactions to experimental
perturbations on a genome-wide scale. This review surveys the current
state-of-the-art in analyzing perturbation screens from a network point of
view. We describe approaches to make the step from the parts list to the wiring
diagram by using phenotypes for network inference and integrating them with
complementary data sources. The first part of the review describes methods to
analyze one- or low-dimensional phenotypes like viability or reporter activity;
the second part concentrates on high-dimensional phenotypes showing global
changes in cell morphology, transcriptome or proteome.Comment: Review based on ISMB 2009 tutorial; after two rounds of revisio
Bioinformatics tools in predictive ecology: Applications to fisheries
This article is made available throught the Brunel Open Access Publishing Fund - Copygith @ 2012 Tucker et al.There has been a huge effort in the advancement of analytical techniques for molecular biological data over the past decade. This has led to many novel algorithms that are specialized to deal with data associated with biological phenomena, such as gene expression and protein interactions. In contrast, ecological data analysis has remained focused to some degree on off-the-shelf statistical techniques though this is starting to change with the adoption of state-of-the-art methods, where few assumptions can be made about the data and a more explorative approach is required, for example, through the use of Bayesian networks. In this paper, some novel bioinformatics tools for microarray data are discussed along with their ‘crossover potential’ with an application to fisheries data. In particular, a focus is made on the development of models that identify functionally equivalent species in different fish communities with the aim of predicting functional collapse
- …