245 research outputs found
Bayesian variable selection and data integration for biological regulatory networks
A substantial focus of research in molecular biology are gene regulatory
networks: the set of transcription factors and target genes which control the
involvement of different biological processes in living cells. Previous
statistical approaches for identifying gene regulatory networks have used gene
expression data, ChIP binding data or promoter sequence data, but each of these
resources provides only partial information. We present a Bayesian hierarchical
model that integrates all three data types in a principled variable selection
framework. The gene expression data are modeled as a function of the unknown
gene regulatory network which has an informed prior distribution based upon
both ChIP binding and promoter sequence data. We also present a variable
weighting methodology for the principled balancing of multiple sources of prior
information. We apply our procedure to the discovery of gene regulatory
relationships in Saccharomyces cerevisiae (Yeast) for which we can use several
external sources of information to validate our results. Our inferred
relationships show greater biological relevance on the external validation
measures than previous data integration methods. Our model also estimates
synergistic and antagonistic interactions between transcription factors, many
of which are validated by previous studies. We also evaluate the results from
our procedure for the weighting for multiple sources of prior information.
Finally, we discuss our methodology in the context of previous approaches to
data integration and Bayesian variable selection.Comment: Published in at http://dx.doi.org/10.1214/07-AOAS130 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Annotation-based meta-analysis of microarray experiments
We are developing software applications to perform meta-analysis of microarray experiments based on standardized experiment annotations aiming to identify similar experiments and cluster experiments. The applications were tested on files obtained from the ArrayExpress public repository. Annotation terms were used to compute experiment dissimilarities to find experiments related to a query experiment. These applications may motivate efforts of bench biologists to better annotate experiments
Clustering of genes into regulons using integrated modeling-COGRIM
We present a Bayesian hierarchical model and Gibbs Sampling implementation that integrates gene expression, ChIP binding, and transcription factor motif data in a principled and robust fashion. COGRIM was applied to both unicellular and mammalian organisms under different scenarios of available data. In these applications, we demonstrate the ability to predict gene-transcription factor interactions with reduced numbers of false-positive findings and to make predictions beyond what is obtained when single types of data are considered
AnnotCompute: annotation-based exploration and meta-analysis of genomics experiments
The ever-increasing scale of biological data sets, particularly those arising in the context of high-throughput technologies, requires the development of rich data exploration tools. In this article, we present AnnotCompute, an information discovery platform for repositories of functional genomics experiments such as ArrayExpress. Our system leverages semantic annotations of functional genomics experiments with controlled vocabulary and ontology terms, such as those from the MGED Ontology, to compute conceptual dissimilarities between pairs of experiments. These dissimilarities are then used to support two types of exploratory analysis—clustering and query-by-example. We show that our proposed dissimilarity measures correspond to a user's intuition about conceptual dissimilarity, and can be used to support effective query-by-example. We also evaluate the quality of clustering based on these measures. While AnnotCompute can support a richer data exploration experience, its effectiveness is limited in some cases, due to the quality of available annotations. Nonetheless, tools such as AnnotCompute may provide an incentive for richer annotations of experiments. Code is available for download at http://www.cbil.upenn.edu/downloads/AnnotCompute
K2/Kleisli and GUS: Experiments in Integrated Access to Genomic Data Sources
The integration of heterogeneous data sources and software systems is a major issue in the biomed ical community and several approaches have been explored: linking databases, on-the- fly integration through views, and integration through warehousing. In this paper we report on our experiences with two systems that were developed at the University of Pennsylvania: an integration system called K2, which has primarily been used to provide views over multiple external data sources and software systems; and a data warehouse called GUS which downloads, cleans, integrates and annotates data from multiple external data sources. Although the view and warehouse approaches each have their advantages, there is no clear winner . Therefore, users must consider how the data is to be used, what the performance guarantees must be, and how much programmer time and expertise is available to choose the best strategy for a particular application
- …