290 research outputs found

    Bayesian variable selection and data integration for biological regulatory networks

    Get PDF
    A substantial focus of research in molecular biology are gene regulatory networks: the set of transcription factors and target genes which control the involvement of different biological processes in living cells. Previous statistical approaches for identifying gene regulatory networks have used gene expression data, ChIP binding data or promoter sequence data, but each of these resources provides only partial information. We present a Bayesian hierarchical model that integrates all three data types in a principled variable selection framework. The gene expression data are modeled as a function of the unknown gene regulatory network which has an informed prior distribution based upon both ChIP binding and promoter sequence data. We also present a variable weighting methodology for the principled balancing of multiple sources of prior information. We apply our procedure to the discovery of gene regulatory relationships in Saccharomyces cerevisiae (Yeast) for which we can use several external sources of information to validate our results. Our inferred relationships show greater biological relevance on the external validation measures than previous data integration methods. Our model also estimates synergistic and antagonistic interactions between transcription factors, many of which are validated by previous studies. We also evaluate the results from our procedure for the weighting for multiple sources of prior information. Finally, we discuss our methodology in the context of previous approaches to data integration and Bayesian variable selection.Comment: Published in at http://dx.doi.org/10.1214/07-AOAS130 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Annotation-based meta-analysis of microarray experiments

    Get PDF
    We are developing software applications to perform meta-analysis of microarray experiments based on standardized experiment annotations aiming to identify similar experiments and cluster experiments. The applications were tested on files obtained from the ArrayExpress public repository. Annotation terms were used to compute experiment dissimilarities to find experiments related to a query experiment. These applications may motivate efforts of bench biologists to better annotate experiments

    Clustering of genes into regulons using integrated modeling-COGRIM

    Get PDF
    We present a Bayesian hierarchical model and Gibbs Sampling implementation that integrates gene expression, ChIP binding, and transcription factor motif data in a principled and robust fashion. COGRIM was applied to both unicellular and mammalian organisms under different scenarios of available data. In these applications, we demonstrate the ability to predict gene-transcription factor interactions with reduced numbers of false-positive findings and to make predictions beyond what is obtained when single types of data are considered

    Standardization Initiatives in the (eco)toxicogenomics Domain: A Review

    Get PDF
    The purpose of this document is to provide readers with a resource of different ongoing standardization efforts within the ‘omics’ (genomic, proteomics, metabolomics) and related communities, with particular focus on toxicological and environmental applications. The review includes initiatives within the research community as well as in the regulatory arena. It addresses data management issues (format and reporting structures for the exchange of information) and database interoperability, highlighting key objectives, target audience and participants. A considerable amount of work still needs to be done and, ideally, collaboration should be optimized and duplication and incompatibility should be avoided where possible. The consequence of failing to deliver data standards is an escalation in the burden and cost of data management tasks

    Annotation-based meta-analysis of microarray experiments

    Get PDF

    Stat and interferon genes identified by network analysis differentially regulate primitive and definitive erythropoiesis

    Get PDF
    BACKGROUND: Hematopoietic ontogeny is characterized by overlapping waves of primitive, fetal definitive, and adult definitive erythroid lineages. Our aim is to identify differences in the transcriptional control of these distinct erythroid cell maturation pathways by inferring and analyzing gene-interaction networks from lineage-specific expression datasets. Inferred networks are strongly connected and do not fit a scale-free model, making it difficult to identify essential regulators using the hub-essentiality standard. RESULTS: We employed a semi-supervised machine learning approach to integrate measures of network topology with expression data to score gene essentiality. The algorithm was trained and tested on the adult and fetal definitive erythroid lineages. When applied to the primitive erythroid lineage, 144 high scoring transcription factors were found to be differentially expressed between the primitive and adult definitive erythroid lineages, including all expressed STAT-family members. Differential responses of primitive and definitive erythroblasts to a Stat3 inhibitor and IFNγ in vitro supported the results of the computational analysis. Further investigation of the original expression data revealed a striking signature of Stat1-related genes in the adult definitive erythroid network. Among the potential pathways known to utilize Stat1, interferon (IFN) signaling-related genes were expressed almost exclusively within the adult definitive erythroid network. CONCLUSIONS: In vitro results support the computational prediction that differential regulation and downstream effectors of STAT signaling are key factors that distinguish the transcriptional control of primitive and definitive erythroid cell maturation

    AnnotCompute: annotation-based exploration and meta-analysis of genomics experiments

    Get PDF
    The ever-increasing scale of biological data sets, particularly those arising in the context of high-throughput technologies, requires the development of rich data exploration tools. In this article, we present AnnotCompute, an information discovery platform for repositories of functional genomics experiments such as ArrayExpress. Our system leverages semantic annotations of functional genomics experiments with controlled vocabulary and ontology terms, such as those from the MGED Ontology, to compute conceptual dissimilarities between pairs of experiments. These dissimilarities are then used to support two types of exploratory analysis—clustering and query-by-example. We show that our proposed dissimilarity measures correspond to a user's intuition about conceptual dissimilarity, and can be used to support effective query-by-example. We also evaluate the quality of clustering based on these measures. While AnnotCompute can support a richer data exploration experience, its effectiveness is limited in some cases, due to the quality of available annotations. Nonetheless, tools such as AnnotCompute may provide an incentive for richer annotations of experiments. Code is available for download at http://www.cbil.upenn.edu/downloads/AnnotCompute

    K2/Kleisli and GUS: Experiments in Integrated Access to Genomic Data Sources

    Get PDF
    The integration of heterogeneous data sources and software systems is a major issue in the biomed ical community and several approaches have been explored: linking databases, on-the- fly integration through views, and integration through warehousing. In this paper we report on our experiences with two systems that were developed at the University of Pennsylvania: an integration system called K2, which has primarily been used to provide views over multiple external data sources and software systems; and a data warehouse called GUS which downloads, cleans, integrates and annotates data from multiple external data sources. Although the view and warehouse approaches each have their advantages, there is no clear winner . Therefore, users must consider how the data is to be used, what the performance guarantees must be, and how much programmer time and expertise is available to choose the best strategy for a particular application
    corecore