966 research outputs found

    ModHMM: A Modular Supra-Bayesian Genome Segmentation Method

    Get PDF
    Genome segmentation methods are powerful tools to obtain cell type or tissue-specific genome-wide annotations and are frequently used to discover regulatory elements. However, traditional segmentation methods show low predictive accuracy and their data-driven annotations have some undesirable properties. As an alternative, we developed ModHMM, a highly modular genome segmentation method. Inspired by the supra-Bayesian approach, it incorporates predictions from a set of classifiers. This allows to compute genome segmentations by utilizing state-of-the-art methodology. We demonstrate the method on ENCODE data and show that it outperforms traditional segmentation methods not only in terms of predictive performance, but also in qualitative aspects. Therefore, ModHMM is a valuable alternative to study the epigenetic and regulatory landscape across and within cell types or tissues

    Machine Learning and Genome Annotation: A Match Meant to Be?

    Get PDF
    By its very nature, genomics produces large, high-dimensional datasets that are well suited to analysis by machine learning approaches. Here, we explain some key aspects of machine learning that make it useful for genome annotation, with illustrative examples from ENCODE

    Comprehensive epigenetic landscape of rheumatoid arthritis fibroblast-like synoviocytes.

    Get PDF
    Epigenetics contributes to the pathogenesis of immune-mediated diseases like rheumatoid arthritis (RA). Here we show the first comprehensive epigenomic characterization of RA fibroblast-like synoviocytes (FLS), including histone modifications (H3K27ac, H3K4me1, H3K4me3, H3K36me3, H3K27me3, and H3K9me3), open chromatin, RNA expression and whole-genome DNA methylation. To address complex multidimensional relationship and reveal epigenetic regulation of RA, we perform integrative analyses using a novel unbiased method to identify genomic regions with similar profiles. Epigenomically similar regions exist in RA cells and are associated with active enhancers and promoters and specific transcription factor binding motifs. Differentially marked genes are enriched for immunological and unexpected pathways, with "Huntington's Disease Signaling" identified as particularly prominent. We validate the relevance of this pathway to RA by showing that Huntingtin-interacting protein-1 regulates FLS invasion into matrix. This work establishes a high-resolution epigenomic landscape of RA and demonstrates the potential for integrative analyses to identify unanticipated therapeutic targets

    Identifying genome-wide transcription units from histone modifications using EPIGENE

    Get PDF
    With the successful completion of the human genome project and the rapid development of sequencing technologies, transcriptome annotation across multiple human cell types and tissues is now available. Accurate transcriptome annotation is critical for understanding the functional as well as the regulatory roles of genomic regions. Current methods for identifying genome-wide active transcription units (TUs) use RNA sequencing (RNA-seq). However, this approach requires large quantities of mRNAs making the identification of highly unstable regulatory RNAs (like microRNA precursors) difficult. As a result of this complexity in identifying inherently unstable TUs, the transcriptome landscape across all cells and tissues remains incomplete. This problem can be alleviated by chromatin-based approaches due to a well-established correlation between transcription and histone modification. Here, I present EPIGENE, a novel chromatin segmentation method for identifying genome-wide active TUs using transcription-associated histone modifications. Unlike existing chromatin segmentation approaches, EPIGENE uses a constrained, semi-supervised multivariate Hidden Markov Model (HMM) that models the observed combination of histone modifications using a product of independent Bernoulli random variables to identify the chromatin state sequence underlying an active TU. Using EPIGENE, I successfully predicted genome-wide TUs across multiple human cell lines. EPIGENE predicted TUs were enriched for RNA Polymerase II (Pol II) at the transcription start site (TSS) and in gene body indicating that they are indeed transcribed. Comprehensive validation using existing annotations revealed that 93% of EPIGENE TUs can be explained by existing gene annotations and 5% of EPIGENE TUs in HepG2 can be explained by microRNA annotations. EPIGENE predicted TUs more precisely compared to existing chromatin segmentation and RNA-seq based approaches across multiple human cell lines. Using EPIGENE, I also identified 232 novel TUs in K562 and 43 novel cell-specific TUs in K562, HepG2, and IMR90, all of which were supported by Pol II ChIP-seq and nascent RNA-seq evidence

    Unsupervised Classification for Tiling Arrays: ChIP-chip and Transcriptome

    Full text link
    Tiling arrays make possible a large scale exploration of the genome thanks to probes which cover the whole genome with very high density until 2 000 000 probes. Biological questions usually addressed are either the expression difference between two conditions or the detection of transcribed regions. In this work we propose to consider simultaneously both questions as an unsupervised classification problem by modeling the joint distribution of the two conditions. In contrast to previous methods, we account for all available information on the probes as well as biological knowledge like annotation and spatial dependence between probes. Since probes are not biologically relevant units we propose a classification rule for non-connected regions covered by several probes. Applications to transcriptomic and ChIP-chip data of Arabidopsis thaliana obtained with a NimbleGen tiling array highlight the importance of a precise modeling and the region classification
    • …
    corecore