Search CORE

966 research outputs found

ModHMM: A Modular Supra-Bayesian Genome Segmentation Method

Author: Benner P.
Vingron M.
Publication venue: 'Mary Ann Liebert Inc'
Publication date: 01/04/2020
Field of study

Genome segmentation methods are powerful tools to obtain cell type or tissue-specific genome-wide annotations and are frequently used to discover regulatory elements. However, traditional segmentation methods show low predictive accuracy and their data-driven annotations have some undesirable properties. As an alternative, we developed ModHMM, a highly modular genome segmentation method. Inspired by the supra-Bayesian approach, it incorporates predictions from a set of classifiers. This allows to compute genome segmentations by utilizing state-of-the-art methodology. We demonstrate the method on ENCODE data and show that it outperforms traditional segmentation methods not only in terms of predictive performance, but also in qualitative aspects. Therefore, ModHMM is a valuable alternative to study the epigenetic and regulatory landscape across and within cell types or tissues

MPG.PuRe

Recommended from our members

EpiAlign: an alignment-based bioinformatic tool for comparing chromatin state sequences.

Author: Ge Xinzhou
Kwon Soo Bin
Li Jingyi Jessica
Li Wei Vivian
Xie Lingjue
Zhang Haowen
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

The availability of genome-wide epigenomic datasets enables in-depth studies of epigenetic modifications and their relationships with chromatin structures and gene expression. Various alignment tools have been developed to align nucleotide or protein sequences in order to identify structurally similar regions. However, there are currently no alignment methods specifically designed for comparing multi-track epigenomic signals and detecting common patterns that may explain functional or evolutionary similarities. We propose a new local alignment algorithm, EpiAlign, designed to compare chromatin state sequences learned from multi-track epigenomic signals and to identify locally aligned chromatin regions. EpiAlign is a dynamic programming algorithm that novelly incorporates varying lengths and frequencies of chromatin states. We demonstrate the efficacy of EpiAlign through extensive simulations and studies on the real data from the NIH Roadmap Epigenomics project. EpiAlign is able to extract recurrent chromatin state patterns along a single epigenome, and many of these patterns carry cell-type-specific characteristics. EpiAlign can also detect common chromatin state patterns across multiple epigenomes, and it will serve as a useful tool to group and distinguish epigenomic samples based on genome-wide or local chromatin state patterns

eScholarship - University of California

Machine Learning and Genome Annotation: A Match Meant to Be?

Author: Cheng Chao
Gerstein Mark
Yip Kevin Y
Publication venue: Dartmouth Digital Commons
Publication date: 29/05/2013
Field of study

By its very nature, genomics produces large, high-dimensional datasets that are well suited to analysis by machine learning approaches. Here, we explain some key aspects of machine learning that make it useful for genome annotation, with illustrative examples from ENCODE

PubMed Central

Dartmouth Digital Commons (Dartmouth College)

Comprehensive epigenetic landscape of rheumatoid arthritis fibroblast-like synoviocytes.

Author: Ai Rizi
Ainsworth Richard I
Bachman Kurtis E
Bai Yuchen
Boyle David L
Ding Bo
Firestein Gary S
Gulko Percio S
Hammaker Deepa
Krishna Vinod
Laragione Teresina
Maeshima Keisuke
Nagpal Sunil
Palescandolo Emanuele
Pocalyko David
Wang Mengchi
Wang Wei
Whitaker John W
Wildberg Andre
Publication venue: eScholarship, University of California
Publication date: 01/05/2018
Field of study

Epigenetics contributes to the pathogenesis of immune-mediated diseases like rheumatoid arthritis (RA). Here we show the first comprehensive epigenomic characterization of RA fibroblast-like synoviocytes (FLS), including histone modifications (H3K27ac, H3K4me1, H3K4me3, H3K36me3, H3K27me3, and H3K9me3), open chromatin, RNA expression and whole-genome DNA methylation. To address complex multidimensional relationship and reveal epigenetic regulation of RA, we perform integrative analyses using a novel unbiased method to identify genomic regions with similar profiles. Epigenomically similar regions exist in RA cells and are associated with active enhancers and promoters and specific transcription factor binding motifs. Differentially marked genes are enriched for immunological and unexpected pathways, with "Huntington's Disease Signaling" identified as particularly prominent. We validate the relevance of this pathway to RA by showing that Huntingtin-interacting protein-1 regulates FLS invasion into matrix. This work establishes a high-resolution epigenomic landscape of RA and demonstrates the potential for integrative analyses to identify unanticipated therapeutic targets

Directory of Open Access Journals

eScholarship - University of California

Identifying genome-wide transcription units from histone modifications using EPIGENE

Author: Sahu Anshupa
Publication venue: Philipps-Universität Marburg
Publication date: 01/01/2021
Field of study

With the successful completion of the human genome project and the rapid development of sequencing technologies, transcriptome annotation across multiple human cell types and tissues is now available. Accurate transcriptome annotation is critical for understanding the functional as well as the regulatory roles of genomic regions. Current methods for identifying genome-wide active transcription units (TUs) use RNA sequencing (RNA-seq). However, this approach requires large quantities of mRNAs making the identification of highly unstable regulatory RNAs (like microRNA precursors) difficult. As a result of this complexity in identifying inherently unstable TUs, the transcriptome landscape across all cells and tissues remains incomplete. This problem can be alleviated by chromatin-based approaches due to a well-established correlation between transcription and histone modification. Here, I present EPIGENE, a novel chromatin segmentation method for identifying genome-wide active TUs using transcription-associated histone modifications. Unlike existing chromatin segmentation approaches, EPIGENE uses a constrained, semi-supervised multivariate Hidden Markov Model (HMM) that models the observed combination of histone modifications using a product of independent Bernoulli random variables to identify the chromatin state sequence underlying an active TU. Using EPIGENE, I successfully predicted genome-wide TUs across multiple human cell lines. EPIGENE predicted TUs were enriched for RNA Polymerase II (Pol II) at the transcription start site (TSS) and in gene body indicating that they are indeed transcribed. Comprehensive validation using existing annotations revealed that 93% of EPIGENE TUs can be explained by existing gene annotations and 5% of EPIGENE TUs in HepG2 can be explained by microRNA annotations. EPIGENE predicted TUs more precisely compared to existing chromatin segmentation and RNA-seq based approaches across multiple human cell lines. Using EPIGENE, I also identified 232 novel TUs in K562 and 43 novel cell-specific TUs in K562, HepG2, and IMR90, all of which were supported by Pol II ChIP-seq and nascent RNA-seq evidence

Publikations- und Dokumentenserver der Universitätsbibliothek Marburg

Unsupervised Classification for Tiling Arrays: ChIP-chip and Transcriptome

Author: Aubourg Sébastien
Brunaud Véronique
Bérard Caroline
Martin-Magniette Marie-Laure
Robin Stéphane
Publication venue
Publication date: 01/01/2011
Field of study

Tiling arrays make possible a large scale exploration of the genome thanks to probes which cover the whole genome with very high density until 2 000 000 probes. Biological questions usually addressed are either the expression difference between two conditions or the detection of transcribed regions. In this work we propose to consider simultaneously both questions as an unsupervised classification problem by modeling the joint distribution of the two conditions. In contrast to previous methods, we account for all available information on the probes as well as biological knowledge like annotation and spatial dependence between probes. Since probes are not biologically relevant units we propose a classification rule for non-connected regions covered by several probes. Applications to transcriptomic and ChIP-chip data of Arabidopsis thaliana obtained with a NimbleGen tiling array highlight the importance of a precise modeling and the region classification

arXiv.org e-Print Archive

HAL Evry

HAL Descartes

The Ensembl Regulatory Build

Author: A Barski
A Visel
AL Dixon
AR Quinlan
B Ren
BE Stranger
CM Koch
CY McLean
Daniel R Zerbino
DR Zerbino
DS Johnson
E Lieberman-Aiden
FANTOM The
GA Maston
H Li
H Xu
I Keshet
J Dostie
J Ernst
J Ernst
J Severin
JD Buenrostro
M Esteller
M Kellis
M Levine
M Weber
MJ Fullwood
ML Freedman
MM Hoffman
MM Hoffman
Nathan Johnson
P Flicek
P Fraser
Paul R Flicek
PG Giresi
R Andersson
R Jaenisch
R Lister
R Margueron
RE Thurman
RJ Klose
RM Kuhn
SIS Grewal
Steven P Wilder
T Jenuwein
The ENCODE project consortium
Thomas Juettemann
TS Mikkelsen
V Curwen
Y Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref