Search CORE

558 research outputs found

Genome-Wide Localization of Protein-DNA Binding and Histone Modification by a Bayesian Change-Point Method with ChIP-seq Data

Author: Liao W.
Mo Y.
Xing H.
Zhang M. Q.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 26/07/2012
Field of study

Next-generation sequencing (NGS) technologies have matured considerably since their introduction and a focus has been placed on developing sophisticated analytical tools to deal with the amassing volumes of data. Chromatin immunoprecipitation sequencing (ChIP-seq), a major application of NGS, is a widely adopted technique for examining protein-DNA interactions and is commonly used to investigate epigenetic signatures of diffuse histone marks. These datasets have notoriously high variance and subtle levels of enrichment across large expanses, making them exceedingly difficult to define. Windows-based, heuristic models and finite-state hidden Markov models (HMMs) have been used with some success in analyzing ChIP-seq data but with lingering limitations. To improve the ability to detect broad regions of enrichment, we developed a stochastic Bayesian Change-Point (BCP) method, which addresses some of these unresolved issues. BCP makes use of recent advances in infinite-state HMMs by obtaining explicit formulas for posterior means of read densities. These posterior means can be used to categorize the genome into enriched and unenriched segments, as is customarily done, or examined for more detailed relationships since the underlying subpeaks are preserved rather than simplified into a binary classification. BCP performs a near exhaustive search of all possible change points between different posterior means at high-resolution to minimize the subjectivity of window sizes and is computationally efficient, due to a speed-up algorithm and the explicit formulas it employs. In the absence of a well-established "gold standard" for diffuse histone mark enrichment, we corroborated BCP's island detection accuracy and reproducibility using various forms of empirical evidence. We show that BCP is especially suited for analysis of diffuse histone ChIP-seq data but also effective in analyzing punctate transcription factor ChIP datasets, making it widely applicable for numerous experiment types

Cold Spring Harbor Laboratory Institutional Repository

Directory of Open Access Journals

PubMed Central

FigShare

A Stationary Wavelet Entropy-Based Clustering Approach Accurately Predicts Gene Expression

Author: Choi I.
Nguyen N.
Vo A.
Won K. J.
Publication venue: Donald and Barbara Zucker School of Medicine Academic Works
Publication date: 01/01/2015
Field of study

Studying epigenetic landscapes is important to understand the condition for gene regulation. Clustering is a useful approach to study epigenetic landscapes by grouping genes based on their epigenetic conditions. However, classical clustering approaches that often use a representative value of the signals in a fixed-sized window do not fully use the information written in the epigenetic landscapes. Clustering approaches to maximize the information of the epigenetic signals are necessary for better understanding gene regulatory environments. For effective clustering of multidimensional epigenetic signals, we developed a method called Dewer, which uses the entropy of stationary wavelet of epigenetic signals inside enriched regions for gene clustering. Interestingly, the gene expression levels were highly correlated with the entropy levels of epigenetic signals. Dewer separates genes better than a window-based approach in the assessment using gene expression and achieved a correlation coefficient above 0.9 without using any training procedure. Our results show that the changes of the epigenetic signals are useful to study gene regulation

PubMed Central

Hofstra Northwell Academic Works (Hofstra Northwell School of Medicine)

Mapping and Functional Analysis of cis-Regulatory Elements in Mouse Photoreceptors

Author: Hughes Andrew
Publication venue: Washington University Open Scholarship
Publication date: 15/05/2019
Field of study

Photoreceptors are light-sensitive neurons that mediate vision, and they are the most commonly affected cell type in genetic forms of blindness. In mice, there are two basic types of photoreceptors, rods and cones, which mediate vision in dim and bright environments, respectively. The transcription factors (TFs) that control rod and cone development have been studied in detail, but the cis-regulatory elements (CREs) through which these TFs act are less well understood. To comprehensively identify photoreceptor CREs in mice and to understand their relationship with gene expression, we performed open chromatin (ATAC-seq) and transcriptome (RNA-seq) profiling of FACS-purified rods and cones. We find that rods have significantly fewer regions of open chromatin than cones (as well as \u3e60 additional cell types and tissues), and we demonstrate that this uniquely closed chromatin architecture depends on the rod master regulator Nrl. Finally, we find that regions of rod- and cone-specific open chromatin are enriched for distinct sets of TF binding sites, providing insight into the cis-regulatory grammar of these cell types. We also sought to understand how the regulatory activity of rod and cone open chromatin regions is encoded in DNA sequence. Cone-rod homeobox (CRX) is a paired-like homeodomain TF and master regulator of both rod and cone development, and CRX binding sites are by far the most enriched TF binding sites in photoreceptor CREs. The in vitro DNA binding preferences of CRX have been extensively characterized, but how well in vitro models of TF binding site affinity predict in vivo regulatory activity is not known. In addition, paired-class homeodomain TFs bind DNA as both monomers and dimers, but whether monomeric and dimeric CRX binding sites have distinct regulatory activities is not known. To address these questions, we used a massively parallel reporter assay to quantify the activity of thousands native and mutant CRX binding sites in explanted mouse retinas. These data reveal that dimeric CRX binding sites encode stronger enhancers than monomeric CRX binding sites. Moreover, the activity of half-sites within dimeric CRX binding sites is cooperative and spacing-dependent. In addition, saturating mutagenesis of 195 CRX binding sites reveals that, while TF binding site affinity and activity are moderately correlated across mutations within individual CREs, they are poorly correlated across mutations from distinct CREs. Accordingly, we show that accounting for baseline CRE activity improves the prediction of the effects of mutations in regulatory DNA from sequence-based models. Taken together, these data demonstrate that the activity of CRX binding sites depends on multiple layers of sequence context, providing insight into photoreceptor gene regulation and illustrating functional principles of homeodomain TF binding sites

Washington University St. Louis: Open Scholarship

A comparative study of supervised machine learning algorithms for the prediction of long-range chromatin interactions

Author: Divina Federico
García Pedro Manuel Martínez
García-Torres Miguel
Gómez-Vela Francisco
Vanhaeren Thomas
Vanhoof Wim
Publication venue: 'MDPI AG'
Publication date: 24/08/2020
Field of study

The role of three-dimensional genome organization as a critical regulator of gene expression has become increasingly clear over the last decade. Most of our understanding of this association comes from the study of long range chromatin interaction maps provided by Chromatin Conformation Capture-based techniques, which have greatly improved in recent years. Since these procedures are experimentally laborious and expensive, in silico prediction has emerged as an alternative strategy to generate virtual maps in cell types and conditions for which experimental data of chromatin interactions is not available. Several methods have been based on predictive models trained on one-dimensional (1D) sequencing features, yielding promising results. However, different approaches vary both in the way they model chromatin interactions and in the machine learning-based strategy they rely on, making it challenging to carry out performance comparison of existing methods. In this study, we use publicly available 1D sequencing signals to model cohesin-mediated chromatin interactions in two human cell lines and evaluate the prediction performance of six popular machine learning algorithms: decision trees, random forests, gradient boosting, support vector machines, multi-layer perceptron and deep learning. Our approach accurately predicts long-range interactions and reveals that gradient boosting significantly outperforms the other five methods, yielding accuracies of about 95%. We show that chromatin features in close genomic proximity to the anchors cover most of the predictive information, as has been previously reported. Moreover, we demonstrate that gradient boosting models trained with different subsets of chromatin features, unlike the other methods tested, are able to produce accurate predictions. In this regard, and besides architectural proteins, transcription factors are shown to be highly informative. Our study provides a framework for the systematic prediction of long-range chromatin interactions, identifies gradient boosting as the best suited algorithm for this task and highlights cell-type specific binding of transcription factors at the anchors as important determinants of chromatin wiring mediated by cohesin.This research was funded by grant TIN2015-64776-C3-2-R from the Spanish Government and the European Regional Development FundPeer reviewe

Digital.CSIC

Repository of the University of Namur

Dissecting the genomic activity of a transcriptional regulator by the integrative analysis of omics data

Author: Balbo Gianfranco
Beccuti Marco
Cordero Francesca
De Bortoli Michele
Ferrero Giulio
Miano Valentina
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Institutional Research Information System University of Turin

Globally altered epigenetic landscape and delayed osteogenic differentiation in H3.3-G34W-mutant giant cell tumor of bone

Author: Baude Annika
Breuer Kersten
Fellenberg Jörg
Grünschläger Florian
Haas Simon
Haller Florian
Hartmann Mark
Hey Joschka
Jauch Anna
Jeltsch Albert
Jiang Chao
Kusevic Denis
Kühn Alexander
Lee Suman
Lim Jinyeong
Lindroth Anders M.
Lutsik Pavlo
Mancarella Daniela
Mayakonda Anand
Nguyen Viet Ha
Oppermann Udo
Park Joo Hyun
Park Yoon Jung
Plass Christoph
Rosemann Felix
Schlesner Matthias
Schuhmacher Maren Kirstin
Toprak Umut H.
Toth Reka
Vonficht Dominik
Weichenhan Dieter
Zustin Jozef
Öz Simin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

OPUS Augsburg

jMOSAiCS: joint analysis of multiple ChIP-seq datasets

Author: Emery H Bresnick
Hongda Li
Qiang Chang
Rajendran Sanalkumar
Sündüz Keleş
Xin Zeng
Publication venue: Springer Nature
Publication date: 01/01/2013
Field of study

The ChIP-seq technique enables genome-wide mapping of in vivo protein-DNA interactions and chromatin states. Current analytical approaches for ChIP-seq analysis are largely geared towards single-sample investigations, and have limited applicability in comparative settings that aim to identify combinatorial patterns of enrichment across multiple datasets. We describe a novel probabilistic method, jMOSAiCS, for jointly analyzing multiple ChIP-seq datasets. We demonstrate its usefulness with a wide range of data-driven computational experiments and with a case study of histone modifications on GATA1-occupied segments during erythroid differentiation. jMOSAiCS is open source software and can be downloaded from Bioconductor [1]

Springer - Publisher Connector

PubMed Central