Search CORE

2,246 research outputs found

Evaluation of Algorithm Performance in ChIP-Seq Peak Detection

Author: Facciotti Marc T.
Wilbanks Elizabeth G.
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Next-generation DNA sequencing coupled with chromatin immunoprecipitation (ChIP-seq) is revolutionizing our ability to interrogate whole genome protein-DNA interactions. Identification of protein binding sites from ChIP-seq data has required novel computational tools, distinct from those used for the analysis of ChIP-Chip experiments. The growing popularity of ChIP-seq spurred the development of many different analytical programs (at last count, we noted 31 open source methods), each with some purported advantage. Given that the literature is dense and empirical benchmarking challenging, selecting an appropriate method for ChIP-seq analysis has become a daunting task. Herein we compare the performance of eleven different peak calling programs on common empirical, transcription factor datasets and measure their sensitivity, accuracy and usability. Our analysis provides an unbiased critical assessment of available technologies, and should assist researchers in choosing a suitable tool for handling ChIP-seq data

CiteSeerX

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

Differential analysis of chromatin accessibility and histone modifications for predicting mouse developmental enhancers

Author: Fan Kaili
Fu Shaliu
Gu Cuihua
Jiang Cizhong
Kundaje Anshul
Lu Aiping
Moore Jill E.
Pratt Henry E.
Purcaro Michael J.
Wang Qin
Weng Zhiping
Zhu Ruixin
Publication venue: eScholarship@UMassChan
Publication date: 22/08/2018
Field of study

Enhancers are distal cis-regulatory elements that modulate gene expression. They are depleted of nucleosomes and enriched in specific histone modifications; thus, calling DNase-seq and histone mark ChIP-seq peaks can predict enhancers. We evaluated nine peak-calling algorithms for predicting enhancers validated by transgenic mouse assays. DNase and H3K27ac peaks were consistently more predictive than H3K4me1/2/3 and H3K9ac peaks. DFilter and Hotspot2 were the best DNase peak callers, while HOMER, MUSIC, MACS2, DFilter and F-seq were the best H3K27ac peak callers. We observed that the differential DNase or H3K27ac signals between two distant tissues increased the area under the precision-recall curve (PR-AUC) of DNase peaks by 17.5-166.7% and that of H3K27ac peaks by 7.1-22.2%. We further improved this differential signal method using multiple contrast tissues. Evaluated using a blind test, the differential H3K27ac signal method substantially improved PR-AUC from 0.48 to 0.75 for predicting heart enhancers. We further validated our approach using postnatal retina and cerebral cortex enhancers identified by massively parallel reporter assays, and observed improvements for both tissues. In summary, we compared nine peak callers and devised a superior method for predicting tissue-specific mouse developmental enhancers by reranking the called peaks

eScholarship@UMMS

Computation for ChIP-seq and RNA-seq studies

Author: Mortazavi Ali
Pepke Shirley
Wold Barbara
Publication venue: Nature Publishing Group
Publication date: 01/11/2009
Field of study

Genome-wide measurements of protein-DNA interactions and transcriptomes are increasingly done by deep DNA sequencing methods (ChIP-seq and RNA-seq). The power and richness of these counting-based measurements comes at the cost of routinely handling tens to hundreds of millions of reads. Whereas early adopters necessarily developed their own custom computer code to analyze the first ChIP-seq and RNA-seq datasets, a new generation of more sophisticated algorithms and software tools are emerging to assist in the analysis phase of these projects. Here we describe the multilayered analyses of ChIP-seq and RNA-seq datasets, discuss the software packages currently available to perform tasks at each layer and describe some upcoming challenges and features for future analysis tools. We also discuss how software choices and uses are affected by specific aspects of the underlying biology and data structure, including genome size, positional clustering of transcription factor binding sites, transcript discovery and expression quantification

Caltech Authors

PeakRanger: A cloud-enabled peak caller for ChIP-seq data

Author: Feng X.
Grossman R.
Stein L. D.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 09/05/2011
Field of study

Background: Chromatin immunoprecipitation (ChIP), coupled with massively parallel short-read sequencing (seq) is used to probe chromatin dynamics. Although there are many algorithms to call peaks from ChIP-seq datasets, most are tuned either to handle punctate sites, such as transcriptional factor binding sites, or broad regions, such as histone modification marks; few can do both. Other algorithms are limited in their configurability, performance on large data sets, and ability to distinguish closely-spaced peaks. Results: In this paper, we introduce PeakRanger, a peak caller software package that works equally well on punctate and broad sites, can resolve closely-spaced peaks, has excellent performance, and is easily customized. In addition, PeakRanger can be run in a parallel cloud computing environment to obtain extremely high performance on very large data sets. We present a series of benchmarks to evaluate PeakRanger against 10 other peak callers, and demonstrate the performance of PeakRanger on both real and synthetic data sets. We also present real world usages of PeakRanger, including peak-calling in the modENCODE project. Conclusions: Compared to other peak callers tested, PeakRanger offers improved resolution in distinguishing extremely closely-spaced peaks. PeakRanger has above-average spatial accuracy in terms of identifying the precise location of binding events. PeakRanger also has excellent sensitivity and specificity in all benchmarks evaluated. In addition, PeakRanger offers significant improvements in run time when running on a single processor system, and very marked improvements when allowed to take advantage of the MapReduce parallel environment offered by a cloud computing resource. PeakRanger can be downloaded at the official site of modENCODE project: http://www.modencode.org/software/ranger

Cold Spring Harbor Laboratory Institutional Repository

Springer - Publisher Connector

PubMed Central

Recommended from our members

Impact of sequencing depth in ChIP-seq experiments

Author: Epstein Charles B.
Ferrari Francesco
Ho Joshua W.K.
Issner Robbyn
Jung Youngsook L.
Karpen Gary H.
Kuroda Mitzi I.
Luquette Lovelace J.
Minoda Aki
Park Peter J.
Tolstorukov Michael
Publication venue: 'Oxford University Press (OUP)'
Publication date: 05/03/2014
Field of study

In a chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) experiment, an important consideration in experimental design is the minimum number of sequenced reads required to obtain statistically significant results. We present an extensive evaluation of the impact of sequencing depth on identification of enriched regions for key histone modifications (H3K4me3, H3K36me3, H3K27me3 and H3K9me2/me3) using deep-sequenced datasets in human and fly. We propose to define sufficient sequencing depth as the number of reads at which detected enrichment regions increase <1% for an additional million reads. Although the required depth depends on the nature of the mark and the state of the cell in each experiment, we observe that sufficient depth is often reached at <20 million reads for fly. For human, there are no clear saturation points for the examined datasets, but our analysis suggests 40–50 million reads as a practical minimum for most marks. We also devise a mathematical model to estimate the sufficient depth and total genomic coverage of a mark. Lastly, we find that the five algorithms tested do not agree well for broad enrichment profiles, especially at lower depths. Our findings suggest that sufficient sequencing depth and an appropriate peak-calling algorithm are essential for ensuring robustness of conclusions derived from ChIP-seq data

Harvard University - DASH

PubMed Central

eScholarship - University of California

Systematic Evaluation of Factors Influencing ChIP-Seq Fidelity

Author: A Barski
A Mortazavi
A Valouev
AP Boyle
AR Quinlan
Barbara J Wold
DA Nix
DS Johnson
DS Johnson
E Larschan
EG Wilbanks
G Benson
G Robertson
H Ji
Housheng Hansen He
I Kozarewa
J Rozowsky
Jason D Lieb
JC Dohm
Jennifer Zieba
Joanna O Mieczkowska
JW Ho
Kevin P White
L Teytelman
Matthew Slattery
N Negre
N Negre
Nicolas Negre
NU Rashid
P Kolasinska-Zwierz
Peter J Bickel
PV Kharchenko
PV Kharchenko
Q Li
Qunhua Li
R Jothi
Richard M Myers
RM Myers
S Pepke
S Roy
SE Celniker
Tae-Kyung Kim
Tao Liu
TD Laajala
TS Mikkelsen
WE Johnson
X Feng
X Shirley Liu
Y Zhang
Y Zhang
Yijun Ruan
Yiwen Chen
Yong Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

We performed a systematic evaluation of how variations in sequencing depth and other parameters influence interpretation of Chromatin immunoprecipitation (ChIP) followed by sequencing (ChIP-seq) experiments. Using Drosophila S2 cells, we generated ChIP-seq datasets for a site-specific transcription factor (Suppressor of Hairy-wing) and a histone modification (H3K36me3). We detected a chromatin state bias, open chromatin regions yielded higher coverage, which led to false positives if not corrected and had a greater effect on detection specificity than any base-composition bias. Paired-end sequencing revealed that single-end data underestimated ChIP library complexity at high coverage. The removal of reads originating at the same base reduced false-positives while having little effect on detection sensitivity. Even at a depth of ~1 read/bp coverage of mappable genome, ~1% of the narrow peaks detected on a tiling array were missed by ChIP-seq. Evaluation of widely-used ChIP-seq analysis tools suggests that adjustments or algorithm improvements are required to handle datasets with deep coverage

Crossref

Harvard University - DASH

PubMed Central

Carolina Digital Repository

Caltech Authors