Search CORE

6,383 research outputs found

Practical guidelines for the comprehensive analysis of ChIP-seq data.

Author: Bailey Timothy
Krajewski Pawel
Ladunga Istvan
Lefebvre Celine
Li Qunhua
Liu Tao
Madrigal Pedro
Taslim Cenny
Zhang Jie
Publication venue: PLoS Comput Biol
Publication date: 01/01/2013
Field of study

Mapping the chromosomal locations of transcription factors, nucleosomes, histone modifications, chromatin remodeling enzymes, chaperones, and polymerases is one of the key tasks of modern biology, as evidenced by the Encyclopedia of DNA Elements (ENCODE) Project. To this end, chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is the standard methodology. Mapping such protein-DNA interactions in vivo using ChIP-seq presents multiple challenges not only in sample preparation and sequencing but also for computational analysis. Here, we present step-by-step guidelines for the computational analysis of ChIP-seq data. We address all the major steps in the analysis of ChIP-seq data: sequencing depth selection, quality checking, mapping, data normalization, assessment of reproducibility, peak calling, differential binding analysis, controlling the false discovery rate, peak annotation, visualization, and motif analysis. At each step in our guidelines we discuss some of the software tools most frequently used. We also highlight the challenges and problems associated with each step in ChIP-seq data analysis. We present a concise workflow for the analysis of ChIP-seq data in Figure 1 that complements and expands on the recommendations of the ENCODE and modENCODE projects. Each step in the workflow is described in detail in the following sections

Crossref

DigitalCommons@University of Nebraska

Directory of Open Access Journals

PubMed Central

Apollo (Cambridge)

University of Queensland eSpace

FigShare

RACS: Rapid Analysis of ChIP-Seq data for contig based genomes

Author: Fillingham Jeffrey
Nabeel-Shah Syed
Ponce Marcelo
Saettone Alejandro
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/05/2019
Field of study

Background: Chromatin immunoprecipitation coupled to next generation sequencing (ChIP-Seq) is a widely used technique to investigate the function of chromatin-related proteins in a genome-wide manner. ChIP-Seq generates large quantities of data which can be difficult to process and analyse, particularly for organisms with contig based genomes. Contig-based genomes often have poor annotations for cis-elements, for example enhancers, that are important for gene expression. Poorly annotated genomes make a comprehensive analysis of ChIP-Seq data difficult and as such standardized analysis pipelines are lacking. Methods: We report a computational pipeline that utilizes traditional High-Performance Computing techniques and open source tools for processing and analysing data obtained from ChIP-Seq. We applied our computational pipeline "Rapid Analysis of ChIP-Seq data" (RACS) to ChIP-Seq data that was generated in the model organism Tetrahymena thermophila, an example of an organism with a genome that is available in contigs. Results: To test the performance and efficiency of RACs, we performed control ChIP-Seq experiments allowing us to rapidly eliminate false positives when analyzing our previously published data set. Our pipeline segregates the found read accumulations between genic and intergenic regions and is highly efficient for rapid downstream analyses. Conclusions: Altogether, the computational pipeline presented in this report is an efficient and highly reliable tool to analyze genome-wide ChIP-Seq data generated in model organisms with contig-based genomes. RACS is an open source computational pipeline available to download from: https://bitbucket.org/mjponce/racs --or-- https://gitrepos.scinet.utoronto.ca/public/?a=summary&p=RACSComment: Submitted to BMC Bioinformatics. Computational pipeline available at https://bitbucket.org/mjponce/rac

arXiv.org e-Print Archive

University of Toronto Research Repository

Evaluation of experimental design and computational parameter choices affecting analyses of ChIP-seq and RNA-seq data in undomesticated poplar trees.

Author: Filkov Vladimir
Groover Andrew
Liu Lijun
Missirian Victor
Zinkgraf Matthew
Publication venue: eScholarship, University of California
Publication date: 01/01/2014
Field of study

BackgroundOne of the great advantages of next generation sequencing is the ability to generate large genomic datasets for virtually all species, including non-model organisms. It should be possible, in turn, to apply advanced computational approaches to these datasets to develop models of biological processes. In a practical sense, working with non-model organisms presents unique challenges. In this paper we discuss some of these challenges for ChIP-seq and RNA-seq experiments using the undomesticated tree species of the genus Populus.ResultsWe describe specific challenges associated with experimental design in Populus, including selection of optimal genotypes for different technical approaches and development of antibodies against Populus transcription factors. Execution of the experimental design included the generation and analysis of Chromatin immunoprecipitation-sequencing (ChIP-seq) data for RNA polymerase II and transcription factors involved in wood formation. We discuss criteria for analyzing the resulting datasets, determination of appropriate control sequencing libraries, evaluation of sequencing coverage needs, and optimization of parameters. We also describe the evaluation of ChIP-seq data from Populus, and discuss the comparison between ChIP-seq and RNA-seq data and biological interpretations of these comparisons.ConclusionsThese and other "lessons learned" highlight the challenges but also the potential insights to be gained from extending next generation sequencing-supported network analyses to undomesticated non-model species

Crossref

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia

Author: A. J. Hartemink
A. Kundaje
A. Milosavljevic
A. Sidow
B. E. Bernstein
B. J. Wold
C. Epstein
D. Raha
F. Pauli
G. DeSalvo
G. Euskirchen
G. K. Marinov
J. A. Stamatoyannopoulos
J. B. Brown
J. D. Lieb
J. Gertz
J. Rozowsky
K. I. Fisher-Aylor
K. P. White
L. Ma
M. D. Perry
M. Gerstein
M. J. Pazin
M. Kellis
M. M. Hoffman
M. Slattery
M. Snyder
M. Y. Tolstorukov
N. Shoresh
P. Bickel
P. Cayting
P. J. Farnham
P. J. Park
P. Kheradpour
P. V. Kharchenko
Q. Li
R. M. Myers
S. Batzoglou
S. G. Landt
S. Karmakar
S. Xi
T. E. Reddy
T. Liu
V. R. Iyer
X. S. Liu
Y. Chen
Y. L. Jung
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 01/12/2011
Field of study

Chromatin immunoprecipitation (ChIP) followed by high-throughput DNA sequencing (ChIP-seq) has become a valuable and widely used approach for mapping the genomic location of transcription-factor binding and histone modifications in living cells. Despite its widespread use, there are considerable differences in how these experiments are conducted, how the results are scored and evaluated for quality, and how the data and metadata are archived for public use. These practices affect the quality and utility of any global ChIP experiment. Through our experience in performing ChIP-seq experiments, the ENCODE and modENCODE consortia have developed a set of working standards and guidelines for ChIP experiments that are updated routinely. The current guidelines address antibody validation, experimental replication, sequencing depth, data and metadata reporting, and data quality assessment. We discuss how ChIP quality, assessed in these ways, affects different uses of ChIP-seq data. All data sets used in the analysis have been deposited for public viewing and downloading at the ENCODE (http://encodeproject.org/ENCODE/) and modENCODE (http://www.modencode.org/) portals

DSpace@MIT

Crossref

Caltech Authors

Next generation sequencing in cancer: opportunities and challenges for precision cancer medicine

Author: Fortina Paolo
Londin Eric
Paolillo Carmela
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2016
Field of study

Over the past decade, testing the genes of patients and their specific cancer types has become standardized practice in medical oncology since somatic mutations, changes in gene expression and epigenetic modifications are all hallmarks of cancer. However, while cancer genetic assessment has been limited to single biomarkers to guide the use of therapies, improvements in nucleic acid sequencing technologies and implementation of different genome analysis tools have enabled clinicians to detect these genomic alterations and identify functional and disease-associated genomic variants. Next-generation sequencing (NGS) technologies have provided clues about therapeutic targets and genomic markers for novel clinical applications when standard therapy has failed. While Sanger sequencing, an accurate and sensitive approach, allows for the identification of potential novel variants, it is however limited by the single amplicon being interrogated. Similarly, quantitative and qualitative profiling of gene expression changes also represents a challenge for the cancer field. Both RT-PCR and microarrays are efficient approaches, but are limited to the genes present on the array or being assayed. This leaves vast swaths of the transcriptome, including non-coding RNAs and other features, unexplored. With the advent of the ability to collect and analyze genomic sequence data in a timely fashion and at an ever-decreasing cost, many of these limitations have been overcome and are being incorporated into cancer research and diagnostics giving patients and clinicians new hope for targeted and personalized treatment. Below we highlight the various applications of next-generation sequencing in precision cancer medicine

Archivio Istituzionale della Ricerca- Università degli Studi di Foggia

Archivio della ricerca- Università di Roma La Sapienza

NBLDA: Negative Binomial Linear Discriminant Analysis for RNA-Seq Data

Author: Dong Kai
Tong Tiejun
Wan Xiang
Zhao Hongyu
Publication venue
Publication date: 27/01/2015
Field of study

RNA-sequencing (RNA-Seq) has become a powerful technology to characterize gene expression profiles because it is more accurate and comprehensive than microarrays. Although statistical methods that have been developed for microarray data can be applied to RNA-Seq data, they are not ideal due to the discrete nature of RNA-Seq data. The Poisson distribution and negative binomial distribution are commonly used to model count data. Recently, Witten (2011) proposed a Poisson linear discriminant analysis for RNA-Seq data. The Poisson assumption may not be as appropriate as negative binomial distribution when biological replicates are available and in the presence of overdispersion (i.e., when the variance is larger than the mean). However, it is more complicated to model negative binomial variables because they involve a dispersion parameter that needs to be estimated. In this paper, we propose a negative binomial linear discriminant analysis for RNA-Seq data. By Bayes' rule, we construct the classifier by fitting a negative binomial model, and propose some plug-in rules to estimate the unknown parameters in the classifier. The relationship between the negative binomial classifier and the Poisson classifier is explored, with a numerical investigation of the impact of dispersion on the discriminant score. Simulation results show the superiority of our proposed method. We also analyze four real RNA-Seq data sets to demonstrate the advantage of our method in real-world applications

arXiv.org e-Print Archive

Springer - Publisher Connector