Background: Chromatin immunoprecipitation coupled to next generation
sequencing (ChIP-Seq) is a widely used technique to investigate the function of
chromatin-related proteins in a genome-wide manner. ChIP-Seq generates large
quantities of data which can be difficult to process and analyse, particularly
for organisms with contig based genomes. Contig-based genomes often have poor
annotations for cis-elements, for example enhancers, that are important for
gene expression. Poorly annotated genomes make a comprehensive analysis of
ChIP-Seq data difficult and as such standardized analysis pipelines are
lacking. Methods: We report a computational pipeline that utilizes traditional
High-Performance Computing techniques and open source tools for processing and
analysing data obtained from ChIP-Seq. We applied our computational pipeline
"Rapid Analysis of ChIP-Seq data" (RACS) to ChIP-Seq data that was generated in
the model organism Tetrahymena thermophila, an example of an organism with a
genome that is available in contigs. Results: To test the performance and
efficiency of RACs, we performed control ChIP-Seq experiments allowing us to
rapidly eliminate false positives when analyzing our previously published data
set. Our pipeline segregates the found read accumulations between genic and
intergenic regions and is highly efficient for rapid downstream analyses.
Conclusions: Altogether, the computational pipeline presented in this report is
an efficient and highly reliable tool to analyze genome-wide ChIP-Seq data
generated in model organisms with contig-based genomes.
RACS is an open source computational pipeline available to download from:
https://bitbucket.org/mjponce/racs --or--
https://gitrepos.scinet.utoronto.ca/public/?a=summary&p=RACSComment: Submitted to BMC Bioinformatics. Computational pipeline available at
https://bitbucket.org/mjponce/rac