40 research outputs found
RUbioSeq+: A multiplatform application that executes parallelized pipelines to analyse next-generation sequencing data
This is the peer reviewed version of the following article: Computer Methods and Programs in Biomedine 138 (2016): 73-81, which has been published in final form at http://dx.doi.org/10.1016/j.cmpb.2016.10.008Background and objective To facilitate routine analysis and to improve the reproducibility of the results, next-generation sequencing (NGS) analysis requires intuitive, efficient and integrated data processing pipelines. Methods We have selected well-established software to construct a suite of automated and parallelized workflows to analyse NGS data for DNA-seq (single-nucleotide variants (SNVs) and indels), CNA-seq, bisulfite-seq and ChIP-seq experiments. Results Here, we present RUbioSeq+, an updated and extended version of RUbioSeq, a multiplatform application that incorporates a suite of automated and parallelized workflows to analyse NGS data. This new version includes: (i) an interactive graphical user interface (GUI) that facilitates its use by both biomedical researchers and bioinformaticians, (ii) a new pipeline for ChIP-seq experiments, (iii) pair-wise comparisons (case–control analyses) for DNA-seq experiments, (iv) and improvements in the parallelized and multithreaded execution options. Results generated by our software have been experimentally validated and accepted for publication. Conclusions RUbioSeq+ is free and open to all users at http://rubioseq.bioinfo.cnio.es/.M.R-C is funded by the BLUEPRINT Consortium (FP7/ 2007-2013) under grant agreement number 282510. J.M.F is funded by the INB Node 2 - CNIO, a member of Proteored - PRB2-ISCIII and is supported by grant PT13/0001, of the PE I+D+i 2013-2016, funded by ISCIII and FEDER. H.L-F is funded by a postdoctoral fellowship from the Xunta de Galicia. F.F-R and D.G-P are funded by the European Union's Seventh Framework Programme FP7/REGPOT 2012 2013.1 under grant agreement n° 316265 (BIOCAPS) and the "Platform of integration of intelligent techniques for analysis of biomedical information" project (TIN2013-47153-C3-3-R) financed by the Spanish Ministry of Economy and Competitiveness C.FT is funded by the "Spanish National Youth Guarantee Implementation Plan” (2013/2016) financed by the Spanish Ministry of Economy and Competitivenes
Q&A: ChIP-seq technologies and the study of gene regulation
10.1186/1741-7007-8-56BMC Biology85
A computational model for histone mark propagation reproduces the distribution of heterochromatin in different human cell types
Chromatin is a highly compact and dynamic nuclear structure that consists of
DNA and associated proteins. The main organizational unit is the nucleosome,
which consists of a histone octamer with DNA wrapped around it. Histone
proteins are implicated in the regulation of eukaryote genes and they carry
numerous reversible post-translational modifications that control DNA-protein
interactions and the recruitment of chromatin binding proteins.
Heterochromatin, the transcriptionally inactive part of the genome, is densely
packed and contains histone H3 that is methylated at Lys 9 (H3K9me). The
propagation of H3K9me in nucleosomes along the DNA in chromatin is antagonizing
by methylation of H3 Lysine 4 (H3K4me) and acetylations of several lysines,
which is related to euchromatin and active genes. We show that the related
histone modifications form antagonized domains on a coarse scale. These histone
marks are assumed to be initiated within distinct nucleation sites in the DNA
and to propagate bi-directionally. We propose a simple computer model that
simulates the distribution of heterochromatin in human chromosomes. The
simulations are in agreement with previously reported experimental observations
from two different human cell lines. We reproduced different types of barriers
between heterochromatin and euchromatin providing a unified model for their
function. The effect of changes in the nucleation site distribution and of
propagation rates were studied. The former occurs mainly with the aim of
(de-)activation of single genes or gene groups and the latter has the power of
controlling the transcriptional programs of entire chromosomes. Generally, the
regulatory program of gene transcription is controlled by the distribution of
nucleation sites along the DNA string.Comment: 24 pages,9 figures, 1 table + supplementary materia
Bivalent-Like Chromatin Markers Are Predictive for Transcription Start Site Distribution in Human
Deep sequencing of 5′ capped transcripts has revealed a variety of transcription initiation patterns, from narrow, focused promoters to wide, broad promoters. Attempts have already been made to model empirically classified patterns, but virtually no quantitative models for transcription initiation have been reported. Even though both genetic and epigenetic elements have been associated with such patterns, the organization of regulatory elements is largely unknown. Here, linear regression models were derived from a pool of regulatory elements, including genomic DNA features, nucleosome organization, and histone modifications, to predict the distribution of transcription start sites (TSS). Importantly, models including both active and repressive histone modification markers, e.g. H3K4me3 and H4K20me1, were consistently found to be much more predictive than models with only single-type histone modification markers, indicating the possibility of “bivalent-like” epigenetic control of transcription initiation. The nucleosome positions are proposed to be coded in the active component of such bivalent-like histone modification markers. Finally, we demonstrated that models trained on one cell type could successfully predict TSS distribution in other cell types, suggesting that these models may have a broader application range
A diffusion model for the coordination of DNA replication in Schizosaccharomyces pombe
The locations of proteins and epigenetic marks on the chromosomal DNA sequence are believed to demarcate the eukaryotic genome into distinct structural and functional domains that contribute to gene regulation and genome organization. However, how these proteins and epigenetic marks are organized in three dimensions remains unknown. Recent advances in proximity-ligation methodologies and high resolution microscopy have begun to expand our understanding of these spatial relationships. Here we use polymer models to examine the spatial organization of epigenetic marks, euchromatin and heterochromatin, and origins of replication within the Schizosaccharomyces pombe genome. These models incorporate data from microscopy and proximity-ligation experiments that inform on the positions of certain elements and contacts within and between chromosomes. Our results show a striking degree of compartmentalization of epigenetic and genomic features and lead to the proposal of a diffusion based mechanism, centred on the spindle pole body, for the coordination of DNA replication in S. pombe
Evaluation of Algorithm Performance in ChIP-Seq Peak Detection
Next-generation DNA sequencing coupled with chromatin immunoprecipitation (ChIP-seq) is revolutionizing our ability to interrogate whole genome protein-DNA interactions. Identification of protein binding sites from ChIP-seq data has required novel computational tools, distinct from those used for the analysis of ChIP-Chip experiments. The growing popularity of ChIP-seq spurred the development of many different analytical programs (at last count, we noted 31 open source methods), each with some purported advantage. Given that the literature is dense and empirical benchmarking challenging, selecting an appropriate method for ChIP-seq analysis has become a daunting task. Herein we compare the performance of eleven different peak calling programs on common empirical, transcription factor datasets and measure their sensitivity, accuracy and usability. Our analysis provides an unbiased critical assessment of available technologies, and should assist researchers in choosing a suitable tool for handling ChIP-seq data
Detecting broad domains and narrow peaks in ChIP-seq data with hiddenDomains
Abstract Background Correctly identifying genomic regions enriched with histone modifications and transcription factors is key to understanding their regulatory and developmental roles. Conceptually, these regions are divided into two categories, narrow peaks and broad domains, and different algorithms are used to identify each one. Datasets that span these two categories are often analyzed with a single program for peak calling combined with an ad hoc method for domains. Results We developed hiddenDomains, which identifies both peaks and domains, and compare it to the leading algorithms using H3K27me3, H3K36me3, GABP, ESR1 and FOXA ChIP-seq datasets. The output from the programs was compared to qPCR-validated enriched and depleted sites, predicted transcription factor binding sites, and highly-transcribed gene bodies. With every method, hiddenDomains, performed as well as, if not better than algorithms dedicated to a specific type of analysis. Conclusions hiddenDomains performs as well as the best domain and peak calling algorithms, making it ideal for analyzing ChIP-seq datasets, especially those that contain a mixture of peaks and domains