65 research outputs found
Ringo – an R/Bioconductor package for analyzing ChIP-chip readouts
Background: Chromatin immunoprecipitation combined with DNA microarrays (ChIP-chip) is a high-throughput assay for DNA-protein-binding or post-translational chromatin/histone modifications. However, the raw microarray intensity readings themselves are not immediately useful to researchers, but require a number of bioinformatic analysis steps. Identified enriched regions need to be bioinformatically annotated and compared to related datasets by statistical methods. Results: We present a free, open-source R package Ringo that facilitates the analysis of ChIP-chip experiments by providing functionality for data import, quality assessment, normalization and visualization of the data, and the detection of ChIP-enriched genomic regions. Conclusion: Ringo integrates with other packages of the Bioconductor project, uses common data structures and is accompanied by ample documentation. It facilitates the construction of programmed analysis workflows, offers benefits in scalability, reproducibility and methodical scope of the analyses and opens up a broad selection of follow-up statistical and bioinformatic methods
Starr: Simple Tiling Array Analysis of Affymetrix ChIP-chip data
Chromatin immunoprecipitation combined with DNA microarrays (ChIP-chip) is an
assay for DNA-protein-binding or post-translational chromatin/histone
modifications. As with all high-throughput technologies, it requires a thorough
bioinformatic processing of the data for which there is no standard yet. The
primary goal is the reliable identification and localization of genomic regions
that bind a specific protein. The second step comprises comparison of binding
profiles of functionally related proteins, or of binding profiles of the same
protein in different genetic backgrounds or environmental conditions.
Ultimately, one would like to gain a mechanistic understanding of the effects
of DNA binding events on gene expression. We present a free, open-source R
package Starr that, in combination with the package Ringo, facilitates the
comparative analysis of ChIP-chip data across experiments and across different
microarray platforms. Core features are data import, quality assessment,
normalization and visualization of the data, and the detection of ChIP-enriched
genomic regions. The use of common Bioconductor classes ensures the
compatibility with other R packages. Most importantly, Starr provides methods
for integration of complementary genomics data, e.g., it enables systematic
investigation of the relation between gene expression and dna binding
Analyzing ChIP-chip Data Using Bioconductor
Analyzing ChIP-chip Data Using Bioconducto
Starr: Simple Tiling ARRay analysis of Affymetrix ChIP-chip data
<p>Abstract</p> <p>Background</p> <p>Chromatin immunoprecipitation combined with DNA microarrays (ChIP-chip) is an assay used for investigating DNA-protein-binding or post-translational chromatin/histone modifications. As with all high-throughput technologies, it requires thorough bioinformatic processing of the data for which there is no standard yet. The primary goal is to reliably identify and localize genomic regions that bind a specific protein. Further investigation compares binding profiles of functionally related proteins, or binding profiles of the same proteins in different genetic backgrounds or experimental conditions. Ultimately, the goal is to gain a mechanistic understanding of the effects of DNA binding events on gene expression.</p> <p>Results</p> <p>We present a free, open-source <b>R</b>/Bioconductor package <it>Starr </it>that facilitates comparative analysis of ChIP-chip data across experiments and across different microarray platforms. The package provides functions for data import, quality assessment, data visualization and exploration. <it>Starr </it>includes high-level analysis tools such as the alignment of ChIP signals along annotated features, correlation analysis of ChIP signals with complementary genomic data, peak-finding and comparative display of multiple clusters of binding profiles. It uses standard Bioconductor classes for maximum compatibility with other software. Moreover, <it>Starr </it>automatically updates microarray probe annotation files by a highly efficient remapping of microarray probe sequences to an arbitrary genome.</p> <p>Conclusion</p> <p><it>Starr </it>is an <b>R </b>package that covers the complete ChIP-chip workflow from data processing to binding pattern detection. It focuses on the high-level data analysis, e.g., it provides methods for the integration and combined statistical analysis of binding profiles and complementary functional genomics data. <it>Starr </it>enables systematic assessment of binding behaviour for groups of genes that are alingned along arbitrary genomic features.</p
CoCAS: a ChIP-on-chip analysis suite
Motivation: High-density tiling microarrays are increasingly used in combination with ChIP assays to study transcriptional regulation. To ease the analysis of the large amounts of data generated by this approach, we have developed ChIP-on-chip Analysis Suite (CoCAS), a standalone software suite which implements optimized ChIP-on-chip data normalization, improved peak detection, as well as quality control reports. Our software allows dye swap, replicate correlation and connects easily with genome browsers and other peak detection algorithms. CoCAS can readily be used on the latest generation of Agilent high-density arrays. Also, the implemented peak detection methods are suitable for other datasets, including ChIP-Seq output
ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data
<p>Abstract</p> <p>Background</p> <p>Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) or ChIP followed by genome tiling array analysis (ChIP-chip) have become standard technologies for genome-wide identification of DNA-binding protein target sites. A number of algorithms have been developed in parallel that allow identification of binding sites from ChIP-seq or ChIP-chip datasets and subsequent visualization in the University of California Santa Cruz (UCSC) Genome Browser as custom annotation tracks. However, summarizing these tracks can be a daunting task, particularly if there are a large number of binding sites or the binding sites are distributed widely across the genome.</p> <p>Results</p> <p>We have developed <it>ChIPpeakAnno </it>as a Bioconductor package within the statistical programming environment R to facilitate batch annotation of enriched peaks identified from ChIP-seq, ChIP-chip, cap analysis of gene expression (CAGE) or any experiments resulting in a large number of enriched genomic regions. The binding sites annotated with <it>ChIPpeakAnno </it>can be viewed easily as a table, a pie chart or plotted in histogram form, i.e., the distribution of distances to the nearest genes for each set of peaks. In addition, we have implemented functionalities for determining the significance of overlap between replicates or binding sites among transcription factors within a complex, and for drawing Venn diagrams to visualize the extent of the overlap between replicates. Furthermore, the package includes functionalities to retrieve sequences flanking putative binding sites for PCR amplification, cloning, or motif discovery, and to identify Gene Ontology (GO) terms associated with adjacent genes.</p> <p>Conclusions</p> <p><it>ChIPpeakAnno </it>enables batch annotation of the binding sites identified from ChIP-seq, ChIP-chip, CAGE or any technology that results in a large number of enriched genomic regions within the statistical programming environment R. Allowing users to pass their own annotation data such as a different Chromatin immunoprecipitation (ChIP) preparation and a dataset from literature, or existing annotation packages, such as <it>GenomicFeatures </it>and <it>BSgenom</it>e, provides flexibility. Tight integration to the <it>biomaRt </it>package enables up-to-date annotation retrieval from the BioMart database.</p
Nucleosomal chromatin in the mature sperm of Drosophila melanogaster
During spermiogenesis in mammals and many other vertebrate classes, histone-containing nucleosomes are replaced by protamine toroids, which can repackage chromatin at a 10 to 20-fold higher density than in a typical somatic nucleus. However, recent evidence suggests that sperm of many species, including human and mouse retain a small compartment of nucleosomal chromatin, particularly near genes important for embryogenesis. As in mammals, spermiogenesis in the fruit fly, Drosophila melanogaster has also been shown to undergo a programmed substitution of nucleosomes with protamine-like proteins. Using chromatin immunoprecipitation (ChIP) and whole-genome tiling array hybridization (ChIP-chip), supported by immunocytochemical evidence, we show that in a manner analogous to nucleosomal chromatin retention in mammalian spermatozoa, distinct domains packaged by the canonical histones H2A, H2B, H3 and H4 are present in the fly sperm nucleus. We also find evidence for the retention of nucleosomes with specific histone H3 trimethylation marks characteristic of chromatin repression (H3K9me3, H3K27me3) and active transcription (H3K36me3)
Regulatorische Netzwerke der Genexpression in Herz- und Skelettmuskelzellen auf der Ebene von Histonmodifikationen und Transkriptionsfaktoren
Title and Contents I
1\. Introduction 1
2\. Aims 23
3\. Materials and Methods 25
4\. Results 41
5\. Discussion 85
6\. Outlook 105
7\. Summary 107
8\. Zusammenfassung 109
9\. References 113
10\. Supplementary Data 125
11\. Appendix 141
13\. List of Own Publications 158In this study the role of histone modifications and their potential
interactions with four key cardiac transcription factors in the context of
heart development and congenital heart diseases were studied. While histone
modifications influence the compaction of chromatin and consequently the
accessibility of DNA, transcription factors (TFs) are responsible for a fine
tuning of expression. Two histone acetylations and two histone methylations
were studied which were previously described to be associated with active
transcription: H3ac, H4ac, H3K4me2 and H3K4me3. The transcription factors
Gata4, Mef2a, Nkx2.5, and Srf are known to be essential for heart development
and form a regulatory subnetwork in which they regulate each others
expression.
To gain insight into the processes regulating gene expression, the histone
modifications and transcription factors were investigated, using heart and
skeletal muscle cells as model systems. Their combinatorial occurrence was
determined by chromatin immunoprecipitation followed by detection on custom
arrays (ChIP-chip). The arrays represent the 12,625 transcription start sites
of 8,585 murine genes. This information was combined with expression array
data of the same genes and RNA interference (RNAi) experiments targeting the
investigated transcription factors.
The developed research tools such as array designs, the software package for
the analysis, the raw data and a results database were made publicly
available. The presented data set demonstrated that the average transcript
levels associated with combinations of modifications are not simply related to
those associated with individual modifications, supporting the histone code
hypothesis: Different combinations of modifications are associated with
significantly different transcript levels and the levels of these combinations
are not additively related to the levels associated with the individual
modifications. The dynamics of histone modifications during muscle cell
differentiation suggests that they may have a major function as signaling
marks for the recruitment of TFs.
The four investigated transcription factors not only regulate each others
expression but also have a high number of coregulated target genes, many of
which are themselves transcription factors. An example is the T-box
transcription factor Tbx20, which was identified as a novel target. This
suggests that Gata4, Mef2a, Nkx2.5, and Srf can be placed at the top of
several regulatory cascades. Analysis of the binding sequences showed that the
conservation of binding sites is lower than previously suggested and
furthermore revealed a novel binding motif for Srf. The TFs were found to
frequently bind at sites of histone modifications and to mainly function as
activators of transcription. The activating potential of Gata4 and Srf was
even enhanced at sites of H3ac: possibly a consequence of the interaction
between these transcription factors and the histone acetyl transferase (HAT)
p300.In dieser Arbeit wurden die Rolle von Histon-Modifikationen sowie ihre
möglichen Interaktionen mit vier Schlüssel-Transkriptionsfaktoren (TFs) im
Kontext von Herz-Entwicklung und angeborenen Herz-Erkrankungen untersucht.
Histon-Modifikationen beeinflussen maßgeblich den Grad der Kompaktierung von
Chromatin und damit auch die Zugänglichkeit der DNA. Transkriptionsfaktoren
dagegen sind für die Feinjustierung der Expression verantwortlich. Zwei
Histonacetylierungen und zwei Histonmethylierungen wurden untersucht, deren
Einfluss auf die Transkription als aktivierend gilt: H3ac, H4ac, H3K4me2 und
H3K4me3. Die Transkriptionsfaktoren Gata4, Mef2a, Nkx2.5, und Srf sind
essentiell für die Entwicklung des Herzens und bilden ein Teilnetzwerk, in dem
sie ihre Expression gegenseitig regulieren.
Um Einblicke in die Steuerung der Genexpression zu erlangen, wurden die
Histon-Modifikationen und die vier Transkriptionsfaktoren in Herz- und
Skelettmuskelzellen als Modellsysteme verwendet. Die Kombinatorik ihres
Auftretens wurde mittels Chromatin-Immunopräzipitation und einer
anschließenden Analyse auf maßgeschneiderten Arrays ermittelt (ChIP-chip). Die
Arrays repräsentieren die 12.625 Transkription-Starts von 8.585 murinen Genen.
Diese Information wurde mit den Ergebnissen einer Expressionsanalyse der
gleichen Gene sowie mit RNA Interferenz (RNAi) Experimenten gegen die
Transkriptionsfaktoren kombiniert.
Die entwickelten Analyse-Werkzeuge, wie die Designs der Arrays, das Software-
Paket für die Auswertung, die Rohdaten und eine Datendank der Ergebnisse,
wurden über das Internet allgemein zugänglich gemacht. Die hier vorgestellten
Daten zeigen, dass die durchschnittlichen Expressionswerte, die mit
Kombinationen von Modifikationen einhergehen, sich nicht einfach aus den
Expressionswerten der Gene, die mit einzelnen Modifikationen assoziiert sind,
ableiten lassen. Diese Befunde unterstützen die Histon Code Hypothese:
Verschiedene Kombinationen von Histon-Modifikationen sind mit signifikant
unterschiedlichen Expressionswerten verknüpft, und das Expressionsniveau der
Kombinationen ergibt sich mitnichten als Summe der Expressionsniveaus, die mit
den einzelnen Modifikationen assoziiert sind. Die Dynamik der Histon-
Modifikationen während der Differenzierung von Muskelzellen suggeriert, dass
sie eine wichtige Rolle als Signale für die Rekrutierung von
Transkriptionsfaktoren spielen könnten.
Die vier untersuchten Transkriptionsfaktoren regulieren sich nicht nur
gegenseitig, sondern haben auch eine hohe Zahl von gemeinsamen Zielgenen, von
denen zahlreiche selbst wieder Transkriptionsfaktoren sind. Ein Bespiel dafür
ist der als neues Zielgen identifizierte T-box Transkriptionsfaktor Tbx20. Auf
Grund dieser Ergebnisse können Gata4, Mef2a, Nkx2.5 und Srf am Beginn von
mehreren regulatorischen Kaskaden plaziert werden. Die Analyse der von den
Faktoren gebundenen DNA-Sequenzen zeigt, dass diese weniger gut konserviert
sind als bisher vermutet wurde; weiterhin konnte ein neues Bindungsmotif für
Srf identifiziert werden. Die untersuchten Transkriptionsfaktoren binden meist
in Bereichen, in denen auch Histon-Modifikationen gefunden wurden, und
fungieren überwiegend als Aktivatoren der Genexpression. Dieses aktivierende
Potential wird im Falle von Gata4 und Srf noch verstärkt, wenn gleichzeitig
H3ac vorliegt; dies ist möglicherweise eine Konsequenz der Interaktion der
Transkriptionsfaktoren mit der Histonacetyltransferase (HAT) p300
Cooperativity of stress-responsive transcription factors in core hypoxia-inducible factor binding regions
The transcriptional response driven by Hypoxia-inducible factor (HIF) is central to the adaptation to oxygen restriction. Despite recent characterization of genome-wide HIF DNA binding locations and hypoxia-regulated transcripts in different cell types, the molecular bases of HIF target selection remain unresolved. Herein, we combined multi-level experimental data and computational predictions to identify sequence motifs that may contribute to HIF target selectivity. We obtained a core set of bona fide HIF binding regions by integrating multiple HIF1 DNA binding and hypoxia expression profiling datasets. This core set exhibits evolutionarily conserved binding regions and is enriched in functional responses to hypoxia. Computational prediction of enriched transcription factor binding sites identified sequence motifs corresponding to several stress-responsive transcription factors, such as activator protein 1 (AP1), cAMP response element-binding (CREB), or CCAAT-enhancer binding protein (CEBP). Experimental validations on HIF-regulated promoters suggest a functional role of the identified motifs in modulating HIF-mediated transcription. Accordingly, transcriptional targets of these factors are over-represented in a sorted list of hypoxia-regulated genes. Altogether, our results implicate cooperativity among stress-responsive transcription factors in fine-tuning the HIF transcriptional responseThis work was supported by Ministerio de Ciencia e Innovación (Spanish Ministry of Science and Innovation, MICINN) [grant number SAF2008-03147 to L. del P.], Comunidad Autónoma de Madrid [grant number S-SAL-0311_2006 to L. del P.] and the 7th Research Framework Programme of the European Union [grant number METOXIA project ref. HEALTH-F2-2009-222741] to L. del P. D.V. was a recipient of PhD funding from the Spanish Ministry of Science and Innovation [FPU programme] and the European Molecular Biology Organization [Short-Term Fellowships
Recommended from our members
Comprehensive analysis of high-throughput experiments for investigating transcription and transcriptional regulation
As the number of fully sequenced genomes grows, efforts are shifted towards investigation of functional aspects. One research focus is the transcriptome, the set of all transcribed genomic features. We aspire to understand what features constitute the transcriptome, in which context these are transcribed and how their transcription is regulated. Studies that aim to answer these questions frequently make use of high-throughput technologies that allow for investigation of multiple genomic regions, or transcribed copies of genomic regions, in parallel. In this dissertation, I present three high-throughput studies I have been involved in, in which data gained from oligo-nucleotide tiling microarrays or large-scale cDNA sequencing provided insights into the transcriptome and transcriptional regulation in the model organisms Saccharomyces cerevisiae and Mus musculus. Interpretation of such high-throughput data poses two major computational tasks. The primary statistical analysis includes quality assessment, data normalisation and identification of significantly affected targets, i.e. regions of the genome deemed transcribed or involved in transcriptional regulation. Second, in an integrative bioinformatic analysis, the identified targets need to be interpreted in context of the current genome annotation and related experimental results. I provide details of these individual steps as they were conducted in the three studies. For both primary and integrative analysis, functional, extensible and welldocumented software is required, which implements individual analysis steps, allows for concise visualisation of intermittent and final results and facilitates the construction of automated, programmed workflows. Ideally such software is optimised with respect to scalability, reproducibility and methodical scope of the analyses. This dissertation contains details of two such software packages in the Bioconductor project, which I (co-)developed
- …