57 research outputs found

    Starr: Simple Tiling Array Analysis of Affymetrix ChIP-chip data

    Full text link
    Chromatin immunoprecipitation combined with DNA microarrays (ChIP-chip) is an assay for DNA-protein-binding or post-translational chromatin/histone modifications. As with all high-throughput technologies, it requires a thorough bioinformatic processing of the data for which there is no standard yet. The primary goal is the reliable identification and localization of genomic regions that bind a specific protein. The second step comprises comparison of binding profiles of functionally related proteins, or of binding profiles of the same protein in different genetic backgrounds or environmental conditions. Ultimately, one would like to gain a mechanistic understanding of the effects of DNA binding events on gene expression. We present a free, open-source R package Starr that, in combination with the package Ringo, facilitates the comparative analysis of ChIP-chip data across experiments and across different microarray platforms. Core features are data import, quality assessment, normalization and visualization of the data, and the detection of ChIP-enriched genomic regions. The use of common Bioconductor classes ensures the compatibility with other R packages. Most importantly, Starr provides methods for integration of complementary genomics data, e.g., it enables systematic investigation of the relation between gene expression and dna binding

    Genomic data integration with hidden Markov models to understand transcription regulation

    Get PDF
    Transcription is a tightly controlled process that involves the recruitment and prost-translational modification of DNA-associated protein complexes, which can be mapped to the genome using high-throughput experimental assays. An accurate annotation of genomic elements such as transcription units or cis-regulatory elements such as promoters or enhancers is crucial for the use and interpretation of data generated by these assays. Thus, integrative genomic data analysis of high-throughput assays with hidden Markov models (HMMs) has become a popular tool for genome annotation. However, current algorithms are limited by unrealistic data distribution assumptions and variance models. Moreover, they are not able to assign forward or reverse direction to states or properly integrate strand-specific (e.g., RNA expression) with non-strand-specific (e.g., ChIP) data, which is essential to characterize directed processes such as transcription. In this thesis new HMM-based methods are proposed to overcome these limitations. These include (i) bidirectional HMMs (bdHMMs) which integrate strand-specific with non-strand-specific data to infer directed genomic states de novo and (ii) GenoSTAN (Genomic STate ANnotation), a HMM using discrete probability distributions to model count data, for genome annotation from Next-Generation-Sequencing data. Both approaches are made available in the R/Bioconductor package STAN (STate ANnotation) which provides an efficient implementation that can be run on large genomes such as human. STAN is used to derive new and improved annotations of transcription in yeast and human and to generate a map of promoters and enhancers in 127 human cell types and tissues.Integration of transcription factor binding and RNA expression data in yeast recovers the majority of transcribed loci, reveals gene-specific variations in the yeast transcription cycle, identifies 32 new transcribed loci, a regulated initiation-elongation transition, the absence of elongation factors Ctk1 and Paf1 from a class of genes, a distinct transcription mechanism for highly expressed genes and novel DNA sequence motifs associated with transcription termination.Moreover, promoters and enhancers are predicted in 127 human cell types and tissues are mapped by integrating sequencing data from the ENCODE and Roadmap Epigenomics projects, today’s largest compendium of chromatin assays. Promoters and enhancers are identified with consistently higher accuracy and show significantly higher enrichment of complex trait-associated genetic variants than current annotations. Investigation of binding of 101 transcription factors in human K562 cells reveals common and distinctive TF binding properties of enhancers and promoters.Application of STAN to transient transcriptome sequencing (TT-Seq) data in human K562 cells recovers stable mRNAs, long intergenic non-coding RNAs, and additionally maps over 10,000 transient RNAs, including enhancer RNAs, antisense RNAs, and promoter-associated RNAs. Further analyses reveal that transient RNAs such as enhancer RNAs are short and lack U1 motifs and secondary structure. Taken together, the annotations inferred in this thesis gave new insights into transcription and its regulation and will be an important resource for future research in genomics. STAN is a valuable tool to create such annotations also in other organisms and as more data becomes available improve the existing ones

    Starr: Simple Tiling ARRay analysis of Affymetrix ChIP-chip data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Chromatin immunoprecipitation combined with DNA microarrays (ChIP-chip) is an assay used for investigating DNA-protein-binding or post-translational chromatin/histone modifications. As with all high-throughput technologies, it requires thorough bioinformatic processing of the data for which there is no standard yet. The primary goal is to reliably identify and localize genomic regions that bind a specific protein. Further investigation compares binding profiles of functionally related proteins, or binding profiles of the same proteins in different genetic backgrounds or experimental conditions. Ultimately, the goal is to gain a mechanistic understanding of the effects of DNA binding events on gene expression.</p> <p>Results</p> <p>We present a free, open-source <b>R</b>/Bioconductor package <it>Starr </it>that facilitates comparative analysis of ChIP-chip data across experiments and across different microarray platforms. The package provides functions for data import, quality assessment, data visualization and exploration. <it>Starr </it>includes high-level analysis tools such as the alignment of ChIP signals along annotated features, correlation analysis of ChIP signals with complementary genomic data, peak-finding and comparative display of multiple clusters of binding profiles. It uses standard Bioconductor classes for maximum compatibility with other software. Moreover, <it>Starr </it>automatically updates microarray probe annotation files by a highly efficient remapping of microarray probe sequences to an arbitrary genome.</p> <p>Conclusion</p> <p><it>Starr </it>is an <b>R </b>package that covers the complete ChIP-chip workflow from data processing to binding pattern detection. It focuses on the high-level data analysis, e.g., it provides methods for the integration and combined statistical analysis of binding profiles and complementary functional genomics data. <it>Starr </it>enables systematic assessment of binding behaviour for groups of genes that are alingned along arbitrary genomic features.</p

    Simultaneous characterization of sense and antisense genomic processes by the double-stranded hidden Markov model

    Get PDF
    Hidden Markov models (HMMs) have been extensively used to dissect the genome into functionally distinct regions using data such as RNA expression or DNA binding measurements. It is a challenge to disentangle processes occurring on complementary strands of the same genomic region. We present the double-stranded HMM (dsHMM), a model for the strand-specific analysis of genomic processes. We applied dsHMM to yeast using strand specific transcription data, nucleosome data, and protein binding data for a set of 11 factors associated with the regulation of transcription. The resulting annotation recovers the mRNA transcription cycle (initiation, elongation, termination) while correctly predicting strand-specificity and directionality of the transcription process. We find that pre-initiation complex formation is an essentially undirected process, giving rise to a large number of bidirectional promoters and to pervasive antisense transcription. Notably, 12% of all transcriptionally active positions showed simultaneous activity on both strands. Furthermore, dsHMM reveals that antisense transcription is specifically suppressed by Nrd1, a yeast termination factor

    Global DNA hypomethylation prevents consolidation of differentiation programs and allows reversion to the embryonic stem cell state.

    Get PDF
    DNA methylation patterns change dynamically during mammalian development and lineage specification, yet scarce information is available about how DNA methylation affects gene expression profiles upon differentiation. Here we determine genome-wide transcription profiles during undirected differentiation of severely hypomethylated (Dnmt1⁻/⁻) embryonic stem cells (ESCs) as well as ESCs completely devoid of DNA methylation (Dnmt1⁻/⁻;Dnmt3a⁻/⁻;Dnmt3b⁻/⁻ or TKO) and assay their potential to transit in and out of the ESC state. We find that the expression of only few genes mainly associated with germ line function and the X chromosome is affected in undifferentiated TKO ESCs. Upon initial differentiation as embryoid bodies (EBs) wild type, Dnmt1⁻/⁻ and TKO cells downregulate pluripotency associated genes and upregulate lineage specific genes, but their transcription profiles progressively diverge upon prolonged EB culture. While Oct4 protein levels are completely and homogeneously suppressed, transcription of Oct4 and Nanog is not completely silenced even at late stages in both Dnmt1⁻/⁻ and TKO EBs. Despite late wild type and Dnmt1⁻/⁻ EBs showing a much higher degree of concordant expression, after EB dissociation and replating under pluripotency promoting conditions both Dnmt1⁻/⁻ and TKO cells, but not wild type cells rapidly revert to expression profiles typical of undifferentiated ESCs. Thus, while DNA methylation seems not to be critical for initial activation of differentiation programs, it is crucial for permanent restriction of developmental fate during differentiation

    Annotation of genomics data using bidirectional hidden Markov models unveils variations in Pol II transcription cycle

    Get PDF
    DNA replication, transcription and repair involve the recruitment of protein complexes that change their composition as they progress along the genome in a directed or strand-specific manner. Chromatin immunoprecipitation in conjunction with hidden Markov models (HMMs) has been instrumental in understanding these processes, as they segment the genome into discrete states that can be related to DNA-associated protein complexes. However, current HMM-based approaches are not able to assign forward or reverse direction to states or properly integrate strand-specific (e.g.,RNA expression) with non-strand-specific (e.g.,ChIP) data, which is indispensable to accurately characterize directed processes. To overcome these limitations, we introduce bidirectional HMMs which infer directed genomic states from occupancy profiles de novo. Application to RNA polymerase II-associated factors in yeast and chromatin modifications in human T cells recovers the majority of transcribed loci, reveals gene-specific variations in the yeast transcription cycle and indicates the existence of directed chromatin state patterns at transcribed, but not at repressed, regions in the human genome. In yeast, we identify 32 new transcribed loci, a regulated initiation-elongation transition, the absence of elongation factors Ctk1 and Paf1 from a class of genes, a distinct transcription mechanism for highly expressed genes and novel DNA sequence motifs associated with transcription termination. We anticipate bidirectional HMMs to significantly improve the analyses of genome-associated directed processes

    Accurate Promoter and Enhancer Identification in 127 ENCODE and Roadmap Epigenomics Cell Types and Tissues by GenoSTAN

    Get PDF
    Accurate maps of promoters and enhancers are required for understanding transcriptional regulation. Promoters and enhancers are usually mapped by integration of chromatin assays charting histone modifications, DNA accessibility, and transcription factor binding. However, current algorithms are limited by unrealistic data distribution assumptions. Here we propose GenoSTAN (Genomic STate ANnotation), a hidden Markov model overcoming these limitations. We map promoters and enhancers for 127 cell types and tissues from the ENCODE and Roadmap Epigenomics projects, today's largest compendium of chromatin assays. Extensive benchmarks demonstrate that GenoSTAN generally identifies promoters and enhancers with significantly higher accuracy than previous methods. Moreover, GenoSTAN-derived promoters and enhancers showed significantly higher enrichment of complex trait-associated genetic variants than current annotations. Altogether, GenoSTAN provides an easy-to-use tool to define promoters and enhancers in any system, and our annotation of human transcriptional cis-regulatory elements constitutes a rich resource for future research in biology and medicine

    SARS-CoV-2 Omicron variants BA.1 and BA.2 both show similarly reduced disease severity of COVID-19 compared to Delta, Germany, 2021 to 2022

    Get PDF
    German national surveillance data analysis shows that hospitalisation odds associated with Omicron lineage BA.1 or BA.2 infections are up to 80% lower than with Delta infection, primarily in ≥ 35-year-olds. Hospitalised vaccinated Omicron cases’ proportions (2.3% for both lineages) seemed lower than those of the unvaccinated (4.4% for both lineages). Independent of vaccination status, the hospitalisation frequency among cases with Delta seemed nearly threefold higher (8.3%) than with Omicron (3.0% for both lineages), suggesting that Omicron inherently causes less severe disease.Peer Reviewe

    Cluster analysis of resistance combinations in Escherichia coli from different human and animal populations in Germany 2014-2017

    Get PDF
    Recent findings on Antibiotic Resistance (AR) have brought renewed attention to the comparison of data on AR from human and animal sectors. This is however a major challenge since the data is not harmonized. This study performs a comparative analysis of data on resistance combinations in Escherichia coli (E. coli) from different routine surveillance and monitoring systems for human and different animal populations in Germany. Data on E. coli isolates were collected between 2014 and 2017 from human clinical isolates, non-clinical animal isolates from food-producing animals and food, and clinical animal isolates from food-producing and companion animals from national routine surveillance and monitoring for AR in Germany. Sixteen possible resistance combinations to four antibiotics—ampicillin, cefotaxime, ciprofloxacin and gentamicin–for these populations were used for hierarchical clustering (Euclidian and average distance). All analyses were performed with the software R 3.5.1 (Rstudio 1.1.442). Data of 333,496 E. coli isolates and forty-one different human and animal populations were included in the cluster analysis. Three main clusters were detected. Within these three clusters, all human populations (intensive care unit (ICU), general ward and outpatient care) showed similar relative frequencies of the resistance combinations and clustered together. They demonstrated similarities with clinical isolates from different animal populations and most isolates from pigs from both non-clinical and clinical isolates. Isolates from healthy poultry demonstrated similarities in relative frequencies of resistance combinations and clustered together. However, they clustered separately from the human isolates. All isolates from different animal populations with low relative frequencies of resistance combinations clustered together. They also clustered separately from the human populations. Cluster analysis has been able to demonstrate the linkage among human isolates and isolates from various animal populations based on the resistance combinations. Further analyses based on these findings might support a better one-health approach for AR in Germany.Peer Reviewe
    corecore