1,033 research outputs found

    Identifying elemental genomic track types and representing them uniformly

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>With the recent advances and availability of various high-throughput sequencing technologies, data on many molecular aspects, such as gene regulation, chromatin dynamics, and the three-dimensional organization of DNA, are rapidly being generated in an increasing number of laboratories. The variation in biological context, and the increasingly dispersed mode of data generation, imply a need for precise, interoperable and flexible representations of genomic features through formats that are easy to parse. A host of alternative formats are currently available and in use, complicating analysis and tool development. The issue of whether and how the multitude of formats reflects varying underlying characteristics of data has to our knowledge not previously been systematically treated.</p> <p>Results</p> <p>We here identify intrinsic distinctions between genomic features, and argue that the distinctions imply that a certain variation in the representation of features as genomic tracks is warranted. Four core informational properties of tracks are discussed: gaps, lengths, values and interconnections. From this we delineate fifteen generic track types. Based on the track type distinctions, we characterize major existing representational formats and find that the track types are not adequately supported by any single format. We also find, in contrast to the XML formats, that none of the existing tabular formats are conveniently extendable to support all track types. We thus propose two unified formats for track data, an improved XML format, BioXSD 1.1, and a new tabular format, GTrack 1.0.</p> <p>Conclusions</p> <p>The defined track types are shown to capture relevant distinctions between genomic annotation tracks, resulting in varying representational needs and analysis possibilities. The proposed formats, GTrack 1.0 and BioXSD 1.1, cater to the identified track distinctions and emphasize preciseness, flexibility and parsing convenience.</p

    The Genomic HyperBrowser: an analysis web server for genome-scale data

    Get PDF
    The immense increase in availability of genomic scale datasets, such as those provided by the ENCODE and Roadmap Epigenomics projects, presents unprecedented opportunities for individual researchers to pose novel falsifiable biological questions. With this opportunity, however, researchers are faced with the challenge of how to best analyze and interpret their genome-scale datasets. A powerful way of representing genome-scale data is as feature-specific coordinates relative to reference genome assemblies, i.e. as genomic tracks. The Genomic HyperBrowser (http://hyperbrowser.uio.no) is an open-ended web server for the analysis of genomic track data. Through the provision of several highly customizable components for processing and statistical analysis of genomic tracks, the HyperBrowser opens for a range of genomic investigations, related to, e.g., gene regulation, disease association or epigenetic modifications of the genome

    The Genomic HyperBrowser: an analysis web server for genome-scale data

    Get PDF
    The immense increase in availability of genomic scale datasets, such as those provided by the ENCODE and Roadmap Epigenomics projects, presents unprecedented opportunities for individual researchers to pose novel falsifiable biological questions. With this opportunity, however, researchers are faced with the challenge of how to best analyze and interpret their genome-scale datasets. A powerful way of representing genome-scale data is as feature-specific coordinates relative to reference genome assemblies, i.e. as genomic tracks. The Genomic HyperBrowser (http://hyperbrowser.uio.no) is an open-ended web server for the analysis of genomic track data. Through the provision of several highly customizable components for processing and statistical analysis of genomic tracks, the HyperBrowser opens for a range of genomic investigations, related to, e.g., gene regulation, disease association or epigenetic modifications of the genome.publishedVersio

    GSuite HyperBrowser: integrative analysis of dataset collections across the genome and epigenome

    Get PDF
    Background: Recent large-scale undertakings such as ENCODE and Roadmap Epigenomics have generated experimental data mapped to the human reference genome (as genomic tracks) representing a variety of functional elements across a large number of cell types. Despite the high potential value of these publicly available data for a broad variety of investigations, little attention has been given to the analytical methodology necessary for their widespread utilisation. Findings: We here present a first principled treatment of the analysis of collections of genomic tracks. We have developed novel computational and statistical methodology to permit comparative and confirmatory analyses across multiple and disparate data sources. We delineate a set of generic questions that are useful across a broad range of investigations and discuss the implications of choosing different statistical measures and null models. Examples include contrasting analyses across different tissues or diseases. The methodology has been implemented in a comprehensive open-source software system, the GSuite HyperBrowser. To make the functionality accessible to biologists, and to facilitate reproducible analysis, we have also developed a web-based interface providing an expertly guided and customizable way of utilizing the methodology. With this system, many novel biological questions can flexibly be posed and rapidly answered. Conclusions: Through a combination of streamlined data acquisition, interoperable representation of dataset collections, and customizable statistical analysis with guided setup and interpretation, the GSuite HyperBrowser represents a first comprehensive solution for integrative analysis of track collections across the genome and epigenome. The software is available at: https://hyperbrowser.uio.no.This work was supported by the Research Council of Norway (under grant agreements 221580, 218241, and 231217/F20), by the Norwegian Cancer Society (under grant agreements 71220’PR-2006-0433 and 3485238-2013), and by the South-Eastern Norway Regional Health Authority (under grant agreement 2014041).Peer Reviewe

    Analysis of the determinants of Pol II pausing

    Get PDF
    Pausing of transcribing RNA polymerase II (Pol II) has emerged as a general feature of gene expression in human cells. Many transcription factors, DNA sequences and chromatin characteristics have been implicated in inducing transcriptional pausing. However, it is unclear what are the relative contributions of these factors on the observed Pol II pausing. Furthermore, research in metazoans has mainly focused on Pol II promoter-proximal pausing, leaving the causes of pausing outside of this region unknown. To reliably detect real transcriptional pausing sites and advance the understanding of the causes of this phenomenon, we developed a pausing detection algorithm for nucleotide-resolution Pol II occupancy data. We scrutinized the characteristics and potential shortcomings of Native Elongating Transcript sequencing (NET-seq), which is one of the high-resolution methods of Pol II profiling, and we used our observations to improve the NET-seq processing pipeline. Leveraging the improved processing pipeline and the developed pausing detection algorithm revealed widespread genome-wide Pol II pausing at a nucleotide resolution in human cells. Next, we set out to identify the determinants of Pol II pausing in an unbiased manner based on the underlying DNA sequence. To predict the predisposition of a genomic site to evoke Pol II pausing, we applied a range of machine learning approaches using previously identified high-confidence pausing sites. For each of the sites, we created a large number of features, including both factors that were previously linked to transcriptional pausing and factors that were not yet implicated in invoking pausing. Our analysis revealed DNA sequence properties underlying widespread Pol II pausing including a new pausing motif. Interestingly, key sequence determinants of RNA polymerase pausing are shared by human cells and bacteria. Our study indicates that transcriptional pausing in human cells is sequence-induced and that the determinants of Pol II pausing might be evolutionary conserved.Ein allgemeines Merkmal der Genexpression in menschlichen Zellen ist das Pausieren der RNA Polymerase II (Pol II). Verschiedene Aspekte wie Transkriptionsfaktoren, DNA Sequenzen und Eigenschaften des Chromatins werden mit dem Prozess in Verbindung gebracht. Der relative Beitrag dieser Faktoren zur Entstehung der beobachteten Pausen ist unbekannt. Darüber hinaus hat sich die bisherige Forschung bei Metazoen hauptsächlich auf Pol II Pausen während der frühen Elongationsphase, im promoter-proximalen Bereich, konzentriert. Die Ursachen für das Pausieren außerhalb dieser Regionen sind unbekannt. Um das Verständnis der Ursachen von Transkriptionspausen zu verbessern, haben wir einen Algorithmus entwickelt, der Pol II Signale verarbeitet und Pausen präzise bis auf ein einzelnes Nukleotid lokalisiert. Die Pol II Signalmessungen werden mithilfe von NET-seq (Native Elongating Transcript Sequencing), einer hochauflösenden Methode, erstellt. Bei der Untersuchung der Methode identifizierten wir systematische Fehler in den Messdaten, welche zur Anpassung bei der Datenverarbeitung führte. Diese algorithmischen Verbesserungen zeigten, dass Pol II Pausen in menschlichen Zellen weit verbreitet sind und verteilt über das gesamte Genom, an einzelnen Nukleotiden, beobachtet werden können. Für eine unvoreingenommene Identifizierung der Sequenzspezifischen Faktoren, die zum Pausieren der Pol II beitragen, wurden eine Reihe von Methoden des maschinellen Lernens angewandt. Mit hoher Sicherheit detektierte Transkriptionspausen wurden genutzt, um Prädispositionen in DNA-Abschnitten zu lernen und vorherzusagen. Für jedes dieser Beispiel Regionen werden beschreibende Merkmale erstellt. Darunter befinden sich Faktoren, die zuvor mit Transkriptionspausen in Verbindung gebracht wurden, sowie Merkmale ohne bekannte Assoziation. Unsere Analyse identifiziert ein neues DNA Sequenzmotiv und andere relevante Sequenzeigenschaften, welche dem pausieren der Pol II zugrunde liegen. Interessanterweise sind die identifizierten Sequenzeigenschaften sowohl in menschlichen Zellen als auch in Bakterien zu finden. Unsere Studie deutet darauf hin, dass Transkriptionspausen in menschlichen Zellen sequenzabhängig und evolutionär konserviert sind

    Isolierung und kultivierungsunabhängige Untersuchungen von magnetotaktischen Bakterien aus marinen und limnischen Sedimenten

    Get PDF
    In this work the diversity and distribution of magnetotactic bacteria (MTB) in Northern Germany were investigated. With the exception of extreme eutrophic environment different MTB (Spirilla, cocci, rods, vibrios) were found in all samples from marine and freshwater environments. Magnetotactic multicellular aggregates were found for the first time in the German Bight and in the Baltic Sea.During the incubation in microcosms an increasing number and a decreasing diversity of MTB was observed. In most cases the MTB population was dominated by magnetotactic cocci and 16S rDNA analysis identified them as Alphaproteobacteria . Up to 11% sequence divergences were found between the MTB of different microcosms but a high variability was also observed within single microcosms at different times.In one microcosm from a lake in Bremen a magnetotactic rod (MHB-1) was enriched which is closely related to Magnetobacterium bavaricum , the only known MTB from the Nitrospira phylum.A correlation between the development of a distinct MTB population in the microcosms and the original geographical location of the samples was not observed but the heterogeneous vertical distribution of MTB indicates an adaption to special gradients. Most MTB (up to 98%) were restricted to anoxic sediment layers and reached up to 1,5 x 107 MTB/cm3 with an abundance of up to 1% of the total cell counts which indicates a significant influence of MTB on the microbial iron cycle.The high effective enrichment of MTB by magnetic collection, race-track , resp. were demonstrated by microscopy, Denaturing Gradient Gel Electrophoresis (DGGE) of 16S rDNA fragments and Amplified Ribosomal DNA Restriction Analysis (ARDRA). The selective enrichment of MTB allowed the construction of a genomic library which most likely contains a fragment of the mamAB cluster.From the several cultivation experiments 10 new magnetotactic spirilla were isolated. All strains are microaerohpilic and members of the genus Magnetospirillum

    A shared ‘vulnerability code’ underpins varying sources of DNA damage throughout paternal germline transmission in mouse

    Get PDF
    During mammalian spermatogenesis, the paternal genome is extensively remodelled via replacement of histones with protamines forming the highly compact mature sperm nucleus. Compaction occurs in post-meiotic spermatids and is accompanied by extensive double strand break (DSB) formation. We investigate the epigenomic and genomic context of mouse spermatid DSBs, identifying primary sequence motifs, secondary DNA structures and chromatin contexts associated with this damage. Consistent with previously published results we find spermatid DSBs positively associated with short tandem repeats and LINE elements. We further show spermatid DSBs preferentially occur in association with (CA)n, (NA)n and (RY)n repeats, in predicted Z-DNA, are not associated with G-quadruplexes, are preferentially found in regions of low histone mark coverage and engage the remodelling/NHEJ factor BRD4. Locations incurring DSBs in spermatids also show distinct epigenetic profiles throughout later developmental stages: regions retaining histones in mature sperm, regions susceptible to oxidative damage in mature sperm, and fragile two-cell like embryonic stem cell regions bound by ZSCAN4 all co-localise with spermatid DSBs and with each other. Our results point to a common ‘vulnerability code’ unifying several types of DNA damage occurring on the paternal genome during reproduction, potentially underpinned by torsional changes during sperm chromatin remodelling
    corecore