83 research outputs found

    Expanding the repertoire of bacterial (non-)coding RNAs

    Get PDF
    The detection of non-protein-coding RNA (ncRNA) genes in bacteria and their diverse regulatory mode of action moved the experimental and bio-computational analysis of ncRNAs into the focus of attention. Regulatory ncRNA transcripts are not translated to proteins but function directly on the RNA level. These typically small RNAs have been found to be involved in diverse processes such as (post-)transcriptional regulation and modification, translation, protein translocation, protein degradation and sequestration. Bacterial ncRNAs either arise from independent primary transcripts or their mature sequence is generated via processing from a precursor. Besides these autonomous transcripts, RNA regulators (e.g. riboswitches and RNA thermometers) also form chimera with protein-coding sequences. These structured regulatory elements are encoded within the messenger RNA and directly regulate the expression of their “host” gene. The quality and completeness of genome annotation is essential for all subsequent analyses. In contrast to protein-coding genes ncRNAs lack clear statistical signals on the sequence level. Thus, sophisticated tools have been developed to automatically identify ncRNA genes. Unfortunately, these tools are not part of generic genome annotation pipelines and therefore computational searches for known ncRNA genes are the starting point of each study. Moreover, prokaryotic genome annotation lacks essential features of protein-coding genes. Many known ncRNAs regulate translation via base-pairing to the 5’ UTR (untranslated region) of mRNA transcripts. Eukaryotic 5’ UTRs have been routinely annotated by sequencing of ESTs (expressed sequence tags) for more than a decade. Only recently, experimental setups have been developed to systematically identify these elements on a genome-wide scale in prokaryotes. The first part of this thesis, describes three experimental surveys of exploratory field studies to analyze transcript organization in pathogenic bacteria. To identify ncRNAs in Pseudomonas aeruginosa we used a combination of an experimental RNomics approach and ncRNA prediction. Besides already known ncRNAs we identified and validated the expression of six novel RNA genes. Global detection of transcripts by next generation RNA sequencing techniques unraveled an unexpectedly complex transcript organization in many bacteria. These ultra high-throughput methods give us the appealing opportunity to analyze the complete RNA output of any species at once. The development of the differential RNA sequencing (dRNA-seq) approach enabled us to analyze the primary transcriptome of Helicobacter pylori and Xanthomonas campestris. For the first time we generated a comprehensive and precise transcription start site (TSS) map for both species and provide a general framework for the analysis of dRNA-seq data. Focusing on computer-aided analysis we developed new tools to annotate TSS, detect small protein-coding genes and to infer homology of newly detected transcripts. We discovered hundreds of TSS in intergenic regions, upstream of protein-coding genes, within operons and antisense to annotated genes. Analysis of 5’ UTRs (spanning from the TSS to the start codon of the adjacent protein-coding gene) revealed an unexpected size diversity ranging from zero to several hundred nucleotides. We identified and validated the expression of about 60 and about 20 ncRNA candidates in Helicobacter and Xanthomonas, respectively. Among these ncRNA candidates we found several small protein-coding genes that have previously evaded annotation in both species. We showed that the combination of dRNA-seq and computational analysis is a powerful method to examine prokaryotic transcriptomes. Experimental setups are time consuming and often combined with huge costs. Another limitation of experimental approaches is that genes which are expressed in specific developmental stages or stress conditions are likely to be missed. Bioinformatic tools build an alternative to overcome such restraints. General approaches usually depend on comparative genomic data and evolutionary signatures are used to analyze the (non-)coding potential of multiple sequence alignments. In the second part of my thesis we present our major update of the widely used ncRNA gene finder RNAz and introduce RNAcode, an efficient tool to asses local protein-coding potential of genomic regions. RNAz has been successfully used to identify structured RNA elements in all domains of life. However, our own experience and the user feedback not only demonstrated the applicability of the RNAz approach, but also helped us to identify limitations of the current implementation. Using a much larger training set and a new classification model we significantly improved the prediction accuracy of RNAz. During transcriptome analysis we repeatedly identified small protein-coding genes that have not been annotated so far. Only a few of those genes are known to date and standard proteincoding gene finding tools suffer from the lack of training data. To avoid an excess of false positive predictions, gene finding software is usually run with an arbitrary cutoff of 40-50 amino acids and therefore misses the small sized protein-coding genes. We have implemented RNAcode which is optimized for emerging applications not covered by standard protein-coding gene annotation software. In addition to complementing classical protein gene annotation, a major field of application of RNAcode is the functional classification of transcribed regions. RNA sequencing analyses are likely to falsely report transcript fragments (e.g. mRNA degradation products) as non-coding. Hence, an evaluation of the protein-coding potential of these fragments is an essential task. RNAcode reports local regions of high coding potential instead of complete protein-coding genes. A training on known protein-coding sequences is not necessary and RNAcode can therefore be applied to any species. We showed this with our analysis of the Escherichia coli genome where the current annotation could be accurately reproduced. We furthermore identified novel small protein-coding genes with RNAcode in this extensively studied genome. Using transcriptome and proteome data we found compelling evidence that several of the identified candidates are bona fide proteins. In summary, this thesis clearly demonstrates that bioinformatic methods are mandatory to analyze the huge amount of transcriptome data and to identify novel (non-)coding RNA genes. With the major update of RNAz and the implementation of RNAcode we contributed to complete the repertoire of gene finding software which will help to unearth hidden treasures of the RNA World

    Von der Frage zur Antwort

    Full text link
    "Dass Tod und - mittlerweile - auch Sterben in prominenter Weise auf religiöse Formen der Kommunikation verweisen, ist sicherlich kein Novum. Neu ist auch nicht die Beschreibung, dass der Tod dabei immer weniger von klassisch kirchlichen Angeboten aufgefangen werden kann, stattdessen aber immer mehr von individuellen religiösen Selbstbeschreibungen. Eine solche religionssoziologische Beschreibung bleibt insofern unvollständig, als sie nicht den veränderten Umgang mit dem Tod mit einbezieht (vgl. Saake/ Nassehi 2004, Findeiß 2005). Das 'Problem des Todes' (Baumann 1992) lässt sich immer weniger als Transzendieren des Todes beschreiben, sondern vielmehr als Notwendigkeit (religiöse) Antworten in Fragstellungen zu transformieren. Wie dies geschieht, soll anhand von empirischem Material mit Seelsorgern aufgezeigt werden. Während Religionsexperten (Friedrich Wilhelm Graf) die Inflationierung der religiösen Bedeutung befürchten, ergibt sich mit dieser Analyse ein Blick auf eine Praxis, die von dieser Inflation bereits profitiert: Gerade weil alles möglich scheint, erscheint auch die religiöse Antwort zunehmend möglich. Die religiöse Rede funktioniert also parasitär in Bezug auf den 'Kampf der Götter'. Je mehr Götter kämpfen, desto unaufgeregter ergibt sich eine religiöse Antwort. Wichtig ist nur - und das kann man dem Material entnehmen -, dass zunächst eine Frage vorhanden ist. Und exakt darin lässt sich der entscheidende Strukturwandel religiöser Erfahrung in der Moderne verorten." (Autorenreferat

    Common Features in lncRNA Annotation and Classification: A Survey

    Get PDF
    Long non-coding RNAs (lncRNAs) are widely recognized as important regulators of gene expression. Their molecular functions range from miRNA sponging to chromatin-associated mechanisms, leading to effects in disease progression and establishing them as diagnostic and therapeutic targets. Still, only a few representatives of this diverse class of RNAs are well studied, while the vast majority is poorly described beyond the existence of their transcripts. In this review we survey common in silico approaches for lncRNA annotation. We focus on the well-established sets of features used for classification and discuss their specific advantages and weaknesses. While the available tools perform very well for the task of distinguishing coding sequence from other RNAs, we find that current methods are not well suited to distinguish lncRNAs or parts thereof from other non-protein-coding input sequences. We conclude that the distinction of lncRNAs from intronic sequences and untranslated regions of coding mRNAs remains a pressing research gap

    Transcriptional regulation of the human CD97 promoter by Sp1/Sp3 in smooth muscle cells

    Get PDF
    The EGF-TM7 receptor CD97 shows different features of expression and function in muscle cells compared to hematopoetic and tumor cells. Since the molecular function and regulation of CD97 are poorly understood, this study aimed at defining its basal transcriptional regulation in smooth muscle cells (SMCs). The computational analysis of the CD97 5′-flanking region revealed that the TATA box-lacking promoter possesses several GC-rich regions as putative Sp1/Sp3 binding sites. Transfection studies with serially deleted promoter constructs demonstrated that the minimal promoter fragment resided in the − 218/+ 45 region containing one out of five identified GC-boxes in the leiomyosarcoma cell line SK-LMS-1 and human bronchial smooth muscle cells (HbSMCs). Mutation of the most proximal GC-site in CD97 reporter gene constructs caused a significant decrease in promoter activity. Gel shift assays and chromatin immunoprecipitation revealed that Sp1 and Sp3 bound specifically to the most proximal GC-site. Furthermore, we showed that Sp1 and Sp3 over-expression activates CD97 promoter activity in HEK293 cells. Our data characterize for the first time the activity of the human CD97 promoter which is controlled by Sp1/Sp3 transcription factors in SMCs

    Kooperationsförderung heißt Abschied von der Selbsthilfe/Profi-Spaltung

    Full text link

    Computational RNomics of Drosophilids

    Get PDF
    Recent experimental and computational studies have provided overwhelming evidence for a plethora of diverse transcripts that are unrelated to protein-coding genes. One subclass consists of those RNAs that require distinctive secondary structure motifs to exert their biological function and hence exhibit distinctive patterns of sequence conservation characteristic for positive selection on RNA secondary structure. The deep-sequencing of 12 drosophilid species coordinated by the NHGRI provides an ideal data set of comparative computational approaches to determine those genomic loci that code for evolutionarily conserved RNA motifs. This class of loci includes the majority of the known small ncRNAs as well as structured RNA motifs in mRNAs. We report here on a genome-wide survey using RNAz

    deepBlockAlign: a tool for aligning RNA-seq profiles of read block patterns

    Get PDF
    Motivation: High-throughput sequencing methods allow whole transcriptomes to be sequenced fast and cost-effectively. Short RNA sequencing provides not only quantitative expression data but also an opportunity to identify novel coding and non-coding RNAs. Many long transcripts undergo post-transcriptional processing that generates short RNA sequence fragments. Mapped back to a reference genome, they form distinctive patterns that convey information on both the structure of the parent transcript and the modalities of its processing. The miR-miR* pattern from microRNA precursors is the best-known, but by no means singular, example

    Proteinortho: Detection of (Co-)orthologs in large-scale analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Orthology analysis is an important part of data analysis in many areas of bioinformatics such as comparative genomics and molecular phylogenetics. The ever-increasing flood of sequence data, and hence the rapidly increasing number of genomes that can be compared simultaneously, calls for efficient software tools as brute-force approaches with quadratic memory requirements become infeasible in practise. The rapid pace at which new data become available, furthermore, makes it desirable to compute genome-wide orthology relations for a given dataset rather than relying on relations listed in databases.</p> <p>Results</p> <p>The program <monospace>Proteinortho</monospace> described here is a stand-alone tool that is geared towards large datasets and makes use of distributed computing techniques when run on multi-core hardware. It implements an extended version of the reciprocal best alignment heuristic. We apply <monospace>Proteinortho</monospace> to compute orthologous proteins in the complete set of all 717 eubacterial genomes available at NCBI at the beginning of 2009. We identified thirty proteins present in 99% of all bacterial proteomes.</p> <p>Conclusions</p> <p><monospace>Proteinortho</monospace> significantly reduces the required amount of memory for orthology analysis compared to existing tools, allowing such computations to be performed on off-the-shelf hardware.</p
    corecore