19 research outputs found

    Proteinortho: Detection of (Co-)orthologs in large-scale analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Orthology analysis is an important part of data analysis in many areas of bioinformatics such as comparative genomics and molecular phylogenetics. The ever-increasing flood of sequence data, and hence the rapidly increasing number of genomes that can be compared simultaneously, calls for efficient software tools as brute-force approaches with quadratic memory requirements become infeasible in practise. The rapid pace at which new data become available, furthermore, makes it desirable to compute genome-wide orthology relations for a given dataset rather than relying on relations listed in databases.</p> <p>Results</p> <p>The program <monospace>Proteinortho</monospace> described here is a stand-alone tool that is geared towards large datasets and makes use of distributed computing techniques when run on multi-core hardware. It implements an extended version of the reciprocal best alignment heuristic. We apply <monospace>Proteinortho</monospace> to compute orthologous proteins in the complete set of all 717 eubacterial genomes available at NCBI at the beginning of 2009. We identified thirty proteins present in 99% of all bacterial proteomes.</p> <p>Conclusions</p> <p><monospace>Proteinortho</monospace> significantly reduces the required amount of memory for orthology analysis compared to existing tools, allowing such computations to be performed on off-the-shelf hardware.</p

    Ligand-dependent tRNA processing by a rationally designed RNase P riboswitch

    No full text

    Common features in lncRNA annotation and classification : a survey

    No full text
    Long non-coding RNAs (lncRNAs) are widely recognized as important regulators of gene expression. Their molecular functions range from miRNA sponging to chromatin-associated mechanisms, leading to effects in disease progression and establishing them as diagnostic and therapeutic targets. Still, only a few representatives of this diverse class of RNAs are well studied, while the vast majority is poorly described beyond the existence of their transcripts. In this review we survey common in silico approaches for lncRNA annotation. We focus on the well-established sets of features used for classification and discuss their specific advantages and weaknesses. While the available tools perform very well for the task of distinguishing coding sequence from other RNAs, we find that current methods are not well suited to distinguish lncRNAs or parts thereof from other non-protein-coding input sequences. We conclude that the distinction of lncRNAs from intronic sequences and untranslated regions of coding mRNAs remains a pressing research gap

    Beyond plug and pray : context sensitivity and in silico design of artificial neomycin riboswitches

    No full text
    Gene regulation in prokaryotes often depends on RNA elements such as riboswitches or RNA thermometers located in the 5′ untranslated region of mRNA. Rearrangements of the RNA structure in response, e.g., to the binding of small molecules or ions control translational initiation or premature termination of transcription and thus mRNA expression. Such structural responses are amenable to computational modelling, making it possible to rationally design synthetic riboswitches for a given aptamer. Starting from an artificial aptamer, we construct the first synthetic transcriptional riboswitches that respond to the antibiotic neomycin. We show that the switching behaviour in vivo critically depends not only on the sequence of the riboswitch itself, but also on its sequence context. We therefore developed in silico methods to predict the impact of the context, making it possible to adapt the design and to rescue non-functional riboswitches. We furthermore analyse the influence of 5′ hairpins with varying stability on neomycin riboswitch activity. Our data highlight the limitations of a simple plug-and-play approach in the design of complex genetic circuits and demonstrate that detailed computational models significantly simplify, improve, and automate the design of transcriptional circuits. Our design software is available under a free licence on GitHub (https://github.com/xileF1337/riboswitch_design)

    Bioinformatics of prokaryotic RNAs

    No full text
    The genome of most prokaryotes gives rise to surprisingly complex transcriptomes, comprising not only protein-coding mRNAs, often organized as operons, but also harbors dozens or even hundreds of highly structured small regulatory RNAs and unexpectedly large levels of anti-sense transcripts. Comprehensive surveys of prokaryotic transcriptomes and the need to characterize also their non-coding components is heavily dependent on computational methods and workflows, many of which have been developed or at least adapted specifically for the use with bacterial and archaeal data. This review provides an overview on the state-of-the-art of RNA bioinformatics focusing on applications to prokaryotes

    Nature

    No full text
    Genome sequencing of Helicobacter pylori has revealed the potential proteins and genetic diversity of this prevalent human pathogen, yet little is known about its transcriptional organization and noncoding RNA output. Massively parallel cDNA sequencing (RNA-seq) has been revolutionizing global transcriptomic analysis. Here, using a novel differential approach (dRNA-seq) selective for the 5′ end of primary transcripts, we present a genome-wide map of H. pylori transcriptional start sites and operons. We discovered hundreds of transcriptional start sites within operons, and opposite to annotated genes, indicating that complexity of gene expression from the small H. pylori genome is increased by uncoupling of polycistrons and by genome-wide antisense transcription. We also discovered an unexpected number of ~60 small RNAs including the ϵ-subdivision counterpart of the regulatory 6S RNA and associated RNA products, and potential regulators of cis- and trans-encoded target messenger RNAs. Our approach establishes a paradigm for mapping and annotating the primary transcriptomes of many living species
    corecore