3,540 research outputs found

    Prediction of locally stable RNA secondary structures for genome-wide surveys

    Get PDF
    Motivation: Recently novel classes of functional RNAs, most prominently the miRNAs have been discovered, strongly suggesting that further types of functional RNAs are still hidden in the recently completed genomic DNA sequences. Only few techniques are known, however, to survey genomes for such RNA genes. When sufficiently similar sequences are not available for comparative approaches the only known remedy is to search directly for structural features. Results: We present here efficient algorithms for computing locally stable RNA structures at genome-wide scales. Both the minimum energy structure and the complete matrix of base pairing probabilities can be computed in (N × L2) time and (N + L2) memory in terms of the length N of the genome and the size L of the largest secondary structure motifs of interest. In practice, the 100 Mb of the complete genome of Caenorhabditis elegans can be folded within about half a day on a modern PC with a search depth of L = 100. This is sufficient example for a survey for miRNAs

    A new procedure to analyze RNA non-branching structures

    Get PDF
    RNA structure prediction and structural motifs analysis are challenging tasks in the investigation of RNA function. We propose a novel procedure to detect structural motifs shared between two RNAs (a reference and a target). In particular, we developed two core modules: (i) nbRSSP_extractor, to assign a unique structure to the reference RNA encoded by a set of non-branching structures; (ii) SSD_finder, to detect structural motifs that the target RNA shares with the reference, by means of a new score function that rewards the relative distance of the target non-branching structures compared to the reference ones. We integrated these algorithms with already existing software to reach a coherent pipeline able to perform the following two main tasks: prediction of RNA structures (integration of RNALfold and nbRSSP_extractor) and search for chains of matches (integration of Structator and SSD_finder)

    miROrtho: computational survey of microRNA genes

    Get PDF
    MicroRNAs (miRNAs) are short, non-protein coding RNAs that direct the widespread phenomenon of post-transcriptional regulation of metazoan genes. The mature ∼22-nt long RNA molecules are processed from genome-encoded stem-loop structured precursor genes. Hundreds of such genes have been experimentally validated in vertebrate genomes, yet their discovery remains challenging, and substantially higher numbers have been estimated. The miROrtho database (http://cegg.unige.ch/mirortho) presents the results of a comprehensive computational survey of miRNA gene candidates across the majority of sequenced metazoan genomes. We designed and applied a three-tier analysis pipeline: (i) an SVM-based ab initio screen for potent hairpins, plus homologs of known miRNAs, (ii) an orthology delineation procedure and (iii) an SVM-based classifier of the ortholog multiple sequence alignments. The web interface provides direct access to putative miRNA annotations, ortholog multiple alignments, RNA secondary structure conservation, and sequence data. The miROrtho data are conceptually complementary to the miRBase catalog of experimentally verified miRNA sequences, providing a consistent comparative genomics perspective as well as identifying many novel miRNA genes with strong evolutionary support

    smyRNA: A Novel Ab Initio ncRNA Gene Finder

    Get PDF
    Background: Non-coding RNAs (ncRNAs) have important functional roles in the cell: for example, they regulate gene expression by means of establishing stable joint structures with target mRNAs via complementary sequence motifs. Sequence motifs are also important determinants of the structure of ncRNAs. Although ncRNAs are abundant, discovering novel ncRNAs on genome sequences has proven to be a hard task; in particular past attempts for ab initio ncRNA search mostly failed with the exception of tools that can identify micro RNAs. Methodology/Principal Findings: We present a very general ab initio ncRNA gene finder that exploits differential distributions of sequence motifs between ncRNAs and background genome sequences. Conclusions/Significance: Our method, once trained on a set of ncRNAs from a given species, can be applied to a genome sequences of other organisms to find not only ncRNAs homologous to those in the training set but also others that potentially belong to novel (and perhaps unknown) ncRNA families. Availability

    Detection of very long antisense transcripts by whole transcriptome RNA-Seq analysis of Listeria monocytogenes by semiconductor sequencing technology

    Get PDF
    The Gram-positive bacterium Listeria monocytogenes is the causative agent of listeriosis, a severe food-borne infection characterised by abortion, septicaemia, or meningoencephalitis. L. monocytogenes causes outbreaks of febrile gastroenteritis and accounts for community-acquired bacterial meningitis in humans. Listeriosis has one of the highest mortality rates (up to 30%) of all food-borne infections. This human pathogenic bacterium is an important model organism for biomedical research to investigate cell-mediated immunity. L. monocytogenes is also one of the best characterised bacterial systems for the molecular analysis of intracellular parasitism. Recently several transcriptomic studies have also made the ubiquitous distributed bacterium as a model to understand mechanisms of gene regulation from the environment to the infected host on the level of mRNA and non-coding RNAs (ncRNAs). We have used semiconductor sequencing technology for RNA-seq to investigate the repertoire of listerial ncRNAs under extra- and intracellular growth conditions. Furthermore, we applied a new bioinformatic analysis pipeline for detection, comparative genomics and structural conservation to identify ncRNAs. With this work, in total, 741 ncRNA locations of potential ncRNA candidates are now known for L. monocytogenes, of which 611 ncRNA candidates were identified by RNA-seq. 441 transcribed ncRNAs have never been described before. Among these, we identified novel long non-coding antisense RNAs with a length of up to 5,400 nt e.g. opposite to genes coding for internalins, methylases or a high-affinity potassium uptake system, namely the kdpABC operon, which were confirmed by qRT-PCR analysis. RNA-seq, comparative genomics and structural conservation of L. monocytogenes ncRNAs illustrate that this human pathogen uses a large number and repertoire of ncRNA including novel long antisense RNAs, which could be important for intracellular survival within the infected eukaryotic host

    RScan: fast searching structural similarities for structured RNAs in large databases

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Many RNAs have evolutionarily conserved secondary structures instead of primary sequences. Recently, there are an increasing number of methods being developed with focus on the structural alignments for finding conserved secondary structures as well as common structural motifs in pair-wise or multiple sequences. A challenging task is to search similar structures quickly for structured RNA sequences in large genomic databases since existing methods are too slow to be used in large databases.</p> <p>Results</p> <p>An implementation of a fast structural alignment algorithm, RScan, is proposed to fulfill the task. RScan is developed by levering the advantages of both hashing algorithms and local alignment algorithms. In our experiment, on the average, the times for searching a tRNA and an rRNA in the randomized <it>A. pernix </it>genome are only 256 seconds and 832 seconds respectively by using RScan, but need 3,178 seconds and 8,951 seconds respectively by using an existing method RSEARCH. Remarkably, RScan can handle large database queries, taking less than 4 minutes for searching similar structures for a microRNA precursor in human chromosome 21.</p> <p>Conclusion</p> <p>These results indicate that RScan is a preferable choice for real-life application of searching structural similarities for structured RNAs in large databases. RScan software is freely available at <url>http://bioinfo.au.tsinghua.edu.cn/member/cxue/rscan/RScan.htm</url>.</p

    miROrtho: computational survey of microRNA genes

    Get PDF
    MicroRNAs (miRNAs) are short, non-protein coding RNAs that direct the widespread phenomenon of post-transcriptional regulation of metazoan genes. The mature ∼22-nt long RNA molecules are processed from genome-encoded stem-loop structured precursor genes. Hundreds of such genes have been experimentally validated in vertebrate genomes, yet their discovery remains challenging, and substantially higher numbers have been estimated. The miROrtho database (http://cegg.unige.ch/mirortho) presents the results of a comprehensive computational survey of miRNA gene candidates across the majority of sequenced metazoan genomes. We designed and applied a three-tier analysis pipeline: (i) an SVM-based ab initio screen for potent hairpins, plus homologs of known miRNAs, (ii) an orthology delineation procedure and (iii) an SVM-based classifier of the ortholog multiple sequence alignments. The web interface provides direct access to putative miRNA annotations, ortholog multiple alignments, RNA secondary structure conservation, and sequence data. The miROrtho data are conceptually complementary to the miRBase catalog of experimentally verified miRNA sequences, providing a consistent comparative genomics perspective as well as identifying many novel miRNA genes with strong evolutionary suppor

    NOVOMIR: De Novo Prediction of MicroRNA-Coding Regions in a Single Plant-Genome

    Get PDF
    MicroRNAs (miRNA) are small regulatory, noncoding RNA molecules that are transcribed as primary miRNAs (pri-miRNA) from eukaryotic genomes. At least in plants, their regulatory activity is mediated through base-pairing with protein-coding messenger RNAs (mRNA) followed by mRNA degradation or translation repression. We describe NOVOMIR, a program for the identification of miRNA genes in plant genomes. It uses a series of filter steps and a statistical model to discriminate a pre-miRNA from other RNAs and does rely neither on prior knowledge of a miRNA target nor on comparative genomics. The sensitivity and specificity of NOVOMIR for detection of premiRNAs from Arabidopsis thaliana is ~0.83 and ~0.99, respectively. Plant pre-miRNAs are more heterogeneous with respect to size and structure than animal pre-miRNAs. Despite these difficulties, NOVOMIR is well suited to perform searches for pre-miRNAs on a genomic scale. NOVOMIR is written in Perl and relies on two additional, free programs for prediction of RNA secondary structure (RNALFOLD, RNASHAPES)

    RNALOSS: a web server for RNA locally optimal secondary structures

    Get PDF
    RNAomics, analogous to proteomics, concerns aspects of the secondary and tertiary structure, folding pathway, kinetics, comparison, function and regulation of all RNA in a living organism. Given recently discovered roles played by micro RNA, small interfering RNA, riboswitches, ribozymes, etc., it is important to gain insight into the folding process of RNA sequences. We describe the web server RNALOSS, which provides information about the distribution of locally optimal secondary structures, that possibly form kinetic traps in the folding process. The tool RNALOSS may be useful in designing RNA sequences which not only have low folding energy, but whose distribution of locally optimal secondary structures would suggest rapid and robust folding. Website:
    corecore