Search CORE

367 research outputs found

Lightweight comparison of RNAs based on exact sequence–structure matches

Author: Allali
Altschul
Backofen
Bafna
Bahr
Bauer
Blin
Cannone
Evans
Gardner
Griffiths-Jones
Havgaard
Hentze
Hofacker
Hofacker
Huttenhofer
Höchsmann
Jiang
Jiang
Lin
Martineau
Mathews
Mathews
Michael Beckstette
Otto
Rolf Backofen
Sankoff
Sebastian Will
Serganov
Steffen Heyne
Torarinsson
Will
Wilm
Wilting
Zhang
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

Motivation: Specific functions of ribonucleic acid (RNA) molecules are often associated with different motifs in the RNA structure. The key feature that forms such an RNA motif is the combination of sequence and structure properties. In this article, we introduce a new RNA sequence–structure comparison method which maintains exact matching substructures. Existing common substructures are treated as whole unit while variability is allowed between such structural motifs

CiteSeerX

Crossref

PubMed Central

Publications at Bielefeld University

A new procedure to analyze RNA non-branching structures

Author: FISCON GIULIA
G. Iannello
P. Paci
T. Colombo
Publication venue: 'Bentham Science Publishers Ltd.'
Publication date: 01/01/2015
Field of study

RNA structure prediction and structural motifs analysis are challenging tasks in the investigation of RNA function. We propose a novel procedure to detect structural motifs shared between two RNAs (a reference and a target). In particular, we developed two core modules: (i) nbRSSP_extractor, to assign a unique structure to the reference RNA encoded by a set of non-branching structures; (ii) SSD_finder, to detect structural motifs that the target RNA shares with the reference, by means of a new score function that rewards the relative distance of the target non-branching structures compared to the reference ones. We integrated these algorithms with already existing software to reach a coherent pipeline able to perform the following two main tasks: prediction of RNA structures (integration of RNALfold and nbRSSP_extractor) and search for chains of matches (integration of Structator and SSD_finder)

Archivio della ricerca- Università di Roma La Sapienza

Freiburg RNA Tools: a web server integrating IntaRNA, ExpaRNA and LocARNA

Author: A. S. Richter
Amaral
Bauer
Bernhart
C. Smith
Frohlich
Harmanci
Mathews
Mattick
R. Backofen
Rose
S. Heyne
S. Will
Sharp
Wang
Washietl
Will
Wilm
Zuker
Publication venue: Oxford University Press
Publication date: 01/03/2010
Field of study

The Freiburg RNA tools web server integrates three tools for the advanced analysis of RNA in a common web-based user interface. The tools IntaRNA, ExpaRNA and LocARNA support the prediction of RNA–RNA interaction, exact RNA matching and alignment of RNA, respectively. The Freiburg RNA tools web server and the software packages of the stand-alone tools are freely accessible at http://rna.informatik.uni-freiburg.de

DSpace@MIT

Crossref

PubMed Central

Chaining Sequence/Structure Seeds for Computing RNA Similarity

Author: Brown D.G.
Cédric Chauve
Hochsmann M.
Julien Allali
Laetitia Bourgeade
Schmiedl C.
Publication venue: 'Mary Ann Liebert Inc'
Publication date
Field of study

Crossref

Geoseq: a tool for dissecting deep-sequencing datasets

Author: Cancio Anthony
George Ajish
Gurtowski James
Homann Robert
Levovitz Chaya
Sachidanandam Ravi
Shah Hardik
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Gurtowski J, Cancio A, Shah H, et al. Geoseq: a tool for dissecting deep-sequencing datasets. BMC Bioinformatics. 2010;11(1): 506.Background Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO), Sequence Read Archive (SRA) hosted by the NCBI, or the DNA Data Bank of Japan (ddbj). Despite being rich data sources, they have not been used much due to the difficulty in locating and analyzing datasets of interest. Results Geoseq http://geoseq.mssm.edu provides a new method of analyzing short reads from deep sequencing experiments. Instead of mapping the reads to reference genomes or sequences, Geoseq maps a reference sequence against the sequencing data. It is web-based, and holds pre-computed data from public libraries. The analysis reduces the input sequence to tiles and measures the coverage of each tile in a sequence library through the use of suffix arrays. The user can upload custom target sequences or use gene/miRNA names for the search and get back results as plots and spreadsheet files. Geoseq organizes the public sequencing data using a controlled vocabulary, allowing identification of relevant libraries by organism, tissue and type of experiment. Conclusions Analysis of small sets of sequences against deep-sequencing datasets, as well as identification of public datasets of interest, is simplified by Geoseq. We applied Geoseq to, a) identify differential isoform expression in mRNA-seq datasets, b) identify miRNAs (microRNAs) in libraries, and identify mature and star sequences in miRNAS and c) to identify potentially mis-annotated miRNAs. The ease of using Geoseq for these analyses suggests its utility and uniqueness as an analysis tool

Crossref

Directory of Open Access Journals

PubMed Central

Publications at Bielefeld University

ExpaRNA-P : simultaneous exact pattern matching and folding of RNAs

Author: Amit Mika
Backofen Rolf
Heyne Steffen
Landau Gad M.
Möhl Mathias
Otto Christina
Will Sebastian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Background: Identifying sequence-structure motifs common to two RNAs can speed up the comparison of structural RNAs substantially. The core algorithm of the existent approach ExpaRNA solves this problem for a priori known input structures. However, such structures are rarely known; moreover, predicting them computationally is no rescue, since single sequence structure prediction is highly unreliable. Results: The novel algorithm ExpaRNA-P computes exactly matching sequence-structure motifs in entire Boltzmann-distributed structure ensembles of two RNAs; thereby we match and fold RNAs simultaneously, analogous to the well-known “simultaneous alignment and folding” of RNAs. While this implies much higher flexibility compared to ExpaRNA, ExpaRNA-P has the same very low complexity (quadratic in time and space), which is enabled by its novel structure ensemble-based sparsification. Furthermore, we devise a generalized chaining algorithm to compute compatible subsets of ExpaRNA-P’s sequence-structure motifs. Resulting in the very fast RNA alignment approach ExpLoc-P, we utilize the best chain as anchor constraints for the sequence-structure alignment tool LocARNA. ExpLoc-P is benchmarked in several variants and versus state-of-the-art approaches. In particular, we formally introduce and evaluate strict and relaxed variants of the problem; the latter makes the approach sensitive to compensatory mutations. Across a benchmark set of typical non-coding RNAs, ExpLoc-P has similar accuracy to LocARNA but is four times faster (in both variants), while it achieves a speed-up over 30-fold for the longest benchmark sequences (≈400nt). Finally, different ExpLoc-P variants enable tailoring of the method to specific application scenarios. ExpaRNA-P and ExpLoc-P are distributed as part of the LocARNA package. The source code is freely available at http://www.bioinf.uni-freiburg.de/Software/ExpaRNA-P webcite. Conclusions: ExpaRNA-P’s novel ensemble-based sparsification reduces its complexity to quadratic time and space. Thereby, ExpaRNA-P significantly speeds up sequence-structure alignment while maintaining the alignment quality. Different ExpaRNA-P variants support a wide range of applications

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Springer - Publisher Connector

PubMed Central

Qucosa - Publikationsserver der Universität Leipzig

Accurate multiple sequence-structure alignment of RNA sequences using combinatorial optimization

Author: Bauer Markus
Klau Gunnar W
Reinert Knut
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Background: The discovery of functional non-coding RNA sequences has led to an increasing interest in algorithms related to RNA analysis. Traditional sequence alignment algorithms, however, fail at computing reliable alignments of low-homology RNA sequences. The spatial conformation of RNA sequences largely determines their function, and therefore RNA alignment algorithms have to take structural information into account. Results: We present a graph-based representation for sequence-structure alignments, which we model as an integer linear program (ILP). We sketch how we compute an optimal or near-optimal solution to the ILP using methods from combinatorial optimization, and present results on a recently published benchmark set for RNA alignments. Conclusions: The implementation of our algorithm yields better alignments in terms of two published scores than the other programs that we tested: This is especially the case with an increasing number of inpu

CiteSeerX

Institutional Repository of the Freie Universität Berlin

Springer - Publisher Connector

Directory of Open Access Journals

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

PubMed Central

Fast genotyping of known SNPs through approximate

Author: Berger Leighton Bonnie
Shajii Ariya
Yorukoglu Deniz
Yu Yun William
Publication venue: 'Oxford University Press (OUP)'
Publication date: 16/05/2018
Field of study

Motivation: As the volume of next-generation sequencing (NGS) data increases, faster algorithms become necessary. Although speeding up individual components of a sequence analysis pipeline (e.g. read mapping) can reduce the computational cost of analysis, such approaches do not take full advantage of the particulars of a given problem. One problem of great interest, genotyping a known set of variants (e.g. dbSNP or Affymetrix SNPs), is important for characterization of known genetic traits and causative disease variants within an individual, as well as the initial stage of many ancestral and population genomic pipelines (e.g. GWAS). Results: We introduce lightweight assignment of variant alleles (LAVA), an NGS-based genotyping algorithm for a given set of SNP loci, which takes advantage of the fact that approximate matching of mid-size k-mers (with k = 32) can typically uniquely ide ntify loci in the human genome without full read alignment. LAVA accurately calls the vast majority of SNPs in dbSNP and Affymetrix's Genome-Wide Human SNP Array 6.0 up to about an order of magnitude faster than standard NGS genotyping pipelines. For Affymetrix SNPs, LAVA has significantly higher SNP calling accuracy than existing pipelines while using as low as ∼5 GB of RAM. As such, LAVA represents a scalable computational method for population-level genotyping studies as well as a flexible NGS-based replacement for SNP arrays. Availability and Implementation: LAVA software is available at http://lava.csail.mit.edu

DSpace@MIT

Abundance of correctly folded RNA motifs in sequence space, calculated on computational grids

Author: De Sterck Hans
Knight Rob
Markel Rob
Oshmyansky Alexander
Smit Sandra
Yarus Michael
Publication venue: Oxford University Press
Publication date: 01/01/2005
Field of study

Although functional RNA molecules are known to be biased in overall composition, the effects of background composition on the probability of finding a particular active site by chance has received little attention. The probability of finding a particular motif has important implications both for understanding the distribution of functional RNAs in ancient and modern organisms with varying genome compositions and for tuning SELEX pools to optimize the chance of finding specific functions. Here we develop a new method for calculating the probability of finding a modular motif containing base-paired regions, and use a computational grid to fold several hundred million random RNA sequences containing the core elements of the isoleucine aptamer and the hammerhead ribozyme to estimate the probability that a sequence containing these structural elements will fold correctly when isolated from background sequences of different compositions. We find that the two motifs are most likely to be found in distinct regions of compositional space, and that the regions of greatest abundance are influenced by the probability of finding the conserved bases, finding the flanking helices, and folding, in that order of importance. Additionally, we can refine our estimates of the number of random sequences required for a 50% probability of finding an example of each site in unbiased random pools of length 100 to 4.1 × 10(9) for the isoleucine aptamer and 1.6 × 10(10) for the hammerhead ribozyme. These figures are consistent with the facile recovery of these motifs from SELEX experiments

CiteSeerX

PubMed Central

eScholarship - University of California

RNAscClust:Clustering RNA sequences using structure conservation and graph based motifs

Author: Backofen Rolf
Costa Fabrizio
Gorodkin Jan
Havgaard Jakob Hull
Junge Alexander
Miladi Milad
Seemann Stefan E.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2017
Field of study

Copenhagen University Research Information System