Search CORE

78 research outputs found

Fast Pairwise Structural RNA Alignments by Pruning of the Dynamical Programming Matrix

Author: David Mathews
Elfar Torarinsson
Jakob H Havgaard
Jan Gorodkin
Publication venue: Public Library of Science
Publication date: 01/01/2007
Field of study

It has become clear that noncoding RNAs (ncRNA) play important roles in cells, and emerging studies indicate that there might be a large number of unknown ncRNAs in mammalian genomes. There exist computational methods that can be used to search for ncRNAs by comparing sequences from different genomes. One main problem with these methods is their computational complexity, and heuristics are therefore employed. Two heuristics are currently very popular: pre-folding and pre-aligning. However, these heuristics are not ideal, as pre-aligning is dependent on sequence similarity that may not be present and pre-folding ignores the comparative information. Here, pruning of the dynamical programming matrix is presented as an alternative novel heuristic constraint. All subalignments that do not exceed a length-dependent minimum score are discarded as the matrix is filled out, thus giving the advantage of providing the constraints dynamically. This has been included in a new implementation of the FOLDALIGN algorithm for pairwise local or global structural alignment of RNA sequences. It is shown that time and memory requirements are dramatically lowered while overall performance is maintained. Furthermore, a new divide and conquer method is introduced to limit the memory requirement during global alignment and backtrack of local alignment. All branch points in the computed RNA structure are found and used to divide the structure into smaller unbranched segments. Each segment is then realigned and backtracked in a normal fashion. Finally, the FOLDALIGN algorithm has also been updated with a better memory implementation and an improved energy model. With these improvements in the algorithm, the FOLDALIGN software package provides the molecular biologist with an efficient and user-friendly tool for searching for new ncRNAs. The software package is available for download at http://foldalign.ku.dk

Directory of Open Access Journals

Copenhagen University Research Information System

DotAligner:Identification and clustering of RNA structure motifs

Author: Mattick John S.
Quek Xiu Cheng
Seemann Stefan E.
Smith Martin A.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2017
Field of study

Abstract The diversity of processed transcripts in eukaryotic genomes poses a challenge for the classification of their biological functions. Sparse sequence conservation in non-coding sequences and the unreliable nature of RNA structure predictions further exacerbate this conundrum. Here, we describe a computational method, DotAligner, for the unsupervised discovery and classification of homologous RNA structure motifs from a set of sequences of interest. Our approach outperforms comparable algorithms at clustering known RNA structure families, both in speed and accuracy. It identifies clusters of known and novel structure motifs from ENCODE immunoprecipitation data for 44 RNA-binding proteins

Directory of Open Access Journals

Copenhagen University Research Information System

Structural alignment of RNA with FOLDALIGN

Author: Havgaard Jakob Hull
Publication venue: Center for Skov, Landskab og Planlægning/Københavns Universitet
Publication date: 01/01/2007
Field of study

Copenhagen University Research Information System

Foldalign 2.5:multithreaded implementation for pairwise structural RNA alignment

Author: de Melo Alba C. M. A.
Gorodkin Jan
Havgaard Jakob Hull
Sundfeld Daniel
Publication venue: 'Oxford University Press (OUP)'
Publication date: 24/12/2015
Field of study

Motivation: Structured RNAs can be hard to search for as they often are not well conserved in their primary structure and are local in their genomic or transcriptomic context. Thus, the need for tools which in particular can make local structural alignments of RNAs is only increasing. Results: To meet the demand for both large-scale screens and hands on analysis through web servers, we present a new multithreaded version of Foldalign. We substantially improve execution time while maintaining all previous functionalities, including carrying out local structural alignments of sequences with low similarity. Furthermore, the improvements allow for comparing longer RNAs and increasing the sequence length. For example, lengths in the range 2000–6000 nucleotides improve execution up to a factor of five. Availability and implementation: The Foldalign software and the web server are available at http://rth.dk/resources/foldalign Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online

Copenhagen University Research Information System

From Structure Prediction to Genomic Screens for Novel Non-Coding RNAs

Author: A Ben-Hur
AF Bompfünewerer
AM Khalil
AO Harmanci
AR Gruber
AV Uzilov
AX Wang
B Knudsen
B Lewis
BW Matthews
C Warden
C Workman
D Guarnieri
D Mathews
D Sankoff
DH Mathews
DH Turner
DK Chiu
E Bonnet
E Nudler
E Rivas
E Rivas
E Rivas
E Torarinsson
E Torarinsson
EP Nawrocki
EP Nawrocki
ES Andersen
ES Andersen
F Sleutels
GardnerJPP Daub
H Jia
I Holmes
I Holmes
IL Hofacker
Ivo L. Hofacker
J Felsenstein
J Gorodkin
J Gorodkin
J Gorodkin
J Gorodkin
J Gorodkin
J Gorodkin
J Gorodkin
Jan Gorodkin
JC Ellis
JG Underwood
JH Havgaard
JM Watts
JP McCutcheon
JS Mattick
JS Pedersen
JW Brown
K Doshi
K Okamura
K Reiche
KC Wang
KE Deigan
KM Weeks
L Redrup
M Georges
M Guttman
M Kertesz
M Kertesz
M Lindow
M Xie
MB Gerstein
MC Tsai
Michael Levitt
MW Hentze
N Lau
P Anandam
P Clote
P Gardner
P Larsson
P Menzel
P Schattner
PG Hawkins
PN Seibel
PP Gardner
R Nussinov
RA Gupta
RD Dowell
RD Dowell
RJ Klein
RJ Klein
RM Kuhn
RR Gutell
RR Gutell
S Eddy
S Griffiths-Jones
S Siebert
S Washietl
S Washietl
S Washietl
S Will
SE Seemann
SF Altschul
SR Eddy
T Gesell
T Hung
T Lowe
T Nagano
TF Consortium
TJ Macke
UA Ørom
V Kim
V Tripathi
W Deng
W Filipowicz
W Fontana
Y Park
Y Sakakibara
Z Weinberg
Z Weinberg
Z Yao
Z Yao
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Non-coding RNAs (ncRNAs) are receiving more and more attention not only as an abundant class of genes, but also as regulatory structural elements (some located in mRNAs). A key feature of RNA function is its structure. Computational methods were developed early for folding and prediction of RNA structure with the aim of assisting in functional analysis. With the discovery of more and more ncRNAs, it has become clear that a large fraction of these are highly structured. Interestingly, a large part of the structure is comprised of regular Watson-Crick and GU wobble base pairs. This and the increased amount of available genomes have made it possible to employ structure-based methods for genomic screens. The field has moved from folding prediction of single sequences to computational screens for ncRNAs in genomic sequence using the RNA structure as the main characteristic feature. Whereas early methods focused on energy-directed folding of single sequences, comparative analysis based on structure preserving changes of base pairs has been efficient in improving accuracy, and today this constitutes a key component in genomic screens. Here, we cover the basic principles of RNA folding and touch upon some of the concepts in current methods that have been applied in genomic screens for de novo RNA structures in searches for novel ncRNA genes and regulatory RNA structure on mRNAs. We discuss the strengths and weaknesses of the different strategies and how they can complement each other

Directory of Open Access Journals

Copenhagen University Research Information System

Simultaneous alignment and folding of protein sequences

Author: A. Caprara
B.E. Shakhnovich
C.B. Do
C.B. Do
D. Frishman
D. Sankoff
D.H. Mathews
G. Raghava
I.L. Hofacker
J. Selbig
J. Waldispuhl
J. Waldispuhl
J.H. Havgaard
L.R. Forrest
M. Brudno
M. Cline
M. Lomize
M. Menke
P. Bradley
P. Fariselli
P. Rice
R. Backofen
R. Doolittle
R.A. Sutormin
R.C. Edgar
R.C. Edgar
R.C. Edgar
R.L.J. Dunbrack
S. Henikoff
S. Will
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Accurate comparative analysis tools for low-homology proteins remains a difficult challenge in computational biology, especially sequence alignment and consensus folding problems. We presentpartiFold-Align, the first algorithm for simultaneous alignment and consensus folding of unaligned protein sequences; the algorithm’s complexity is polynomial in time and space. Algorithmically,partiFold-Align exploits sparsity in the set of super-secondary structure pairings and alignment candidates to achieve an effectively cubic running time for simultaneous pairwise alignment and folding. We demonstrate the efficacy of these techniques on transmembrane β-barrel proteins, an important yet difficult class of proteins with few known three-dimensional structures. Testing against structurally derived sequence alignments,partiFold-Align significantly outperforms state-of-the-art pairwise sequence alignment tools in the most difficult low sequence homology case and improves secondary structure prediction where current approaches fail. Importantly, partiFold-Align requires no prior training. These general techniques are widely applicable to many more protein families. partiFold-Align is available at http://partiFold.csail.mit.edu

CiteSeerX

Fast and accurate clustering of noncoding RNAs using ensembles of sequence alignments and secondary structures

Author: Saito Yutaka
Sakakibara Yasubumi
Sato Kengo
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Springer - Publisher Connector

A novel approach to represent and compare RNA secondary structures

Author: Ausiello G
Ferrè F
Helmer-Citterich M
Mattei E
Publication venue: Oxford University Press
Publication date: 01/01/2014
Field of study

Structural information is crucial in ribonucleic acid (RNA) analysis and functional annotation; nevertheless, how to include such structural data is still a debated problem. Dot-bracket notation is the most common and simple representation for RNA secondary structures but its simplicity leads also to ambiguity requiring further processing steps to dissolve. Here we present BEAR (Brand nEw Alphabet for RNA), a new context-aware structural encoding represented by a string of characters. Each character in BEAR encodes for a specific secondary structure element (loop, stem, bulge and internal loop) with specific length. Furthermore, exploiting this informative and yet simple encoding in multiple alignments of related RNAs, we captured how much structural variation is tolerated in RNA families and convert it into transition rates among secondary structure elements. This allowed us to compute a substitution matrix for secondary structure elements called MBR (Matrix of BEAR-encoded RNA secondary structures), of which we tested the ability in aligning RNA secondary structures. We propose BEAR and the MBR as powerful resources for the RNA secondary structure analysis, comparison and classification, motif finding and phylogeny