Search CORE

INRIA a CCSD electronic archive server

HAL-Polytechnique

A simple, practical and complete O-time Algorithm for RNA folding using the Four-Russians Speedup

Author: Dan Gusfield
IL Hofacker
J Kleinberg
M Zuker
M Zuker
MS Waterman
P Clote
R Backofen
R Durbin
R Nussinov
R Nussinov
SE Seemann
SL Graham
T Akutsu
TM Chan
Y Wexler
Yelena Frid
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background The problem of computationally predicting the secondary structure (or folding) of RNA molecules was first introduced more than thirty years ago and yet continues to be an area of active research and development. The basic <it>RNA-folding problem </it>of finding a maximum cardinality, non-crossing, matching of complimentary nucleotides in an RNA sequence of length <it>n</it>, has an <it>O</it>(<it>n</it>3)-time dynamic programming solution that is widely applied. It is known that an <it>o</it>(<it>n</it>3) worst-case time solution is possible, but the published and suggested methods are complex and have not been established to be practical. Significant practical improvements to the original dynamic programming method have been introduced, but they retain the <it>O</it>(<it>n</it>3) worst-case time bound when <it>n </it>is the only problem-parameter used in the bound. Surprisingly, the most widely-used, general technique to achieve a worst-case (and often practical) speed up of dynamic programming, the <it>Four-Russians </it>technique, has not been previously applied to the RNA-folding problem. This is perhaps due to technical issues in adapting the technique to RNA-folding. Results In this paper, we give a simple, complete, and practical Four-Russians algorithm for the basic RNA-folding problem, achieving a worst-case time-bound of <it>O</it>(<it>n</it>3/log(<it>n</it>)). Conclusions We show that this time-bound can also be obtained for richer nucleotide matching scoring-schemes, and that the method achieves consistent speed-ups in practice. The contribution is both theoretical and practical, since the basic RNA-folding problem is often solved multiple times in the inner-loop of more complex algorithms, and for long RNA molecules in the study of RNA virus genomes.</p

Springer - Publisher Connector

Scholarly Materials And Research @ Georgia Tech

Fr-TM-align: a new protein structural alignment method based on fragment alignments and the TM-score

Author: A Sali
AG Murzin
AM Lisewski
AR Ortiz
AS Yang
CA Orengo
CA Orengo
D Baker
D Kihara
F Teichert
G Vogt
G Vriend
HM Berman
IN Shindyalov
J Moult
J Skolnick
J Skolnick
J Zhu
Jeffrey Skolnick
L Holm
L Holm
M Levitt
M Novotny
ML Sierk
ML Sierk
MS Waterman
N Siew
NN Alexandrov
NN Alexandrov
P Koehl
R Kolodny
R Leplae
RB Russell
RH Lathrop
Shashi Bhushan Pandit
T Akutsu
T Shibuya
TJ Oldfield
V Alesker
WR Taylor
Y Ye
Y Zhang
Y Zhang
Y Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2008
Field of study

©2008 Pandit and Skolnick; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. This article is available from: http://www.biomedcentral.com/1471-2105/9/531doi:10.1186/1471-2105-9-531Background: Protein tertiary structure comparisons are employed in various fields of contemporary structural biology. Most structure comparison methods involve generation of an initial seed alignment, which is extended and/or refined to provide the best structural superposition between a pair of protein structures as assessed by a structure comparison metric. One such metric, the TM-score, was recently introduced to provide a combined structure quality measure of the coordinate root mean square deviation between a pair of structures and coverage. Using the TM-score, the TM-align structure alignment algorithm was developed that was often found to have better accuracy and coverage than the most commonly used structural alignment programs; however, there were a number of situations when this was not true. Results: To further improve structure alignment quality, the Fr-TM-align algorithm has been developed where aligned fragment pairs are used to generate the initial seed alignments that are then refined using dynamic programming to maximize the TM-score. For the assessment of the structural alignment quality from Fr-TM-align in comparison to other programs such as CE and TMalign, we examined various alignment quality assessment scores such as PSI and TM-score. The assessment showed that the structural alignment quality from Fr-TM-align is better in comparison to both CE and TM-align. On average, the structural alignments generated using Fr-TM-align have a higher TM-score (~9%) and coverage (~7%) in comparison to those generated by TM-align. Fr- TM-align uses an exhaustive procedure to generate initial seed alignments. Hence, the algorithm is computationally more expensive than TM-align. Conclusion: Fr-TM-align, a new algorithm that employs fragment alignment and assembly provides better structural alignments in comparison to TM-align. The source code and executables of Fr- TM-align are freely downloadable at: http://cssb.biology.gatech.edu/skolnick/files/FrTMalign/

A sub-cubic time algorithm for computing the quartet distance between two general trees

Author: Anders K Kristensen
BL Allen
C Christiansen
C Christiansen
Christian NS Pedersen
D Bryant
D Coppersmith
DF Robinson
DF Robinson
G Estabrook
GS Brodal
Jesper Nielsen
M Steel
M Stissing
MS Waterman
Thomas Mailund
Publication venue: BioMed Central
Publication date
Field of study

arXiv.org e-Print Archive

Evolutionary distances in the twilight zone -- a rational kernel approach

Author: A Keller
A Löytynoja
A Stamatakis
B Chor
B Schölkopf
Benjamin Merget
C Cortes
C Daskalakis
CB Do
E Rivas
F Bemm
Florian Markowetz
Frank Förster
G Talavera
HH Otu
I Ulitsky
J Felsenstein
J Friedrich
J Hein
JL Thorne
JL Thorne
Jörg Schultz
KM Wong
LS Wang
M Höhl
M Höhl
M Mohri
M Mohri
M Wolf
MA Buchheim
MA Suchard
Matthias Wolf
MJ Bishop
MK Kuhner
MS Waterman
N Goldman
N Higham
R Durbin
RC Edgar
RF Doolittle
Roland F. Schwarz
S Roch
S Whelan
SR Eddy
T Mailund
T Müller
TH Ogden
V Levenshtein
W Fletcher
W Fletcher
Wayne Delport
William Fletcher
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 23/11/2010
Field of study

Phylogenetic tree reconstruction is traditionally based on multiple sequence alignments (MSAs) and heavily depends on the validity of this information bottleneck. With increasing sequence divergence, the quality of MSAs decays quickly. Alignment-free methods, on the other hand, are based on abstract string comparisons and avoid potential alignment problems. However, in general they are not biologically motivated and ignore our knowledge about the evolution of sequences. Thus, it is still a major open question how to define an evolutionary distance metric between divergent sequences that makes use of indel information and known substitution models without the need for a multiple alignment. Here we propose a new evolutionary distance metric to close this gap. It uses finite-state transducers to create a biologically motivated similarity score which models substitutions and indels, and does not depend on a multiple sequence alignment. The sequence similarity score is defined in analogy to pairwise alignments and additionally has the positive semi-definite property. We describe its derivation and show in simulation studies and real-world examples that it is more accurate in reconstructing phylogenies than competing methods. The result is a new and accurate way of determining evolutionary distances in and beyond the twilight zone of sequence alignments that is suitable for large datasets.Comment: to appear in PLoS ON

Public Library of Science (PLOS)

MDC Repository

Autonomy support, basic need satisfaction and the optimal functioning of adult male and female sport participants: A test of basic needs theory

Author: A Bandura
A Satorra
AL Smith
AS Waterman
BM Byrne
BM Byrne
CD Ryff
CF Ratelle
D Gould
DP MacKinnon
E McAuley
EL Deci
EL Deci
EL Deci
EL Deci
GA Mageau
GC Williams
GN Holmbeck
GW Cheung
HW Marsh
J Reeve
James W. Adie
JF Hair
Joan L. Duda
KM Sheldon
KM Sheldon
L Hu
LG Pelletier
M Gagné
M Reinboth
M Reinboth
M Reinboth
M Standage
M Standage
MS Hagger
N Ntoumanis
Nikos Ntoumanis
PM Bentler
PM Bentler
PN Lemyre
PP Baard
RM Baron
RM Ryan
RM Ryan
RM Ryan
RM Ryan
RM Ryan
S Richer
TD Raedeke
TS Horn
V Krane
Publication venue: Springer Verlag
Publication date: 01/01/2008
Field of study

Grounded in Basic Needs Theory (BNT; Ryan and Deci, American Psychologist, 55, 68–78, 2000a), the present study aimed to: (a) test a theoretically-based model of coach autonomy support, motivational processes and well-/ill being among a sample of adult sport participants, (b) discern which basic psychological need(s) mediate the link between autonomy support and well-/ill-being, and (c) explore gender invariance in the hypothesized model. Five hundred and thirty nine participants (Male = 271;Female = 268; Mage = 22.75) completed a multi-section questionnaire tapping the targeted variables. Structural Equation Modeling (SEM) analysis revealed that coach autonomy support predicted participants’ basic need satisfaction for autonomy, competence and relatedness. In turn, basic need satisfaction predicted greater subjective vitality when engaged in sport. Participants with low levels of autonomy were more susceptible to feeling emotionally and physically exhausted from their sport investment. Autonomy and competence partially mediated the path from autonomy support to subjective vitality. Lastly, the results supported partial invariance of the model with respect to gender

University of Birmingham Research Portal

Coventry University Pure Portal

espace@Curtin

Island method for estimating the statistical significance of profile-profile alignment scores

Author: A Dembo
A Gambin
A Poleksic
A Poleksic
AG Murzin
Aleksandar Poleksic
D Fischer
D Przybylski
DA Debe
E Lindahl
EJ Gumbel
G Yona
H Pang
J Heringa
J Moult
J Söding
JF Collins
JF Lawless
K Ginalski
L Holm
L Rychlewski
L Rychlewski
M Frenkel-Morgenstern
MS Waterman
MS Waterman
O Bastien
O Bastien
R Mott
R Mott
R Olsen
RI Sadreyev
RI Sadreyev
S Karlin
S Karlin
SF Altschul
SF Altschul
SF Altschul
SF Altschul
SR Eddy
T Hulsen
TF Smith
TF Smith
WR Pearson
WR Pearson
YK Yu
YK Yu
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background In the last decade, a significant improvement in detecting remote similarity between protein sequences has been made by utilizing alignment profiles in place of amino-acid strings. Unfortunately, no analytical theory is available for estimating the significance of a gapped alignment of two profiles. Many experiments suggest that the distribution of local profile-profile alignment scores is of the Gumbel form. However, estimating distribution parameters by random simulations turns out to be computationally very expensive. Results We demonstrate that the background distribution of profile-profile alignment scores heavily depends on profiles' composition and thus the distribution parameters must be estimated independently, for each pair of profiles of interest. We also show that accurate estimates of statistical parameters can be obtained using the "island statistics" for profile-profile alignments. Conclusion The island statistics can be generalized to profile-profile alignments to provide an efficient method for the alignment score normalization. Since multiple island scores can be extracted from a single comparison of two profiles, the island method has a clear speed advantage over the direct shuffling method for comparable accuracy in parameter estimates.</p

Springer - Publisher Connector

University of Northern Iowa

Using Structure to Explore the Sequence Alignment Space of Remote Homologs

Author: A Mac Sweeney
AG Murzin
AM Lesk
Andrew Kuziemko
AR Panchenko
AS Yang
B John
B Qian
B Rost
Barry Honig
CL Tang
D Chivian
D Eisenberg
D Kihara
D Petrey
D Petrey
Donald Petrey
DT Jones
F Melo
GJ Barton
H Chen
H Lee
H Zhou
H Zhou
HM Berman
I Friedberg
J Moult
J Shi
J Söding
JM Sauder
JU Bowie
L Jaroszewski
MA Marti-Renom
MA Saqi
MS Madhusudhan
MS Waterman
MS Waterman
N Mirkovic
NC Goonesekere
P Bork
Philip E. Bourne
R Sanchez
RB Russell
RC Edgar
S Liu
SA Benner
SB Williams
T Madej
WRP Scott
Y Zhang
Y Zhang
Y Zhang
Publication venue: Public Library of Science
Publication date: 01/10/2011
Field of study

Protein structure modeling by homology requires an accurate sequence alignment between the query protein and its structural template. However, sequence alignment methods based on dynamic programming (DP) are typically unable to generate accurate alignments for remote sequence homologs, thus limiting the applicability of modeling methods. A central problem is that the alignment that is “optimal” in terms of the DP score does not necessarily correspond to the alignment that produces the most accurate structural model. That is, the correct alignment based on structural superposition will generally have a lower score than the optimal alignment obtained from sequence. Variations of the DP algorithm have been developed that generate alternative alignments that are “suboptimal” in terms of the DP score, but these still encounter difficulties in detecting the correct structural alignment. We present here a new alternative sequence alignment method that relies heavily on the structure of the template. By initially aligning the query sequence to individual fragments in secondary structure elements and combining high-scoring fragments that pass basic tests for “modelability”, we can generate accurate alignments within a small ensemble. Our results suggest that the set of sequences that can currently be modeled by homology can be greatly extended

Public Library of Science (PLOS)

Asymptotic behaviour and optimal word size for exact and approximate word matches between random sequences

Author: A Barbour
A Christoffels
CJ Burden
Conrad J Burden
J Burke
JE Carpenter
L Florea
M Kimura
Miriam R Kantorovitz
MR Kantorovitz
MS Waterman
OM Melko
RA Lippert
S Vinga
SF Altschul
Sylvain Forêt
TJ Wu
W Hide
WJ Conover
WJ Kent
WR Pearson
Z Zhang
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: The number of k-words shared between two sequences is a simple and effcient alignment-free sequence comparison method. This statistic, D(2), has been used for the clustering of EST sequences. Sequence comparison based on D(2 )is extremely fast, its runtime is proportional to the size of the sequences under scrutiny, whereas alignment-based comparisons have a worst-case run time proportional to the square of the size. Recent studies have tackled the rigorous study of the statistical distribution of D(2), and asymptotic regimes have been derived. The distribution of approximate k-word matches has also been studied. RESULTS: We have computed the D(2 )optimal word size for various sequence lengths, and for both perfect and approximate word matches. Kolmogorov-Smirnov tests show D(2 )to have a compound Poisson distribution at the optimal word size for small sequence lengths (below 400 letters) and a normal distribution at the optimal word size for large sequence lengths (above 1600 letters). We find that the D(2 )statistic outperforms BLAST in the comparison of artificially evolved sequences, and performs similarly to other methods based on exact word matches. These results obtained with randomly generated sequences are also valid for sequences derived from human genomic DNA. CONCLUSION: We have characterized the distribution of the D(2 )statistic at optimal word sizes. We find that the best trade-off between computational efficiency and accuracy is obtained with exact word matches. Given that our numerical tests have not included sequence shuffling, transposition or splicing, the improvements over existing methods reported here underestimate that expected in real sequences. Because of the linear run time and of the known normal asymptotic behavior, D(2)-based methods are most appropriate for large genomic sequences

Springer - Publisher Connector

The Australian National University

Optical map guided genome assembly

Author: A Gurevich
A Samad
A Valouev
AK-Y Leung
B Alipanahi
BK Stöcker
DE Jarvis
ET Dimalanta
FJ Sedlazeck
H Li
H Li
HC Lin
JM Shelton
LM Mendelowitz
MD Muggli
MD Muggli
MD Muggli
MS Waterman
N Daccord
N Nagarajan
R Walve
S Beier
S Koren
S Vij
W Pan
Y Dong
Publication venue
Publication date: 06/07/2020
Field of study

Background The long reads produced by third generation sequencing technologies have significantly boosted the results of genome assembly but still, genome-wide assemblies solely based on read data cannot be produced. Thus, for example, optical mapping data has been used to further improve genome assemblies but it has mostly been applied in a post-processing stage after contig assembly. Results We proposeOpticalKermitwhich directly integrates genome wide optical maps into contig assembly. We show how genome wide optical maps can be used to localize reads on the genome and then we adapt the Kermit method, which originally incorporated genetic linkage maps to the miniasm assembler, to use this information in contig assembly. Our experimental results show that incorporating genome wide optical maps to the contig assembly of miniasm increases NGA50 while the number of misassemblies decreases or stays the same. Furthermore, when compared to the Canu assembler,OpticalKermitproduces an assembly with almost three times higher NGA50 with a lower number of misassemblies on realA. thalianareads. Conclusions OpticalKermitsuccessfully incorporates optical mapping data directly to contig assembly of eukaryotic genomes. Our results show that this is a promising approach to improve the contiguity of genome assemblies.Peer reviewe