Search CORE

10,422 research outputs found

Back-translation for discovering distant protein homologies

Author: A. Pedersen
B. Oostra
C. Kosiol
J. Leluk
J. Leluk
J. Raes
K. Okamura
L. Arvestad
L. Delaye
M. Clamp
M. Pellegrini
P. Harrison
P. Lio
R. Blake
S. Altschul
S. Altschul
S. Altschul
Y. Hahn
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Frameshift mutations in protein-coding DNA sequences produce a drastic change in the resulting protein sequence, which prevents classic protein alignment methods from revealing the proteins' common origin. Moreover, when a large number of substitutions are additionally involved in the divergence, the homology detection becomes difficult even at the DNA level. To cope with this situation, we propose a novel method to infer distant homology relations of two proteins, that accounts for frameshift and point mutations that may have affected the coding sequences. We design a dynamic programming alignment algorithm over memory-efficient graph representations of the complete set of putative DNA sequences of each protein, with the goal of determining the two putative DNA sequences which have the best scoring alignment under a powerful scoring system designed to reflect the most probable evolutionary process. This allows us to uncover evolutionary information that is not captured by traditional alignment methods, which is confirmed by biologically significant examples.Comment: The 9th International Workshop in Algorithms in Bioinformatics (WABI), Philadelphia : \'Etats-Unis d'Am\'erique (2009

arXiv.org e-Print Archive

CiteSeerX

HAL - Lille 3

Crossref

INRIA a CCSD electronic archive server

Time Required for a Sphere to Fall Through a Funnel

Author: Altschul Brett
Crittenden S.
Sridharan J.
Publication venue: Scholar Commons
Publication date: 01/01/2014
Field of study

We experimentally test a recently proposed theory of the behavior of a single frictional, inelastic, spherical particle falling under gravity through a symmetric funnel. We find that, while many qualitative results of the theory are supported by the data, the quantitative behavior of a real sphere falling through a real funnel differs from the predictions. The behavior above a 45◦ funnel angle, the duration, and the dependence of the duration on the initial horizontal position all show significant deviations from the predicted results. In particular, for drop positions near the gap, the duration of the fall is often significantly less than predicted for 50◦ and 60◦ funnel angles; and at a 60◦ funnel angle, where the data best matches the model, the R 2 goodness of fit is only 0.27. The fit can be significantly improved for 60◦ funnel angle by relaxing the most stringent approximation of the theory, which asserts that the transition from slipping to rolling is governed by a single constant parameter, β, independent of impact speed and angle. We conclude that, although the theory captures most of the key features of the dynamics of a ball falling through a funnel, it does not do so with quantitative accuracy, indicating that for commonly encountered balls and drop heights, a more realistic model of particle collisions is required

Directory of Open Access Journals

Scholar Commons - Institutional Repository of the University of South Carolina

HMMER web server: interactive sequence similarity searching

Author: Altschul
Altschul
Eddy
Finn
J. Clements
R. D. Finn
S. R. Eddy
Velankar
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

HMMER is a software suite for protein sequence similarity searches using probabilistic methods. Previously, HMMER has mainly been available only as a computationally intensive UNIX command-line tool, restricting its use. Recent advances in the software, HMMER3, have resulted in a 100-fold speed gain relative to previous versions. It is now feasible to make efficient profile hidden Markov model (profile HMM) searches via the web. A HMMER web server (http://hmmer.janelia.org) has been designed and implemented such that most protein database searches return within a few seconds. Methods are available for searching either a single protein sequence, multiple protein sequence alignment or profile HMM against a target sequence database, and for searching a protein sequence against Pfam. The web server is designed to cater to a range of different user expertise and accepts batch uploading of multiple queries at once. All search methods are also available as RESTful web services, thereby allowing them to be readily integrated as remotely executed tasks in locally scripted workflows. We have focused on minimizing search times and the ability to rapidly display tabular results, regardless of the number of matches found, developing graphical summaries of the search results to provide quick, intuitive appraisement of them

CiteSeerX

Crossref

PubMed Central

Simplified amino acid alphabets based on deviation of conditional probability from random background

Author: A. Godzik
A.G. Murzin
C.E. Schafmeister
D.S. Riddle
Di Liu
H.S. Chan
J. Wang
Ji Qi
K.W. Plaxco
L.R. Murphy
M. Munson
S. Henikoff
S. Miyazawa
S.E. Brenner
S.F. Altschul
S.F. Altschul
Wei-Mou Zheng
Xin Liu
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2002
Field of study

The primitive data for deducing the Miyazawa-Jernigan contact energy or BLOSUM score matrix consists of pair frequency counts. Each amino acid corresponds to a conditional probability distribution. Based on the deviation of such conditional probability from random background, a scheme for reduction of amino acid alphabet is proposed. It is observed that evident discrepancy exists between reduced alphabets obtained from raw data of the Miyazawa-Jernigan's and BLOSUM's residue pair counts. Taking homologous sequence database SCOP40 as a test set, we detect homology with the obtained coarse-grained substitution matrices. It is verified that the reduced alphabets obtained well preserve information contained in the original 20-letter alphabet.Comment: 9 pages,3figure

arXiv.org e-Print Archive

Crossref

CERN Document Server

The Gumbel pre-factor k for gapped local alignment can be estimated from simulations of global alignment

Author: Altschul
Altschul
Altschul
Bundschuh
Collins
Gotoh
Henikoff
J. L. Spouge
Karlin
Mott
Mott
Mott
Needleman
Robinson
S. Sheetlin
Smith
Smith
Storey
Waterman
Y. Park
Yu
Publication venue: Oxford University Press
Publication date: 06/09/2005
Field of study

The optimal gapped local alignment score of two random sequences follows a Gumbel distribution. The Gumbel distribution has two parameters, the scale parameter λ and the pre-factor k. Presently, the basic local alignment search tool (BLAST) programs (BLASTP (BLAST for proteins), PSI-BLAST, etc.) use all time-consuming computer simulations to determine the Gumbel parameters. Because the simulations must be done offline, BLAST users are restricted in their choice of alignment scoring schemes. The ultimate aim of this paper is to speed the simulations, to determine the Gumbel parameters online, and to remove the corresponding restrictions on BLAST users. Simulations for the scale parameter λ can be as much as five times faster, if they use global instead of local alignment [R. Bundschuh (2002) J. Comput. Biol., 9, 243–260]. Unfortunately, the acceleration does not extend in determining the Gumbel pre-factor k, because k has no known mathematical relationship to global alignment. This paper relates k to global alignment and exploits the relationship to show that for the BLASTP defaults, 10 000 realizations with sequences of average length 140 suffice to estimate both Gumbel parameters λ and k within the errors required (λ, 0.8%; k, 10%). For the BLASTP defaults, simulations for both Gumbel parameters now take less than 30 s on a 2.8 GHz Pentium 4 processor

Crossref

PubMed Central

Sequence alignment, mutual information, and dissimilarity measures for constructing phylogenies

Author: A Kraskov
A Milosavljević
G Navarro
J Felsenstein
J Lake
J Rissanen
J Rissanen
J Thompson
J Varre
Konrad Scheffler
L Allison
M Brudno
M Brudno
M Cao
M Li
M Li
M Mahoney
M Nei
M Steel
Maya Paczuski
N Bray
N Bray
N Saitou
Orion Penner
P Buneman
P Lockhart
P Viola
Peter Grassberger
R Cilibrasi
R Durbin
S Altschul
S Altschul
S McGinnis
S Vinga
T Cover
T Lassmann
W Press
X Chen
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 19/08/2010
Field of study

Existing sequence alignment algorithms use heuristic scoring schemes which cannot be used as objective distance metrics. Therefore one relies on measures like the p- or log-det distances, or makes explicit, and often simplistic, assumptions about sequence evolution. Information theory provides an alternative, in the form of mutual information (MI) which is, in principle, an objective and model independent similarity measure. MI can be estimated by concatenating and zipping sequences, yielding thereby the "normalized compression distance". So far this has produced promising results, but with uncontrolled errors. We describe a simple approach to get robust estimates of MI from global pairwise alignments. Using standard alignment algorithms, this gives for animal mitochondrial DNA estimates that are strikingly close to estimates obtained from the alignment free methods mentioned above. Our main result uses algorithmic (Kolmogorov) information theory, but we show that similar results can also be obtained from Shannon theory. Due to the fact that it is not additive, normalized compression distance is not an optimal metric for phylogenetics, but we propose a simple modification that overcomes the issue of additivity. We test several versions of our MI based distance measures on a large number of randomly chosen quartets and demonstrate that they all perform better than traditional measures like the Kimura or log-det (resp. paralinear) distances. Even a simplified version based on single letter Shannon entropies, which can be easily incorporated in existing software packages, gave superior results throughout the entire animal kingdom. But we see the main virtue of our approach in a more general way. For example, it can also help to judge the relative merits of different alignment algorithms, by estimating the significance of specific alignments.Comment: 19 pages + 16 pages of supplementary materia

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

IMT Institutional Repository

Bethe Ansatz in the Bernoulli Matching Model of Random Sequence Alignment

Author: A. M. Vershik
D. Gusfield
D. Sankoff
J. M. Hammersley
Kirone Mallick
M. Ablowitz
M. S. Waterman
R. Dubrin
R. J. Baxter
R. Wagner
S. F. Altschul
S. M. Ulam
Satya N. Majumdar
Sergei Nechaev
V. Dancik
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2008
Field of study

For the Bernoulli Matching model of sequence alignment problem we apply the Bethe ansatz technique via an exact mapping to the 5--vertex model on a square lattice. Considering the terrace--like representation of the sequence alignment problem, we reproduce by the Bethe ansatz the results for the averaged length of the Longest Common Subsequence in Bernoulli approximation. In addition, we compute the average number of nucleation centers of the terraces.Comment: 14 pages, 5 figures (some points are clarified

arXiv.org e-Print Archive

Crossref

HAL-CEA

soaPDB: a web application for searching the Protein Data Bank, organizing results, and receiving automatic email alerts

Author: Altschul
Berman
Berman
C. A. Lesburg
Deshpande
J. S. Duca
Publication venue: Oxford University Press
Publication date
Field of study

soaPDB is a web application that allows generation and organization of saved PDB searches, and offers automatic email alerts. This tool is used from a web interface to store PDB searches and results in a backend relational database. Written using the Ruby on Rails open-source web framework, soaPDB is easy to deploy, maintain and customize. soaPDB is freely available upon request for local installation and is also available at http://soapdb.dyndns.org:3000

Crossref

PubMed Central

Multi-netclust: an efficient tool for finding connected clusters in multi-parametric networks

Author: A. Kuzniar
Altschul
Enright
H. Nijveen
Holm
J. A. M. Leunissen
S. Dhir
S. Pongor
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Summary: Multi-netclust is a simple tool that allows users to extract connected clusters of data represented by different networks given in the form of matrices. The tool uses user-defined threshold values to combine the matrices, and uses a straightforward, memory-efficient graph algorithm to find clusters that are connected in all or in either of the networks. The tool is written in C/C++ and is available either as a form-based or as a command-line-based program running on Linux platforms. The algorithm is fast, processing a network of > 106 nodes and 108 edges takes only a few minutes on an ordinary computer

Crossref

PubMed Central

UCL Discovery

Wageningen University & Research Publications

A unifying framework for seed sensitivity and its application to subset seeds

Author: A. Finkelstein
A.V. Aho
B. Brejova
B. Brejova
B. Brejova
B. Ma
D. Brown
G. Kucherov
G. Kucherov
I.H. Yang
J. Buhler
J. Xu
J.D. Ullman
K. Choi
K.P. Choi
S. Altschul
S. Burkhardt
W. Chen
W.J. Kent
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/01/2004
Field of study

We propose a general approach to compute the seed sensitivity, that can be applied to different definitions of seeds. It treats separately three components of the seed sensitivity problem -- a set of target alignments, an associated probability distribution, and a seed model -- that are specified by distinct finite automata. The approach is then applied to a new concept of subset seeds for which we propose an efficient automaton construction. Experimental results confirm that sensitive subset seeds can be efficiently designed using our approach, and can then be used in similarity search producing better results than ordinary spaced seeds

arXiv.org e-Print Archive

CiteSeerX

HAL - Lille 3

Crossref

INRIA a CCSD electronic archive server

PubMed Central