Search CORE

5,777 research outputs found

Sequence-specific sequence comparison using pairwise statistical significance

Author: Agrawal Ankit
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2009
Field of study

Sequence comparison is one of the most fundamental computational problems in bioinformatics for which many approaches have been and are still being developed. In particular, pairwise sequence alignment forms the crux of both DNA and protein sequence comparison techniques, which in turn forms the basis of many other applications in bioinformatics. Pairwise sequence alignment methods align two sequences using a substitution matrix consisting of pairwise scores of aligning different residues with each other (like BLOSUM62), and give an alignment score for the given sequence-pair. The biologists routinely use such pairwise alignment programs to identify similar, or more specifically, related sequences (having common ancestor). It is widely accepted that the relatedness of two sequences is better judged by statistical significance of the alignment score rather than by the alignment score alone. This research addresses the problem of accurately estimating statistical significance of pairwise alignment for the purpose of identifying related sequences, by making the sequence comparison process more sequence-specific. The major contributions of this research work are as follows. Firstly, using sequence-specific strategies for pairwise sequence alignment in conjunction with sequence-specific strategies for statistical significance estimation, wherein accurate methods for pairwise statistical significance estimation using standard, sequence-specific, and position-specific substitution matrices are developed. Secondly, using pairwise statistical significance to improve the performance of the most popular database search program PSI-BLAST. Thirdly, design and implementation of heuristics to speed-up pairwise statistical significance estimation by an factor of more than 200. The implementation of all the methods developed in this work is freely available online. With the all-pervasive application of sequence alignment methods in bioinformatics using the ever-increasing sequence data, this work is expected to offer useful contributions to the research community

Digital Repository @ Iowa State University (ISU)

Accelerating pairwise statistical significance estimation for local alignment by harvesting GPU's power

Author: A Agrawal
A Agrawal
A Agrawal
A Agrawal
A Agrawal
A Agrawal
A Agrawal
A Agrawal
A Agrawal
A Mitrophanov
A Poleksic
A Samuel
AA Schäffer
Alok Choudhary
Ankit Agrawal
C Camacho
D Honbo
DS Roos
L Ligowski
M Pagni
M Waterman
Md Mostofa Ali Patwary
ML Sierk
ML Sierk
NVIDIA
NVIDIA
P Aleksandar
R Mott
R O
S Altschul
S Karlin
S Manavski
S Ryoo
S Yooseph
S Zuyderduyn
Sanchit Misra
SF Altschul
SR Eddy
T Rognes
T Smith
W Liu
W Pearson
W Pearson
Wei-keng Liao
WR Pearson
Y Liu
Y Liu
Y Yu
Y Yu
Y Zhang
Y Zhang
Yuhong Zhang
Zhiguang Qin
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Pairwise statistical significance of local sequence alignment using multiple parameter sets and empirical justification of parameter set change penalty

Author: A Agrawal
A Agrawal
AA Schäffer
AK Hartmann
Ankit Agrawal
AY Mitrophanov
CA Orengo
J Rocha
M Kschischo
M Pagni
ML Sierk
MS Waterman
P Bucher
PH Sellers
R Mott
R Mott
R Mott
R Olsen
RF Mott
S Grossmann
S Karlin
S Kotz
S Sheetlin
S Wolfsheimer
SE Brenner
SF Altschul
SF Altschul
SF Altschul
SF Altschul
SF Altschul
TF Smith
WR Pearson
WR Pearson
WR Pearson
WR Pearson
WR Pearson
X Huang
X Huang
Xiaoqiu Huang
YK Yu
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Background: Accurate estimation of statistical significance of a pairwise alignment is an important problem in sequence comparison. Recently, a comparative study of pairwise statistical significance with database statistical significance was conducted. In this paper, we extend the earlier work on pairwise statistical significance by incorporating with it the use of multiple parameter sets. Results: Results for a knowledge discovery application of homology detection reveal that using multiple parameter sets for pairwise statistical significance estimates gives better coverage than using a single parameter set, at least at some error levels. Further, the results of pairwise statistical significance using multiple parameter sets are shown to be significantly better than database statistical significance estimates reported by BLAST and PSI-BLAST, and comparable and at times significantly better than SSEARCH. Using non-zero parameter set change penalty values give better performance than zero penalty. Conclusion: The fact that the homology detection performance does not degrade when using multiple parameter sets is a strong evidence for the validity of the assumption that the alignment score distribution follows an extreme value distribution even when using multiple parameter sets. Parameter set change penalty is a useful parameter for alignment using multiple parameter sets. Pairwise statistical significance using multiple parameter sets can be effectively used to determine the relatedness of a (or a few) pair(s) of sequences without performing a time-consuming database search

Digital Repository @ Iowa State University (ISU)

Crossref

Springer - Publisher Connector

PubMed Central

Sequence alignment, mutual information, and dissimilarity measures for constructing phylogenies

Author: A Kraskov
A Milosavljević
G Navarro
J Felsenstein
J Lake
J Rissanen
J Rissanen
J Thompson
J Varre
Konrad Scheffler
L Allison
M Brudno
M Brudno
M Cao
M Li
M Li
M Mahoney
M Nei
M Steel
Maya Paczuski
N Bray
N Bray
N Saitou
Orion Penner
P Buneman
P Lockhart
P Viola
Peter Grassberger
R Cilibrasi
R Durbin
S Altschul
S Altschul
S McGinnis
S Vinga
T Cover
T Lassmann
W Press
X Chen
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 19/08/2010
Field of study

Existing sequence alignment algorithms use heuristic scoring schemes which cannot be used as objective distance metrics. Therefore one relies on measures like the p- or log-det distances, or makes explicit, and often simplistic, assumptions about sequence evolution. Information theory provides an alternative, in the form of mutual information (MI) which is, in principle, an objective and model independent similarity measure. MI can be estimated by concatenating and zipping sequences, yielding thereby the "normalized compression distance". So far this has produced promising results, but with uncontrolled errors. We describe a simple approach to get robust estimates of MI from global pairwise alignments. Using standard alignment algorithms, this gives for animal mitochondrial DNA estimates that are strikingly close to estimates obtained from the alignment free methods mentioned above. Our main result uses algorithmic (Kolmogorov) information theory, but we show that similar results can also be obtained from Shannon theory. Due to the fact that it is not additive, normalized compression distance is not an optimal metric for phylogenetics, but we propose a simple modification that overcomes the issue of additivity. We test several versions of our MI based distance measures on a large number of randomly chosen quartets and demonstrate that they all perform better than traditional measures like the Kimura or log-det (resp. paralinear) distances. Even a simplified version based on single letter Shannon entropies, which can be easily incorporated in existing software packages, gave superior results throughout the entire animal kingdom. But we see the main virtue of our approach in a more general way. For example, it can also help to judge the relative merits of different alignment algorithms, by estimating the significance of specific alignments.Comment: 19 pages + 16 pages of supplementary materia

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

IMT Institutional Repository

Back-translation for discovering distant protein homologies

Author: A. Pedersen
B. Oostra
C. Kosiol
J. Leluk
J. Leluk
J. Raes
K. Okamura
L. Arvestad
L. Delaye
M. Clamp
M. Pellegrini
P. Harrison
P. Lio
R. Blake
S. Altschul
S. Altschul
S. Altschul
Y. Hahn
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Frameshift mutations in protein-coding DNA sequences produce a drastic change in the resulting protein sequence, which prevents classic protein alignment methods from revealing the proteins' common origin. Moreover, when a large number of substitutions are additionally involved in the divergence, the homology detection becomes difficult even at the DNA level. To cope with this situation, we propose a novel method to infer distant homology relations of two proteins, that accounts for frameshift and point mutations that may have affected the coding sequences. We design a dynamic programming alignment algorithm over memory-efficient graph representations of the complete set of putative DNA sequences of each protein, with the goal of determining the two putative DNA sequences which have the best scoring alignment under a powerful scoring system designed to reflect the most probable evolutionary process. This allows us to uncover evolutionary information that is not captured by traditional alignment methods, which is confirmed by biologically significant examples.Comment: The 9th International Workshop in Algorithms in Bioinformatics (WABI), Philadelphia : \'Etats-Unis d'Am\'erique (2009

arXiv.org e-Print Archive

CiteSeerX

HAL - Lille 3

Crossref

INRIA a CCSD electronic archive server

An FPGA-based Web server for high performance biological sequence alignment

Author: Benkrid Abdsamad
Benkrid K.
Kasap S.
Liu Ying
Publication venue
Publication date: 01/01/2009
Field of study

Portsmouth University Research Portal (Pure)

Homology-extended sequence alignment

Author: Jaap Heringa
Jens Kleinjung
John Romein
Kuang Lin
Publication venue: Oxford University Press
Publication date: 01/01/2005
Field of study

We present a profile–profile multiple alignment strategy that uses database searching to collect homologues for each sequence in a given set, in order to enrich their available evolutionary information for the alignment. For each of the alignment sequences, the putative homologous sequences that score above a pre-defined threshold are incorporated into a position-specific pre-alignment profile. The enriched position-specific profile is used for standard progressive alignment, thereby more accurately describing the characteristic features of the given sequence set. We show that owing to the incorporation of the pre-alignment information into a standard progressive multiple alignment routine, the alignment quality between distant sequences increases significantly and outperforms state-of-the-art methods, such as T-COFFEE and MUSCLE. We also show that although entirely sequence-based, our novel strategy is better at aligning distant sequences when compared with a recent contact-based alignment method. Therefore, our pre-alignment profile strategy should be advantageous for applications that rely on high alignment accuracy such as local structure prediction, comparative modelling and threading

CiteSeerX

Crossref

PubMed Central

Protein sequence alignment with family-specific amino acid similarity matrices

Author: A Agrawal
A Prlić
AR Panchenko
B Qian
B Rost
C Notredame
CB Do
CN Cavasotto
G Vogt
GH Gonnet
GP Raghava
I Van Walle
Igor B Kuznetsov
IN Shindyalov
J Pei
J Söding
JD Blake
JD Thompson
JM Sauder
JS Bernardes
K Mizuguchi
L Holm
L Lo Conte
ML Sierk
MO Dayhoff
MS Johnson
RB Vilim
RC Edgar
RC Edgar
RC Edgar
S Henikoff
S Salem
SB Needleman
SE Brenner
SF Altschul
SR Eddy
T Müller
TF Smith
V Ahola
WR Pearson
WR Taylor
Y Liu
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Recommended from our members

Protein Fold Recognition Using Neural Networks

Author: Lin Guang
Publication venue
Publication date: 01/01/2003
Field of study

To predict accurately the three-dimensional (3D) structures of proteins from their amino acid sequences alone remains a challenging problem. However, using protein fold recognition tools, it is often possible to achieve good models or at least to gain some more information, to aid scientists in their research. This thesis describes development of TUNE (Threading Using Neural Networks), a fold recognition program using artificial neural network (ANN) models. A new method to generate amino acid substitution matrices is described in chapter two. It uses an ANN to generalise amino acid substitutions observed in protein structure alignments. Matrices for alignment scoring from this approach were compared with classic alignment scoring schemes. From these neural network models, a series of encoding schemes were constructed. These schemes describe the amino acid types with a few numbers. They were generated to replace the orthogonal encoding scheme, so that smaller, faster and more accurate neural network models can be applied on bioinformatic problems. The TUNE model was introduced in chapter four to measure protein sequence-structure compatibility. Given the integrated residue structural environment descriptions, the model predicts probabilities of observing amino acid types in such environments. Using this model, a scoring function to measure the fitness of a residue in a protein structure model can be made for protein threading programs. The model in chapter two was extended by including the residue structural environment descriptions for predictions. A simple protein fold recognition program with a dynamic programming algorithm was developed using this model. The program was then tested in the fourth round of the Critical Assessment of protein Structure Prediction methods (CASP4) and produced reasonably good results

Open Research Online

OpenGrey Repository

Optimal Sequence Alignment and Its Relationship with Phylogeny

Author: Atoosa Ghahremani
Mahmood A. Mahdavi
Publication venue: 'IntechOpen'
Publication date: 02/11/2011
Field of study

IntechOpen