Search CORE

5,349 research outputs found

Context-specific methods for sequence homology searching and alignment

Author: Biegert Andreas
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 01/01/2010
Field of study

Digitale Hochschulschriften der LMU

MPG.PuRe

EvDTree: structure-dependent substitution profiles based on decision tree classification of 3D environments

Author: Chiche Laurent
Gelly Jean-Christophe
Gracy Jérôme
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: Structure-dependent substitution matrices increase the accuracy of sequence alignments when the 3D structure of one sequence is known, and are successful e.g. in fold recognition. We propose a new automated method, EvDTree, based on a decision tree algorithm, for automatic derivation of amino acid substitution probabilities from a set of sequence-structure alignments. The main advantage over other approaches is an unbiased automatic selection of the most informative structural descriptors and associated values or thresholds. This feature allows automatic derivation of structure-dependent substitution scores for any specific set of structures, without the need to empirically determine best descriptors and parameters. RESULTS: Decision trees for residue substitutions were constructed for each residue type from sequence-structure alignments extracted from the HOMSTRAD database. For each tree cluster, environment-dependent substitution profiles were derived. The resulting structure-dependent substitution scores were assessed using a criterion based on the mean ranking of observed substitution among all possible substitutions and in sequence-structure alignments. The automatically built EvDTree substitution scores provide significantly better results than conventional matrices and similar or slightly better results than other structure-dependent matrices. EvDTree has been applied to small disulfide-rich proteins as a test case to automatically derive specific substitutions scores providing better results than non-specific substitution scores. Analyses of the decision tree classifications provide useful information on the relative importance of different structural descriptors. CONCLUSIONS: We propose a fully automatic method for the classification of structural environments and inference of structure-dependent substitution profiles. We show that this approach is more accurate than existing methods for various applications. The easy adaptation of EvDTree to any specific data set opens the way for class-specific structure-dependent substitution scores which can be used in threading-based remote homology searches

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Recommended from our members

Protein Fold Recognition Using Neural Networks

Author: Lin Guang
Publication venue
Publication date: 01/01/2003
Field of study

To predict accurately the three-dimensional (3D) structures of proteins from their amino acid sequences alone remains a challenging problem. However, using protein fold recognition tools, it is often possible to achieve good models or at least to gain some more information, to aid scientists in their research. This thesis describes development of TUNE (Threading Using Neural Networks), a fold recognition program using artificial neural network (ANN) models. A new method to generate amino acid substitution matrices is described in chapter two. It uses an ANN to generalise amino acid substitutions observed in protein structure alignments. Matrices for alignment scoring from this approach were compared with classic alignment scoring schemes. From these neural network models, a series of encoding schemes were constructed. These schemes describe the amino acid types with a few numbers. They were generated to replace the orthogonal encoding scheme, so that smaller, faster and more accurate neural network models can be applied on bioinformatic problems. The TUNE model was introduced in chapter four to measure protein sequence-structure compatibility. Given the integrated residue structural environment descriptions, the model predicts probabilities of observing amino acid types in such environments. Using this model, a scoring function to measure the fitness of a residue in a protein structure model can be made for protein threading programs. The model in chapter two was extended by including the residue structural environment descriptions for predictions. A simple protein fold recognition program with a dynamic programming algorithm was developed using this model. The program was then tested in the fourth round of the Critical Assessment of protein Structure Prediction methods (CASP4) and produced reasonably good results

Open Research Online (The Open University)

OpenGrey Repository

Alignment of helical membrane protein sequences using AlignMe

Author: Forrest Lucy R.
Khafizov Kamil
Stamm Marcus
Staritzbichler René
Publication venue
Publication date: 01/01/2013
Field of study

Few sequence alignment methods have been designed specifically for integral membrane proteins, even though these important proteins have distinct evolutionary and structural properties that might affect their alignments. Existing approaches typically consider membrane-related information either by using membrane-specific substitution matrices or by assigning distinct penalties for gap creation in transmembrane and non-transmembrane regions. Here, we ask whether favoring matching of predicted transmembrane segments within a standard dynamic programming algorithm can improve the accuracy of pairwise membrane protein sequence alignments. We tested various strategies using a specifically designed program called AlignMe. An updated set of homologous membrane protein structures, called HOMEP2, was used as a reference for optimizing the gap penalties. The best of the membrane-protein optimized approaches were then tested on an independent reference set of membrane protein sequence alignments from the BAliBASE collection. When secondary structure (S) matching was combined with evolutionary information (using a position-specific substitution matrix (P)), in an approach we called AlignMePS, the resultant pairwise alignments were typically among the most accurate over a broad range of sequence similarities when compared to available methods. Matching transmembrane predictions (T), in addition to evolutionary information, and secondary-structure predictions, in an approach called AlignMePST, generally reduces the accuracy of the alignments of closely-related proteins in the BAliBASE set relative to AlignMePS, but may be useful in cases of extremely distantly related proteins for which sequence information is less informative. The open source AlignMe code is available at https://sourceforge.net/projects/alignme/, and at http://www.forrestlab.org, along with an online server and the HOMEP2 data set

Directory of Open Access Journals

PubMed Central

MPG.PuRe

Hochschulschriftenserver - Universität Frankfurt am Main

FigShare

Back-translation for discovering distant protein homologies in the presence of frameshift mutations

Author: Gîrdea Marta
Kucherov Gregory
Noé Laurent
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Background: Frameshift mutations in protein-coding DNA sequences produce a drastic change in the resulting protein sequence, which prevents classic protein alignment methods from revealing the proteins ’ common origin. Moreover, when a large number of substitutions are additionally involved in the divergence, the homology detection becomes difficult even at the DNA level. \ud \ud Results: We developed a novel method to infer distant homology relations of two proteins, that accounts for frameshift and point mutations that may have affected the coding sequences. We design a dynamic programming alignment algorithm over memory-efficient graph representations of the complete set of putative DNA sequences of each protein, with the goal of determining the two putative DNA sequences which have the best scoring alignment under a powerful scoring system designed to reflect the most probable evolutionary process. Our implementation is freely available at http://bioinfo.lifl.fr/path/.\ud \ud Conclusions: Our approach allows to uncover evolutionary information that is not captured by traditional\ud alignment methods, which is confirmed by biologically significant example

CiteSeerX

HAL - Lille 3

Springer - Publisher Connector

INRIA a CCSD electronic archive server

PubMed Central

An automatic method for assessing structural importance of amino acid positions

Author: Jones D.T.
Sadowski M.I.
Publication venue
Publication date: 01/01/2009
Field of study

Background: A great deal is known about the qualitative aspects of the sequence-structure relationship, for example that buried residues are usually more conserved between structurally similar homologues, but no attempts have been made to quantitate the relationship between evolutionary conservation at a sequence position and change to global tertiary structure. In this paper we demonstrate that the Spearman correlation between sequence and structural change is suitable for this purpose. Results: Buried residues, bends, cysteines, prolines and leucines were significantly more likely to occupy positions highly correlated with structural change than expected by chance. Some buried residues were found to be less informative than expected, particularly residues involved in active sites and the binding of small molecules. Conclusion: The correlation-based method generates predictions of structural importance for superfamily positions which agree well with previous results of manual analyses, and may be of use in automated residue annotation piplines. A PERL script which implements the method is provided

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

UCL Discovery

PubMed Central

Sequence alignment, mutual information, and dissimilarity measures for constructing phylogenies

Author: A Kraskov
A Milosavljević
G Navarro
J Felsenstein
J Lake
J Rissanen
J Rissanen
J Thompson
J Varre
Konrad Scheffler
L Allison
M Brudno
M Brudno
M Cao
M Li
M Li
M Mahoney
M Nei
M Steel
Maya Paczuski
N Bray
N Bray
N Saitou
Orion Penner
P Buneman
P Lockhart
P Viola
Peter Grassberger
R Cilibrasi
R Durbin
S Altschul
S Altschul
S McGinnis
S Vinga
T Cover
T Lassmann
W Press
X Chen
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 19/08/2010
Field of study

Existing sequence alignment algorithms use heuristic scoring schemes which cannot be used as objective distance metrics. Therefore one relies on measures like the p- or log-det distances, or makes explicit, and often simplistic, assumptions about sequence evolution. Information theory provides an alternative, in the form of mutual information (MI) which is, in principle, an objective and model independent similarity measure. MI can be estimated by concatenating and zipping sequences, yielding thereby the "normalized compression distance". So far this has produced promising results, but with uncontrolled errors. We describe a simple approach to get robust estimates of MI from global pairwise alignments. Using standard alignment algorithms, this gives for animal mitochondrial DNA estimates that are strikingly close to estimates obtained from the alignment free methods mentioned above. Our main result uses algorithmic (Kolmogorov) information theory, but we show that similar results can also be obtained from Shannon theory. Due to the fact that it is not additive, normalized compression distance is not an optimal metric for phylogenetics, but we propose a simple modification that overcomes the issue of additivity. We test several versions of our MI based distance measures on a large number of randomly chosen quartets and demonstrate that they all perform better than traditional measures like the Kimura or log-det (resp. paralinear) distances. Even a simplified version based on single letter Shannon entropies, which can be easily incorporated in existing software packages, gave superior results throughout the entire animal kingdom. But we see the main virtue of our approach in a more general way. For example, it can also help to judge the relative merits of different alignment algorithms, by estimating the significance of specific alignments.Comment: 19 pages + 16 pages of supplementary materia

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

IMT Institutional Repository