5,946 research outputs found

    Fr-TM-align: a new protein structural alignment method based on fragment alignments and the TM-score

    Get PDF
    ©2008 Pandit and Skolnick; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. This article is available from: http://www.biomedcentral.com/1471-2105/9/531doi:10.1186/1471-2105-9-531Background: Protein tertiary structure comparisons are employed in various fields of contemporary structural biology. Most structure comparison methods involve generation of an initial seed alignment, which is extended and/or refined to provide the best structural superposition between a pair of protein structures as assessed by a structure comparison metric. One such metric, the TM-score, was recently introduced to provide a combined structure quality measure of the coordinate root mean square deviation between a pair of structures and coverage. Using the TM-score, the TM-align structure alignment algorithm was developed that was often found to have better accuracy and coverage than the most commonly used structural alignment programs; however, there were a number of situations when this was not true. Results: To further improve structure alignment quality, the Fr-TM-align algorithm has been developed where aligned fragment pairs are used to generate the initial seed alignments that are then refined using dynamic programming to maximize the TM-score. For the assessment of the structural alignment quality from Fr-TM-align in comparison to other programs such as CE and TMalign, we examined various alignment quality assessment scores such as PSI and TM-score. The assessment showed that the structural alignment quality from Fr-TM-align is better in comparison to both CE and TM-align. On average, the structural alignments generated using Fr-TM-align have a higher TM-score (~9%) and coverage (~7%) in comparison to those generated by TM-align. Fr- TM-align uses an exhaustive procedure to generate initial seed alignments. Hence, the algorithm is computationally more expensive than TM-align. Conclusion: Fr-TM-align, a new algorithm that employs fragment alignment and assembly provides better structural alignments in comparison to TM-align. The source code and executables of Fr- TM-align are freely downloadable at: http://cssb.biology.gatech.edu/skolnick/files/FrTMalign/

    Sequence alignment, mutual information, and dissimilarity measures for constructing phylogenies

    Get PDF
    Existing sequence alignment algorithms use heuristic scoring schemes which cannot be used as objective distance metrics. Therefore one relies on measures like the p- or log-det distances, or makes explicit, and often simplistic, assumptions about sequence evolution. Information theory provides an alternative, in the form of mutual information (MI) which is, in principle, an objective and model independent similarity measure. MI can be estimated by concatenating and zipping sequences, yielding thereby the "normalized compression distance". So far this has produced promising results, but with uncontrolled errors. We describe a simple approach to get robust estimates of MI from global pairwise alignments. Using standard alignment algorithms, this gives for animal mitochondrial DNA estimates that are strikingly close to estimates obtained from the alignment free methods mentioned above. Our main result uses algorithmic (Kolmogorov) information theory, but we show that similar results can also be obtained from Shannon theory. Due to the fact that it is not additive, normalized compression distance is not an optimal metric for phylogenetics, but we propose a simple modification that overcomes the issue of additivity. We test several versions of our MI based distance measures on a large number of randomly chosen quartets and demonstrate that they all perform better than traditional measures like the Kimura or log-det (resp. paralinear) distances. Even a simplified version based on single letter Shannon entropies, which can be easily incorporated in existing software packages, gave superior results throughout the entire animal kingdom. But we see the main virtue of our approach in a more general way. For example, it can also help to judge the relative merits of different alignment algorithms, by estimating the significance of specific alignments.Comment: 19 pages + 16 pages of supplementary materia

    Back-translation for discovering distant protein homologies

    Get PDF
    Frameshift mutations in protein-coding DNA sequences produce a drastic change in the resulting protein sequence, which prevents classic protein alignment methods from revealing the proteins' common origin. Moreover, when a large number of substitutions are additionally involved in the divergence, the homology detection becomes difficult even at the DNA level. To cope with this situation, we propose a novel method to infer distant homology relations of two proteins, that accounts for frameshift and point mutations that may have affected the coding sequences. We design a dynamic programming alignment algorithm over memory-efficient graph representations of the complete set of putative DNA sequences of each protein, with the goal of determining the two putative DNA sequences which have the best scoring alignment under a powerful scoring system designed to reflect the most probable evolutionary process. This allows us to uncover evolutionary information that is not captured by traditional alignment methods, which is confirmed by biologically significant examples.Comment: The 9th International Workshop in Algorithms in Bioinformatics (WABI), Philadelphia : \'Etats-Unis d'Am\'erique (2009
    • …
    corecore