88 research outputs found

    Matt: Local Flexibility Aids Protein Multiple Structure Alignment

    Get PDF
    Even when there is agreement on what measure a protein multiple structure alignment should be optimizing, finding the optimal alignment is computationally prohibitive. One approach used by many previous methods is aligned fragment pair chaining, where short structural fragments from all the proteins are aligned against each other optimally, and the final alignment chains these together in geometrically consistent ways. Ye and Godzik have recently suggested that adding geometric flexibility may help better model protein structures in a variety of contexts. We introduce the program Matt (Multiple Alignment with Translations and Twists), an aligned fragment pair chaining algorithm that, in intermediate steps, allows local flexibility between fragments: small translations and rotations are temporarily allowed to bring sets of aligned fragments closer, even if they are physically impossible under rigid body transformations. After a dynamic programming assembly guided by these “bent” alignments, geometric consistency is restored in the final step before the alignment is output. Matt is tested against other recent multiple protein structure alignment programs on the popular Homstrad and SABmark benchmark datasets. Matt's global performance is competitive with the other programs on Homstrad, but outperforms the other programs on SABmark, a benchmark of multiple structure alignments of proteins with more distant homology. On both datasets, Matt demonstrates an ability to better align the ends of α-helices and ÎČ-strands, an important characteristic of any structure alignment program intended to help construct a structural template library for threading approaches to the inverse protein-folding problem. The related question of whether Matt alignments can be used to distinguish distantly homologous structure pairs from pairs of proteins that are not homologous is also considered. For this purpose, a p-value score based on the length of the common core and average root mean squared deviation (RMSD) of Matt alignments is shown to largely separate decoys from homologous protein structures in the SABmark benchmark dataset. We postulate that Matt's strong performance comes from its ability to model proteins in different conformational states and, perhaps even more important, its ability to model backbone distortions in more distantly related proteins

    A data-mining approach for multiple structural alignment of proteins

    Get PDF
    Comparing the 3D structures of proteins is an important but computationally hard problem in bioinformatics. In this paper, we propose studying the problem when much less information or assumptions are available. We model the structural alignment of proteins as a combinatorial problem. In the problem, each protein is simply a set of points in the 3D space, without sequence order information, and the objective is to discover all large enough alignments for any subset of the input. We propose a data-mining approach for this problem. We first perform geometric hashing of the structures such that points with similar locations in the 3D space are hashed into the same bin in the hash table. The novelty is that we consider each bin as a coincidence group and mine for frequent patterns, which is a well-studied technique in data mining. We observe that these frequent patterns are already potentially large alignments. Then a simple heuristic is used to extend the alignments if possible. We implemented the algorithm and tested it using real protein structures. The results were compared with existing tools. They showed that the algorithm is capable of finding conserved substructures that do not preserve sequence order, especially those existing in protein interfaces. The algorithm can also identify conserved substructures of functionally similar structures within a mixture with dissimilar ones. The running time of the program was smaller or comparable to that of the existing tools

    An enhanced partial order curve comparison algorithm and its application to analyzing protein folding trajectories

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Understanding how proteins fold is essential to our quest in discovering how life works at the molecular level. Current computation power enables researchers to produce a huge amount of folding simulation data. Hence there is a pressing need to be able to interpret and identify novel folding features from them.</p> <p>Results</p> <p>In this paper, we model each folding trajectory as a multi-dimensional curve. We then develop an effective multiple curve comparison (MCC) algorithm, called the <it>enhanced partial order (EPO) </it>algorithm, to extract features from a set of diverse folding trajectories, including both successful and unsuccessful simulation runs. The EPO algorithm addresses several new challenges presented by comparing high dimensional curves coming from folding trajectories. A detailed case study on miniprotein Trp-cage <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> demonstrates that our algorithm can detect similarities at rather low level, and extract biologically meaningful folding events.</p> <p>Conclusion</p> <p>The EPO algorithm is general and applicable to a wide range of applications. We demonstrate its generality and effectiveness by applying it to aligning multiple protein structures with low similarities. For user's convenience, we provide a web server for the algorithm at <url>http://db.cse.ohio-state.edu/EPO</url>.</p

    Computational approaches to modeling the conserved structural core among distantly homologous proteins

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 95-103).Modem techniques in biology have produced sequence data for huge quantities of proteins, and 3-D structural information for a much smaller number of proteins. We introduce several algorithms that make use of the limited available structural information to classify and annotate proteins with structures that are unknown, but similar to solved structures. The first algorithm is actually a tool for better understanding solved structures themselves. Namely, we introduce the multiple alignment algorithm Matt (Multiple Alignment with Translations and Twists), an aligned fragment pair chaining algorithm that, in intermediate steps, allows local flexibility between fragments. Matt temporarily allows small translations and rotations to bring sets of fragments into closer alignment than physically possible under rigid body transformation. The second algorithm, BetaWrapPro, is designed to recognize sequences of unknown structure that belong to specific all-beta fold classes. BetaWrapPro employs a "wrapping" algorithm that uses long-distance pairwise residue preferences to recognize sequences belonging to the beta-helix and the beta-trefoil classes. It uses hand-curated beta-strand templates based on solved structures. Finally, SMURF (Structural Motifs Using Random Fields) combines ideas from both these algorithms into a general method to recognize beta-structural motifs using both sequence information and long-distance pairwise correlations involved in beta-sheet formation. For any beta-structural fold, SMURF uses Matt to automatically construct a template from an alignment of solved 3-D structures.(cont.) From this template, SMURF constructs a Markov random field that combines a profile hidden Markov model together with pairwise residue preferences of the type introduced by BetaWrapPro. The efficacy of SMURF is demonstrated on three beta-propeller fold classes.by Matthew Ewald Menke.Ph.D

    The Many Faces of Protein–Protein Interactions: A Compendium of Interface Geometry

    Get PDF
    A systematic classification of protein–protein interfaces is a valuable resource for understanding the principles of molecular recognition and for modelling protein complexes. Here, we present a classification of domain interfaces according to their geometry. Our new algorithm uses a hybrid approach of both sequential and structural features. The accuracy is evaluated on a hand-curated dataset of 416 interfaces. Our hybrid procedure achieves 83% precision and 95% recall, which improves the earlier sequence-based method by 5% on both terms. We classify virtually all domain interfaces of known structure, which results in nearly 6,000 distinct types of interfaces. In 40% of the cases, the interacting domain families associate in multiple orientations, suggesting that all the possible binding orientations need to be explored for modelling multidomain proteins and protein complexes. In general, hub proteins are shown to use distinct surface regions (multiple faces) for interactions with different partners. Our classification provides a convenient framework to query genuine gene fusion, which conserves binding orientation in both fused and separate forms. The result suggests that the binding orientations are not conserved in at least one-third of the gene fusion cases detected by a conventional sequence similarity search. We show that any evolutionary analysis on interfaces can be skewed by multiple binding orientations and multiple interaction partners. The taxonomic distribution of interface types suggests that ancient interfaces common to the three major kingdoms of life are enriched by symmetric homodimers. The classification results are online at http://www.scoppi.org

    Alternative Splicing and Protein Structure Evolution

    Get PDF
    In den letzten Jahren gab es in verschiedensten Bereichen der Biologie einen dramatischen Anstieg verfĂŒgbarer, experimenteller Daten. Diese erlauben zum ersten Mal eine detailierte Analyse der Funktionsweisen von zellulĂ€ren Komponenten wie Genen und Proteinen, die Analyse ihrer VerknĂŒpfung in zellulĂ€ren Netzwerken sowie der Geschichte ihrer Evolution. Insbesondere der Bioinformatik kommt hier eine wichtige Rolle in der Datenaufbereitung und ihrer biologischen Interpretation zu. In der vorliegenden Doktorarbeit werden zwei wichtige Bereiche der aktuellen bioinformatischen Forschung untersucht, nĂ€mlich die Analyse von Proteinstrukturevolution und Ähnlichkeiten zwischen Proteinstrukturen, sowie die Analyse von alternativem Splicing, einem integralen Prozess in eukaryotischen Zellen, der zur funktionellen DiversitĂ€t beitrĂ€gt. Insbesondere fĂŒhren wir mit dieser Arbeit die Idee einer kombinierten Analyse der beiden Mechanismen (Strukturevolution und Splicing) ein. Wir zeigen, dass sich durch eine kombinierte Betrachtung neue Einsichten gewinnen lassen, wie Strukturevolution und alternatives Splicing sowie eine Kopplung beider Mechanismen zu funktioneller und struktureller KomplexitĂ€t in höheren Organismen beitragen. Die in der Arbeit vorgestellten Methoden, Hypothesen und Ergebnisse können dabei einen Beitrag zu unserem VerstĂ€ndnis der Funktionsweise von Strukturevolution und alternativem Splicing bei der Entstehung komplexer Organismen leisten wodurch beide, traditionell getrennte Bereiche der Bioinformatik in Zukunft voneinander profitieren können

    ProteinDBS v2.0: a web server for global and local protein structure search

    Get PDF
    ProteinDBS v2.0 is a web server designed for efficient and accurate comparisons and searches of structurally similar proteins from a large-scale database. It provides two comparison methods, global-to-global and local-to-local, to facilitate the searches of protein structures or substructures. ProteinDBS v2.0 applies advanced feature extraction algorithms and scalable indexing techniques to achieve a high-running speed while preserving reasonably high precision of structural comparison. The experimental results show that our system is able to return results of global comparisons in seconds from a complete Protein Data Bank (PDB) database of 152 959 protein chains and that it takes much less time to complete local comparisons from a non-redundant database of 3276 proteins than other accurate comparison methods. ProteinDBS v2.0 supports query by PDB protein ID and by new structures uploaded by users. To our knowledge, this is the only search engine that can simultaneously support global and local comparisons. ProteinDBS v2.0 is a useful tool to investigate functional or evolutional relationships among proteins. Moreover, the common substructures identified by local comparison can be potentially used to assist the human curation process in discovering new domains or folds from the ever-growing protein structure databases. The system is hosted at http://ProteinDBS.rnet.missouri.edu

    A study of carboxylic ester hydrolases: structural classification, properties, and database

    Get PDF
    The carboxylic ester hydrolases (CEHs) are enzymes that hydrolyze an ester bond to form a carboxylic acid and an alcohol. They are one of the enzyme groups that are most explored industrially for their applications in the food, flavor, pharmaceutical, organic synthesis, and detergent industries. We classified CEHs into families and clans according to their amino acid sequences (primary structures) and three-dimensional structures (tertiary structures). Our work has established the systematic structural classification of the CEHs. Primary structures of family members are similar to each other, and their active sites and reaction mechanisms are conserved. The tertiary structures of members of each clan, which is composed of different families, remain very similar, although amino acid sequences of members of different families are not similar. CEHs were divided into 127 families by use of BLAST, with 67 families being grouped into seven clans. Multiple sequence alignment and tertiary structures superposition were used, and active sites and reaction mechanisms were analyzed. Python and Shell scripts were implemented to automate the process of comparing CEH primary and tertiary structures. A comprehensive database, CASTLE (CArboxylic eSTer hydroLasEs), may be constructed to provide the primary and tertiary structures of CEHs. This database would be available at www.castle.enzyme.iastate.edu and will be accessible to the entire biology community

    Beauty Is in the Eye of the Beholder: Proteins Can Recognize Binding Sites of Homologous Proteins in More than One Way

    Get PDF
    Understanding the mechanisms of protein–protein interaction is a fundamental problem with many practical applications. The fact that different proteins can bind similar partners suggests that convergently evolved binding interfaces are reused in different complexes. A set of protein complexes composed of non-homologous domains interacting with homologous partners at equivalent binding sites was collected in 2006, offering an opportunity to investigate this point. We considered 433 pairs of protein–protein complexes from the ABAC database (AB and AC binary protein complexes sharing a homologous partner A) and analyzed the extent of physico-chemical similarity at the atomic and residue level at the protein–protein interface. Homologous partners of the complexes were superimposed using Multiprot, and similar atoms at the interface were quantified using a five class grouping scheme and a distance cut-off. We found that the number of interfacial atoms with similar properties is systematically lower in the non-homologous proteins than in the homologous ones. We assessed the significance of the similarity by bootstrapping the atomic properties at the interfaces. We found that the similarity of binding sites is very significant between homologous proteins, as expected, but generally insignificant between the non-homologous proteins that bind to homologous partners. Furthermore, evolutionarily conserved residues are not colocalized within the binding sites of non-homologous proteins. We could only identify a limited number of cases of structural mimicry at the interface, suggesting that this property is less generic than previously thought. Our results support the hypothesis that different proteins can interact with similar partners using alternate strategies, but do not support convergent evolution
    • 

    corecore