Search CORE

A data-mining approach for multiple structural alignment of proteins

Author: Chan Ho-Leung
Mamoulis Nikos
Siu Wing-Yan
Yiu Siu-Ming
Publication venue: Biomedical Informatics Publishing Group
Publication date: 01/01/2010
Field of study

Comparing the 3D structures of proteins is an important but computationally hard problem in bioinformatics. In this paper, we propose studying the problem when much less information or assumptions are available. We model the structural alignment of proteins as a combinatorial problem. In the problem, each protein is simply a set of points in the 3D space, without sequence order information, and the objective is to discover all large enough alignments for any subset of the input. We propose a data-mining approach for this problem. We first perform geometric hashing of the structures such that points with similar locations in the 3D space are hashed into the same bin in the hash table. The novelty is that we consider each bin as a coincidence group and mine for frequent patterns, which is a well-studied technique in data mining. We observe that these frequent patterns are already potentially large alignments. Then a simple heuristic is used to extend the alignments if possible. We implemented the algorithm and tested it using real protein structures. The results were compared with existing tools. They showed that the algorithm is capable of finding conserved substructures that do not preserve sequence order, especially those existing in protein interfaces. The algorithm can also identify conserved substructures of functionally similar structures within a mixture with dissimilar ones. The running time of the program was smaller or comparable to that of the existing tools

HKU Scholars Hub

A method for simultaneous alignment of multiple protein structures

Author: Akutsu
Akutsu
Bachar
Bashford
Berman
deBerg
Dror
Dror
Durbin
Eidhammer
Fischer
Fischer
Forman-Kay
Gribskov
Holm
Jonassen
Kosloff
Leibowitz
Leibowitz
Lemmen
Madej
Mitchel
Mizuguchi
Murzin
Nussinov
Russell
Sandak
Shatsky
Shatsky
Shatsky
Shindyalov
Taylor
Taylor
Vriend
Wu
Publication venue: 'Wiley'
Publication date
Field of study

An enhanced partial order curve comparison algorithm and its application to analyzing protein folding trajectories

Author: A Murzin
AK Jain
C Grasso
C Guda
C Lee
C Levinthal
CA Orengo
CA Orengo
D Lupyan
D Lupyan
E Krissinel
E Sandelin
F Chiti
Hakan Ferhatosmanoglu
Hong Sun
IN Shindyalov
J Neidigh
JF Gibrat
JM Borreguero
K Kedem
L Holm
L Holm
LP Chew
M Gerstein
M Ota
M Shatsky
ME Ochagavía
MJ Sutcliffe
Motonori Ota
NV Dokholyan
P Wolynes
R Du
R Koike
SB Needleman
SW Lockless
TF Smith
VI Abkevich
W Taylor
Y Caspi
Y Ye
Yusu Wang
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Understanding how proteins fold is essential to our quest in discovering how life works at the molecular level. Current computation power enables researchers to produce a huge amount of folding simulation data. Hence there is a pressing need to be able to interpret and identify novel folding features from them. Results In this paper, we model each folding trajectory as a multi-dimensional curve. We then develop an effective multiple curve comparison (MCC) algorithm, called the <it>enhanced partial order (EPO) </it>algorithm, to extract features from a set of diverse folding trajectories, including both successful and unsuccessful simulation runs. The EPO algorithm addresses several new challenges presented by comparing high dimensional curves coming from folding trajectories. A detailed case study on miniprotein Trp-cage <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> demonstrates that our algorithm can detect similarities at rather low level, and extract biologically meaningful folding events. Conclusion The EPO algorithm is general and applicable to a wide range of applications. We demonstrate its generality and effectiveness by applying it to aligning multiple protein structures with low similarities. For user's convenience, we provide a web server for the algorithm at <url>http://db.cse.ohio-state.edu/EPO</url>.</p

Springer - Publisher Connector

Public Library of Science (PLOS)

Computational approaches to modeling the conserved structural core among distantly homologous proteins

Author: Menke Matthew Ewald, 1978-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2009
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 95-103).Modem techniques in biology have produced sequence data for huge quantities of proteins, and 3-D structural information for a much smaller number of proteins. We introduce several algorithms that make use of the limited available structural information to classify and annotate proteins with structures that are unknown, but similar to solved structures. The first algorithm is actually a tool for better understanding solved structures themselves. Namely, we introduce the multiple alignment algorithm Matt (Multiple Alignment with Translations and Twists), an aligned fragment pair chaining algorithm that, in intermediate steps, allows local flexibility between fragments. Matt temporarily allows small translations and rotations to bring sets of fragments into closer alignment than physically possible under rigid body transformation. The second algorithm, BetaWrapPro, is designed to recognize sequences of unknown structure that belong to specific all-beta fold classes. BetaWrapPro employs a "wrapping" algorithm that uses long-distance pairwise residue preferences to recognize sequences belonging to the beta-helix and the beta-trefoil classes. It uses hand-curated beta-strand templates based on solved structures. Finally, SMURF (Structural Motifs Using Random Fields) combines ideas from both these algorithms into a general method to recognize beta-structural motifs using both sequence information and long-distance pairwise correlations involved in beta-sheet formation. For any beta-structural fold, SMURF uses Matt to automatically construct a template from an alignment of solved 3-D structures.(cont.) From this template, SMURF constructs a Markov random field that combines a profile hidden Markov model together with pairwise residue preferences of the type introduced by BetaWrapPro. The efficacy of SMURF is demonstrated on three beta-propeller fold classes.by Matthew Ewald Menke.Ph.D

CiteSeerX

DSpace@MIT

The Many Faces of Protein–Protein Interactions: A Compendium of Interface Geometry

Author: Andreas Henschel
Christof Winter
Michael Schroeder
Philip E Bourne
Wan Kyu Kim
Publication venue: Public Library of Science
Publication date: 01/01/2005
Field of study

A systematic classification of protein–protein interfaces is a valuable resource for understanding the principles of molecular recognition and for modelling protein complexes. Here, we present a classification of domain interfaces according to their geometry. Our new algorithm uses a hybrid approach of both sequential and structural features. The accuracy is evaluated on a hand-curated dataset of 416 interfaces. Our hybrid procedure achieves 83% precision and 95% recall, which improves the earlier sequence-based method by 5% on both terms. We classify virtually all domain interfaces of known structure, which results in nearly 6,000 distinct types of interfaces. In 40% of the cases, the interacting domain families associate in multiple orientations, suggesting that all the possible binding orientations need to be explored for modelling multidomain proteins and protein complexes. In general, hub proteins are shown to use distinct surface regions (multiple faces) for interactions with different partners. Our classification provides a convenient framework to query genuine gene fusion, which conserves binding orientation in both fused and separate forms. The result suggests that the binding orientations are not conserved in at least one-third of the gene fusion cases detected by a conventional sequence similarity search. We show that any evolutionary analysis on interfaces can be skewed by multiple binding orientations and multiple interaction partners. The taxonomic distribution of interface types suggests that ancient interfaces common to the three major kingdoms of life are enriched by symmetric homodimers. The classification results are online at http://www.scoppi.org

Digitale Hochschulschriften der LMU

Alternative Splicing and Protein Structure Evolution

Author: Birzele Fabian
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 27/01/2009
Field of study

In den letzten Jahren gab es in verschiedensten Bereichen der Biologie einen dramatischen Anstieg verfügbarer, experimenteller Daten. Diese erlauben zum ersten Mal eine detailierte Analyse der Funktionsweisen von zellulären Komponenten wie Genen und Proteinen, die Analyse ihrer Verknüpfung in zellulären Netzwerken sowie der Geschichte ihrer Evolution. Insbesondere der Bioinformatik kommt hier eine wichtige Rolle in der Datenaufbereitung und ihrer biologischen Interpretation zu. In der vorliegenden Doktorarbeit werden zwei wichtige Bereiche der aktuellen bioinformatischen Forschung untersucht, nämlich die Analyse von Proteinstrukturevolution und Ähnlichkeiten zwischen Proteinstrukturen, sowie die Analyse von alternativem Splicing, einem integralen Prozess in eukaryotischen Zellen, der zur funktionellen Diversität beiträgt. Insbesondere führen wir mit dieser Arbeit die Idee einer kombinierten Analyse der beiden Mechanismen (Strukturevolution und Splicing) ein. Wir zeigen, dass sich durch eine kombinierte Betrachtung neue Einsichten gewinnen lassen, wie Strukturevolution und alternatives Splicing sowie eine Kopplung beider Mechanismen zu funktioneller und struktureller Komplexität in höheren Organismen beitragen. Die in der Arbeit vorgestellten Methoden, Hypothesen und Ergebnisse können dabei einen Beitrag zu unserem Verständnis der Funktionsweise von Strukturevolution und alternativem Splicing bei der Entstehung komplexer Organismen leisten wodurch beide, traditionell getrennte Bereiche der Bioinformatik in Zukunft voneinander profitieren können

ProteinDBS v2.0: a web server for global and local protein structure search

Author: B. Pang
C.-R. Shyu
Carpentier
D. Korkin
D. Xu
Friedberg
Holm
Hvidsten
Martin
Murzin
N. Zhao
Ortiz
P.-H. Chi
Shatsky
Shindyalov
Shyu
S ding
Yang
Zarembinski
Publication venue: Oxford University Press
Publication date
Field of study

ProteinDBS v2.0 is a web server designed for efficient and accurate comparisons and searches of structurally similar proteins from a large-scale database. It provides two comparison methods, global-to-global and local-to-local, to facilitate the searches of protein structures or substructures. ProteinDBS v2.0 applies advanced feature extraction algorithms and scalable indexing techniques to achieve a high-running speed while preserving reasonably high precision of structural comparison. The experimental results show that our system is able to return results of global comparisons in seconds from a complete Protein Data Bank (PDB) database of 152 959 protein chains and that it takes much less time to complete local comparisons from a non-redundant database of 3276 proteins than other accurate comparison methods. ProteinDBS v2.0 supports query by PDB protein ID and by new structures uploaded by users. To our knowledge, this is the only search engine that can simultaneously support global and local comparisons. ProteinDBS v2.0 is a useful tool to investigate functional or evolutional relationships among proteins. Moreover, the common substructures identified by local comparison can be potentially used to assist the human curation process in discovering new domains or folds from the ever-growing protein structure databases. The system is hosted at http://ProteinDBS.rnet.missouri.edu

Digital Repository @ Iowa State University (ISU)

A study of carboxylic ester hydrolases: structural classification, properties, and database

Author: Chen Yingfei
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2015
Field of study

The carboxylic ester hydrolases (CEHs) are enzymes that hydrolyze an ester bond to form a carboxylic acid and an alcohol. They are one of the enzyme groups that are most explored industrially for their applications in the food, flavor, pharmaceutical, organic synthesis, and detergent industries. We classified CEHs into families and clans according to their amino acid sequences (primary structures) and three-dimensional structures (tertiary structures). Our work has established the systematic structural classification of the CEHs. Primary structures of family members are similar to each other, and their active sites and reaction mechanisms are conserved. The tertiary structures of members of each clan, which is composed of different families, remain very similar, although amino acid sequences of members of different families are not similar. CEHs were divided into 127 families by use of BLAST, with 67 families being grouped into seven clans. Multiple sequence alignment and tertiary structures superposition were used, and active sites and reaction mechanisms were analyzed. Python and Shell scripts were implemented to automate the process of comparing CEH primary and tertiary structures. A comprehensive database, CASTLE (CArboxylic eSTer hydroLasEs), may be constructed to provide the primary and tertiary structures of CEHs. This database would be available at www.castle.enzyme.iastate.edu and will be accessible to the entire biology community

Beauty Is in the Eye of the Beholder: Proteins Can Recognize Binding Sites of Homologous Proteins in More than One Way

Understanding the mechanisms of protein–protein interaction is a fundamental problem with many practical applications. The fact that different proteins can bind similar partners suggests that convergently evolved binding interfaces are reused in different complexes. A set of protein complexes composed of non-homologous domains interacting with homologous partners at equivalent binding sites was collected in 2006, offering an opportunity to investigate this point. We considered 433 pairs of protein–protein complexes from the ABAC database (AB and AC binary protein complexes sharing a homologous partner A) and analyzed the extent of physico-chemical similarity at the atomic and residue level at the protein–protein interface. Homologous partners of the complexes were superimposed using Multiprot, and similar atoms at the interface were quantified using a five class grouping scheme and a distance cut-off. We found that the number of interfacial atoms with similar properties is systematically lower in the non-homologous proteins than in the homologous ones. We assessed the significance of the similarity by bootstrapping the atomic properties at the interfaces. We found that the similarity of binding sites is very significant between homologous proteins, as expected, but generally insignificant between the non-homologous proteins that bind to homologous partners. Furthermore, evolutionarily conserved residues are not colocalized within the binding sites of non-homologous proteins. We could only identify a limited number of cases of structural mimicry at the interface, suggesting that this property is less generic than previously thought. Our results support the hypothesis that different proteins can interact with similar partners using alternate strategies, but do not support convergent evolution

CiteSeerX

Public Library of Science (PLOS)