Search CORE

UCL Discovery

Public Library of Science (PLOS)

A Mathematical Framework for Protein Structure Comparison

Author: A Srivastava
A Srivastava
A Zemla
AG Murzin
Anuj Srivastava
AR Ortiz
AS Konagurthu
B Kolbeck
C Berbalk
CA Orengo
CA Orengo
DL Theobald
E Klassen
E Krissinel
F Teichert
G Mayr
H Hasegawa
HM Berman
IN Shindyalov
J Dundas
J Ebert
J Zhang
J Zhang
J Zhu
JF Gibrat
Jinfeng Zhang
K Illergard
L Holm
L Holm
L Lo Conte
M Levitt
M Menke
M Shatsky
M Shatsky
MJ Sippl
N Furnham
O Dror
P Koehl
PD Dobson
QS Du
R Kolodny
R Kolodny
R Mosca
R Mosca
Roland L. Dunbrack
S Kurtek
SH Joshi
SR Eddy
VA Ilyin
W Mio
Wei Liu
WR Taylor
X Zhou
Y Ye
Y Zhang
YJ Huang
Publication venue: Public Library of Science
Publication date: 03/02/2011
Field of study

Comparison of protein structures is important for revealing the evolutionary relationship among proteins, predicting protein functions and predicting protein structures. Many methods have been developed in the past to align two or multiple protein structures. Despite the importance of this problem, rigorous mathematical or statistical frameworks have seldom been pursued for general protein structure comparison. One notable issue in this field is that with many different distances used to measure the similarity between protein structures, none of them are proper distances when protein structures of different sequences are compared. Statistical approaches based on those non-proper distances or similarity scores as random variables are thus not mathematically rigorous. In this work, we develop a mathematical framework for protein structure comparison by treating protein structures as three-dimensional curves. Using an elastic Riemannian metric on spaces of curves, geodesic distance, a proper distance on spaces of curves, can be computed for any two protein structures. In this framework, protein structures can be treated as random variables on the shape manifold, and means and covariance can be computed for populations of protein structures. Furthermore, these moments can be used to build Gaussian-type probability distributions of protein structures for use in hypothesis testing. The covariance of a population of protein structures can reveal the population-specific variations and be helpful in improving structure classification. With curves representing protein structures, the matching is performed using elastic shape analysis of curves, which can effectively model conformational changes and insertions/deletions. We show that our method performs comparably with commonly used methods in protein structure classification on a large manually annotated data set

Linear-time protein 3-D structure searching with insertions and deletions

Author: ACR Martin
AI Jewett
B Zhu
C Gergely
CH Chionh
D Bu
D Goldman
DG Corneil
DW Eggert
E Krissinel
F Zu-Kang
G Navarro
GH Golub
H Hasegawa
HA Kramers
HM Berman
I Eidhammer
IN Shindyalov
Jesper Jansson
JT Schwartz
KS Arun
Kunihiko Sadakane
L Holm
M Comin
M Shatsky
P Koehl
PG de Gennes
PJ Flory
RH Boyd
RH Lathrop
T Shibuya
T Shibuya
Tetsuo Shibuya
W Kabsch
W Kabsch
WR Taylor
Z Aung
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Two biomolecular 3-D structures are said to be similar if the RMSD (root mean square deviation) between the two molecules' sequences of 3-D coordinates is less than or equal to some given constant bound. Tools for searching for similar structures in biomolecular 3-D structure databases are becoming increasingly important in the structural biology of the post-genomic era. Results We consider an important, fundamental problem of reporting all substructures in a 3-D structure database of chain molecules (such as proteins) which are similar to a given query 3-D structure, with consideration of indels (<it>i.e.</it>, insertions and deletions). This problem has been believed to be very difficult but its exact computational complexity has not been known. In this paper, we first prove that the problem in unbounded dimensions is NP-hard. We then propose a new algorithm that dramatically improves the average-case time complexity of the problem in 3-D in case the number of indels <it>k </it>is bounded by a constant. Our algorithm solves the above problem for a query of size <it>m </it>and a database of size <it>N </it>in average-case <it>O</it>(<it>N</it>) time, whereas the time complexity of the previously best algorithm was <it>O</it>(<it>Nm</it><it>k</it>+1). Conclusions Our results show that although the problem of searching for similar structures in a database based on the RMSD measure with indels is NP-hard in the case of unbounded dimensions, it can be solved in 3-D by a simple average-case linear time algorithm when the number of indels is bounded by a constant.</p

An enhanced partial order curve comparison algorithm and its application to analyzing protein folding trajectories

Author: A Murzin
AK Jain
C Grasso
C Guda
C Lee
C Levinthal
CA Orengo
CA Orengo
D Lupyan
D Lupyan
E Krissinel
E Sandelin
F Chiti
Hakan Ferhatosmanoglu
Hong Sun
IN Shindyalov
J Neidigh
JF Gibrat
JM Borreguero
K Kedem
L Holm
L Holm
LP Chew
M Gerstein
M Ota
M Shatsky
ME Ochagavía
MJ Sutcliffe
Motonori Ota
NV Dokholyan
P Wolynes
R Du
R Koike
SB Needleman
SW Lockless
TF Smith
VI Abkevich
W Taylor
Y Caspi
Y Ye
Yusu Wang
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Understanding how proteins fold is essential to our quest in discovering how life works at the molecular level. Current computation power enables researchers to produce a huge amount of folding simulation data. Hence there is a pressing need to be able to interpret and identify novel folding features from them. Results In this paper, we model each folding trajectory as a multi-dimensional curve. We then develop an effective multiple curve comparison (MCC) algorithm, called the <it>enhanced partial order (EPO) </it>algorithm, to extract features from a set of diverse folding trajectories, including both successful and unsuccessful simulation runs. The EPO algorithm addresses several new challenges presented by comparing high dimensional curves coming from folding trajectories. A detailed case study on miniprotein Trp-cage <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> demonstrates that our algorithm can detect similarities at rather low level, and extract biologically meaningful folding events. Conclusion The EPO algorithm is general and applicable to a wide range of applications. We demonstrate its generality and effectiveness by applying it to aligning multiple protein structures with low similarities. For user's convenience, we provide a web server for the algorithm at <url>http://db.cse.ohio-state.edu/EPO</url>.</p

Representing and comparing protein structures as paths in three-dimensional space

Author: A Andreeva
A Aszodi
A Godzik
A Harrison
A Kolinski
A Zemla
Adam Godzik
AG Murzin
AP Yamniuk
AR Ortiz
CA Orengo
D Sankoff
Degui Zhi
E Krissinel
F Pearl
G Wang
GJ Kleywegt
H Gong
Haibo Cao
HM Berman
IJ Byeon
IN Shindyalov
K Mizuguchi
L Holm
L Jaroszewski
L Jaroszewski
M Shatsky
MS Waterman
Pavel Pevzner
R Kolodny
S Dietmann
S Sri Krishna
SB Needleman
ST Rao
T Can
T Madej
TF Smith
W Kabsch
WR Taylor
WR Taylor
Y Ye
Z Li
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Most existing formulations of protein structure comparison are based on detailed atomic level descriptions of protein structures and bypass potential insights that arise from a higher-level abstraction. RESULTS: We propose a structure comparison approach based on a simplified representation of proteins that describes its three-dimensional path by local curvature along the generalized backbone of the polypeptide. We have implemented a dynamic programming procedure that aligns curvatures of proteins by optimizing a defined sum turning angle deviation measure. CONCLUSION: Although our procedure does not directly optimize global structural similarity as measured by RMSD, our benchmarking results indicate that it can surprisingly well recover the structural similarity defined by structure classification databases and traditional structure alignment programs. In addition, our program can recognize similarities between structures with extensive conformation changes that are beyond the ability of traditional structure alignment programs. We demonstrate the applications of procedure to several contexts of structure comparison. An implementation of our procedure, CURVE, is available as a public webserver

eScholarship - University of California

Insertions and the emergence of novel protein structure: a structure-based phylogenetic study of insertions

Author: A Sali
AR Panchenko
AR Panchenko
BG Hall
C Blouin
C Chothia
C Notredame
Christian Blouin
D Frishman
EI Petersen
EV Koonin
GP Karev
Haiyan Jiang
I Van Walle
IN Shindyalov
J Casbon
J Felsentein
JM Chandonia
K Mizuguchi
L Aravind
L Holm
L Ribas De Pouplana
M Clamp
M Heinig
M Shatsky
MB Eisen
N Saitou
NV Dokholyan
NV Grishin
O O'Sullivan
O Poirot
P O'Donoghue
P O'Donoghue
R Breitling
R Development Core Team
RB Russell
S Balaji
S Guindon
S Guindon
S Pascarella
SA Benner
UG Wagner
W Humphrey
WL DeLano
WR Taylor
Y Wolf
Y Ye
Y Ye
ZY Zhu
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background In protein evolution, the mechanism of the emergence of novel protein domain is still an open question. The incremental growth of protein variable regions, which was produced by stochastic insertions, has the potential to generate large and complex sub-structures. In this study, a deterministic methodology is proposed to reconstruct phylogenies from protein structures, and to infer insertion events in protein evolution. The analysis was performed on a broad range of SCOP domain families. Results Phylogenies were reconstructed from protein 3D structural data. The phylogenetic trees were used to infer ancestral structures with a consensus method. From these ancestral reconstructions, 42.7% of the observed insertions are nested insertions, which locate in previous insert regions. The average size of inserts tends to increase with the insert rank or total number of insertions in the variable regions. We found that the structures of some nested inserts show complex or even domain-like fold patterns with helices, strands and loops. Furthermore, a basal level of structural innovation was found in inserts which displayed a significant structural similarity exclusively to themselves. The β-Lactamase/D-ala carboxypeptidase domain family is provided as an example to illustrate the inference of insertion events, and how the incremental growth of a variable region is capable to generate novel structural patterns. Conclusion Using 3D data, we proposed a method to reconstruct phylogenies. We applied the method to reconstruct the sequences of insertion events leading to the emergence of potentially novel structural elements within existing protein domains. The results suggest that structural innovation is possible via the stochastic process of insertions and rapid evolution within variable regions where inserts tend to be nested. We also demonstrate that the structure-based phylogeny enables the study of new questions relating to the evolution of protein domain and biological function.</p

Public Library of Science (PLOS)

Cross-Over between Discrete and Continuous Protein Structure Space: Insights into Automatic Classification and Networks of Protein Structures

Structural classifications of proteins assume the existence of the fold, which is an intrinsic equivalence class of protein domains. Here, we test in which conditions such an equivalence class is compatible with objective similarity measures. We base our analysis on the transitive property of the equivalence relationship, requiring that similarity of A with B and B with C implies that A and C are also similar. Divergent gene evolution leads us to expect that the transitive property should approximately hold. However, if protein domains are a combination of recurrent short polypeptide fragments, as proposed by several authors, then similarity of partial fragments may violate the transitive property, favouring the continuous view of the protein structure space. We propose a measure to quantify the violations of the transitive property when a clustering algorithm joins elements into clusters, and we find out that such violations present a well defined and detectable cross-over point, from an approximately transitive regime at high structure similarity to a regime with large transitivity violations and large differences in length at low similarity. We argue that protein structure space is discrete and hierarchic classification is justified up to this cross-over point, whereas at lower similarities the structure space is continuous and it should be represented as a network. We have tested the qualitative behaviour of this measure, varying all the choices involved in the automatic classification procedure, i.e., domain decomposition, alignment algorithm, similarity score, and clustering algorithm, and we have found out that this behaviour is quite robust. The final classification depends on the chosen algorithms. We used the values of the clustering coefficient and the transitivity violations to select the optimal choices among those that we tested. Interestingly, this criterion also favours the agreement between automatic and expert classifications. As a domain set, we have selected a consensus set of 2,890 domains decomposed very similarly in SCOP and CATH. As an alignment algorithm, we used a global version of MAMMOTH developed in our group, which is both rapid and accurate. As a similarity measure, we used the size-normalized contact overlap, and as a clustering algorithm, we used average linkage. The resulting automatic classification at the cross-over point was more consistent than expert ones with respect to the structure similarity measure, with 86% of the clusters corresponding to subsets of either SCOP or CATH superfamilies and fewer than 5% containing domains in distinct folds according to both SCOP and CATH. Almost 15% of SCOP superfamilies and 10% of CATH superfamilies were split, consistent with the notion of fold change in protein evolution. These results were qualitatively robust for all choices that we tested, although we did not try to use alignment algorithms developed by other groups. Folds defined in SCOP and CATH would be completely joined in the regime of large transitivity violations where clustering is more arbitrary. Consistently, the agreement between SCOP and CATH at fold level was lower than their agreement with the automatic classification obtained using as a clustering algorithm, respectively, average linkage (for SCOP) or single linkage (for CATH). The networks representing significant evolutionary and structural relationships between clusters beyond the cross-over point may allow us to perform evolutionary, structural, or functional analyses beyond the limits of classification schemes. These networks and the underlying clusters are available at http://ub.cbm.uam.es/research/ProtNet.ph

Secretaría de Estado de Cultura

Digital.CSIC

A novel method to compare protein structures using local descriptors

Abstract Background Protein structure comparison is one of the most widely performed tasks in bioinformatics. However, currently used methods have problems with the so-called "difficult similarities", including considerable shifts and distortions of structure, sequential swaps and circular permutations. There is a demand for efficient and automated systems capable of overcoming these difficulties, which may lead to the discovery of previously unknown structural relationships. Results We present a novel method for protein structure comparison based on the formalism of local descriptors of protein structure - DEscriptor Defined Alignment (DEDAL). Local similarities identified by pairs of similar descriptors are extended into global structural alignments. We demonstrate the method's capability by aligning structures in difficult benchmark sets: curated alignments in the SISYPHUS database, as well as SISY and RIPC sets, including non-sequential and non-rigid-body alignments. On the most difficult RIPC set of sequence alignment pairs the method achieves an accuracy of 77% (the second best method tested achieves 60% accuracy). Conclusions DEDAL is fast enough to be used in whole proteome applications, and by lowering the threshold of detectable structure similarity it may shed additional light on molecular evolution processes. It is well suited to improving automatic classification of structure domains, helping analyze protein fold space, or to improving protein classification schemes. DEDAL is available online at <url>http://bioexploratorium.pl/EP/DEDAL</url>.</p

A cross-kingdom internal ribosome entry site reveals a simplified mode of internal ribosome entry

Author: Andreev DE
Belsham GJ
Dmitriev SE
Roberts LO
Royall E
Shatsky IN
Terenin IM
Publication venue: 'American Society for Microbiology'
Publication date: 24/01/2020
Field of study

University of Surrey

Messenger RNA Path Through the Procaryotic Ribosome

Author: A Evstafieva
AA Bogdanov
AE Dahlberg
AG Balakin
AG Balakin
BS Cooperman
C Kang
D Hartz
D Moazed
D Moazed
EA Skripkin
EN Trifonov
H McKuskie-Olson
HM Olson
I Fiser
IN Shatsky
IN Shatsky
J Frank
J Heider
J Rinke-Appel
JA Steitz
JB Prince
LA Sylvers
M Stoffler-Melicke
MR Trempe
N Towbin
NE Broude
O Dontsova
O Dontsova
OA Dontsova
OA Dontsova
P Melancon
P Wollenzien
R Bhangu
R Brimacombe
R Brimacombe
R Denman
S Vladimirov
W Tate
WE Tapprich
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/1993
Field of study