Search CORE

4 research outputs found

Linear-time protein 3-D structure searching with insertions and deletions

Author: ACR Martin
AI Jewett
B Zhu
C Gergely
CH Chionh
D Bu
D Goldman
DG Corneil
DW Eggert
E Krissinel
F Zu-Kang
G Navarro
GH Golub
H Hasegawa
HA Kramers
HM Berman
I Eidhammer
IN Shindyalov
Jesper Jansson
JT Schwartz
KS Arun
Kunihiko Sadakane
L Holm
M Comin
M Shatsky
P Koehl
PG de Gennes
PJ Flory
RH Boyd
RH Lathrop
T Shibuya
T Shibuya
Tetsuo Shibuya
W Kabsch
W Kabsch
WR Taylor
Z Aung
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Two biomolecular 3-D structures are said to be similar if the RMSD (root mean square deviation) between the two molecules' sequences of 3-D coordinates is less than or equal to some given constant bound. Tools for searching for similar structures in biomolecular 3-D structure databases are becoming increasingly important in the structural biology of the post-genomic era. Results We consider an important, fundamental problem of reporting all substructures in a 3-D structure database of chain molecules (such as proteins) which are similar to a given query 3-D structure, with consideration of indels (<it>i.e.</it>, insertions and deletions). This problem has been believed to be very difficult but its exact computational complexity has not been known. In this paper, we first prove that the problem in unbounded dimensions is NP-hard. We then propose a new algorithm that dramatically improves the average-case time complexity of the problem in 3-D in case the number of indels <it>k </it>is bounded by a constant. Our algorithm solves the above problem for a query of size <it>m </it>and a database of size <it>N </it>in average-case <it>O</it>(<it>N</it>) time, whereas the time complexity of the previously best algorithm was <it>O</it>(<it>Nm</it><it>k</it>+1). Conclusions Our results show that although the problem of searching for similar structures in a database based on the RMSD measure with indels is NP-hard in the case of unbounded dimensions, it can be solved in 3-D by a simple average-case linear time algorithm when the number of indels is bounded by a constant.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

TS-AMIR: a topology string alignment method for intensive rapid protein structure comparison

Author: A Bogan-Marta
ACR Martin
AP Singh
AR Ortiz
B Kolbeck
CH Chionh
CH Tung
E Krissinel
F Guyon
G Mayr
I Budowski-Tal
I Shindyalov
J Razmara
Jafar Razmara
JF Gibrat
L Holm
L Liao
M Carpentier
M Novotny
ML Sierk
R Kolodny
RA Bauer
S Salem
Safaai Deris
Sepideh Parvizpour
SF Altschul
T Kawabata
T Shibuya
W Kabsch
W Kabsch
WC Lo
WR Pearson
Y Zhang
Y Zhang
Z Aung
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Abstract Background In structural biology, similarity analysis of protein structure is a crucial step in studying the relationship between proteins. Despite the considerable number of techniques that have been explored within the past two decades, the development of new alternative methods is still an active research area due to the need for high performance tools. Results In this paper, we present TS-AMIR, a Topology String Alignment Method for Intensive Rapid comparison of protein structures. The proposed method works in two stages: In the first stage, the method generates a topology string based on the geometric details of secondary structure elements, and then, utilizes an n-gram modelling technique over entropy concept to capture similarities in these strings. This initial correspondence map between secondary structure elements is submitted to the second stage in order to obtain the alignment at the residue level. Applying the Kabsch method, a heuristic step-by-step algorithm is adopted in the second stage to align the residues, resulting in an optimal rotation matrix and minimized RMSD. The performance of the method was assessed in different information retrieval tests and the results were compared with those of CE and TM-align, representing two geometrical tools, and YAKUSA, 3D-BLAST and SARST as three representatives of linear encoding schemes. It is shown that the method obtains a high running speed similar to that of the linear encoding schemes. In addition, the method runs about 800 and 7200 times faster than TM-align and CE respectively, while maintaining a competitive accuracy with TM-align and CE. Conclusions The experimental results demonstrate that linear encoding techniques are capable of reaching the same high degree of accuracy as that achieved by geometrical methods, while generally running hundreds of times faster than conventional programs.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Linear-Time Protein 3-D Structure Searching with Insertions and Deletions

Author: D.G. Corneil
D.W. Eggert
G. Navarro
H.A. Kramers
H.M. Berman
I. Eidhammer
J. Dayantis
J.T. Schwartz
J.W. Cooley
K.S. Arun
O. Kallenberg
P. Koehl
P.J. Flory
R.H. Boyd
T. Shibuya
T. Shibuya
W. Kabsch
W. Kabsch
Z. Aung
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Crossref

Sadakane K: Linear-time protein 3-D structure searching with insertions and deletions. Algorithms for Molecular Biology 2010

Author: Jesper Jansson
Kunihiko Sadakane
Tetsuo Shibuya
Publication venue
Publication date: 24/04/2020
Field of study

Abstract. It becomes more and more important to search for similar structures from molecular 3-D structure databases in the structural biology of the post genomic era. Two molecules are said to be similar if the RMSD (root mean square deviation) of the two molecules is less than or equal to some given constant bound. In this paper, we consider an important, fundamental problem of finding all the similar substructures from 3-D structure databases of chain molecules (such as proteins), with consideration of indels (i.e., insertions and deletions). The problem has been believed to be very difficult, but its computational difficulty has not been well known. In this paper, we first show that the same problem in arbitrary dimension is NP-hard. Moreover, we also propose a new algorithm that dramatically improves the average-case time complexity for the problem, in case the number of indels k is bounded by some constant. Our algorithm solves the above problem in average O(N ) time, while the time complexity of the best known algorithm was O(Nm k+1 ), for a query of size m and a database of size N

CiteSeerX