Search CORE

8 research outputs found

A simple and fast heuristic for protein structure comparison

Author: A Caprara
A Caprara
A May
A Murzin
A Zemla
B Thiruv
D Barthel
D Fischer
D Goldman
D Pelta
D Zhi
David A Pelta
DM Strickland
G Lancia
G Lancia
H Liisa
I Eidhammer
I Shindyalov
J Leluk
Juan R González
L Chew
L Holm
L Holm
Marcos Moreno Vega
N Krasnogor
N Krasnogor
N Leibowitz
P Bourne
P Hansen
P Hansen
P Koehl
R Development Core Team
RA Laskowski
W Taylor
W Xie
W Xie
Z Aung
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Background Protein structure comparison is a key problem in bioinformatics. There exist several methods for doing protein comparison, being the solution of the Maximum Contact Map Overlap problem (MAX-CMO) one of the alternatives available. Although this problem may be solved using exact algorithms, researchers require approximate algorithms that obtain good quality solutions using less computational resources than the formers. Results We propose a variable neighborhood search metaheuristic for solving MAX-CMO. We analyze this strategy in two aspects: 1) from an optimization point of view the strategy is tested on two different datasets, obtaining an error of 3.5%(over 2702 pairs) and 1.7% (over 161 pairs) with respect to optimal values; thus leading to high accurate solutions in a simpler and less expensive way than exact algorithms; 2) in terms of protein structure classification, we conduct experiments on three datasets and show that is feasible to detect structural similarities at SCOP's family and CATH's architecture levels using normalized overlap values. Some limitations and the role of normalization are outlined for doing classification at SCOP's fold level. Conclusion We designed, implemented and tested.a new tool for solving MAX-CMO, based on a well-known metaheuristic technique. The good balance between solution's quality and computational effort makes it a valuable tool. Moreover, to the best of our knowledge, this is the first time the MAX-CMO measure is tested at SCOP's fold and CATH's architecture levels with encouraging results. Software is available for download at http://modo.ugr.es/jrgonzalez/msvns4maxcmo webcite.This work is supported by Projects HeuriCosc TIN2005-08404-C04-01, HeuriCode TIN2005-08404-C04-03, both from the Spanish Ministry of Education and Science. JRG acknowledges financial support from Project TIC2002-04242-C03-02. Authors thank N. Krasnogor and ProCKSi project (BB/C511764/1) for their support

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Repositorio Institucional Universidad de Granada

Comparison study on k-word statistical measures for protein: From sequence to 'sequence space'

Author: A Andreeva
A Bairoch
A Bateman
A Kelil
AP Bradley
B Rost
B Thiruv
BE Blaisdell
CH Wu
D Barthel
EM Taylor
F Pearl
F Ronquist
G Didier
G Fichant
G Reinert
GW Stuart
J Felsenstein
J Felsenstein
J Felsenstein
J Lowe
J Soppa
JM Word
JP Egan
JP Huelsenbeck
K Komatsu
KP Wu
LP Chew
M Hirano
M Sierk
N Cobbe
N Krasnogor
N Saitoh
P Ferragina
Qi Dai
S Hochreiter
S Kumar
S Vinga
S Vinga
SF Altschul
SF Altschul
TD Pham
TD Pham
Tianming Wang
TJ Wu
TJ Wu
W Li
Y Fujioka
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Many proposed statistical measures can efficiently compare protein sequence to further infer protein structure, function and evolutionary information. They share the same idea of using <it>k</it>-word frequencies of protein sequences. Given a protein sequence, the information on its related protein sequences hasn't been used for protein sequence comparison until now. This paper proposed a scheme to construct protein 'sequence space' which was associated with protein sequences related to the given protein, and the performances of statistical measures were compared when they explored the information on protein 'sequence space' or not. This paper also presented two statistical measures for protein: <it>gre.k </it>(generalized relative entropy) and <it>gsm.k </it>(gapped similarity measure). Results We tested statistical measures based on protein 'sequence space' or not with three data sets. This not only offers the systematic and quantitative experimental assessment of these statistical measures, but also naturally complements the available comparison of statistical measures based on protein sequence. Moreover, we compared our statistical measures with alignment-based measures and the existing statistical measures. The experiments were grouped into two sets. The first one, performed via ROC (Receiver Operating Curve) analysis, aims at assessing the intrinsic ability of the statistical measures to discriminate and classify protein sequences. The second set of the experiments aims at assessing how well our measure does in phylogenetic analysis. Based on the experiments, several conclusions can be drawn and, from them, novel valuable guidelines for the use of protein 'sequence space' and statistical measures were obtained. Conclusion Alignment-based measures have a clear advantage when the data is high redundant. The more efficient statistical measure is the novel <it>gsm.k </it>introduced by this article, the <it>cos.k </it>followed. When the data becomes less redundant, <it>gre.k </it>proposed by us achieves a better performance, but all the other measures perform poorly on classification tasks. Almost all the statistical measures achieve improvement by exploring the information on 'sequence space' as word's length increases, especially for less redundant data. The reasonable results of phylogenetic analysis confirm that <it>Gdis.k </it>based on 'sequence space' is a reliable measure for phylogenetic analysis. In summary, our quantitative analysis verifies that exploring the information on 'sequence space' is a promising way to improve the abilities of statistical measures for protein comparison.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Tableau-based protein substructure search using quadratic programming

Author: A Abyzov
A Caprara
A Caprara
A Guerler
A Harrison
AG Murzin
Alex Stivala
AM Lesk
Anthony Wirth
AP Kamat
AP Singh
AS Konagurthu
AS Konagurthu
B Kolbeck
B Thiruv
BK Koo
D Fischer
D Frishman
D Gilbert
DA Pelta
E Anderson
E Krissinel
GM Torrance
HK Ho
HM Berman
I Majumdar
J Jung
J Shapiro
JA Casbon
JA Hanley
JF Gibrat
JJ Dongarra
L Holm
ML Sierk
O Carugo
Peter J Stuckey
PR Elliott
S Kirillova
S Shi
SB Needleman
SS Krishna
T Hamelryck
T Madej
T Sing
TA Davis
TA Davis
TA Davis
TA Davis
V Sam
W Kabsch
W Xie
Y Ye
Y Ye
Y Ye
Z Gáspári
Z Li
Publication venue: BioMed Central
Publication date: 01/05/2009
Field of study

Abstract Background Searching for proteins that contain similar substructures is an important task in structural biology. The exact solution of most formulations of this problem, including a recently published method based on tableaux, is too slow for practical use in scanning a large database. Results We developed an improved method for detecting substructural similarities in proteins using tableaux. Tableaux are compared efficiently by solving the quadratic program (QP) corresponding to the quadratic integer program (QIP) formulation of the extraction of maximally-similar tableaux. We compare the accuracy of the method in classifying protein folds with some existing techniques. Conclusion We find that including constraints based on the separation of secondary structure elements increases the accuracy of protein structure search using maximally-similar subtableau extraction, to a level where it has comparable or superior accuracy to existing techniques. We demonstrate that our implementation is able to search a structural database in a matter of hours on a standard PC.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

University of Melbourne Institutional Repository

Nh3D: A reference dataset of non-homologous protein structures

Author: Quon G
Saldanha SA
Steipe B
Thiruv B
Publication venue
Publication date: 27/03/2018
Field of study

Abstract Background The statistical analysis of protein structures requires datasets in which structural features can be considered independently distributed, i.e. not related through common ancestry, and that fulfil minimal requirements regarding the experimental quality of the structures it contains. However, non-redundant datasets based on sequence similarity invariably contain distantly related homologues. Here we provide a reference dataset of non-homologous protein domains, assuming that structural dissimilarity at the topology level is incompatible with recognizable common ancestry. The dataset is based on domains at the Topology level of the CATH database which hierarchically classifies all protein structures. It contains the best refined representatives of each Topology level, validates structural dissimilarity and removes internally duplicated fragments. The compilation of Nh3D is fully scripted. Results The current Nh3D list contains 570 domains with a total of 90780 residues. It covers more than 70% of folds at the Topology level of the CATH database and represents more than 90% of the structures in the PDB that have been classified by CATH. We observe that even though all protein pairs are structurally dissimilar, some pairwise sequence identities after global alignment are greater than 30%. Conclusion Nh3D is freely available as a reference dataset for the statistical analysis of sequence and structure features of proteins in the PDB. Regularly updated versions of Nh3D and the corresponding PDB-formatted coordinate sets are accessible from our Web site http://www.schematikon.org

University of Toronto Research Repository

schematikon: Detailed Sequence-Structure Relationships from Mining a Non-redundant Protein Structure Database

Author: B. Steipe
B. Steipe
B. Thiruv
C. Bystroff
C.A. Orengo
D. Frishman
D.S. Marks
E.F. Pettersen
J.S. Richardson
S. Brin
T.J. Lane
W. Kabsch
X. Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Crossref

Minimotif Miner, a tool for investigating protein function

Author: A Bairoch
A Bateman
A Kloczkowski
A Kreegipuu
B Thiruv
Bryan Piccirillo
Chun-Hsi Huang
DL Wheeler
G Tzivion
H Naderi-Manesh
Jacob J del Campo
JC Obenauer
Jessica H Shinn
JJ delCampo
KD Pruitt
Mark W Maciejewski
Martin R Schiller
Michael R Gryk
N Blom
N Blom
P Puntervoll
Sanguthevar Rajasekaran
Snigdha Verma
Stanley R Schiller
Sudha Balla
Tanaz Faghri
ThaiBinh Luong
Vishal Thapar
WA Mohler
William A Mohler
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2006
Field of study

In addition to large domains, many short motifs mediate functional post-translational modification of proteins as well as protein-protein interactions and protein trafficking functions. We have constructed a motif database comprising 312 unique motifs and a web-based tool for identifying motifs in proteins. Functional motifs predicted by MnM can be ranked by several approaches, and we validated these scores by analyzing thousands of confirmed examples and by confirming prediction of previously unidentified 14-3-3 motifs in EFF-1

Crossref

University of Nevada, Las Vegas Repository

Arbitrary protein−protein docking targets biologically relevant interfaces

Abstract Background Protein-protein recognition is of fundamental importance in the vast majority of biological processes. However, it has already been demonstrated that it is very hard to distinguish true complexes from false complexes in so-called cross-docking experiments, where binary protein complexes are separated and the isolated proteins are all docked against each other and scored. Does this result, at least in part, reflect a physical reality? False complexes could reflect possible nonspecific or weak associations. Results In this paper, we investigate the twilight zone of protein-protein interactions, building on an interesting outcome of cross-docking experiments: false complexes seem to favor residues from the true interaction site, suggesting that randomly chosen partners dock in a non-random fashion on protein surfaces. Here, we carry out arbitrary docking of a non-redundant data set of 198 proteins, with more than 300 randomly chosen "probe" proteins. We investigate the tendency of arbitrary partners to aggregate at localized regions of the protein surfaces, the shape and compositional bias of the generated interfaces, and the potential of this property to predict biologically relevant binding sites. We show that the non-random localization of arbitrary partners after protein-protein docking is a generic feature of protein structures. The interfaces generated in this way are not systematically planar or curved, but tend to be closer than average to the center of the proteins. These results can be used to predict biological interfaces with an AUC value up to 0.69 alone, and 0.72 when used in combination with evolutionary information. An appropriate choice of random partners and number of docking models make this method computationally practical. It is also noted that nonspecific interfaces can point to alternate interaction sites in the case of proteins with multiple interfaces. We illustrate the usefulness of arbitrary docking using PEBP (Phosphatidylethanolamine binding protein), a kinase inhibitor with multiple partners. Conclusions An approach using arbitrary docking, and based solely on physical properties, can successfully identify biologically pertinent protein interfaces.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Polymer Uncrossing and Knotting in Protein Folding, and Their Role in Minimal Folding Pathways

Author: A Das
A Ferguson
A Mallam
A Nordlund
A van der Vaart
A van Roon
AD Schuyler
AL Mallam
AL Mallam
Ali R. Mohazab
AM Gutin
AR Fersht
AR Fersht
AR Mohazab
AR Mohazab
AR Mohazab
AY Istomin
B Oztop
B Thiruv
BA Shoemaker
BG Wensley
C Bodenreider
C Clementi
C Clementi
C Clementi
C Dellago
C Kayatekin
CC Adams
CD Snow
CJ Cerjan
CM Dobson
D Baker
D Branduardi
D Bölinger
DJ Wales
DJ Wales
DK Klimov
DN Ivankov
DR Flower
DW Farrell
EA Coutsias
EA Coutsias
F Ding
F Khatib
G Favrin
G Hummer
G Hummer
G Koczyk
G Kolesov
GR Kneller
H Maity
H Nymeyer
H Nymeyer
H Nymeyer
H Yang
H Zhou
HS Chan
HS Chan
HS Chung
I Byeon
J Banavar
J Martinez
J Shea
JE Shea
JI Su lkowska
JI Su lkowska
JK Noel
JN Onuchic
K Koniaris
K Koniaris
K Lindorff-Larsen
KBZ Stefan Wallin
KV Andersen
KW Plaxco
KW Plaxco
L Maragliano
LL Chavez
M Gromiha
M Lal
M Lindberg
M Oliveberg
M Oliveberg
MC Prentiss
MK Kim
MK Kim
ML Connolly
ML Mansfield
ML Mansfield
MM Gromiha
MR Ejtehadi
N Madras
NP King
NP King
OV Galzitskaya
OV Galzitskaya
P Maragakis
P Virnau
P Virnau
P Weinkam
PG Bolhuis
PG Wolynes
PG Wolynes
R Du
R Elber
R Potestio
RB Best
RC Lua
S Bell
S Cavagnero
S Fischer
S Gianni
S Koyama
S Wells
SK Nechaev
SS Cho
SS Plotkin
SS Plotkin
SS Plotkin
SS Plotkin
SS Plotkin
SS Plotkin
SS Plotkin
Steven S. Plotkin
SW Englander
T Ternström
TR Sosnick
TR Weikl
TS Norcross
V Daggett
VI Abkevich
VS Pande
W Kabsch
W Kabsch
WA Eaton
WG Krebs
WR Taylor
Y Zhang
Y Zhou
Yaakov Koby Levy
Publication venue: 'Public Library of Science (PLoS)'
Publication date
Field of study

Crossref