Search CORE

247 research outputs found

Selective prediction of interaction sites in protein structures with THEMATICS

Author: Ko Jaeju
Murga Leonel F
Ondrechen Mary Jo
Wei Ying
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Methods are now available for the prediction of interaction sites in protein 3D structures. While many of these methods report high success rates for site prediction, often these predictions are not very selective and have low precision. Precision in site prediction is addressed using Theoretical Microscopic Titration Curves (THEMATICS), a simple computational method for the identification of active sites in enzymes. Recall and precision are measured and compared with other methods for the prediction of catalytic sites. Results Using a test set of 169 enzymes from the original Catalytic Residue Dataset (CatRes) it is shown that THEMATICS can deliver precise, localised site predictions. Furthermore, adjustment of the cut-off criteria can improve the recall rates for catalytic residues with only a small sacrifice in precision. Recall rates for CatRes/CSA annotated catalytic residues are 41.1%, 50.4%, and 54.2% for Z score cut-off values of 1.00, 0.99, and 0.98, respectively. The corresponding precision rates are 19.4%, 17.9%, and 16.4%. The success rate for catalytic sites is higher, with correct or partially correct predictions for 77.5%, 85.8%, and 88.2% of the enzymes in the test set, corresponding to the same respective Z score cut-offs, if only the CatRes annotations are used as the reference set. Incorporation of additional literature annotations into the reference set gives total success rates of 89.9%, 92.9%, and 94.1%, again for corresponding cut-off values of 1.00, 0.99, and 0.98. False positive rates for a 75-protein test set are 1.95%, 2.60%, and 3.12% for Z score cut-offs of 1.00, 0.99, and 0.98, respectively. Conclusion With a preferred cut-off value of 0.99, THEMATICS achieves a high success rate of interaction site prediction, about 86% correct or partially correct using CatRes/CSA annotations only and about 93% with an expanded reference set. Success rates for catalytic residue prediction are similar to those of other structure-based methods, but with substantially better precision and lower false positive rates. THEMATICS performs well across the spectrum of E.C. classes. The method requires only the structure of the query protein as input. THEMATICS predictions may be obtained via the web from structures in PDB format at: <url>http://pfweb.chem.neu.edu/thematics/submit.html</url></p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Partial Order Optimum Likelihood (POOL): Maximum Likelihood Prediction of Protein Active Site Residues Using 3D Structure and Sequence Properties

Author: A Gutteridge
A Shulman-Peleg
AH Elcock
AP Bradley
C Enroth
CT Porter
D Ming
E Youn
F Glaser
F Wilcoxon
G Amitai
G Cheng
GJ Bartlett
J Ko
J Liang
JD Madura
L Xie
Leonel F. Murga
LF Murga
M Ota
M Silberstein
Mary Jo Ondrechen
Michael Levitt
MJ Best
MJ Ondrechen
MK Gilson
N Petrova
P Domingos
R Edgar
R Greaves
RA Laskowski
RA Laskowski
Ronald J. Williams
T Robertson
TA Binkowski
W Tong
W Tong
Wenxu Tong
Y Wei
Y Wei
Ying Wei
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

A new monotonicity-constrained maximum likelihood approach, called Partial Order Optimum Likelihood (POOL), is presented and applied to the problem of functional site prediction in protein 3D structures, an important current challenge in genomics. The input consists of electrostatic and geometric properties derived from the 3D structure of the query protein alone. Sequence-based conservation information, where available, may also be incorporated. Electrostatics features from THEMATICS are combined with multidimensional isotonic regression to form maximum likelihood estimates of probabilities that specific residues belong to an active site. This allows likelihood ranking of all ionizable residues in a given protein based on THEMATICS features. The corresponding ROC curves and statistical significance tests demonstrate that this method outperforms prior THEMATICS-based methods, which in turn have been shown previously to outperform other 3D-structure-based methods for identifying active site residues. Then it is shown that the addition of one simple geometric property, the size rank of the cleft in which a given residue is contained, yields improved performance. Extension of the method to include predictions of non-ionizable residues is achieved through the introduction of environment variables. This extension results in even better performance than THEMATICS alone and constitutes to date the best functional site predictor based on 3D structure only, achieving nearly the same level of performance as methods that use both 3D structure and sequence alignment data. Finally, the method also easily incorporates such sequence alignment data, and when this information is included, the resulting method is shown to outperform the best current methods using any combination of sequence alignments and 3D structures. Included is an analysis demonstrating that when THEMATICS features, cleft size rank, and alignment-based conservation scores are used individually or in combination THEMATICS features represent the single most important component of such classifiers

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Structure- and context-based analysis of the GxGYxYP family reveals a new putative class of glycoside hydrolase.

Author: Chang Yuanyuan
Eberhardt Ruth Y
Gilbert Harry J
Godzik Adam
Rigden Daniel J
Xu Qingping
Publication venue: eScholarship, University of California
Publication date: 01/06/2014
Field of study

BackgroundGut microbiome metagenomics has revealed many protein families and domains found largely or exclusively in that environment. Proteins containing the GxGYxYP domain are over-represented in the gut microbiota, and are found in Polysaccharide Utilization Loci in the gut symbiont Bacteroides thetaiotaomicron, suggesting their involvement in polysaccharide metabolism, but little else is known of the function of this domain.ResultsGenomic context and domain architecture analyses support a role for the GxGYxYP domain in carbohydrate metabolism. Sparse occurrences in eukaryotes are the result of lateral gene transfer. The structure of the GxGYxYP domain-containing protein encoded by the BT2193 locus reveals two structural domains, the first composed of three divergent repeats with no recognisable homology to previously solved structures, the second a more familiar seven-stranded β/α barrel. Structure-based analyses including conservation mapping localise a presumed functional site to a cleft between the two domains of BT2193. Matching to a catalytic site template from a GH9 cellulase and other analyses point to a putative catalytic triad composed of Glu272, Asp331 and Asp333.ConclusionsWe suggest that GxGYxYP-containing proteins constitute a novel glycoside hydrolase family of as yet unknown specificity

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

E1DS: catalytic site prediction based on 1D signatures of concurrent conservation

Author: Altschul
Bartlett
C.-M. Hsu
C.-Y. Chen
Chandonia
Cheng
D. T.-H. Chang
Dundas
Hsu
Hulo
Jonassen
Jones
Jones
Kasuya
Lichtarge
Liu
Petrova
Porter
Puntervoll
Rigoutsos
Sheu
T.-Y. Chien
Thompson
Tian
Torrance
Watson
Wei
Y.-Z. Weng
Publication venue: Oxford University Press
Publication date: 02/06/2010
Field of study

Large-scale automatic annotation of protein sequences remains challenging in postgenomics era. E1DS is designed for annotating enzyme sequences based on a repository of 1D signatures. The employed sequence signatures are derived using a novel pattern mining approach that discovers long motifs consisted of several sequential blocks (conserved segments). Each of the sequential blocks is considerably conserved among the protein members of an EC group. Moreover, a signature includes at least three sequential blocks that are concurrently conserved, i.e. frequently observed together in sequences. In other words, a sequence signature is consisted of residues from multiple regions of the protein sequence, which echoes the observation that an enzyme catalytic site is usually constituted of residues that are largely separated in the sequence. E1DS currently contains 5421 sequence signatures that in total cover 932 4-digital EC numbers. E1DS is evaluated based on a collection of enzymes with catalytic sites annotated in Catalytic Site Atlas. When compared to the famous pattern database PROSITE, predictions based on E1DS signatures are considered more sensitive in identifying catalytic sites and the involved residues. E1DS is available at http://e1ds.ee.ncku.edu.tw/ and a mirror site can be found at http://e1ds.csbb.ntu.edu.tw/

Crossref

PubMed Central

National Taiwan University Repository

On the Structural Context and Identification of Enzyme Catalytic Residues

Author
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2013
Field of study

Crossref

Automatic prediction of catalytic residues by modeling residue structural neighborhood

Author: A Ceroni
A Humm
A Yamaguchi
AC Wallace
AE Todd
Andrea Passerini
CT Porter
E Chea
E Webb
E Youn
EF Pettersen
Elisa Cilia
G Amitai
G Bartlett
J Bernardes
J Davis
J Ebert
J Mistry
JA Capra
JC Nebel
JD Fischer
KM Borgwardt
L Xie
M Babor
M Lippi
M Ondrechen
MM Benning
N Cristianini
N Nagano
N Shu
NV Petrova
P Gherardini
RD Finn
S Kawashima
SF Altschul
T Joachims
T Zhang
W Tong
WS Valdar
Y Tang
Y Wei
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Background: Prediction of catalytic residues is a major step in characterizing the function of enzymes. In its simpler formulation, the problem can be cast into a binary classification task at the residue level, by predicting whether the residue is directly involved in the catalytic process. The task is quite hard also when structural information is available, due to the rather wide range of roles a functional residue can play and to the large imbalance between the number of catalytic and non-catalytic residues.Results: We developed an effective representation of structural information by modeling spherical regions around candidate residues, and extracting statistics on the properties of their content such as physico-chemical properties, atomic density, flexibility, presence of water molecules. We trained an SVM classifier combining our features with sequence-based information and previously developed 3D features, and compared its performance with the most recent state-of-the-art approaches on different benchmark datasets. We further analyzed the discriminant power of the information provided by the presence of heterogens in the residue neighborhood.Conclusions: Our structure-based method achieves consistent improvements on all tested datasets over both sequence-based and structure-based state-of-the-art approaches. Structural neighborhood information is shown to be responsible for such results, and predicting the presence of nearby heterogens seems to be a promising direction for further improvements.Journal ArticleResearch Support, N.I.H. Extramuralinfo:eu-repo/semantics/publishe

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

DI-fusion

Active site prediction using evolutionary and structural information

Author: Aloy
Alterovitz
Altschul
Apweiler
Bagley
Baker
Bartlett
Bate
Berna
Brady
Capra
Casari
Chandonia
Davis
Edgar
Elcock
Fei Sha
Felsenstein
Fetrow
Fischer
Frey
George
Greenshtein
Gutteridge
Hastie
Hedstrom
Hedstrom
Henikoff
Hoggart
Hosmer
Huang
Hubbard
Innis
Jack F. Kirsch
Kabsch
Kimmen Sjölander
Koh
Kraut
Krem
Landau
Landgraf
Laurie
Lichtarge
Lin
Mayrose
McGrath
Michael I. Jordan
Mihalek
Mooney
Murzin
Ondrechen
Ota
Panchenko
Pazos
Peters
Petrova
Polgar
Porter
Richardson
Sankararaman
Segal
Shevade
Sriram Sankararaman
Tibshirani
Tong
van de Geer
Vàrallyay
Youn
Zhao
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Motivation: The identification of catalytic residues is a key step in understanding the function of enzymes. While a variety of computational methods have been developed for this task, accuracies have remained fairly low. The best existing method exploits information from sequence and structure to achieve a precision (the fraction of predicted catalytic residues that are catalytic) of 18.5% at a corresponding recall (the fraction of catalytic residues identified) of 57% on a standard benchmark. Here we present a new method, Discern, which provides a significant improvement over the state-of-the-art through the use of statistical techniques to derive a model with a small set of features that are jointly predictive of enzyme active sites

CiteSeerX

Crossref

PubMed Central

eScholarship - University of California

Active Site Detection by Spatial Conformity and Electrostatic Analysis—Unravelling a Proteolytic Function in Shrimp Alkaline Phosphatase

Author: Bhattacharjee Swapan K.
Chakraborty Sandeep
Minda Renu
Rao Basuthkar J.
Salaye Lipika
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Computational methods are increasingly gaining importance as an aid in identifying active sites. Mostly these methods tend to have structural information that supplement sequence conservation based analyses. Development of tools that compute electrostatic potentials has further improved our ability to better characterize the active site residues in proteins. We have described a computational methodology for detecting active sites based on structural and electrostatic conformity - CataLytic Active Site Prediction (CLASP). In our pipelined model, physical 3D signature of any particular enzymatic function as defined by its active sites is used to obtain spatially congruent matches. While previous work has revealed that catalytic residues have large pKa deviations from standard values, we show that for a given enzymatic activity, electrostatic potential difference (PD) between analogous residue pairs in an active site taken from different proteins of the same family are similar. False positives in spatially congruent matches are further pruned by PD analysis where cognate pairs with large deviations are rejected. We first present the results of active site prediction by CLASP for two enzymatic activities - β-lactamases and serine proteases, two of the most extensively investigated enzymes. The results of CLASP analysis on motifs extracted from Catalytic Site Atlas (CSA) are also presented in order to demonstrate its ability to accurately classify any protein, putative or otherwise, with known structure. The source code and database is made available at www.sanchak.com/clasp/. Subsequently, we probed alkaline phosphatases (AP), one of the well known promiscuous enzymes, for additional activities. Such a search has led us to predict a hitherto unknown function of shrimp alkaline phosphatase (SAP), where the protein acts as a protease. Finally, we present experimental evidence of the prediction by CLASP by showing that SAP indeed has protease activity in vitro

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

Novel Feature for Catalytic Protein Residues Reflecting Interactions with Other Residues

Author: A BenShimon
A del Sol
A Gutteridge
A Stark
B Sterner
BK Shoichet
C Gabor
CT Porter
D La
DJ Watts
E Chea
E Youn
F Pazos
G Amitai
G Bagler
G Cheng
G Pugalenthi
GJ Bartlett
Gongbing Li
Hui Yin
I Guyon
J Ko
JA Capra
JD Fischer
Jiamin Xiao
JM Kleinberg
JW Torrance
K Goyal
KV Brinda
LH Greene
M Vendruscolo
M Vendruscolo
Mei Hu
MEJ Newman
Menglong Li
MFE Hall
MJ Ondrechen
N Tokuriki
NV Dokholyan
NV Petrova
PP Wangikar
RS Burt
S SacquinMora
S Sankararaman
SF Altschul
T Ikura
T Zhang
Vladimir Uversky
W Kabsch
WX Tong
Yizhou Li
YR Tang
Zhining Wen
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Owing to their potential for systematic analysis, complex networks have been widely used in proteomics. Representing a protein structure as a topology network provides novel insight into understanding protein folding mechanisms, stability and function. Here, we develop a new feature to reveal correlations between residues using a protein structure network. In an original attempt to quantify the effects of several key residues on catalytic residues, a power function was used to model interactions between residues. The results indicate that focusing on a few residues is a feasible approach to identifying catalytic residues. The spatial environment surrounding a catalytic residue was analyzed in a layered manner. We present evidence that correlation between residues is related to their distance apart most environmental parameters of the outer layer make a smaller contribution to prediction and ii catalytic residues tend to be located near key positions in enzyme folds. Feature analysis revealed satisfactory performance for our features, which were combined with several conventional features in a prediction model for catalytic residues using a comprehensive data set from the Catalytic Site Atlas. Values of 88.6 for sensitivity and 88.4 for specificity were obtained by 10fold crossvalidation. These results suggest that these features reveal the mutual dependence of residues and are promising for further study of structurefunction relationship

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central