Search CORE

Public Library of Science (PLOS)

Networks of High Mutual Information Define the Structural Proximity of Catalytic Sites: Implications for Catalytic Residue Identification

Author: A Rausell
B Sterner
Burkhard Rost
CA Innis
CE Shannon
CM Buslje
Cristina Marino Buslje
CT Porter
D Kristensen
D Leys
E Cilia
Elin Teppa
GB Gloor
GJ Bartlett
I Mihalek
J Bernardes
J Manning
J Swets
JE Donald
José María Delfino
L Byung-Chul
M Nielsen
Morten Nielsen
N Petrova
O Lichtarge
R Alterovitz
R Gouveia-Oliveira
R Matthew Ward
RD Finn
RK Kuipers
S Chakrabarti
S Erdin
S Sankararaman
S Sankararaman
SD Dunn
SF Altschul
SW Lockless
T Zhang
T-Y Chien
TM Cover
Tomas Di Doménico
W Tong
Y-R Tang
Z Shi
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Identification of catalytic residues (CR) is essential for the characterization of enzyme function. CR are, in general, conserved and located in the functional site of a protein in order to attain their function. However, many non-catalytic residues are highly conserved and not all CR are conserved throughout a given protein family making identification of CR a challenging task. Here, we put forward the hypothesis that CR carry a particular signature defined by networks of close proximity residues with high mutual information (MI), and that this signature can be applied to distinguish functional from other non-functional conserved residues. Using a data set of 434 Pfam families included in the catalytic site atlas (CSA) database, we tested this hypothesis and demonstrated that MI can complement amino acid conservation scores to detect CR. The Kullback-Leibler (KL) conservation measurement was shown to significantly outperform both the Shannon entropy and maximal frequency measurements. Residues in the proximity of catalytic sites were shown to be rich in shared MI. A structural proximity MI average score (termed pMI) was demonstrated to be a strong predictor for CR, thus confirming the proposed hypothesis. A structural proximity conservation average score (termed pC) was also calculated and demonstrated to carry distinct information from pMI. A catalytic likeliness score (Cls), combining the KL, pC and pMI measures, was shown to lead to significantly improved prediction accuracy. At a specificity of 0.90, the Cls method was found to have a sensitivity of 0.816. In summary, we demonstrate that networks of residues with high MI provide a distinct signature on CR and propose that such a signature should be present in other classes of functional residues where the requirement to maintain a particular function places limitations on the diversification of the structural environment along the course of evolution

CiteSeerX

Online Research Database In Technology

Prediction of protein binding sites in protein structures using hidden Markov support vector machine

Author: A Henschel
A Koike
A Kouranov
A Porollo
A Rossi
AJ Bordner
B Wang
Bin Liu
Buzhou Tang
C Chothia
C Yan
C Yan
C-T Chen
C-W Cheng
H Chen
H Kim
H Neuvirth
H-X Zhou
HX Zhou
I Ezkurdia
I Res
I Tsochantaridis
I Tsochantaridis
J Lafferty
J Song
J Song
J-L Chung
JD Fischer
JL Chung
JR Bradford
JW Torrance
K Henrick
L Holm
L Lo Conte
L Wang
Lei Lin
LR Rabiner
M Gribskov
M Vincent
M Šikić
MH Li
N Li
NJ Burgoyne
P Fariselli
Q Dong
Qiwen Dong
S Ahmad
S Liang
S Qin
SF Altschul
SF Altschul
T Joachims
T Zhang
TH Dang
W Kabsch
WK Kim
X-w Chen
Xiaolong Wang
Xuan Wang
Y Altun
Y Liu
Y Ofran
Y Ofran
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Predicting the binding sites between two interacting proteins provides important clues to the function of a protein. Recent research on protein binding site prediction has been mainly based on widely known machine learning techniques, such as artificial neural networks, support vector machines, conditional random field, etc. However, the prediction performance is still too low to be used in practice. It is necessary to explore new algorithms, theories and features to further improve the performance. Results In this study, we introduce a novel machine learning model hidden Markov support vector machine for protein binding site prediction. The model treats the protein binding site prediction as a sequential labelling task based on the maximum margin criterion. Common features derived from protein sequences and structures, including protein sequence profile and residue accessible surface area, are used to train hidden Markov support vector machine. When tested on six data sets, the method based on hidden Markov support vector machine shows better performance than some state-of-the-art methods, including artificial neural networks, support vector machines and conditional random field. Furthermore, its running time is several orders of magnitude shorter than that of the compared methods. Conclusion The improved prediction performance and computational efficiency of the method based on hidden Markov support vector machine can be attributed to the following three factors. Firstly, the relation between labels of neighbouring residues is useful for protein binding site prediction. Secondly, the kernel trick is very advantageous to this field. Thirdly, the complexity of the training step for hidden Markov support vector machine is linear with the number of training samples by using the cutting-plane algorithm.</p

ScholarBank@NUS

CATH functional families predict functional sites in proteins

Author: Das S
Orengo C
Scholes HM
Sen N
Publication venue
Publication date: 02/11/2020
Field of study

MOTIVATION: Identification of functional sites in proteins is essential for functional characterization, variant interpretation and drug design. Several methods are available for predicting either a generic functional site, or specific types of functional site. Here, we present FunSite, a machine learning predictor that identifies catalytic, ligand-binding and protein-protein interaction functional sites using features derived from protein sequence and structure, and evolutionary data from CATH functional families (FunFams). RESULTS: FunSite's prediction performance was rigorously benchmarked using cross-validation and a holdout dataset. FunSite outperformed other publicly-available functional site prediction methods. We show that conserved residues in FunFams are enriched in functional sites. We found FunSite's performance depends greatly on the quality of functional site annotations and the information content of FunFams in the training data. Finally, we analyse which structural and evolutionary features are most predictive for functional sites. AVAILABILITY: https://github.com/UCL/cath-funsite-predictor. CONTACT: [email protected]. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

UCL Discovery

Knowledge-based annotation of small molecule binding sites in proteins

Author: Bryant Stephen H
Madej Thomas
Panchenko Anna R
Shoemaker Benjamin A
Thangudu Ratna R
Tyagi Manoj
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background The study of protein-small molecule interactions is vital for understanding protein function and for practical applications in drug discovery. To benefit from the rapidly increasing structural data, it is essential to improve the tools that enable large scale binding site prediction with greater emphasis on their biological validity. Results We have developed a new method for the annotation of protein-small molecule binding sites, using inference by homology, which allows us to extend annotation onto protein sequences without experimental data available. To ensure biological relevance of binding sites, our method clusters similar binding sites found in homologous protein structures based on their sequence and structure conservation. Binding sites which appear evolutionarily conserved among non-redundant sets of homologous proteins are given higher priority. After binding sites are clustered, position specific score matrices (PSSMs) are constructed from the corresponding binding site alignments. Together with other measures, the PSSMs are subsequently used to rank binding sites to assess how well they match the query and to better gauge their biological relevance. The method also facilitates a succinct and informative representation of observed and inferred binding sites from homologs with known three-dimensional structures, thereby providing the means to analyze conservation and diversity of binding modes. Furthermore, the chemical properties of small molecules bound to the inferred binding sites can be used as a starting point in small molecule virtual screening. The method was validated by comparison to other binding site prediction methods and to a collection of manually curated binding site annotations. We show that our method achieves a sensitivity of 72% at predicting biologically relevant binding sites and can accurately discriminate those sites that bind biological small molecules from non-biological ones. Conclusions A new algorithm has been developed to predict binding sites with high accuracy in terms of their biological validity. It also provides a common platform for function prediction, knowledge-based docking and for small molecule virtual screening. The method can be applied even for a query sequence without structure. The method is available at <url>http://www.ncbi.nlm.nih.gov/Structure/ibis/ibis.cgi</url>.</p

An Accurate Method for Prediction of Protein-Ligand Binding Site on Protein Surface Using SVM and Statistical Depth Function

Author
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2013
Field of study

Automatic prediction of catalytic residues by modeling residue structural neighborhood

Author: A Ceroni
A Humm
A Yamaguchi
AC Wallace
AE Todd
Andrea Passerini
CT Porter
E Chea
E Webb
E Youn
EF Pettersen
Elisa Cilia
G Amitai
G Bartlett
J Bernardes
J Davis
J Ebert
J Mistry
JA Capra
JC Nebel
JD Fischer
KM Borgwardt
L Xie
M Babor
M Lippi
M Ondrechen
MM Benning
N Cristianini
N Nagano
N Shu
NV Petrova
P Gherardini
RD Finn
S Kawashima
SF Altschul
T Joachims
T Zhang
W Tong
WS Valdar
Y Tang
Y Wei
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Background: Prediction of catalytic residues is a major step in characterizing the function of enzymes. In its simpler formulation, the problem can be cast into a binary classification task at the residue level, by predicting whether the residue is directly involved in the catalytic process. The task is quite hard also when structural information is available, due to the rather wide range of roles a functional residue can play and to the large imbalance between the number of catalytic and non-catalytic residues.Results: We developed an effective representation of structural information by modeling spherical regions around candidate residues, and extracting statistics on the properties of their content such as physico-chemical properties, atomic density, flexibility, presence of water molecules. We trained an SVM classifier combining our features with sequence-based information and previously developed 3D features, and compared its performance with the most recent state-of-the-art approaches on different benchmark datasets. We further analyzed the discriminant power of the information provided by the presence of heterogens in the residue neighborhood.Conclusions: Our structure-based method achieves consistent improvements on all tested datasets over both sequence-based and structure-based state-of-the-art approaches. Structural neighborhood information is shown to be responsible for such results, and predicting the presence of nearby heterogens seems to be a promising direction for further improvements.Journal ArticleResearch Support, N.I.H. Extramuralinfo:eu-repo/semantics/publishe

DI-fusion

Many Local Pattern Texture Features: Which Is Better for Image-Based Multilabel Human Protein Subcellular Localization Classification?

Author: Fan Yang
Hong-Bin Shen
Ying-Ying Xu
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2014
Field of study

Human protein subcellular location prediction can provide critical knowledge for understanding a protein’s function. Since significant progress has been made on digital microscopy, automated image-based protein subcellular location classification is urgently needed. In this paper, we aim to investigate more representative image features that can be effectively used for dealing with the multilabel subcellular image samples. We prepared a large multilabel immunohistochemistry (IHC) image benchmark from the Human Protein Atlas database and tested the performance of different local texture features, including completed local binary pattern, local tetra pattern, and the standard local binary pattern feature. According to our experimental results from binary relevance multilabel machine learning models, the completed local binary pattern, and local tetra pattern are more discriminative for describing IHC images when compared to the traditional local binary pattern descriptor. The combination of these two novel local pattern features and the conventional global texture features is also studied. The enhanced performance of final binary relevance classification model trained on the combined feature space demonstrates that different features are complementary to each other and thus capable of improving the accuracy of classification

Catalytic residues in hydrolases: analysis of methods designed for ligand-binding site prediction

Author: A Armon
A Bhinge
A Eichinger
A Gutteridge
A Pingoud
A Shulman-Peleg
A Stark
A Stark
A Stark
AA Bliznyuk
AC Stuart
AC Wallace
AH Elcock
AJ Chalk
ATR Laurie
ATR Laurie
B Huang
B Lee
B Zhang
C Taroni
CA Orengo
CM Seibert
CT Porter
D Pantoja-Uceda
DG Levitt
DJ Vocadlo
DT-H Chang
E Kellenberger
E Youn
FX Gomis-Rüth
G Nimrod
G Pugalenthi
GG Hammes
GJ Bartlett
GJ Kleywegt
GL Holliday
GL Holliday
GP Brady
H Yao
HM Berman
I Botos
Irena Roterman
J An
J An
J Dundas
J Liang
J Teyra
J Weigelt
J-M Chandonia
JA Barker
JM Yon
K Henrick
K Katayanagi
K Kinoshita
K Kinoshita
K Stummeyer
K Zhang
KA Snyder
Katarzyna Prymula
KP Peters
M Bryliński
M Grabowski
M Hendlich
M Jambon
M Jambon
M Kanehisa
M Landau
M Levitt
M Stahl
MA Kurowski
MJ Ondrechen
MP Liang
MR Landon
N Kallenbach
O Gileadi
O Goldenberg
O Lichtarge
O Lichtarge
P Aloy
P Baldi
P Reis
PJ Hajduk
PJ Hajduk
PP Wangikar
R Landgraf
RA Laskowski
RA Laskowski
RA Laskowski
RV Spriggs
S Madabushi
S Vajda
SE Brenner
T Fawcett
T Kortvelyesi
T Pupko
T Tadokoro
T Zhang
TA Binkowski
Tomasz Jadczyk
UniProt Consortium The Universal Protein Resource (UniProt)
V Siksnys
W Kabsch
Y Dou
Y Oda
Y Tsunaka
Y-R Tang
Publication venue: Springer Netherlands
Publication date: 01/01/2010
Field of study

The comparison of eight tools applicable to ligand-binding site prediction is presented. The methods examined cover three types of approaches: the geometrical (CASTp, PASS, Pocket-Finder), the physicochemical (Q-SiteFinder, FOD) and the knowledge-based (ConSurf, SuMo, WebFEATURE). The accuracy of predictions was measured in reference to the catalytic residues documented in the Catalytic Site Atlas. The test was performed on a set comprising selected chains of hydrolases. The results were analysed with regard to size, polarity, secondary structure, accessible solvent area of predicted sites as well as parameters commonly used in machine learning (F-measure, MCC). The relative accuracies of predictions are presented in the ROC space, allowing determination of the optimal methods by means of the ROC convex hull. Additionally the minimum expected cost analysis was performed. Both advantages and disadvantages of the eight methods are presented. Characterization of protein chains in respect to the level of difficulty in the active site prediction is introduced. The main reasons for failures are discussed. Overall, the best performance offers SuMo followed by FOD, while Pocket-Finder is the best method among the geometrical approaches

Jagiellonian Univeristy Repository

TANGLE: Two-Level Support Vector Regression Approach for Protein Backbone Torsion Angle Prediction from Primary Sequences

Author: A Schlessinger
A Schlessinger
A Schlessinger
AG de Brevern
B Rost
B Rost
B Rost
B Xue
C Bystroff
C Haynes
C Mooney
C Zhang
C Zheng
Christian Schönbach
D Xie
DT Jones
E Faraggi
E Faraggi
G Helles
Geoffrey I. Webb
GN Ramachandran
GP Raghava
H Zhang
H Zhang
Hao Tan
HJ Dyson
HS Kang
J Cheng
J Gao
J Gsponer
J Song
J Song
J Song
J Song
J Song
J Song
Jiangning Song
JJ Ward
JS Chauhan
K Chen
K Chen
K Chen
L Chen
L Kurgan
M Kumar
Mingjun Wang
MJ Mizianty
MJ Rooman
MJ Wood
MJ Wood
MK Kalita
MN Nguyen
MN Nguyen
MV Berjanskii
O Dor
O Dor
O Zimmermann
P Chen
P Kountouris
P Kountouris
P Sliz
PC Chen
R Gaudet
R Karchin
R Kuang
R Verma
S Ahmad
S Ahmad
S Liang
S Qiu
S Wu
S Wu
SF Altschul
T Ishida
T Zhang
T Zhang
Tatsuya Akutsu
V Vapnik
V Vapnik
W Kabsch
W Liu
W Zhang
X Miao
X Wang
XY Pan
Y Ofran
Y Ofran
YM Huang
Z Markovic-Housley
Z Yuan
Z Yuan
Z Yuan
Z Yuan
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

Protein backbone torsion angles (Phi) and (Psi) involve two rotation angles rotating around the Cα-N bond (Phi) and the Cα-C bond (Psi). Due to the planarity of the linked rigid peptide bonds, these two angles can essentially determine the backbone geometry of proteins. Accordingly, the accurate prediction of protein backbone torsion angle from sequence information can assist the prediction of protein structures. In this study, we develop a new approach called TANGLE (Torsion ANGLE predictor) to predict the protein backbone torsion angles from amino acid sequences. TANGLE uses a two-level support vector regression approach to perform real-value torsion angle prediction using a variety of features derived from amino acid sequences, including the evolutionary profiles in the form of position-specific scoring matrices, predicted secondary structure, solvent accessibility and natively disordered region as well as other global sequence features. When evaluated based on a large benchmark dataset of 1,526 non-homologous proteins, the mean absolute errors (MAEs) of the Phi and Psi angle prediction are 27.8° and 44.6°, respectively, which are 1% and 3% respectively lower than that using one of the state-of-the-art prediction tools ANGLOR. Moreover, the prediction of TANGLE is significantly better than a random predictor that was built on the amino acid-specific basis, with the p-value<1.46e-147 and 7.97e-150, respectively by the Wilcoxon signed rank test. As a complementary approach to the current torsion angle prediction algorithms, TANGLE should prove useful in predicting protein structural properties and assisting protein fold recognition by applying the predicted torsion angles as useful restraints. TANGLE is freely accessible at http://sunflower.kuicr.kyoto-u.ac.jp/~sjn/TANGLE/

Public Library of Science (PLOS)