Search CORE

16 research outputs found

Active site prediction using evolutionary and structural information

Author: Aloy
Alterovitz
Altschul
Apweiler
Bagley
Baker
Bartlett
Bate
Berna
Brady
Capra
Casari
Chandonia
Davis
Edgar
Elcock
Fei Sha
Felsenstein
Fetrow
Fischer
Frey
George
Greenshtein
Gutteridge
Hastie
Hedstrom
Hedstrom
Henikoff
Hoggart
Hosmer
Huang
Hubbard
Innis
Jack F. Kirsch
Kabsch
Kimmen Sjölander
Koh
Kraut
Krem
Landau
Landgraf
Laurie
Lichtarge
Lin
Mayrose
McGrath
Michael I. Jordan
Mihalek
Mooney
Murzin
Ondrechen
Ota
Panchenko
Pazos
Peters
Petrova
Polgar
Porter
Richardson
Sankararaman
Segal
Shevade
Sriram Sankararaman
Tibshirani
Tong
van de Geer
Vàrallyay
Youn
Zhao
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Motivation: The identification of catalytic residues is a key step in understanding the function of enzymes. While a variety of computational methods have been developed for this task, accuracies have remained fairly low. The best existing method exploits information from sequence and structure to achieve a precision (the fraction of predicted catalytic residues that are catalytic) of 18.5% at a corresponding recall (the fraction of catalytic residues identified) of 57% on a standard benchmark. Here we present a new method, Discern, which provides a significant improvement over the state-of-the-art through the use of statistical techniques to derive a model with a small set of features that are jointly predictive of enzyme active sites

CiteSeerX

Crossref

PubMed Central

eScholarship - University of California

SATCHMO-JS: a webserver for simultaneous protein multiple sequence alignment and phylogenetic tree construction.

Author: Datta Ruchira S
Davidson John R
Hagopian Raffi
Jarvis Glen R
Samad Bushra
Sjölander Kimmen
Publication venue: eScholarship, University of California
Publication date: 29/04/2010
Field of study

We present the jump-start simultaneous alignment and tree construction using hidden Markov models (SATCHMO-JS) web server for simultaneous estimation of protein multiple sequence alignments (MSAs) and phylogenetic trees. The server takes as input a set of sequences in FASTA format, and outputs a phylogenetic tree and MSA; these can be viewed online or downloaded from the website. SATCHMO-JS is an extension of the SATCHMO algorithm, and employs a divide-and-conquer strategy to jump-start SATCHMO at a higher point in the phylogenetic tree, reducing the computational complexity of the progressive all-versus-all HMM-HMM scoring and alignment. Results on a benchmark dataset of 983 structurally aligned pairs from the PREFAB benchmark dataset show that SATCHMO-JS provides a statistically significant improvement in alignment accuracy over MUSCLE, Multiple Alignment using Fast Fourier Transform (MAFFT), ClustalW and the original SATCHMO algorithm. The SATCHMO-JS webserver is available at http://phylogenomics.berkeley.edu/satchmo-js. The datasets used in these experiments are available for download at http://phylogenomics.berkeley.edu/satchmo-js/supplementary/

PubMed Central

eScholarship - University of California

Recommended from our members

The binding site distance test score: a robust method for the assessment of predicted protein binding sites

Author: Daniel B. Roche
Liam J. McGuffin
Lopez
Lopez
Matthews
Oh
Sankararaman
Soro
Stuart J. Tetchner
Wass
Publication venue: 'Oxford University Press (OUP)'
Publication date: 22/09/2010
Field of study

We propose a novel method for scoring the accuracy of protein binding site predictions – the Binding-site Distance Test (BDT) score. Recently, the Matthews Correlation Coefficient (MCC) has been used to evaluate binding site predictions, both by developers of new methods and by the assessors for the community wide prediction experiment – CASP8. Whilst being a rigorous scoring method, the MCC does not take into account the actual 3D location of the predicted residues from the observed binding site. Thus, an incorrectly predicted site that is nevertheless close to the observed binding site will obtain an identical score to the same number of nonbinding residues predicted at random. The MCC is somewhat affected by the subjectivity of determining observed binding residues and the ambiguity of choosing distance cutoffs. By contrast the BDT method produces continuous scores ranging between 0 and 1, relating to the distance between the predicted and observed residues. Residues predicted close to the binding site will score higher than those more distant, providing a better reflection of the true accuracy of predictions. The CASP8 function predictions were evaluated using both the MCC and BDT methods and the scores were compared. The BDT was found to strongly correlate with the MCC scores whilst also being less susceptible to the subjectivity of defining binding residues. We therefore suggest that this new simple score is a potentially more robust method for future evaluations of protein-ligand binding site predictions

Central Archive at the University of Reading

Crossref

Ortholog identification in the presence of domain architecture rearrangement

Author: Abascal
Addou
Altschul
Ashburner
Bairoch
Bateman
Bennett-Lovsey
Brown
Brown
Chen
Chen
Corpet
Delsuc
Dessimoz
Edgar
Eisen
G. M. Shoffner
Galperin
Gilks
Hahn
Hollich
Huelsenbeck
Jones
K. Sjolander
Kanehisa
Kaplan
Krishnamurthy
Kuzniar
Li
Meinel
O'Brien
Orengo
Pati
Pollard
Price
R. S. Datta
Saitou
Saitou
Schnoes
Servant
Sjolander
Sjolander
Sonnhammer
Storm
Storm
Tatusov
van der Heijden
Venter
Y. Shen
Zmasek
Publication venue: Oxford University Press
Publication date: 01/09/2011
Field of study

Ortholog identification is used in gene functional annotation, species phylogeny estimation, phylogenetic profile construction and many other analyses. Bioinformatics methods for ortholog identification are commonly based on pairwise protein sequence comparisons between whole genomes. Phylogenetic methods of ortholog identification have also been developed; these methods can be applied to protein data sets sharing a common domain architecture or which share a single functional domain but differ outside this region of homology. While promiscuous domains represent a challenge to all orthology prediction methods, overall structural similarity is highly correlated with proximity in a phylogenetic tree, conferring a degree of robustness to phylogenetic methods. In this article, we review the issues involved in orthology prediction when data sets include sequences with structurally heterogeneous domain architectures, with particular attention to automated methods designed for high-throughput application, and present a case study to illustrate the challenges in this area

Crossref

PubMed Central

eScholarship - University of California

L1pred: A Sequence-Based Prediction Tool for Catalytic Residues in Enzymes with the L1-logreg Classifier

Author: A Armon
A del Sol Mesa
A Gutteridge
AR Panchenko
B Sterner
C Berezin
C Marino Buslje
C Porter
CA Innis
Chi Zhang
D La
DR Caffrey
E Chea
E Cilia
E Greenshtein
E Youn
F Glaser
G Lopez
GJ Bartlett
HM Berman
I Mayrose
I Mihalek
IA Vergara
Iddo Friedberg
J Capra
J Pei
JD Fischer
Jialiang Yang
Jun Wang
K Koh
K Wang
K Ye
KC Bahadur Dukka
L Mirny
LJ McGuffin
M Brylinski
M Landau
N Petrova
P Zhao
R Alterovitz
RM Sweet
RM Williamson
S Ahmad
S Gong
S Pande
S Sankararaman
S Sankararaman
SA van de Geer
SF Altschul
SW Zhang
T Kato
T Zhang
W Taylor
W Tong
W Valdar
XS Liu
YC Dou
YC Dou
YC Dou
Yongchao Dou
YR Tang
ZP Liu
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

To understand enzyme functions, identifying the catalytic residues is a usual first step. Moreover, knowledge about catalytic residues is also useful for protein engineering and drug-design. However, to experimentally identify catalytic residues remains challenging for reasons of time and cost. Therefore, computational methods have been explored to predict catalytic residues. Here, we developed a new algorithm, L1pred, for catalytic residue prediction, by using the L1-logreg classifier to integrate eight sequence-based scoring functions. We tested L1pred and compared it against several existing sequence-based methods on carefully designed datasets Data604 and Data63. With ten-fold cross-validation, L1pred showed the area under precision-recall curve (AUPR) and the area under ROC curve (AUC) of 0.2198 and 0.9494 on the training dataset, Data604, respectively. In addition, on the independent test dataset, Data63, it showed the AUPR and AUC values of 0.2636 and 0.9375, respectively. Compared with other sequence-based methods, L1pred showed the best performance on both datasets. We also analyzed the importance of each attribute in the algorithm, and found that all the scores contributed more or less equally to the L1pred performance

CiteSeerX

Public Library of Science (PLOS)

Crossref

DigitalCommons@University of Nebraska

Directory of Open Access Journals

PubMed Central

Novel Feature for Catalytic Protein Residues Reflecting Interactions with Other Residues

Author: A BenShimon
A del Sol
A Gutteridge
A Stark
B Sterner
BK Shoichet
C Gabor
CT Porter
D La
DJ Watts
E Chea
E Youn
F Pazos
G Amitai
G Bagler
G Cheng
G Pugalenthi
GJ Bartlett
Gongbing Li
Hui Yin
I Guyon
J Ko
JA Capra
JD Fischer
Jiamin Xiao
JM Kleinberg
JW Torrance
K Goyal
KV Brinda
LH Greene
M Vendruscolo
M Vendruscolo
Mei Hu
MEJ Newman
Menglong Li
MFE Hall
MJ Ondrechen
N Tokuriki
NV Dokholyan
NV Petrova
PP Wangikar
RS Burt
S SacquinMora
S Sankararaman
SF Altschul
T Ikura
T Zhang
Vladimir Uversky
W Kabsch
WX Tong
Yizhou Li
YR Tang
Zhining Wen
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Owing to their potential for systematic analysis, complex networks have been widely used in proteomics. Representing a protein structure as a topology network provides novel insight into understanding protein folding mechanisms, stability and function. Here, we develop a new feature to reveal correlations between residues using a protein structure network. In an original attempt to quantify the effects of several key residues on catalytic residues, a power function was used to model interactions between residues. The results indicate that focusing on a few residues is a feasible approach to identifying catalytic residues. The spatial environment surrounding a catalytic residue was analyzed in a layered manner. We present evidence that correlation between residues is related to their distance apart most environmental parameters of the outer layer make a smaller contribution to prediction and ii catalytic residues tend to be located near key positions in enzyme folds. Feature analysis revealed satisfactory performance for our features, which were combined with several conventional features in a prediction model for catalytic residues using a comprehensive data set from the Catalytic Site Atlas. Values of 88.6 for sensitivity and 88.4 for specificity were obtained by 10fold crossvalidation. These results suggest that these features reveal the mutual dependence of residues and are promising for further study of structurefunction relationship

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

Networks of High Mutual Information Define the Structural Proximity of Catalytic Sites: Implications for Catalytic Residue Identification

Author: A Rausell
B Sterner
Burkhard Rost
CA Innis
CE Shannon
CM Buslje
Cristina Marino Buslje
CT Porter
D Kristensen
D Leys
E Cilia
Elin Teppa
GB Gloor
GJ Bartlett
I Mihalek
J Bernardes
J Manning
J Swets
JE Donald
José María Delfino
L Byung-Chul
M Nielsen
Morten Nielsen
N Petrova
O Lichtarge
R Alterovitz
R Gouveia-Oliveira
R Matthew Ward
RD Finn
RK Kuipers
S Chakrabarti
S Erdin
S Sankararaman
S Sankararaman
SD Dunn
SF Altschul
SW Lockless
T Zhang
T-Y Chien
TM Cover
Tomas Di Doménico
W Tong
Y-R Tang
Z Shi
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Identification of catalytic residues (CR) is essential for the characterization of enzyme function. CR are, in general, conserved and located in the functional site of a protein in order to attain their function. However, many non-catalytic residues are highly conserved and not all CR are conserved throughout a given protein family making identification of CR a challenging task. Here, we put forward the hypothesis that CR carry a particular signature defined by networks of close proximity residues with high mutual information (MI), and that this signature can be applied to distinguish functional from other non-functional conserved residues. Using a data set of 434 Pfam families included in the catalytic site atlas (CSA) database, we tested this hypothesis and demonstrated that MI can complement amino acid conservation scores to detect CR. The Kullback-Leibler (KL) conservation measurement was shown to significantly outperform both the Shannon entropy and maximal frequency measurements. Residues in the proximity of catalytic sites were shown to be rich in shared MI. A structural proximity MI average score (termed pMI) was demonstrated to be a strong predictor for CR, thus confirming the proposed hypothesis. A structural proximity conservation average score (termed pC) was also calculated and demonstrated to carry distinct information from pMI. A catalytic likeliness score (Cls), combining the KL, pC and pMI measures, was shown to lead to significantly improved prediction accuracy. At a specificity of 0.90, the Cls method was found to have a sensitivity of 0.816. In summary, we demonstrate that networks of residues with high MI provide a distinct signature on CR and propose that such a signature should be present in other classes of functional residues where the requirement to maintain a particular function places limitations on the diversification of the structural environment along the course of evolution

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Online Research Database In Technology

Recommended from our members

In silico identification and characterization of protein-ligand binding sites

Author: McGuffin Liam J.
Roche Daniel Barry
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Protein–ligand binding site prediction methods aim to predict, from amino acid sequence, protein–ligand interactions, putative ligands, and ligand binding site residues using either sequence information, structural information, or a combination of both. In silico characterization of protein–ligand interactions has become extremely important to help determine a protein’s functionality, as in vivo-based functional elucidation is unable to keep pace with the current growth of sequence databases. Additionally, in vitro biochemical functional elucidation is time-consuming, costly, and may not be feasible for large-scale analysis, such as drug discovery. Thus, in silico prediction of protein–ligand interactions must be utilized to aid in functional elucidation. Here, we briefly discuss protein function prediction, prediction of protein–ligand interactions, the Critical Assessment of Techniques for Protein Structure Prediction (CASP) and the Continuous Automated EvaluatiOn (CAMEO) competitions, along with their role in shaping the field. We also discuss, in detail, our cutting-edge web-server method, FunFOLD for the structurally informed prediction of protein–ligand interactions. Furthermore, we provide a step-by-step guide on using the FunFOLD web server and FunFOLD3 downloadable application, along with some real world examples, where the FunFOLD methods have been used to aid functional elucidation

Central Archive at the University of Reading

Crossref

CATH functional families predict functional sites in proteins

Author: Das S
Orengo C
Scholes HM
Sen N
Publication venue
Publication date: 02/11/2020
Field of study

MOTIVATION: Identification of functional sites in proteins is essential for functional characterization, variant interpretation and drug design. Several methods are available for predicting either a generic functional site, or specific types of functional site. Here, we present FunSite, a machine learning predictor that identifies catalytic, ligand-binding and protein-protein interaction functional sites using features derived from protein sequence and structure, and evolutionary data from CATH functional families (FunFams). RESULTS: FunSite's prediction performance was rigorously benchmarked using cross-validation and a holdout dataset. FunSite outperformed other publicly-available functional site prediction methods. We show that conserved residues in FunFams are enriched in functional sites. We found FunSite's performance depends greatly on the quality of functional site annotations and the information content of FunFams in the training data. Finally, we analyse which structural and evolutionary features are most predictive for functional sites. AVAILABILITY: https://github.com/UCL/cath-funsite-predictor. CONTACT: [email protected]. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

UCL Discovery