Search CORE

9,241 research outputs found

Prediction of catalytic residues using Support Vector Machine with selected protein sequence and structural properties

Author: A Andreeva
A Gutteridge
AH Elcock
AR Panchenko
B Lee
B Rost
BW Mathews
CA Innis
Cathy H Wu
CH Wu
DK Smith
GJ Bartlett
H Yao
HM Berman
IH Witten
JC Platt
JD Thompson
JS Milton
K Kinoshita
K Sjolander
M Ota
MA Hearst
MJ Ondrechen
Natalia V Petrova
O Lichtarge
P Aloy
PP Wangikar
R Kohavi
R Koradi
R Landgraf
RL Tatusov
S Chakravarty
S Jones
S Parthasarathy
S Zhu
SF Altschul
SJ Campbell
SJ Hubbard
TA Binkowski
W Kabsch
W Tian
WSJ Valdar
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: The number of protein sequences deriving from genome sequencing projects is outpacing our knowledge about the function of these proteins. With the gap between experimentally characterized and uncharacterized proteins continuing to widen, it is necessary to develop new computational methods and tools for functional prediction. Knowledge of catalytic sites provides a valuable insight into protein function. Although many computational methods have been developed to predict catalytic residues and active sites, their accuracy remains low, with a significant number of false positives. In this paper, we present a novel method for the prediction of catalytic sites, using a carefully selected, supervised machine learning algorithm coupled with an optimal discriminative set of protein sequence conservation and structural properties. RESULTS: To determine the best machine learning algorithm, 26 classifiers in the WEKA software package were compared using a benchmarking dataset of 79 enzymes with 254 catalytic residues in a 10-fold cross-validation analysis. Each residue of the dataset was represented by a set of 24 residue properties previously shown to be of functional relevance, as well as a label {+1/-1} to indicate catalytic/non-catalytic residue. The best-performing algorithm was the Sequential Minimal Optimization (SMO) algorithm, which is a Support Vector Machine (SVM). The Wrapper Subset Selection algorithm further selected seven of the 24 attributes as an optimal subset of residue properties, with sequence conservation, catalytic propensities of amino acids, and relative position on protein surface being the most important features. CONCLUSION: The SMO algorithm with 7 selected attributes correctly predicted 228 of the 254 catalytic residues, with an overall predictive accuracy of more than 86%. Missing only 10.2% of the catalytic residues, the method captures the fundamental features of catalytic residues and can be used as a "catalytic residue filter" to facilitate experimental identification of catalytic residues for proteins with known structure but unknown function

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The interplay of descriptor-based computational analysis with pharmacophore modeling builds the basis for a novel classification scheme for feruloyl esterases

Author: Akin
Altschul
Andersen
Andreasen
Aurilia
Barnum
Bartolomé
Bendtsen
Benner
Benoit
Benoit
Bhasin
Bhasin
Blum
Cai
Cai
Castanares
Chang
Choi
Crepin
D.B.R.K. Gupta Udatha
Dodd
Donaghy
Donaghy
Dudoit
Dysvik
Ewing
Faulds
Ferguson
Fillingham
Finn
Garcia-Conesa
García-Conesa
Garrigues
Gasteiger
Gasteiger
Gianni Panagiotou
Giuliani
Goldstone
Hall
Han
Hatzakis
Henikoff
Hermoso
Hsu
Humberstone
Huson
Irene Kouskoumvekaki
Kaiser
Karchin
Keerthi
Kheder
Kikuzaki
Kim
Kohavi
Kohonen
Koseki
Koseki
Kroon
Kroon
Kumar
Lao
Larkin
Laszlo
Latha
Lee
Lesage-Meessen
Levasseur
Levasseur
Li
Lima
Lisbeth Olsson
MacKay
Marcotte
McAuley
Meinicke
Morris
Mukherjee
Nielsen
Noble
Nsereko
Oili
Ong
Platt
Prates
Pérez-Bercoff
Rashamuse
Record
Rost
Sancho
Sankararaman
Sankararaman
Schrödinger Suite 2009
Schubot
Slavin
Tarbouriech
Teodoro
Thompson
Tomoko
Topakas
Topakas
Topakas
Topakas
Topakas
Tsuchiyama
Tsuchiyama
Uestuen
Vafiadi
Vafiadi
Vafiadi
Vafiadi
Vafiadi
Vafiadi
Wang
Wang
Wang
Wilkinson
Publication venue
Publication date: 11/08/2010
Field of study

One of the most intriguing groups of enzymes, the feruloyl esterases (FAEs), is ubiquitous in both simple and complex organisms. FAEs have gained importance in biofuel, medicine and food industries due to their capability of acting on a large range of substrates for cleaving ester bonds and synthesizing high-added value molecules through esterification and transesterification reactions. During the past two decades extensive studies have been carried out on the production and partial characterization of FAEs from fungi, while much less is known about FAEs of bacterial or plant origin. Initial classification studies on FAEs were restricted on sequence similarity and substrate specificity on just four model substrates and considered only a handful of FAEs belonging to the fungal kingdom. This study centers on the descriptor-based classification and structural analysis of experimentally verified and putative FAEs; nevertheless, the framework presented here is applicable to every poorly characterized enzyme family. 365 FAE-related sequences of fungal, bacterial and plantae origin were collected and they were clustered using Self Organizing Maps followed by k-means clustering into distinct groups based on amino acid composition and physico-chemical composition descriptors derived from the respective amino acid sequence. A Support Vector Machine model was subsequently constructed for the classification of new FAEs into the pre-assigned clusters. The model successfully recognized 98.2% of the training sequences and all the sequences of the blind test. The underlying functionality of the 12 proposed FAE families was validated against a combination of prediction tools and published experimental data. Another important aspect of the present work involves the development of pharmacophore models for the new FAE families, for which sufficient information on known substrates existed. Knowing the pharmacophoric features of a small molecule that are essential for binding to the members of a certain family opens a window of opportunities for tailored applications of FAEs

Crossref

Chalmers Research

Nature Precedings

Online Research Database In Technology

Chalmers Publication Library

HKU Scholars Hub

Identification of functionally related enzymes by learning-to-rank methods

Author: Airola Antti
De Baets Bernard
Fober Thomas
Glinca Serghei
Hüllermeier Eyke
Klebe Gerhard
Pahikkala Tapio
Stock Michiel
Waegeman Willem
Publication venue
Publication date: 01/01/2014
Field of study

Enzyme sequences and structures are routinely used in the biological sciences as queries to search for functionally related enzymes in online databases. To this end, one usually departs from some notion of similarity, comparing two enzymes by looking for correspondences in their sequences, structures or surfaces. For a given query, the search operation results in a ranking of the enzymes in the database, from very similar to dissimilar enzymes, while information about the biological function of annotated database enzymes is ignored. In this work we show that rankings of that kind can be substantially improved by applying kernel-based learning algorithms. This approach enables the detection of statistical dependencies between similarities of the active cleft and the biological function of annotated enzymes. This is in contrast to search-based approaches, which do not take annotated training data into account. Similarity measures based on the active cleft are known to outperform sequence-based or structure-based measures under certain conditions. We consider the Enzyme Commission (EC) classification hierarchy for obtaining annotated enzymes during the training phase. The results of a set of sizeable experiments indicate a consistent and significant improvement for a set of similarity measures that exploit information about small cavities in the surface of enzymes

arXiv.org e-Print Archive

Ghent University Academic Bibliography

On the Structural Context and Identification of Enzyme Catalytic Residues

Author
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2013
Field of study

Crossref

Novel application of query-based qualitative predictors for characterization of solvent accessible residues in conjunction with protein sequence homology. Proceedings of the 22nd International Workshop on Database and Expert Systems Applications

Author: Gholizadeh S
Lau R
Lustig Brooke
Mishra R
Nepal R
Rose D
Publication venue: SJSU ScholarWorks
Publication date: 01/01/2011
Field of study

SJSU ScholarWorks

Computational approaches to predict protein functional families and functional sites.

Author: Abbasian M
Orengo CA
Rauer C
Sen N
Waman VP
Publication venue
Publication date: 01/10/2021
Field of study

Understanding the mechanisms of protein function is indispensable for many biological applications, such as protein engineering and drug design. However, experimental annotations are sparse, and therefore, theoretical strategies are needed to fill the gap. Here, we present the latest developments in building functional subclassifications of protein superfamilies and using evolutionary conservation to detect functional determinants, for example, catalytic-, binding- and specificity-determining residues important for delineating the functional families. We also briefly review other features exploited for functional site detection and new machine learning strategies for combining multiple features

UCL Discovery

L1pred: A Sequence-Based Prediction Tool for Catalytic Residues in Enzymes with the L1-logreg Classifier

Author: A Armon
A del Sol Mesa
A Gutteridge
AR Panchenko
B Sterner
C Berezin
C Marino Buslje
C Porter
CA Innis
Chi Zhang
D La
DR Caffrey
E Chea
E Cilia
E Greenshtein
E Youn
F Glaser
G Lopez
GJ Bartlett
HM Berman
I Mayrose
I Mihalek
IA Vergara
Iddo Friedberg
J Capra
J Pei
JD Fischer
Jialiang Yang
Jun Wang
K Koh
K Wang
K Ye
KC Bahadur Dukka
L Mirny
LJ McGuffin
M Brylinski
M Landau
N Petrova
P Zhao
R Alterovitz
RM Sweet
RM Williamson
S Ahmad
S Gong
S Pande
S Sankararaman
S Sankararaman
SA van de Geer
SF Altschul
SW Zhang
T Kato
T Zhang
W Taylor
W Tong
W Valdar
XS Liu
YC Dou
YC Dou
YC Dou
Yongchao Dou
YR Tang
ZP Liu
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

To understand enzyme functions, identifying the catalytic residues is a usual first step. Moreover, knowledge about catalytic residues is also useful for protein engineering and drug-design. However, to experimentally identify catalytic residues remains challenging for reasons of time and cost. Therefore, computational methods have been explored to predict catalytic residues. Here, we developed a new algorithm, L1pred, for catalytic residue prediction, by using the L1-logreg classifier to integrate eight sequence-based scoring functions. We tested L1pred and compared it against several existing sequence-based methods on carefully designed datasets Data604 and Data63. With ten-fold cross-validation, L1pred showed the area under precision-recall curve (AUPR) and the area under ROC curve (AUC) of 0.2198 and 0.9494 on the training dataset, Data604, respectively. In addition, on the independent test dataset, Data63, it showed the AUPR and AUC values of 0.2636 and 0.9375, respectively. Compared with other sequence-based methods, L1pred showed the best performance on both datasets. We also analyzed the importance of each attribute in the algorithm, and found that all the scores contributed more or less equally to the L1pred performance

CiteSeerX

Public Library of Science (PLOS)

Crossref

DigitalCommons@University of Nebraska

Directory of Open Access Journals

PubMed Central

Abundance of intrinsic disorder in SV-IV, a multifunctional androgen-dependent protein secreted from rat seminal vesicle

Author: Ambrosone
Bairoch
Bornberg-Bauer
Bourhis
Cai
Caporale
Cheng
Coeytaux
Csizmók
Doszatányi
Dosztányi
Dunker
Dunker
Dyson
D’Ambrosio
Esposito
Ferron
Gaboriaud
Galzitskaya
Galzitskaya
Galzitskaya
Garbuzynskiy
Harris
Hirose
Ialenti
Kandala
Kyte
Li
Lin
Linding
Linding
Liu
Lupas
MacCallum
McDonald
Metafora
Metafora
Miele
Murzin
Obradovic
Obradovic
Ostrowski
Pan
Prilusky
Quevillon-Cheruel
Radivojac
Ragone
Romero
Romero
Rüping
Shimizu
Shimizu
Sickmeier
Stiuso
Stiuso
Tompa
Tufano
Uversky
Uversky
Vucetic
Ward
Weathers
Weathers
Wolf
Wootton
Wright
Yang
Publication venue
Publication date: 06/12/2007
Field of study

The potent immunomodulatory, anti-inflammatory and procoagulant properties of the
protein no. 4 secreted from the rat seminal vesicle epithelium (SV-IV) have been
previously found to be modulated by a supramolecular monomer-trimer equilibrium.
More structural details that integrate experimental data into a predictive framework
have recently been reported. Unfortunately, homology modelling and fold-recognition
strategies were not successful in creating a theoretical model of the structural
organization of SV-IV. It was inferred that the global structure of SV-IV is not similar
to any protein of known three-dimensional structure. Reversing the classical approach
to the sequence-structure-function paradigm, in this paper we report on novel
information obtained by comparing physicochemical parameters of SV-IV with two
datasets made of intrinsically unfolded and ideally globular proteins. In addition, we
have analysed the SV-IV sequence by several publicly available disorder-oriented
predictors. Overall, disorder predictions and a re-examination of existing experimental
data strongly suggest that SV-IV needs large plasticity to efficiently interact with the
different targets that characterize its multifaceted biological function and should be
therefore better classified as an intrinsically disordered protein

CiteSeerX

Crossref

Archivio della Ricerca - Università di Salerno

Open Access Repository

Nature Precedings

Automatic prediction of catalytic residues by modeling residue structural neighborhood

Author: A Ceroni
A Humm
A Yamaguchi
AC Wallace
AE Todd
Andrea Passerini
CT Porter
E Chea
E Webb
E Youn
EF Pettersen
Elisa Cilia
G Amitai
G Bartlett
J Bernardes
J Davis
J Ebert
J Mistry
JA Capra
JC Nebel
JD Fischer
KM Borgwardt
L Xie
M Babor
M Lippi
M Ondrechen
MM Benning
N Cristianini
N Nagano
N Shu
NV Petrova
P Gherardini
RD Finn
S Kawashima
SF Altschul
T Joachims
T Zhang
W Tong
WS Valdar
Y Tang
Y Wei
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Background: Prediction of catalytic residues is a major step in characterizing the function of enzymes. In its simpler formulation, the problem can be cast into a binary classification task at the residue level, by predicting whether the residue is directly involved in the catalytic process. The task is quite hard also when structural information is available, due to the rather wide range of roles a functional residue can play and to the large imbalance between the number of catalytic and non-catalytic residues.Results: We developed an effective representation of structural information by modeling spherical regions around candidate residues, and extracting statistics on the properties of their content such as physico-chemical properties, atomic density, flexibility, presence of water molecules. We trained an SVM classifier combining our features with sequence-based information and previously developed 3D features, and compared its performance with the most recent state-of-the-art approaches on different benchmark datasets. We further analyzed the discriminant power of the information provided by the presence of heterogens in the residue neighborhood.Conclusions: Our structure-based method achieves consistent improvements on all tested datasets over both sequence-based and structure-based state-of-the-art approaches. Structural neighborhood information is shown to be responsible for such results, and predicting the presence of nearby heterogens seems to be a promising direction for further improvements.Journal ArticleResearch Support, N.I.H. Extramuralinfo:eu-repo/semantics/publishe

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

DI-fusion