Search CORE

30 research outputs found

How to find simple and accurate rules for viral protease cleavage specificities

Author: A Grakoui
A Kontijevskis
A Narayanan
A Urbani
AA Kolykhalov
AD Kwong
B Keil
BA Malcolm
BE Turk
BM Dunn
C Howson
CM Overall
Daniel Garwicz
DJC MacKay
E Berry
EB Fowlkes
H Eizert
H Neurath
HB Shen
I Schechter
Ian Jarman
IH Jarman
J Shi
JK Stoller
K Fujikawa
K Li
L You
L You
Liwen You
MR Attwood
NA Thornberry
O Schilling
Paulo JG Lisboa
R Bartenschlager
R Bartenschlager
R Rönn
R Zhang
RA Poorman
RE Stauber
SC Pettit
SH Yang
SM Best
SS Leinbach
SY Kim
T Rögnvaldsson
TA Etchells
Terence A Etchells
Thorsteinn Rögnvaldsson
X Hou
YH Kou
ZR Yang
ZR Yang
ZR Yang
ZR Yang
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Proteases of human pathogens are becoming increasingly important drug targets, hence it is necessary to understand their substrate specificity and to interpret this knowledge in practically useful ways. New methods are being developed that produce large amounts of cleavage information for individual proteases and some have been applied to extract cleavage rules from data. However, the hitherto proposed methods for extracting rules have been neither easy to understand nor very accurate. To be practically useful, cleavage rules should be accurate, compact, and expressed in an easily understandable way. Results A new method is presented for producing cleavage rules for viral proteases with seemingly complex cleavage profiles. The method is based on orthogonal search-based rule extraction (OSRE) combined with spectral clustering. It is demonstrated on substrate data sets for human immunodeficiency virus type 1 (HIV-1) protease and hepatitis C (HCV) NS3/4A protease, showing excellent prediction performance for both HIV-1 cleavage and HCV NS3/4A cleavage, agreeing with observed HCV genotype differences. New cleavage rules (consensus sequences) are suggested for HIV-1 and HCV NS3/4A cleavages. The practical usability of the method is also demonstrated by using it to predict the location of an internal cleavage site in the HCV NS3 protease and to correct the location of a previously reported internal cleavage site in the HCV NS3 protease. The method is fast to converge and yields accurate rules, on par with previous results for HIV-1 protease and better than previous state-of-the-art for HCV NS3/4A protease. Moreover, the rules are fewer and simpler than previously obtained with rule extraction methods. Conclusion A rule extraction methodology by searching for multivariate low-order predicates yields results that significantly outperform existing rule bases on out-of-sample data, but are more transparent to expert users. The approach yields rules that are easy to use and useful for interpreting experimental data.</p

Lund University Publications

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Högskolebiblioteket i Halmstad Publikationer

Predicting sulfotyrosine sites using the random forest algorithm with significantly improved prediction accuracy

addresses: School of Biosciences, University of Exeter, Exeter EX4 5DE, UK. [email protected]: PMCID: PMC2777180types: Journal Article; Research Support, Non-U.S. Gov't© 2009 Yang; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Tyrosine sulfation is one of the most important posttranslational modifications. Due to its relevance to various disease developments, tyrosine sulfation has become the target for drug design. In order to facilitate efficient drug design, accurate prediction of sulfotyrosine sites is desirable. A predictor published seven years ago has been very successful with claimed prediction accuracy of 98%. However, it has a particularly low sensitivity when predicting sulfotyrosine sites in some newly sequenced proteins

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Open Research Exeter

Prediction of Protein Domain with mRMR Feature Selection and Analysis

Author: AA Schaffer
AG Murzin
AK Dunker
AM Moses
AP Elhammer
B Saffari
Bi-Qing Li
Bin Xue
BQ Li
CA Orengo
D Chivian
D Li
DE Kim
E Angov
EC Mbamala
G Pugalenthi
GP Zhou
GP Zhou
H Ingolfsson
H Mohabatkar
H Peng
HB Shen
HB Shen
I Walsh
ID Campbell
IH Witten
J Chen
J Cheng
J Cheng
J Cheng
J Eickholt
J Lin
J Liu
J Liu
J Wang
JD Qiu
JE Gewehr
JJ Chou
JR Schnell
K Peng
K Shameer
K Wang
Kai-Yan Feng
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KK Kandaswamy
Kuo-Chen Chou
L Breiman
L Chen
L Holm
Le-Le Hu
Lei Chen
M Esmaeili
M Hayat
M Suyama
MJ Berardi
MK Yoon
N Nagarajan
N von Ohsen
NM Goldenberg
P Mundra
P Tompa
P Wang
PE Wright
PK Nielsen
Q Gu
R Apweiler
R Bondugula
R Guerois
R Linding
RA George
RA Poorman
S Gong
S Kawashima
S Roy
SC Jia
SF Altschul
SM Reynolds
T Ebina
T Huang
TA Holland
W Li
W Zhao
WR Atchley
WZ Lin
X Xiao
X Xiao
X Xiao
X Xiao
X Xiao
X Xiao
X Xiao
Y Zhang
YD Cai
YD Li
Yu-Dong Cai
YX Li
Z He
Z Qiu
ZC Wu
ZC Wu
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

The domains are the structural and functional units of proteins. With the avalanche of protein sequences generated in the postgenomic age, it is highly desired to develop effective methods for predicting the protein domains according to the sequences information alone, so as to facilitate the structure prediction of proteins and speed up their functional annotation. However, although many efforts have been made in this regard, prediction of protein domains from the sequence information still remains a challenging and elusive problem. Here, a new method was developed by combing the techniques of RF (random forest), mRMR (maximum relevance minimum redundancy), and IFS (incremental feature selection), as well as by incorporating the features of physicochemical and biochemical properties, sequence conservation, residual disorder, secondary structure, and solvent accessibility. The overall success rate achieved by the new method on an independent dataset was around 73%, which was about 28–40% higher than those by the existing method on the same benchmark dataset. Furthermore, it was revealed by an in-depth analysis that the features of evolution, codon diversity, electrostatic charge, and disorder played more important roles than the others in predicting protein domains, quite consistent with experimental observations. It is anticipated that the new method may become a high-throughput tool in annotating protein domains, or may, at the very least, play a complementary role to the existing domain prediction methods, and that the findings about the key features with high impacts to the domain prediction might provide useful insights or clues for further experimental investigations in this area. Finally, it has not escaped our notice that the current approach can also be utilized to study protein signal peptides, B-cell epitopes, HIV protease cleavage sites, among many other important topics in protein science and biomedicine

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare