Search CORE

2 research outputs found

Automatic structure classification of small proteins using random forest

Author: A Andreeva
A Andreeva
AG Murzin
AV Levitin
C Hadley
CHQ Ding
E Ie
G Zhanga
H Shen
HM Berman
I Chung
I Melvin
IH Witten
J Cheng
J Wu
JE Gewehr
JF Gibrat
Jonathan D Hirst
JR Quinlan
K Chen
KC Chou
L Breiman
L Holm
L Kurgan
M Gerstein
MB Swindells
MTA Shamim
O Çamoğlu
P Baldi
P Han
P Jain
P Klein
Pooja Jain
S Kim
S Mile
S Vinga
SE Brenner
SE Hamby
SF Altschul
SP Kanaan
SS Krishna
U Hobohm
V Sam
W Kabsch
X Chen
X Chen
XM Zhao
Y Cai
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Random forest, an ensemble based supervised machine learning algorithm, is used to predict the SCOP structural classification for a target structure, based on the similarity of its structural descriptors to those of a template structure with an equal number of secondary structure elements (SSEs). An initial assessment of random forest is carried out for domains consisting of three SSEs. The usability of random forest in classifying larger domains is demonstrated by applying it to domains consisting of four, five and six SSEs. Results Random forest, trained on SCOP version 1.69, achieves a predictive accuracy of up to 94% on an independent and non-overlapping test set derived from SCOP version 1.73. For classification to the SCOP <it>Class, Fold, Super-family </it>or <it>Family </it>levels, the predictive quality of the model in terms of Matthew's correlation coefficient (MCC) ranged from 0.61 to 0.83. As the number of constituent SSEs increases the MCC for classification to different structural levels decreases. Conclusions The utility of random forest in classifying domains from the place-holder classes of SCOP to the true <it>Class, Fold, Super-family </it>or <it>Family </it>levels is demonstrated. Issues such as introduction of a new structural level in SCOP and the merger of singleton levels can also be addressed using random forest. A real-world scenario is mimicked by predicting the classification for those protein structures from the PDB, which are yet to be assigned to the SCOP classification hierarchy.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Identifying anticancer peptides by using a generalized chaos game representation

Author: A Fiser
A Tyagi
B Liao
C Cortes
C Xu
CC Chang
D Suna
E Paradis
F Sievers
FM Li
G Chang
G Fang
G Wang
H Nakashima
H Wu
HJ Jeffrey
HJ Yu
HS Chan
JD Thompson
Jiaguo Liu
JS Almeida
JY Shi
JY Yang
K Chen
K Chen
KA Dill
KC Chou
KC Chou
KC Chou
L Zhang
Li Ge
M Randić
Matthias Dehmer
MJ Ford
MTA Shamim
N Saitou
O Robinson
P Deschavanne
P He
P Welch
PA He
Pa He
PJ Deschavanne
PJ Deschavanne
R Singh
Ry Luo
S Basu
S Matsuda
SS Sahu
SST Yau
T Hoang
W Chen
W Lam
W Li
W Tanchotsrinon
WM Fitch
Y Liu
YH Yao
Yusen Zhang
YZ Chen
Z Hajisharifi
Z Mu
ZG Yu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref