12,755 research outputs found
DeepSF: deep convolutional neural network for mapping protein sequences to folds
Motivation
Protein fold recognition is an important problem in structural
bioinformatics. Almost all traditional fold recognition methods use sequence
(homology) comparison to indirectly predict the fold of a tar get protein based
on the fold of a template protein with known structure, which cannot explain
the relationship between sequence and fold. Only a few methods had been
developed to classify protein sequences into a small number of folds due to
methodological limitations, which are not generally useful in practice.
Results
We develop a deep 1D-convolution neural network (DeepSF) to directly classify
any protein se quence into one of 1195 known folds, which is useful for both
fold recognition and the study of se quence-structure relationship. Different
from traditional sequence alignment (comparison) based methods, our method
automatically extracts fold-related features from a protein sequence of any
length and map it to the fold space. We train and test our method on the
datasets curated from SCOP1.75, yielding a classification accuracy of 80.4%. On
the independent testing dataset curated from SCOP2.06, the classification
accuracy is 77.0%. We compare our method with a top profile profile alignment
method - HHSearch on hard template-based and template-free modeling targets of
CASP9-12 in terms of fold recognition accuracy. The accuracy of our method is
14.5%-29.1% higher than HHSearch on template-free modeling targets and
4.5%-16.7% higher on hard template-based modeling targets for top 1, 5, and 10
predicted folds. The hidden features extracted from sequence by our method is
robust against sequence mutation, insertion, deletion and truncation, and can
be used for other protein pattern recognition problems such as protein
clustering, comparison and ranking.Comment: 28 pages, 13 figure
- …