5,968 research outputs found
Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model
Recently exciting progress has been made on protein contact prediction, but
the predicted contacts for proteins without many sequence homologs is still of
low quality and not very useful for de novo structure prediction. This paper
presents a new deep learning method that predicts contacts by integrating both
evolutionary coupling (EC) and sequence conservation information through an
ultra-deep neural network formed by two deep residual networks. This deep
neural network allows us to model very complex sequence-contact relationship as
well as long-range inter-contact correlation. Our method greatly outperforms
existing contact prediction methods and leads to much more accurate
contact-assisted protein folding. Tested on three datasets of 579 proteins, the
average top L long-range prediction accuracy obtained our method, the
representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21
and 0.30, respectively; the average top L/10 long-range accuracy of our method,
CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding
using our predicted contacts as restraints can yield correct folds (i.e.,
TMscore>0.6) for 203 test proteins, while that using MetaPSICOV- and
CCMpred-predicted contacts can do so for only 79 and 62 proteins, respectively.
Further, our contact-assisted models have much better quality than
template-based models. Using our predicted contacts as restraints, we can (ab
initio) fold 208 of the 398 membrane proteins with TMscore>0.5. By contrast,
when the training proteins of our method are used as templates, homology
modeling can only do so for 10 of them. One interesting finding is that even if
we do not train our prediction models with any membrane proteins, our method
works very well on membrane protein prediction. Finally, in recent blind CAMEO
benchmark our method successfully folded 5 test proteins with a novel fold
Distance-based Protein Folding Powered by Deep Learning
Contact-assisted protein folding has made very good progress, but two
challenges remain. One is accurate contact prediction for proteins lack of many
sequence homologs and the other is that time-consuming folding simulation is
often needed to predict good 3D models from predicted contacts. We show that
protein distance matrix can be predicted well by deep learning and then
directly used to construct 3D models without folding simulation at all. Using
distance geometry to construct 3D models from our predicted distance matrices,
we successfully folded 21 of the 37 CASP12 hard targets with a median family
size of 58 effective sequence homologs within 4 hours on a Linux computer of 20
CPUs. In contrast, contacts predicted by direct coupling analysis (DCA) cannot
fold any of them in the absence of folding simulation and the best CASP12 group
folded 11 of them by integrating predicted contacts into complex,
fragment-based folding simulation. The rigorous experimental validation on 15
CASP13 targets show that among the 3 hardest targets of new fold our
distance-based folding servers successfully folded 2 large ones with <150
sequence homologs while the other servers failed on all three, and that our ab
initio folding server also predicted the best, high-quality 3D model for a
large homology modeling target. Further experimental validation in CAMEO shows
that our ab initio folding server predicted correct fold for a membrane protein
of new fold with 200 residues and 229 sequence homologs while all the other
servers failed. These results imply that deep learning offers an efficient and
accurate solution for ab initio folding on a personal computer
DeepSF: deep convolutional neural network for mapping protein sequences to folds
Motivation
Protein fold recognition is an important problem in structural
bioinformatics. Almost all traditional fold recognition methods use sequence
(homology) comparison to indirectly predict the fold of a tar get protein based
on the fold of a template protein with known structure, which cannot explain
the relationship between sequence and fold. Only a few methods had been
developed to classify protein sequences into a small number of folds due to
methodological limitations, which are not generally useful in practice.
Results
We develop a deep 1D-convolution neural network (DeepSF) to directly classify
any protein se quence into one of 1195 known folds, which is useful for both
fold recognition and the study of se quence-structure relationship. Different
from traditional sequence alignment (comparison) based methods, our method
automatically extracts fold-related features from a protein sequence of any
length and map it to the fold space. We train and test our method on the
datasets curated from SCOP1.75, yielding a classification accuracy of 80.4%. On
the independent testing dataset curated from SCOP2.06, the classification
accuracy is 77.0%. We compare our method with a top profile profile alignment
method - HHSearch on hard template-based and template-free modeling targets of
CASP9-12 in terms of fold recognition accuracy. The accuracy of our method is
14.5%-29.1% higher than HHSearch on template-free modeling targets and
4.5%-16.7% higher on hard template-based modeling targets for top 1, 5, and 10
predicted folds. The hidden features extracted from sequence by our method is
robust against sequence mutation, insertion, deletion and truncation, and can
be used for other protein pattern recognition problems such as protein
clustering, comparison and ranking.Comment: 28 pages, 13 figure
Single Muscle Fiber Proteomics Reveals Fiber-Type-Specific Features of Human Muscle Aging
Skeletal muscle is a key tissue in human aging, which affects different muscle fiber types unequally. We developed a highly sensitive single muscle fiber proteomics workflow to study human aging and show that the senescence of slow and fast muscle fibers is characterized by diverging metabolic and protein quality control adaptations. Whereas mitochondrial content declines with aging in both fiber types, glycolysis and glycogen metabolism are upregulated in slow but downregulated in fast muscle fibers. Aging mitochondria decrease expression of the redox enzyme monoamine oxidase A. Slow fibers upregulate a subset of actin and myosin chaperones, whereas an opposite change happens in fast fibers. These changes in metabolism and sarcomere quality control may be related to the ability of slow, but not fast, muscle fibers to maintain their mass during aging. We conclude that single muscle fiber analysis by proteomics can elucidate pathophysiology in a sub-type-specific manner
- …