5,968 research outputs found

    Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model

    Full text link
    Recently exciting progress has been made on protein contact prediction, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual networks. This deep neural network allows us to model very complex sequence-contact relationship as well as long-range inter-contact correlation. Our method greatly outperforms existing contact prediction methods and leads to much more accurate contact-assisted protein folding. Tested on three datasets of 579 proteins, the average top L long-range prediction accuracy obtained our method, the representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints can yield correct folds (i.e., TMscore>0.6) for 203 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 proteins, respectively. Further, our contact-assisted models have much better quality than template-based models. Using our predicted contacts as restraints, we can (ab initio) fold 208 of the 398 membrane proteins with TMscore>0.5. By contrast, when the training proteins of our method are used as templates, homology modeling can only do so for 10 of them. One interesting finding is that even if we do not train our prediction models with any membrane proteins, our method works very well on membrane protein prediction. Finally, in recent blind CAMEO benchmark our method successfully folded 5 test proteins with a novel fold

    Distance-based Protein Folding Powered by Deep Learning

    Full text link
    Contact-assisted protein folding has made very good progress, but two challenges remain. One is accurate contact prediction for proteins lack of many sequence homologs and the other is that time-consuming folding simulation is often needed to predict good 3D models from predicted contacts. We show that protein distance matrix can be predicted well by deep learning and then directly used to construct 3D models without folding simulation at all. Using distance geometry to construct 3D models from our predicted distance matrices, we successfully folded 21 of the 37 CASP12 hard targets with a median family size of 58 effective sequence homologs within 4 hours on a Linux computer of 20 CPUs. In contrast, contacts predicted by direct coupling analysis (DCA) cannot fold any of them in the absence of folding simulation and the best CASP12 group folded 11 of them by integrating predicted contacts into complex, fragment-based folding simulation. The rigorous experimental validation on 15 CASP13 targets show that among the 3 hardest targets of new fold our distance-based folding servers successfully folded 2 large ones with <150 sequence homologs while the other servers failed on all three, and that our ab initio folding server also predicted the best, high-quality 3D model for a large homology modeling target. Further experimental validation in CAMEO shows that our ab initio folding server predicted correct fold for a membrane protein of new fold with 200 residues and 229 sequence homologs while all the other servers failed. These results imply that deep learning offers an efficient and accurate solution for ab initio folding on a personal computer

    DeepSF: deep convolutional neural network for mapping protein sequences to folds

    Get PDF
    Motivation Protein fold recognition is an important problem in structural bioinformatics. Almost all traditional fold recognition methods use sequence (homology) comparison to indirectly predict the fold of a tar get protein based on the fold of a template protein with known structure, which cannot explain the relationship between sequence and fold. Only a few methods had been developed to classify protein sequences into a small number of folds due to methodological limitations, which are not generally useful in practice. Results We develop a deep 1D-convolution neural network (DeepSF) to directly classify any protein se quence into one of 1195 known folds, which is useful for both fold recognition and the study of se quence-structure relationship. Different from traditional sequence alignment (comparison) based methods, our method automatically extracts fold-related features from a protein sequence of any length and map it to the fold space. We train and test our method on the datasets curated from SCOP1.75, yielding a classification accuracy of 80.4%. On the independent testing dataset curated from SCOP2.06, the classification accuracy is 77.0%. We compare our method with a top profile profile alignment method - HHSearch on hard template-based and template-free modeling targets of CASP9-12 in terms of fold recognition accuracy. The accuracy of our method is 14.5%-29.1% higher than HHSearch on template-free modeling targets and 4.5%-16.7% higher on hard template-based modeling targets for top 1, 5, and 10 predicted folds. The hidden features extracted from sequence by our method is robust against sequence mutation, insertion, deletion and truncation, and can be used for other protein pattern recognition problems such as protein clustering, comparison and ranking.Comment: 28 pages, 13 figure

    Single Muscle Fiber Proteomics Reveals Fiber-Type-Specific Features of Human Muscle Aging

    Get PDF
    Skeletal muscle is a key tissue in human aging, which affects different muscle fiber types unequally. We developed a highly sensitive single muscle fiber proteomics workflow to study human aging and show that the senescence of slow and fast muscle fibers is characterized by diverging metabolic and protein quality control adaptations. Whereas mitochondrial content declines with aging in both fiber types, glycolysis and glycogen metabolism are upregulated in slow but downregulated in fast muscle fibers. Aging mitochondria decrease expression of the redox enzyme monoamine oxidase A. Slow fibers upregulate a subset of actin and myosin chaperones, whereas an opposite change happens in fast fibers. These changes in metabolism and sarcomere quality control may be related to the ability of slow, but not fast, muscle fibers to maintain their mass during aging. We conclude that single muscle fiber analysis by proteomics can elucidate pathophysiology in a sub-type-specific manner
    • …
    corecore