11 research outputs found

    Accurate Prediction of Protein Structural Class

    Get PDF
    Because of the increasing gap between the data from sequencing and structural genomics, the accurate prediction of the structural class of a protein domain solely from the primary sequence has remained a challenging problem in structural biology. Traditional sequence-based predictors generally select several sequence features and then feed them directly into a classification program to identify the structural class. The current best sequence-based predictor achieved an overall accuracy of 74.1% when tested on a widely used, non-homologous benchmark dataset 25PDB. In the present work, we built a multiple linear regression (MLR) model to convert the 440-dimensional (440D) sequence feature vector extracted from the Position Specific Scoring Matrix (PSSM) of a protein domain to a 4-dimensinal (4D) structural feature vector, which could then be used to predict the four major structural classes. We performed 10-fold cross-validation and jackknife tests of the method on a large non-homologous dataset containing 8,244 domains distributed among the four major classes. The performance of our approach outperformed all of the existing sequence-based methods and had an overall accuracy of 83.1%, which is even higher than the results of those predicted secondary structure-based methods

    Prediction of protein structural classes for low-homology sequences based on predicted secondary structure

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Prediction of protein structural classes (<it>α</it>, <it>β</it>, <it>α </it>+ <it>β </it>and <it>α</it>/<it>β</it>) from amino acid sequences is of great importance, as it is beneficial to study protein function, regulation and interactions. Many methods have been developed for high-homology protein sequences, and the prediction accuracies can achieve up to 90%. However, for low-homology sequences whose average pairwise sequence identity lies between 20% and 40%, they perform relatively poorly, yielding the prediction accuracy often below 60%.</p> <p>Results</p> <p>We propose a new method to predict protein structural classes on the basis of features extracted from the predicted secondary structures of proteins rather than directly from their amino acid sequences. It first uses PSIPRED to predict the secondary structure for each protein sequence. Then, the <it>chaos game representation </it>is employed to represent the predicted secondary structure as two time series, from which we generate a comprehensive set of 24 features using <it>recurrence quantification analysis</it>, <it>K-string based information entropy </it>and <it>segment-based analysis</it>. The resulting feature vectors are finally fed into a simple yet powerful Fisher's discriminant algorithm for the prediction of protein structural classes. We tested the proposed method on three benchmark datasets in low homology and achieved the overall prediction accuracies of 82.9%, 83.1% and 81.3%, respectively. Comparisons with ten existing methods showed that our method consistently performs better for all the tested datasets and the overall accuracy improvements range from 2.3% to 27.5%. A web server that implements the proposed method is freely available at <url>http://www1.spms.ntu.edu.sg/~chenxin/RKS_PPSC/</url>.</p> <p>Conclusion</p> <p>The high prediction accuracy achieved by our proposed method is attributed to the design of a comprehensive feature set on the predicted secondary structure sequences, which is capable of characterizing the sequence order information, local interactions of the secondary structural elements, and spacial arrangements of <it>α </it>helices and <it>β </it>strands. Thus, it is a valuable method to predict protein structural classes particularly for low-homology amino acid sequences.</p

    Comparative analysis of essential collective dynamics and NMR-derived flexibility profiles in evolutionarily diverse prion proteins

    No full text
    Collective motions on ns-µs time scales are known to have a major impact on protein folding, stability, binding and enzymatic efficiency. It is also believed that these motions may have an important role in the early stages of prion protein misfolding and prion disease. In an effort to accurately characterize these motions and their potential influence on the misfolding and prion disease transmissibility we have conducted a combined analysis of molecular dynamic simulations and NMR-derived flexibility measurements over a diverse range of prion proteins. Using a recently developed numerical formalism, we have analyzed the essential collective dynamics (ECD) for prion proteins from eight different species including human, cow, elk, cat, hamster, chicken, turtle and frog. We also compared the numerical results with flexibility profiles generated by the random coil index (RCI) from NMR chemical shifts. Prion protein backbone flexibility derived from experimental NMR data and from theoretical computations show strong agreement with each other, demonstrating that it is possible to predict the observed RCI profiles employing the numerical ECD formalism. Interestingly, flexibility differences in the loop between second b strand (S2) and the second a helix (HB) appear to distinguish prion proteins from species that are susceptible to prion disease and those that are resistant. Our results show that the different levels of flexibility in the S2-HB loop in various species are predictable via the ECD method, indicating that ECD may be used to identify disease resistant variants of prion proteins, as well as the influence of prion proteins mutations on disease susceptibility or misfolding propensity
    corecore