72 research outputs found

    Protein Structure Prediction: Knowledge-based Approaches for Loop Prediction and Model Quality Assessment

    Get PDF
    Knowledge of the three-dimensional structure of proteins is of vital importance for understanding their function and for the rational development of new drugs. Homology modelling is currently the most successful method for the prediction of the structure of a protein from its sequence. A structural model is thereby built by incorporating information from experimentally solved proteins showing an evolutionary relationship to the target protein. The accurate prediction of loop regions which frequently contribute to the functional specificity of proteins as well as the assessment of the quality of the models are major determinants of the applicability of the generated models in order to answer biological questions. The modelling pipeline established in the course of this work is able to produce very accurate models as shown in a recent community-wide blind test experiment: From 18 processed protein structure prediction test cases, 3 very good models have been submitted (rank 2, 4 and 6 of over 130 participating groups) and the vast majority of the remaining models was above the community average. The loop modelling routine relies on a comprehensive database of fragments extracted from known protein structures. After the selection of fragments from the database, a variety of filters are applied in order to reduce the number of fragments. In contrast to other knowledge-based loop prediction methods described in the literature, which mostly perform a ranking based on the geometrical fit of the fragments to the anchor groups in the protein, the present method ranks the remaining candidates with an all-atom statistical potential scoring function which investigates the compatibility of the loop including side chains with its structural environment. On a large test set of over 200 loops, the loop prediction method is able to model loops with median root mean square deviation per loop length below 1 angstrom for loops up to a length of 7 residues if all fragments, originating from proteins sharing more than 50% sequence identity to the proteins of the test set, are excluded. On the same data basis, the present method outperforms 3 out of 4 commercial loop modelling programs tested in this work. Furthermore, a composite scoring function consisting of 3 statistical potential terms covering the major aspects of protein stability and two additional terms describing the agreement between prediction features of the sequence and calculated characteristics of the model is presented. The scoring function performs significantly better than five well-established methods in the discrimination of good from bad models based on a comprehensive test set of 22,420 models and represents a valuable tool for the assessment of the quality of protein models

    QMEAN server for protein model quality estimation

    Get PDF
    Model quality estimation is an essential component of protein structure prediction, since ultimately the accuracy of a model determines its usefulness for specific applications. Usually, in the course of protein structure prediction a set of alternative models is produced, from which subsequently the most accurate model has to be selected. The QMEAN server provides access to two scoring functions successfully tested at the eighth round of the community-wide blind test experiment CASP. The user can choose between the composite scoring function QMEAN, which derives a quality estimate on the basis of the geometrical analysis of single models, and the clustering-based scoring function QMEANclust which calculates a global and local quality estimate based on a weighted all-against-all comparison of the models from the ensemble provided by the user. The web server performs a ranking of the input models and highlights potentially problematic regions for each model. The QMEAN server is available at http://swissmodel.expasy.org/qmea

    Toward the estimation of the absolute quality of individual protein structure models

    Get PDF
    Motivation: Quality assessment of protein structures is an important part of experimental structure validation and plays a crucial role in protein structure prediction, where the predicted models may contain substantial errors. Most current scoring functions are primarily designed to rank alternative models of the same sequence supporting model selection, whereas the prediction of the absolute quality of an individual protein model has received little attention in the field. However, reliable absolute quality estimates are crucial to assess the suitability of a model for specific biomedical applications

    Toward the estimation of the absolute quality of individual protein structure models

    Get PDF
    Motivation: Quality assessment of protein structures is an important part of experimental structure validation and plays a crucial role in protein structure prediction, where the predicted models may contain substantial errors. Most current scoring functions are primarily designed to rank alternative models of the same sequence supporting model selection, whereas the prediction of the absolute quality of an individual protein model has received little attention in the field. However, reliable absolute quality estimates are crucial to assess the suitability of a model for specific biomedical applications. Results: In this work, we present a new absolute measure for the quality of protein models, which provides an estimate of the ‘degree of nativeness' of the structural features observed in a model and describes the likelihood that a given model is of comparable quality to experimental structures. Model quality estimates based on the QMEAN scoring function were normalized with respect to the number of interactions. The resulting scoring function is independent of the size of the protein and may therefore be used to assess both monomers and entire oligomeric assemblies. Model quality scores for individual models are then expressed as ‘Z-scores' in comparison to scores obtained for high-resolution crystal structures. We demonstrate the ability of the newly introduced QMEAN Z-score to detect experimentally solved protein structures containing significant errors, as well as to evaluate theoretical protein models. In a comprehensive QMEAN Z-score analysis of all experimental structures in the PDB, membrane proteins accumulate on one side of the score spectrum and thermostable proteins on the other. Proteins from the thermophilic organism Thermatoga maritima received significantly higher QMEAN Z-scores in a pairwise comparison with their homologous mesophilic counterparts, underlining the significance of the QMEAN Z-score as an estimate of protein stability. Availability: The Z-score calculation has been integrated in the QMEAN server available at: http://swissmodel.expasy.org/qmean. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics onlin

    QMEANclust: estimation of protein model quality by combining a composite scoring function with structural density information

    Get PDF
    ABSTRACT: BACKGROUND: The selection of the most accurate protein model from a set of alternatives is a crucial step in protein structure prediction both in template-based and ab initio approaches. Scoring functions have been developed which can either return a quality estimate for a single model or derive a score from the information contained in the ensemble of models for a given sequence. Local structural features occurring more frequently in the ensemble have a greater probability of being correct. Within the context of the CASP experiment, these so called consensus methods have been shown to perform considerably better in selecting good candidate models, but tend to fail if the best models are far from the dominant structural cluster. In this paper we show that model selection can be improved if both approaches are combined by pre-filtering the models used during the calculation of the structural consensus. RESULTS: Our recently published QMEAN composite scoring function has been improved by including an all-atom interaction potential term. The preliminary model ranking based on the new QMEAN score is used to select a subset of reliable models against which the structural consensus score is calculated. This scoring function called QMEANclust achieves a correlation coefficient of predicted quality score and GDT_TS of 0.9 averaged over the 98 CASP7 targets and perform significantly better in selecting good models from the ensemble of server models than any other groups participating in the quality estimation category of CASP7. Both scoring functions are also benchmarked on the MOULDER test set consisting of 20 target proteins each with 300 alternatives models generated by MODELLER. QMEAN outperforms all other tested scoring functions operating on individual models, while the consensus method QMEANclust only works properly on decoy sets containing a certain fraction of near-native conformations. We also present a local version of QMEAN for the per-residue estimation of model quality (QMEANlocal) and compare it to a new local consensus-based approach. CONCLUSION: Improved model selection is obtained by using a composite scoring function operating on single models in order to enrich higher quality models which are subsequently used to calculate the structural consensus. The performance of consensus-based methods such as QMEANclust highly depends on the composition and quality of the model ensemble to be analysed. Therefore, performance estimates for consensus methods based on large meta-datasets (e.g. CASP) might overrate their applicability in more realistic modelling situations with smaller sets of models based on individual methods

    Improving your target-template alignment with MODalign

    Get PDF
    Summary: MODalign is an interactive web-based tool aimed at helping protein structure modelers to inspect and manually modify the alignment between the sequences of a target protein and of its template(s). It interactively computes, displays and, upon modification of the target-template alignment, updates the multiple sequence alignments of the two protein families, their conservation score, secondary structure and solvent accessibility values, and local quality scores of the implied three-dimensional model(s). Although it has been designed to simplify the target-template alignment step in modeling, it is suitable for all cases where a sequence alignment needs to be inspected in the context of other biological information. Availability and implementation: Freely available on the web at http://modorama.biocomputing.it/modalign. Website implemented in HTML and JavaScript with all major browsers supported. Contact: [email protected]

    QMEAN server for protein model quality estimation

    Get PDF
    Model quality estimation is an essential component of protein structure prediction, since ultimately the accuracy of a model determines its usefulness for specific applications. Usually, in the course of protein structure prediction a set of alternative models is produced, from which subsequently the most accurate model has to be selected. The QMEAN server provides access to two scoring functions successfully tested at the eighth round of the community-wide blind test experiment CASP. The user can choose between the composite scoring function QMEAN, which derives a quality estimate on the basis of the geometrical analysis of single models, and the clustering-based scoring function QMEANclust which calculates a global and local quality estimate based on a weighted all-against-all comparison of the models from the ensemble provided by the user. The web server performs a ranking of the input models and highlights potentially problematic regions for each model. The QMEAN server is available at http://swissmodel.expasy.org/qmean

    Accurate classification of secondary progression in multiple sclerosis using a decision tree

    Get PDF
    BACKGROUND: The absence of reliable imaging or biological markers of phenotype transition in multiple sclerosis (MS) makes assignment of current phenotype status difficult. OBJECTIVE: The authors sought to determine whether clinical information can be used to accurately assign current disease phenotypes. METHODS: Data from the clinical visits of 14,387 MS patients in Sweden were collected. Classifying algorithms based on several demographic and clinical factors were examined. Results obtained from the best classifier when predicting neurologist recorded disease classification were replicated in an independent cohort from British Columbia and were compared to a previously published algorithm and clinical judgment of three neurologists. RESULTS: A decision tree (the classifier) containing only most recently available expanded disability scale status score and age obtained 89.3% (95% confidence intervals (CIs): 88.8-89.8) classification accuracy, defined as concordance with the latest reported status. Validation in the independent cohort resulted in 82.0% (95% CI: 81.0-83.1) accuracy. A previously published classification algorithm with slight modifications achieved 77.8% (95% CI: 77.1-78.4) accuracy. With complete patient history of 100 patients, three neurologists obtained 84.3% accuracy compared with 85% for the classifier using the same data. CONCLUSION: The classifier can be used to standardize definitions of disease phenotype across different cohorts. Clinically, this model could assist neurologists by providing additional information

    Harmonizing Definitions for Progression Independent of Relapse Activity in Multiple Sclerosis: A Systematic Review

    Get PDF
    IMPORTANCE: Emerging evidence suggests that progression independent of relapse activity (PIRA) is a substantial contributor to long-term disability accumulation in relapsing-remitting multiple sclerosis (RRMS). To date, there is no uniform agreed-upon definition of PIRA, limiting the comparability of published studies. OBJECTIVE: To summarize the current evidence about PIRA based on a systematic review, to discuss the various terminologies used in the context of PIRA, and to propose a harmonized definition for PIRA for use in clinical practice and future trials. EVIDENCE REVIEW: A literature search was conducted using the search terms multiple sclerosis, PIRA, progression independent of relapse activity, silent progression, and progression unrelated to relapses in PubMed, Embase, Cochrane, and Web of Science, published between January 1990 and December 2022. FINDINGS: Of 119 identified single records, 48 eligible studies were analyzed. PIRA was reported to occur in roughly 5% of all patients with RRMS per annum, causing at least 50% of all disability accrual events in typical RRMS. The proportion of PIRA vs relapse-associated worsening increased with age, longer disease duration, and, despite lower absolute event numbers, potent suppression of relapses by highly effective disease-modifying therapy. However, different studies used various definitions of PIRA, rendering the comparability of studies difficult. CONCLUSION AND RELEVANCE: PIRA is the most frequent manifestation of disability accumulation across the full spectrum of traditional MS phenotypes, including clinically isolated syndrome and early RRMS. The harmonized definition suggested here may improve the comparability of results in current and future cohorts and data sets
    corecore