20 research outputs found
TM-score of the final (lowest energy) model against top-L long-range contact score.
<p>The score is derived basing on the length of a protein, total number of predicted contacts and the fraction of satisfied predicted long-range (>23 residues) contacts. The Spearman correlation coefficient (ρ) is 0.77.</p
Sample results of FRAGFOLD without contacts, contacts-only methodology and both statistical and contact potentials.
<p>Below each structure its TM-score is given. 1hh8A is presented in the first column. It is a case where TM-score of no contacts structure is higher than FRAGFOLD with contacts potential (0.59 and 0.58, respectively). Targets 1bkrA (second column) and 1svyA (third) exhibit a progression of TM-score from FRAGFOLD utilizing only statistical potentials (top row), FRAGFOLD contacts-only (second row) folding and folding with both, statistical and contacts-derived potentials (third row). Such progression is expected and observed in most of cases throughout the test set.</p
Folding with and without the use of predicted contacts.
<p><b>A</b>. TM-scores obtained for best top-5 predictions (on the basis of calculated final energy) without (no contacts) and with residue-residue contact (RRCON) term are compared (combined <i>all</i> and <i>sequential</i> contacts; explained in the text). Three results are significantly better (TM-score difference >0.05) without the use of contacts: 1hh8A, 1m4jA, 1m8aA; upper from the diagonal. <b>B</b>. Shows contact only best top-5 TM-scores in comparison to <i>combined</i> contacts FRAGFOLD results (best top-5 energy). <b>C</b>. <i>Combined</i> RRCON results compared to no contacts results assessed on the basis of best TM score in top-5 largest clusters. <b>D</b>. <i>Combined</i> RRCON TM-score against contacts-only approach TM-score (best top-5 clusters). Diagonal lines indicate identical results. Vertical dashed lines indicate correct prediction boundary (TM-score ≥0.5). The area below the diagonal and right of the dashed line encompasses all correct predictions. Targets are grouped by fold: green squares – α-proteins, red triangles – β-proteins, diamonds – α+β and α/β proteins. Overall, 100 targets out of 150 were correctly predicted.</p
Model quality assessment on the basis of a combined score (<i>CS</i>) derived from long-range contact satisfaction score and mean inter-residues TM-score in an ensemble of models.
<p>Data concerns the full 150 protein dataset.</p
Number of sequences in Pfam version 26 in comparison to the growth since version 25.
<p>Upper line (red) indicates emerging new families not present in version 25, lower points (black) indicate a stable growth of the families in size. Not all data is shown. <b>A</b>. Region of up to 500 sequences, below the capabilities of most contact prediction methods. <b>B</b>. Region up to 40,000 sequences. Some families decrease their size (negative value on the ordinate axis), what might be attributed to redefinition of some families. Number of sequences range up to over 288,000 sequences (COX1 cytochrome c oxidase family), but with low density.</p
Improvements in predictions.
<p>Comparison between fractions of correctly predicted models (TM-score ≥0.5 or 0.4 when noted) among best, best top-5 and best top-5≥0.4 TM-scores. Best top-5 results are analyzed as 2 groups: derived on the basis of calculated final energy (<i>energy</i>) and on the basis of cluster size (<i>clustering</i>). Results without the use of residue-residue contacts, only with the use of residue-residue contacts, or with: all predicted contacts included for the whole duration of simulation (<i>all</i>), contacts sequentially included as the simulation proceeds (<i>sequential</i>) and combined results taking advantage of both approaches are compared. Best results are predictions with highest TM-score from the whole generated ensemble, best top-5: the highest TM-score value from 5 lowest energy models (or 5 largest clusters) in an ensemble.</p
Sampling and contact-related problems.
<p>Only cases where clear allocation to one of these two cases can be made are shown.</p><p>*obtained from SCOP database <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0092197#pone.0092197-LoConte1" target="_blank">[35]</a> and verified in CATH <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0092197#pone.0092197-Cuff1" target="_blank">[36]</a>.</p>#<p>supplying real contacts extracted from PDB does not ensure a correct prediction.</p>##<p>PDB contacts enable correct prediction; the shortage or incorrectness of contact information results in poor prediction.</p><p>SCOP: β, CATH: α+β.</p
Growth of Pfam holdings from version 20.
<p><b>A</b>. plot of the increase of median family size and <b>B</b>. percentage of Pfam with families of size above sequence length thresholds: 250, 500, 1000 and 2000 residues. In all cases an exponential growth may be observed. Currently (version 26) median family size is 248 and 34% of families hold more than 500 sequences.</p
Accuracy of predictions basing on the total inter-residue TM-score and long-range contact score.
<p>ROC curves are plotted at different TM-score cut-offs. TPR – true positive rate, FPR – false positive rate. Diagonal dashed line indicates random prediction boundary.</p
TM-score of the final (lowest energy) model against mean pair-wise TM-score within the model's ensemble.
<p>Good correlation (Spearman's ρ = 0.73) emerges from the results. Inter-residue TM-score >0.26 is likely to produce a model with TM-score >0.5.</p