Search CORE

20 research outputs found

TM-score of the final (lowest energy) model against top-L long-range contact score.

Author: David T. Jones (59366)
Tomasz Kosciolek (538142)
Publication venue
Publication date
Field of study

The score is derived basing on the length of a protein, total number of predicted contacts and the fraction of satisfied predicted long-range (>23 residues) contacts. The Spearman correlation coefficient (ρ) is 0.77.</p

The Francis Crick Institute

Sample results of FRAGFOLD without contacts, contacts-only methodology and both statistical and contact potentials.

Author: David T. Jones (59366)
Tomasz Kosciolek (538142)
Publication venue
Publication date
Field of study

Below each structure its TM-score is given. 1hh8A is presented in the first column. It is a case where TM-score of no contacts structure is higher than FRAGFOLD with contacts potential (0.59 and 0.58, respectively). Targets 1bkrA (second column) and 1svyA (third) exhibit a progression of TM-score from FRAGFOLD utilizing only statistical potentials (top row), FRAGFOLD contacts-only (second row) folding and folding with both, statistical and contacts-derived potentials (third row). Such progression is expected and observed in most of cases throughout the test set.</p

The Francis Crick Institute

Folding with and without the use of predicted contacts.

Author: David T. Jones (59366)
Tomasz Kosciolek (538142)
Publication venue
Publication date
Field of study

A. TM-scores obtained for best top-5 predictions (on the basis of calculated final energy) without (no contacts) and with residue-residue contact (RRCON) term are compared (combined all and sequential contacts; explained in the text). Three results are significantly better (TM-score difference >0.05) without the use of contacts: 1hh8A, 1m4jA, 1m8aA; upper from the diagonal. B. Shows contact only best top-5 TM-scores in comparison to combined contacts FRAGFOLD results (best top-5 energy). C. Combined RRCON results compared to no contacts results assessed on the basis of best TM score in top-5 largest clusters. D. Combined RRCON TM-score against contacts-only approach TM-score (best top-5 clusters). Diagonal lines indicate identical results. Vertical dashed lines indicate correct prediction boundary (TM-score ≥0.5). The area below the diagonal and right of the dashed line encompasses all correct predictions. Targets are grouped by fold: green squares – α-proteins, red triangles – β-proteins, diamonds – α+β and α/β proteins. Overall, 100 targets out of 150 were correctly predicted.</p

The Francis Crick Institute

Model quality assessment on the basis of a combined score (CS) derived from long-range contact satisfaction score and mean inter-residues TM-score in an ensemble of models.

Author: David T. Jones (59366)
Tomasz Kosciolek (538142)
Publication venue
Publication date
Field of study

Data concerns the full 150 protein dataset.</p

The Francis Crick Institute

Number of sequences in Pfam version 26 in comparison to the growth since version 25.

Author: David T. Jones (59366)
Tomasz Kosciolek (538142)
Publication venue
Publication date
Field of study

Upper line (red) indicates emerging new families not present in version 25, lower points (black) indicate a stable growth of the families in size. Not all data is shown. A. Region of up to 500 sequences, below the capabilities of most contact prediction methods. B. Region up to 40,000 sequences. Some families decrease their size (negative value on the ordinate axis), what might be attributed to redefinition of some families. Number of sequences range up to over 288,000 sequences (COX1 cytochrome c oxidase family), but with low density.</p

The Francis Crick Institute

Improvements in predictions.

Author: David T. Jones (59366)
Tomasz Kosciolek (538142)
Publication venue
Publication date
Field of study

Comparison between fractions of correctly predicted models (TM-score ≥0.5 or 0.4 when noted) among best, best top-5 and best top-5≥0.4 TM-scores. Best top-5 results are analyzed as 2 groups: derived on the basis of calculated final energy (energy) and on the basis of cluster size (clustering). Results without the use of residue-residue contacts, only with the use of residue-residue contacts, or with: all predicted contacts included for the whole duration of simulation (all), contacts sequentially included as the simulation proceeds (sequential) and combined results taking advantage of both approaches are compared. Best results are predictions with highest TM-score from the whole generated ensemble, best top-5: the highest TM-score value from 5 lowest energy models (or 5 largest clusters) in an ensemble.</p

The Francis Crick Institute

Sampling and contact-related problems.

Author: David T. Jones (59366)
Tomasz Kosciolek (538142)
Publication venue
Publication date
Field of study

Only cases where clear allocation to one of these two cases can be made are shown.*obtained from SCOP database <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0092197#pone.0092197-LoConte1" target="_blank">[35]</a> and verified in CATH <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0092197#pone.0092197-Cuff1" target="_blank">[36]</a>.#supplying real contacts extracted from PDB does not ensure a correct prediction.##PDB contacts enable correct prediction; the shortage or incorrectness of contact information results in poor prediction.SCOP: β, CATH: α+β.</p

The Francis Crick Institute

Growth of Pfam holdings from version 20.

Author: David T. Jones (59366)
Tomasz Kosciolek (538142)
Publication venue
Publication date
Field of study

A. plot of the increase of median family size and B. percentage of Pfam with families of size above sequence length thresholds: 250, 500, 1000 and 2000 residues. In all cases an exponential growth may be observed. Currently (version 26) median family size is 248 and 34% of families hold more than 500 sequences.</p

The Francis Crick Institute

Accuracy of predictions basing on the total inter-residue TM-score and long-range contact score.

Author: David T. Jones (59366)
Tomasz Kosciolek (538142)
Publication venue
Publication date
Field of study

ROC curves are plotted at different TM-score cut-offs. TPR – true positive rate, FPR – false positive rate. Diagonal dashed line indicates random prediction boundary.</p

The Francis Crick Institute

TM-score of the final (lowest energy) model against mean pair-wise TM-score within the model's ensemble.

Author: David T. Jones (59366)
Tomasz Kosciolek (538142)
Publication venue
Publication date
Field of study

Good correlation (Spearman's ρ = 0.73) emerges from the results. Inter-residue TM-score >0.26 is likely to produce a model with TM-score >0.5.</p

The Francis Crick Institute