Search CORE

15 research outputs found

Sensitivity and specificity for random forest tests applied to peptide-MHC binding scores for vaccine classification of Benchmark dataset.

Author: John T. Ellis (114797)
Paul J. Kennedy (114795)
Stephen J. Goodswen (114793)
Publication venue
Publication date
Field of study

Abbreviations: (R) = target variable e.g. 1 or 0 in training data randomly changed for each protein, HE = hold-out dataset error (%) i.e. error when predicting 30% of training data, OE = overall error (%) i.e. percentage of incorrect predictions, SN = sensitivity (%) = true positives/(true positives+false negatives), SP = specificity (%) = true negatives/(true negatives+false positives).aCross-validation involved a random sample of 70% from training dataset to build predictive model and remaining 30% used for testing. This was repeated 10 times and predictions averaged (predictions for the same input data fluctuate unless a random seed is set initially).bBenchmark are proteins from published studies with known or expected T-cell responses (source species: T. gondii) –100% from training data used to build predictive model.Note: Number of input variables used to build predictive model = 304 (i.e. number of allele-peptide length combinations derived from 76 common alleles).Sensitivity and specificity for random forest tests applied to peptide-MHC binding scores for vaccine classification of Benchmark dataset.</p

FigShare

Schematic representation of gene prediction evaluation at the nucleotide level.

Author: John T. Ellis (114797)
Paul J. Kennedy (114795)
Stephen J. Goodswen (114793)
Publication venue
Publication date
Field of study

Abbreviations: C = coding nucleotide located on exon, N = non-coding nucleotide located on intron, TP = true positive, FP = false positive, TN = true negative, and FN = false negative.</p

FigShare

Example of rule-based approach applied to highest affinity peptide on each test protein.

Author: John T. Ellis (114797)
Paul J. Kennedy (114795)
Stephen J. Goodswen (114793)
Publication venue
Publication date
Field of study

Proteins are listed in ascending order based on the lowest IC50 (nM) binding affinity score. A threshold value e.g. 1.5 is applied to the score to segregate the list into two classifications. Below the threshold is ‘YES’ for vaccine candidacy and above is ‘NO’. The rule-based classification is compared with the expected classification to determine performance accuracy. Threshold value is derived from a trial-and-error approach with the intention to classify the greatest number of true positives and negatives.</p

FigShare

Plot of conservation scores computed for binding peptides along a protein (UniProtKB ID: P13664).

Author: John T. Ellis (114797)
Paul J. Kennedy (114795)
Stephen J. Goodswen (114793)
Publication venue
Publication date
Field of study

Each circle represents the amino acid conservation score computed at a sliding window. The window is of length 9 and slides one residue at a time. The colour of the circle represents binding affinities against 76 common MHC alleles computed at each window. A window (i.e. a peptide) can theoretically bind to all 76 alleles and colours are therefore plotted in a set order: no, low, intermediate, and high affinity. For example, a dark blue circle for low affinity indicates there are no intermediate or high affinity peptides at the window; however, a green circle for high affinity provides no indication of other affinities at the same window. Mean conservation = 0.7805; median conservation = 0.7946. For protein P13664 (Major surface antigen p30) 54.6% high, 56% intermediate, and 55.9% low binders have conservation scores below the mean. The study shows that vaccine candidates are significantly more likely to have either a greater number of less conserved peptides or a lower total conservation score than non-vaccine candidates.</p

FigShare

Schematic representation of gene prediction evaluation at the exon level.

Author: John T. Ellis (114797)
Paul J. Kennedy (114795)
Stephen J. Goodswen (114793)
Publication venue
Publication date
Field of study

Exons are represented by shaded rectangles. Introns are represented by the adjoining solid lines. Abbreviations: TP = true positive, FP = false positive, and FN = false negative.</p

FigShare

Comparison of test genes not identified by gene finders.

Author: John T. Ellis (114797)
Paul J. Kennedy (114795)
Stephen J. Goodswen (114793)
Publication venue
Publication date
Field of study

++Number of groups of test genes not found in which the test genes are located consecutively along the chromosome.

The highest number of test genes in a consecutive group.</p

FigShare

Number of matching predicted genes with 299 test genes using BLASTN (with 250, 500, and 1000 training genes).

Author: John T. Ellis (114797)
Paul J. Kennedy (114795)
Stephen J. Goodswen (114793)
Publication venue
Publication date
Field of study

Abbreviations:gl = GlimmerHMM; aug = AUGUSTUS.N/A = not applicable – the AUGUSTUS training program does not give the option to control the number of bases that precede and follow the coding segment (CDS) sequence of the training genes.

Number of predicted genes that align entirely or partly with the test genes and meet the criteria E-value = 0 and 100% coverage – a value in brackets is the number of predicted genes that are exactly the same as the test genes i.e. each exon genomic coordinate is the same.++Number of predicted genes that align to the same test gene i.e. the predicted gene is only a part of the entire test gene and there can be one or more predictions per test gene.The values underlined indicate the highest number of matches for each gene finder.</p

FigShare