15 research outputs found

    Sensitivity and specificity for random forest tests applied to peptide-MHC binding scores for vaccine classification of Benchmark dataset.

    No full text
    <p>Abbreviations: (R)  =  target variable e.g. 1 or 0 in training data randomly changed for each protein, HE  =  hold-out dataset error (%) i.e. error when predicting 30% of training data, OE  =  overall error (%) i.e. percentage of incorrect predictions, SN  =  sensitivity (%)  =  true positives/(true positives+false negatives), SP  =  specificity (%)  =  true negatives/(true negatives+false positives).</p>a<p>Cross-validation involved a random sample of 70% from training dataset to build predictive model and remaining 30% used for testing. This was repeated 10 times and predictions averaged (predictions for the same input data fluctuate unless a random seed is set initially).</p>b<p>Benchmark are proteins from published studies with known or expected T-cell responses (source species: <i>T. gondii</i>) –100% from training data used to build predictive model.</p><p>Note: Number of input variables used to build predictive model  = 304 (i.e. number of allele-peptide length combinations derived from 76 common alleles).</p><p>Sensitivity and specificity for random forest tests applied to peptide-MHC binding scores for vaccine classification of Benchmark dataset.</p

    Schematic representation of gene prediction evaluation at the nucleotide level.

    No full text
    <p>Abbreviations: C = coding nucleotide located on exon, N = non-coding nucleotide located on intron, TP = true positive, FP = false positive, TN = true negative, and FN = false negative.</p

    Example of rule-based approach applied to highest affinity peptide on each test protein.

    No full text
    <p>Proteins are listed in ascending order based on the lowest IC<sub>50</sub> (nM) binding affinity score. A threshold value e.g. 1.5 is applied to the score to segregate the list into two classifications. Below the threshold is ‘YES’ for vaccine candidacy and above is ‘NO’. The rule-based classification is compared with the expected classification to determine performance accuracy. Threshold value is derived from a trial-and-error approach with the intention to classify the greatest number of true positives and negatives.</p

    Plot of conservation scores computed for binding peptides along a protein (UniProtKB ID: P13664).

    No full text
    <p>Each circle represents the amino acid conservation score computed at a sliding window. The window is of length 9 and slides one residue at a time. The colour of the circle represents binding affinities against 76 common MHC alleles computed at each window. A window (i.e. a peptide) can theoretically bind to all 76 alleles and colours are therefore plotted in a set order: no, low, intermediate, and high affinity. For example, a dark blue circle for low affinity indicates there are no intermediate or high affinity peptides at the window; however, a green circle for high affinity provides no indication of other affinities at the same window. Mean conservation  = 0.7805; median conservation  = 0.7946. For protein P13664 (Major surface antigen p30) 54.6% high, 56% intermediate, and 55.9% low binders have conservation scores below the mean. The study shows that vaccine candidates are significantly more likely to have either a greater number of less conserved peptides or a lower total conservation score than non-vaccine candidates.</p

    Schematic representation of gene prediction evaluation at the exon level.

    No full text
    <p>Exons are represented by shaded rectangles. Introns are represented by the adjoining solid lines. Abbreviations: TP = true positive, FP = false positive, and FN = false negative.</p

    Comparison of test genes not identified by gene finders.

    No full text
    ++<p>Number of groups of test genes not found in which the test genes are located consecutively along the chromosome.</p><p>The highest number of test genes in a consecutive group.</p

    Number of matching predicted genes with 299 test genes using BLASTN (with 250, 500, and 1000 training genes).

    No full text
    <p>Abbreviations:</p><p>gl = GlimmerHMM; aug = AUGUSTUS.</p><p>N/A = not applicable – the AUGUSTUS training program does not give the option to control the number of bases that precede and follow the coding segment (CDS) sequence of the training genes.</p><p>Number of predicted genes that align entirely or partly with the test genes and meet the criteria E-value  =  0 and 100% coverage – a value in brackets is the number of predicted genes that are exactly the same as the test genes i.e. each exon genomic coordinate is the same.</p>++<p>Number of predicted genes that align to the same test gene i.e. the predicted gene is only a part of the entire test gene and there can be one or more predictions per test gene.</p><p>The values underlined indicate the highest number of matches for each gene finder.</p
    corecore