37 research outputs found

    The error rate of different genotype callers for different call rates.

    No full text
    <p>The SFS-method is the method described in the main text. The MAF method is based on first obtaining a maximum likelihood estimate of the allele frequency, and then use the estimated allele frequency to define priors for genotype calling. The GC-max method is based on calling genotypes with highest posterior probability. The GC-ratio method is based on calling genotypes depending on the ratio of the likelihood for the most likely to second most likely genotype. The jagged behavior of some of the curves is a consequence of the discrete nature of the data, i.e. an individual contains a discrete number of copies of the minor allele. 10 individuals are simulated for 50,000 variable sites with a distribution of allele frequencies (<i>p</i>), proportional to 1/<i>p</i> with an error rate of 0.5%. Results for other error rates are shown in Figure S2.</p

    ROC curves for different SNP callers.

    No full text
    <p>Data for 10 individuals were simulated assuming a sequencing depth of 2 and a raw sequencing error rate of 1% (A) and (B) a depth of 5 and a raw sequencing error rate of 5%. The SFS method is the main method described in the text. The GC method is based on genotype calling using the genotype with the highest posterior probability. The LR method is based on a likelihood ratio test of the hypothesis that the allele frequency is zero. The SFS based method and the LR method have similar performance except for very high error rates, where the SFS tends to be somewhat better. Both methods in general perform much better than the GC method. The difference would even larger in larger panels of individuals. Simulations under other conditions can be found in Figure S1.</p

    Prediction error curves.

    No full text
    <p>Performance of the three strategies and the null model. The gray lines represent the performances of the respective prediction model estimated in the 100 bootstrap cross-validation steps. The solid lines represent the mean bootstrap cross-validation performance and the dashed lines represent the apparent performance.</p

    Model evaluation.

    No full text
    <p>Extracts from the R script used for evaluating the random forest model in the VAML Nugenob game. The elements of the list RfPredOob are obtained as described in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0006287#pone-0006287-g002" target="_blank">Figure 2</a>. The other two strategies are evaluated similarly.</p

    LASSO model.

    No full text
    <p>Extracts from the R script that TAG used for building the LASSO model. The shrinkage parameter s is obtained as described in the text.</p

    Random forest model.

    No full text
    <p>Extracts from the R script that THP used for building the random forest model. The number of trees (NT) and the number of variables tried at each split (MT) are obtained as described in the text.</p

    Game setup in R.

    No full text
    <p>Extracts from the R script used for setting up the VAML Nugenob game.</p

    Results of the VAML Nugenob game.

    No full text
    <p>Continuous rank probability scores for the three strategies and the null model that ignores all predictors. The bootstrap cross-validation error is based on 100 bootstrap subsamples of size 80 drawn without replacement from the 99 subjects.</p
    corecore