Search CORE

37 research outputs found

The unfolded site frequency spectrum from 25 Danish indivuduals.

Author: Anders Albrechtsen (149023)
Jun Wang (5906)
Rasmus Nielsen (16017)
Thorfinn Korneliussen (149014)
Yingrui Li (149030)
Publication venue
Publication date
Field of study

The data were previously analyzed in Yi et al. 2010.</p

FigShare

The distribution of true (True) and estimated unfolded SFS using the Maximum Likelihood method (ML) presented in the paper, genotype calling based on choosing the genotype with highest posterior probability (GC), and using the Bayesian procedure described in the text (Bay) in a sample from 50 MB 10 diploid individuals, where 2% of all SNPs are variable in the population and follow a distribution of allele frequencies, p, proportional to 1/p. An error rate of 0.5% is assumed.

Author: Anders Albrechtsen (149023)
Jun Wang (5906)
Rasmus Nielsen (16017)
Thorfinn Korneliussen (149014)
Yingrui Li (149030)
Publication venue
Publication date
Field of study

The mean sequencing depths are 1X (a), 3X (b), 5X (c), and 10X (d). The values presented in the figure legend box are the estimates of the proportion of sites that are variable in the sample.</p

FigShare

The error rate of different genotype callers for different call rates.

Author: Anders Albrechtsen (149023)
Jun Wang (5906)
Rasmus Nielsen (16017)
Thorfinn Korneliussen (149014)
Yingrui Li (149030)
Publication venue
Publication date
Field of study

The SFS-method is the method described in the main text. The MAF method is based on first obtaining a maximum likelihood estimate of the allele frequency, and then use the estimated allele frequency to define priors for genotype calling. The GC-max method is based on calling genotypes with highest posterior probability. The GC-ratio method is based on calling genotypes depending on the ratio of the likelihood for the most likely to second most likely genotype. The jagged behavior of some of the curves is a consequence of the discrete nature of the data, i.e. an individual contains a discrete number of copies of the minor allele. 10 individuals are simulated for 50,000 variable sites with a distribution of allele frequencies (p), proportional to 1/p with an error rate of 0.5%. Results for other error rates are shown in Figure S2.</p

FigShare

ROC curves for different SNP callers.

Author: Anders Albrechtsen (149023)
Jun Wang (5906)
Rasmus Nielsen (16017)
Thorfinn Korneliussen (149014)
Yingrui Li (149030)
Publication venue
Publication date
Field of study

Data for 10 individuals were simulated assuming a sequencing depth of 2 and a raw sequencing error rate of 1% (A) and (B) a depth of 5 and a raw sequencing error rate of 5%. The SFS method is the main method described in the text. The GC method is based on genotype calling using the genotype with the highest posterior probability. The LR method is based on a likelihood ratio test of the hypothesis that the allele frequency is zero. The SFS based method and the LR method have similar performance except for very high error rates, where the SFS tends to be somewhat better. Both methods in general perform much better than the GC method. The difference would even larger in larger panels of individuals. Simulations under other conditions can be found in Figure S1.</p

FigShare

Prediction error curves.

Author: Anders Albrechtsen (149023)
Claus Holst (37670)
Thomas A. Gerds (133253)
Thorkild I. A. Sørensen (37671)
Tune H. Pers (250824)
Publication venue
Publication date
Field of study

Performance of the three strategies and the null model. The gray lines represent the performances of the respective prediction model estimated in the 100 bootstrap cross-validation steps. The solid lines represent the mean bootstrap cross-validation performance and the dashed lines represent the apparent performance.</p

FigShare

Model evaluation.

Author: Anders Albrechtsen (149023)
Claus Holst (37670)
Thomas A. Gerds (133253)
Thorkild I. A. Sørensen (37671)
Tune H. Pers (250824)
Publication venue
Publication date
Field of study

Extracts from the R script used for evaluating the random forest model in the VAML Nugenob game. The elements of the list RfPredOob are obtained as described in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0006287#pone-0006287-g002" target="_blank">Figure 2</a>. The other two strategies are evaluated similarly.</p

FigShare

LASSO model.

Author: Anders Albrechtsen (149023)
Claus Holst (37670)
Thomas A. Gerds (133253)
Thorkild I. A. Sørensen (37671)
Tune H. Pers (250824)
Publication venue
Publication date
Field of study

Extracts from the R script that TAG used for building the LASSO model. The shrinkage parameter s is obtained as described in the text.</p

FigShare

Random forest model.

Author: Anders Albrechtsen (149023)
Claus Holst (37670)
Thomas A. Gerds (133253)
Thorkild I. A. Sørensen (37671)
Tune H. Pers (250824)
Publication venue
Publication date
Field of study

Extracts from the R script that THP used for building the random forest model. The number of trees (NT) and the number of variables tried at each split (MT) are obtained as described in the text.</p

FigShare

Game setup in R.

Author: Anders Albrechtsen (149023)
Claus Holst (37670)
Thomas A. Gerds (133253)
Thorkild I. A. Sørensen (37671)
Tune H. Pers (250824)
Publication venue
Publication date
Field of study

Extracts from the R script used for setting up the VAML Nugenob game.</p

FigShare

Results of the VAML Nugenob game.

Author: Anders Albrechtsen (149023)
Claus Holst (37670)
Thomas A. Gerds (133253)
Thorkild I. A. Sørensen (37671)
Tune H. Pers (250824)
Publication venue
Publication date
Field of study

Continuous rank probability scores for the three strategies and the null model that ignores all predictors. The bootstrap cross-validation error is based on 100 bootstrap subsamples of size 80 drawn without replacement from the 99 subjects.</p

FigShare