DTA under-represents uncertainty and lacks statistical efficiency.

Abstract

<p>To test the accuracy of the 95% credible intervals produced by (a) DTA, (b) MTT and (c) BASTA, we simulated and analysed 100 datasets under the two-population “Continental” model with even sampling of 100 individuals per subpopulation. We provided the true genealogy to BEAST2, as if it were estimated without error; in this scenario methods are expected to give the best accuracy. The migration rates between the subpopulations were simulated for each dataset from a prior distribution, and we compared the “true” ratio <i>f</i><sub>1,2</sub>/<i>f</i><sub>2,1</sub> (horizontal axis) to the point estimate (posterior median; vertical axis, points) and 95% credible interval (2.5 and 97.5 percentiles; error bars). The results show a weak correlation between the truth and the point estimates for DTA, compared to MTT and BASTA, indicating poor statistical efficiency. The percentage of datasets in which the 95% credible intervals contained the truth revealed that DTA was poorly calibrated compared to MTT, BASTA and the theoretical target of 95%. The mean migration rate was high (</p><p></p><p></p><p></p><p><mi>f</mi><mo>‾</mo></p><mo>=</mo><mn>5</mn><mo>.</mo><mn>0</mn><p></p><p></p><p></p>). The dashed line indicates the hypothetical optimal estimate. Number of MCMC steps for DTA, MTT and BASTA are respectively 10<sup>6</sup>, 2 × 10<sup>5</sup> and 10<sup>5</sup> so to achieve similar running times (respectively approximately 180, 200 and 150 seconds per replicate).<p></p

    Similar works

    Full text

    thumbnail-image