9 research outputs found
A Model-Based Bayesian Estimation of the Rate of Evolution of VNTR Loci in Mycobacterium tuberculosis
Variable numbers of tandem repeats (VNTR) typing is widely used for studying the bacterial cause of tuberculosis. Knowledge of the rate of mutation of VNTR loci facilitates the study of the evolution and epidemiology of Mycobacterium tuberculosis. Previous studies have applied population genetic models to estimate the mutation rate, leading to estimates varying widely from around to per locus per year. Resolving this issue using more detailed models and statistical methods would lead to improved inference in the molecular epidemiology of tuberculosis. Here, we use a model-based approach that incorporates two alternative forms of a stepwise mutation process for VNTR evolution within an epidemiological model of disease transmission. Using this model in a Bayesian framework we estimate the mutation rate of VNTR in M. tuberculosis from four published data sets of VNTR profiles from Albania, Iran, Morocco and Venezuela. In the first variant, the mutation rate increases linearly with respect to repeat numbers (linear model); in the second, the mutation rate is constant across repeat numbers (constant model). We find that under the constant model, the mean mutation rate per locus is (95% CI: ,)and under the linear model, the mean mutation rate per locus per repeat unit is (95% CI: ,). These new estimates represent a high rate of mutation at VNTR loci compared to previous estimates. To compare the two models we use posterior predictive checks to ascertain which of the two models is better able to reproduce the observed data. From this procedure we find that the linear model performs better than the constant model. The general framework we use allows the possibility of extending the analysis to more complex models in the future
Summary of data sets analysed in this study.
*<p>per 100,000 per year. Data from <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002573#pcbi.1002573-World2" target="_blank">[57]</a>.</p
Marginal posterior distributions for and using simulated data.
<p>Plots show the marginal posterior distribution of (left) and (right) using four simulated data sets generated from the constant (left) and linear (right) VNTR models. The known values of and used to generate the data, and , are indicated by vertical lines.</p
Further posterior predictive model checks.
<p>Scatterplots of the posterior predictive distributions of (the maximum range of repeat numbers over loci) versus (the intercept at one repeat) under the linear model, for each observed dataset. The indicates the statistics derived from the observed dataset.</p
Marginal posterior estimates for , and .
<p>Here is the per-locus mutation rate for a locus with a single repeat under the linear model; is the same quantity scaled by the mean number of repeats observed in the sample; is the per-locus mutation rate for any repeat number under the constant model.</p
Transition rates in the stochastic model.
*<p>If an existing genotype is re-created by mutation, the count of that genotype is incremented instead. Note that the increment occurs before the assignment .</p
Bayesian posterior estimates for mutation rate.
<p>Bayesian posterior estimates for mutation rate.</p
Posterior predictive model checks.
<p>Scatterplots of the posterior predictive distributions of (the difference between maximum and minimum range of repeat numbers over loci), versus (the same quantity substituting variance for range). Columns represent constant (left) and linear (right) models. Rows represent the Albanian dataset (top), artificially generated data from the constant model (middle) and artificially generated data from the linear model (bottom). The indicates the statistics derived from the observed dataset.</p