25 research outputs found
Large and finite sample properties of a maximum-likelihood estimator for multiplicity of infection
<div><p>Reliable measures of transmission intensities can be incorporated into metrics for monitoring disease-control interventions. Genetic (molecular) measures like multiplicity of infection (MOI) have several advantages compared with traditional measures, e.g., <i>R</i><sub>0</sub>. Here, we investigate the properties of a maximum-likelihood approach to estimate MOI and pathogen-lineage frequencies. By verifying regulatory conditions, we prove asymptotical unbiasedness, consistency and efficiency of the estimator. Finite sample properties concerning bias and variance are evaluated over a comprehensive parameter range by a systematic simulation study. Moreover, the estimator’s sensitivity to model violations is studied. The estimator performs well for realistic sample sizes and parameter ranges. In particular, the lineage-frequency estimates are almost unbiased independently of sample size. The MOI estimate’s bias vanishes with increasing sample size, but might be substantial if sample size is too small. The estimator’s variance matrix agrees well with the Cramér-Rao lower bound, even for small sample size. The numerical and analytical results of this study can be used for study design. This is exemplified by a malaria data set from Venezuela. It is shown how the results can be used to determine the necessary sample size to achieve certain performance goals. An implementation of the likelihood method and a simulation algorithm for study design, implemented as an R script, is available as S1 File alongside a documentation (S2 File) and example data (S3 File).</p></div
Measures of variation.
<p>(A)-(B) CV of in % (i.e. ×100) as a function of <i>ψ</i> for the conditional Poisson model. The dashed line is the respective prediction based on the Cramér-Rao lower bound. Almost identical pictures are obtained for the other models (conditional Poisson, shifted Poisson, conditional binomial and shifted binomial). Panels are for different <b><i>p</i></b> (with different lineage numbers <i>n</i>). (C)-(D) CV for lineage frequencies. Shown is the theoretical prediction which is almost indistinguishable from the curves obtained by simulation, for all models. Panels are for different <b><i>p</i></b> (with different lineage numbers <i>n</i>). (E)-(F) Average Euclidian distance of the MLE and the true parameter <b><i>p</i></b>. Shown are the curves obtained from the conditional Poisson model. Panels are for different <b><i>p</i></b> (with different lineage numbers <i>n</i>).</p
Results for Venezuela data.
<p>Shown are sample size <i>N</i>, MLEs for and , an estimate for the square root of the Cramér-Rao lower bound for <i>ψ</i>, the probabilities of obtaining an irregular data set derived from (<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0194148#pone.0194148.e021" target="_blank">5</a>) using the estimates and λ<sub><i>me</i></sub> for the actual sample size <i>N</i> (<i>q</i>), sample size <i>N</i> = 300 (<i>q</i><sub>300</sub>) and <i>N</i> = 400 (<i>q</i><sub>400</sub>), respectively. The estimate λ<sub><i>me</i></sub> is the median of the 8 estimates for which (regular data). In pairwise comparisons these eight estimates were not found to be significantly different at a 5% level based on pairwise likelihood-ratio tests provided in [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0194148#pone.0194148.ref043" target="_blank">43</a>].</p
Dependence on true model.
<p>Shown is the bias as a function of the true parameter <i>ψ</i>, for <i>n</i> = 4 and a balanced lineage-frequency distribution. The underlying true models are the shifted Poisson, shifted binomial, conditional binomial and uniform models (cf. Appendix F in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0194148#pone.0194148.s004" target="_blank">S4 File</a>) in panels (A), (B), (C), (D), respectively.</p
Table_1_A maximum-likelihood method to estimate haplotype frequencies and prevalence alongside multiplicity of infection from SNP data.XLSX
The introduction of genomic methods facilitated standardized molecular disease surveillance. For instance, SNP barcodes in Plasmodium vivax and Plasmodium falciparum malaria allows the characterization of haplotypes, their frequencies and prevalence to reveal temporal and spatial transmission patterns. A confounding factor is the presence of multiple genetically distinct pathogen variants within the same infection, known as multiplicity of infection (MOI). Disregarding ambiguous information, as usually done in ad-hoc approaches, leads to less confident and biased estimates. We introduce a statistical framework to obtain maximum-likelihood estimates (MLE) of haplotype frequencies and prevalence alongside MOI from malaria SNP data, i.e., multiple biallelic marker loci. The number of model parameters increases geometrically with the number of genetic markers considered and no closed-form solution exists for the MLE. Therefore, the MLE needs to be derived numerically. We use the Expectation-Maximization (EM) algorithm to derive the maximum-likelihood estimates, an efficient and easy-to-implement algorithm that yields a numerically stable solution. We also derive expressions for haplotype prevalence based on either all or just the unambiguous genetic information and compare both approaches. The latter corresponds to a biased ad-hoc estimate of prevalence. We assess the performance of our estimator by systematic numerical simulations assuming realistic sample sizes and various scenarios of transmission intensity. For reasonable sample sizes, and number of loci, the method has little bias. As an example, we apply the method to a dataset from Cameroon on sulfadoxine-pyrimethamine resistance in P. falciparum malaria. The method is not confined to malaria and can be applied to any infectious disease with similar transmission behavior. An easy-to-use implementation of the method as an R-script is provided.</p
Data_Sheet_5_A maximum-likelihood method to estimate haplotype frequencies and prevalence alongside multiplicity of infection from SNP data.zip
The introduction of genomic methods facilitated standardized molecular disease surveillance. For instance, SNP barcodes in Plasmodium vivax and Plasmodium falciparum malaria allows the characterization of haplotypes, their frequencies and prevalence to reveal temporal and spatial transmission patterns. A confounding factor is the presence of multiple genetically distinct pathogen variants within the same infection, known as multiplicity of infection (MOI). Disregarding ambiguous information, as usually done in ad-hoc approaches, leads to less confident and biased estimates. We introduce a statistical framework to obtain maximum-likelihood estimates (MLE) of haplotype frequencies and prevalence alongside MOI from malaria SNP data, i.e., multiple biallelic marker loci. The number of model parameters increases geometrically with the number of genetic markers considered and no closed-form solution exists for the MLE. Therefore, the MLE needs to be derived numerically. We use the Expectation-Maximization (EM) algorithm to derive the maximum-likelihood estimates, an efficient and easy-to-implement algorithm that yields a numerically stable solution. We also derive expressions for haplotype prevalence based on either all or just the unambiguous genetic information and compare both approaches. The latter corresponds to a biased ad-hoc estimate of prevalence. We assess the performance of our estimator by systematic numerical simulations assuming realistic sample sizes and various scenarios of transmission intensity. For reasonable sample sizes, and number of loci, the method has little bias. As an example, we apply the method to a dataset from Cameroon on sulfadoxine-pyrimethamine resistance in P. falciparum malaria. The method is not confined to malaria and can be applied to any infectious disease with similar transmission behavior. An easy-to-use implementation of the method as an R-script is provided.</p
Table_2_A maximum-likelihood method to estimate haplotype frequencies and prevalence alongside multiplicity of infection from SNP data.XLSX
The introduction of genomic methods facilitated standardized molecular disease surveillance. For instance, SNP barcodes in Plasmodium vivax and Plasmodium falciparum malaria allows the characterization of haplotypes, their frequencies and prevalence to reveal temporal and spatial transmission patterns. A confounding factor is the presence of multiple genetically distinct pathogen variants within the same infection, known as multiplicity of infection (MOI). Disregarding ambiguous information, as usually done in ad-hoc approaches, leads to less confident and biased estimates. We introduce a statistical framework to obtain maximum-likelihood estimates (MLE) of haplotype frequencies and prevalence alongside MOI from malaria SNP data, i.e., multiple biallelic marker loci. The number of model parameters increases geometrically with the number of genetic markers considered and no closed-form solution exists for the MLE. Therefore, the MLE needs to be derived numerically. We use the Expectation-Maximization (EM) algorithm to derive the maximum-likelihood estimates, an efficient and easy-to-implement algorithm that yields a numerically stable solution. We also derive expressions for haplotype prevalence based on either all or just the unambiguous genetic information and compare both approaches. The latter corresponds to a biased ad-hoc estimate of prevalence. We assess the performance of our estimator by systematic numerical simulations assuming realistic sample sizes and various scenarios of transmission intensity. For reasonable sample sizes, and number of loci, the method has little bias. As an example, we apply the method to a dataset from Cameroon on sulfadoxine-pyrimethamine resistance in P. falciparum malaria. The method is not confined to malaria and can be applied to any infectious disease with similar transmission behavior. An easy-to-use implementation of the method as an R-script is provided.</p
Image_1_A maximum-likelihood method to estimate haplotype frequencies and prevalence alongside multiplicity of infection from SNP data.tif
The introduction of genomic methods facilitated standardized molecular disease surveillance. For instance, SNP barcodes in Plasmodium vivax and Plasmodium falciparum malaria allows the characterization of haplotypes, their frequencies and prevalence to reveal temporal and spatial transmission patterns. A confounding factor is the presence of multiple genetically distinct pathogen variants within the same infection, known as multiplicity of infection (MOI). Disregarding ambiguous information, as usually done in ad-hoc approaches, leads to less confident and biased estimates. We introduce a statistical framework to obtain maximum-likelihood estimates (MLE) of haplotype frequencies and prevalence alongside MOI from malaria SNP data, i.e., multiple biallelic marker loci. The number of model parameters increases geometrically with the number of genetic markers considered and no closed-form solution exists for the MLE. Therefore, the MLE needs to be derived numerically. We use the Expectation-Maximization (EM) algorithm to derive the maximum-likelihood estimates, an efficient and easy-to-implement algorithm that yields a numerically stable solution. We also derive expressions for haplotype prevalence based on either all or just the unambiguous genetic information and compare both approaches. The latter corresponds to a biased ad-hoc estimate of prevalence. We assess the performance of our estimator by systematic numerical simulations assuming realistic sample sizes and various scenarios of transmission intensity. For reasonable sample sizes, and number of loci, the method has little bias. As an example, we apply the method to a dataset from Cameroon on sulfadoxine-pyrimethamine resistance in P. falciparum malaria. The method is not confined to malaria and can be applied to any infectious disease with similar transmission behavior. An easy-to-use implementation of the method as an R-script is provided.</p
Data_Sheet_3_A maximum-likelihood method to estimate haplotype frequencies and prevalence alongside multiplicity of infection from SNP data.zip
The introduction of genomic methods facilitated standardized molecular disease surveillance. For instance, SNP barcodes in Plasmodium vivax and Plasmodium falciparum malaria allows the characterization of haplotypes, their frequencies and prevalence to reveal temporal and spatial transmission patterns. A confounding factor is the presence of multiple genetically distinct pathogen variants within the same infection, known as multiplicity of infection (MOI). Disregarding ambiguous information, as usually done in ad-hoc approaches, leads to less confident and biased estimates. We introduce a statistical framework to obtain maximum-likelihood estimates (MLE) of haplotype frequencies and prevalence alongside MOI from malaria SNP data, i.e., multiple biallelic marker loci. The number of model parameters increases geometrically with the number of genetic markers considered and no closed-form solution exists for the MLE. Therefore, the MLE needs to be derived numerically. We use the Expectation-Maximization (EM) algorithm to derive the maximum-likelihood estimates, an efficient and easy-to-implement algorithm that yields a numerically stable solution. We also derive expressions for haplotype prevalence based on either all or just the unambiguous genetic information and compare both approaches. The latter corresponds to a biased ad-hoc estimate of prevalence. We assess the performance of our estimator by systematic numerical simulations assuming realistic sample sizes and various scenarios of transmission intensity. For reasonable sample sizes, and number of loci, the method has little bias. As an example, we apply the method to a dataset from Cameroon on sulfadoxine-pyrimethamine resistance in P. falciparum malaria. The method is not confined to malaria and can be applied to any infectious disease with similar transmission behavior. An easy-to-use implementation of the method as an R-script is provided.</p
Data_Sheet_4_A maximum-likelihood method to estimate haplotype frequencies and prevalence alongside multiplicity of infection from SNP data.pdf
The introduction of genomic methods facilitated standardized molecular disease surveillance. For instance, SNP barcodes in Plasmodium vivax and Plasmodium falciparum malaria allows the characterization of haplotypes, their frequencies and prevalence to reveal temporal and spatial transmission patterns. A confounding factor is the presence of multiple genetically distinct pathogen variants within the same infection, known as multiplicity of infection (MOI). Disregarding ambiguous information, as usually done in ad-hoc approaches, leads to less confident and biased estimates. We introduce a statistical framework to obtain maximum-likelihood estimates (MLE) of haplotype frequencies and prevalence alongside MOI from malaria SNP data, i.e., multiple biallelic marker loci. The number of model parameters increases geometrically with the number of genetic markers considered and no closed-form solution exists for the MLE. Therefore, the MLE needs to be derived numerically. We use the Expectation-Maximization (EM) algorithm to derive the maximum-likelihood estimates, an efficient and easy-to-implement algorithm that yields a numerically stable solution. We also derive expressions for haplotype prevalence based on either all or just the unambiguous genetic information and compare both approaches. The latter corresponds to a biased ad-hoc estimate of prevalence. We assess the performance of our estimator by systematic numerical simulations assuming realistic sample sizes and various scenarios of transmission intensity. For reasonable sample sizes, and number of loci, the method has little bias. As an example, we apply the method to a dataset from Cameroon on sulfadoxine-pyrimethamine resistance in P. falciparum malaria. The method is not confined to malaria and can be applied to any infectious disease with similar transmission behavior. An easy-to-use implementation of the method as an R-script is provided.</p