365 research outputs found
Regression approaches for Approximate Bayesian Computation
This book chapter introduces regression approaches and regression adjustment
for Approximate Bayesian Computation (ABC). Regression adjustment adjusts
parameter values after rejection sampling in order to account for the imperfect
match between simulations and observations. Imperfect match between simulations
and observations can be more pronounced when there are many summary statistics,
a phenomenon coined as the curse of dimensionality. Because of this imperfect
match, credibility intervals obtained with regression approaches can be
inflated compared to true credibility intervals. The chapter presents the main
concepts underlying regression adjustment. A theorem that compares theoretical
properties of posterior distributions obtained with and without regression
adjustment is presented. Last, a practical application of regression adjustment
in population genetics shows that regression adjustment shrinks posterior
distributions compared to rejection approaches, which is a solution to avoid
inflated credibility intervals.Comment: Book chapter, published in Handbook of Approximate Bayesian
Computation 201
Efficient Forward Simulation of Fisher-Wright Populations with Stochastic Population Size and Neutral Single Step Mutations in Haplotypes
In both population genetics and forensic genetics it is important to know how
haplotypes are distributed in a population. Simulation of population dynamics
helps facilitating research on the distribution of haplotypes. In forensic
genetics, the haplotypes can for example consist of lineage markers such as
short tandem repeat loci on the Y chromosome (Y-STR). A dominating model for
describing population dynamics is the simple, yet powerful, Fisher-Wright
model. We describe an efficient algorithm for exact forward simulation of exact
Fisher-Wright populations (and not approximative such as the coalescent model).
The efficiency comes from convenient data structures by changing the
traditional view from individuals to haplotypes. The algorithm is implemented
in the open-source R package 'fwsim' and is able to simulate very large
populations. We focus on a haploid model and assume stochastic population size
with flexible growth specification, no selection, a neutral single step
mutation process, and self-reproducing individuals. These assumptions make the
algorithm ideal for studying lineage markers such as Y-STR.Comment: 17 pages, 6 figure
MixtureTree: a program for constructing phylogeny
<p>Abstract</p> <p>Background</p> <p>MixtureTree v1.0 is a Linux based program (written in C++) which implements an algorithm based on mixture models for reconstructing phylogeny from binary sequence data, such as single-nucleotide polymorphisms (SNPs). In addition to the mixture algorithm with three different optimization options, the program also implements a bootstrap procedure with majority-rule consensus.</p> <p>Results</p> <p>The MixtureTree program written in C++ is a Linux based package. The User's Guide and source codes will be available at <url>http://math.asu.edu/~scchen/MixtureTree.html</url></p> <p>Conclusions</p> <p>The efficiency of the mixture algorithm is relatively higher than some classical methods, such as Neighbor-Joining method, Maximum Parsimony method and Maximum Likelihood method. The shortcoming of the mixture tree algorithms, for example timing consuming, can be improved by implementing other revised Expectation-Maximization(EM) algorithms instead of the traditional EM algorithm.</p
Error-prone polymerase activity causes multinucleotide mutations in humans
About 2% of human genetic polymorphisms have been hypothesized to arise via
multinucleotide mutations (MNMs), complex events that generate SNPs at multiple
sites in a single generation. MNMs have the potential to accelerate the pace at
which single genes evolve and to confound studies of demography and selection
that assume all SNPs arise independently. In this paper, we examine clustered
mutations that are segregating in a set of 1,092 human genomes, demonstrating
that MNMs become enriched as large numbers of individuals are sampled. We
leverage the size of the dataset to deduce new information about the allelic
spectrum of MNMs, estimating the percentage of linked SNP pairs that were
generated by simultaneous mutation as a function of the distance between the
affected sites and showing that MNMs exhibit a high percentage of transversions
relative to transitions. These findings are reproducible in data from multiple
sequencing platforms. Among tandem mutations that occur simultaneously at
adjacent sites, we find an especially skewed distribution of ancestral and
derived dinucleotides, with , and their reverse complements making up 36% of the total. These
same mutations dominate the spectrum of tandem mutations produced by the
upregulation of low-fidelity Polymerase in mutator strains of S.
cerevisiae that have impaired DNA excision repair machinery. This suggests that
low-fidelity DNA replication by Pol is at least partly responsible for
the MNMs that are segregating in the human population, and that useful
information about the biochemistry of MNM can be extracted from ordinary
population genomic data. We incorporate our findings into a mathematical model
of the multinucleotide mutation process that can be used to correct
phylogenetic and population genetic methods for the presence of MNMs
Genomic signatures of population decline in the malaria mosquito Anopheles gambiae
Population genomic features such as nucleotide diversity and linkage disequilibrium are expected to be strongly shaped by changes in population size, and might therefore be useful for monitoring the success of a control campaign. In the Kilifi district of Kenya, there has been a marked decline in the abundance of the malaria vector Anopheles gambiae subsequent to the rollout of insecticide-treated bed nets. To investigate whether this decline left a detectable population genomic signature, simulations were performed to compare the effect of population crashes on nucleotide diversity, Tajima's D, and linkage disequilibrium (as measured by the population recombination parameter ρ). Linkage disequilibrium and ρ were estimated for An. gambiae from Kilifi, and compared them to values for Anopheles arabiensis and Anopheles merus at the same location, and for An. gambiae in a location 200 km from Kilifi. In the first simulations ρ changed more rapidly after a population crash than the other statistics, and therefore is a more sensitive indicator of recent population decline. In the empirical data, linkage disequilibrium extends 100-1000 times further, and ρ is 100-1000 times smaller, for the Kilifi population of An. gambiae than for any of the other populations. There were also significant runs of homozygosity in many of the individual An. gambiae mosquitoes from Kilifi. These results support the hypothesis that the recent decline in An. gambiae was driven by the rollout of bed nets. Measuring population genomic parameters in a small sample of individuals before, during and after vector or pest control may be a valuable method of tracking the effectiveness of interventions
Yule-generated trees constrained by node imbalance
The Yule process generates a class of binary trees which is fundamental to
population genetic models and other applications in evolutionary biology. In
this paper, we introduce a family of sub-classes of ranked trees, called
Omega-trees, which are characterized by imbalance of internal nodes. The degree
of imbalance is defined by an integer 0 <= w. For caterpillars, the extreme
case of unbalanced trees, w = 0. Under models of neutral evolution, for
instance the Yule model, trees with small w are unlikely to occur by chance.
Indeed, imbalance can be a signature of permanent selection pressure, such as
observable in the genealogies of certain pathogens. From a mathematical point
of view it is interesting to observe that the space of Omega-trees maintains
several statistical invariants although it is drastically reduced in size
compared to the space of unconstrained Yule trees. Using generating functions,
we study here some basic combinatorial properties of Omega-trees. We focus on
the distribution of the number of subtrees with two leaves. We show that
expectation and variance of this distribution match those for unconstrained
trees already for very small values of w
- …