9,393 research outputs found
Genealogies of rapidly adapting populations
The genetic diversity of a species is shaped by its recent evolutionary
history and can be used to infer demographic events or selective sweeps. Most
inference methods are based on the null hypothesis that natural selection is a
weak or infrequent evolutionary force. However, many species, particularly
pathogens, are under continuous pressure to adapt in response to changing
environments. A statistical framework for inference from diversity data of such
populations is currently lacking. Toward this goal, we explore the properties
of genealogies in a model of continual adaptation in asexual populations. We
show that lineages trace back to a small pool of highly fit ancestors, in which
almost simultaneous coalescence of more than two lineages frequently occurs.
While such multiple mergers are unlikely under the neutral coalescent, they
create a unique genetic footprint in adapting populations. The site frequency
spectrum of derived neutral alleles, for example, is non-monotonic and has a
peak at high frequencies, whereas Tajima's D becomes more and more negative
with increasing sample size. Since multiple merger coalescents emerge in many
models of rapid adaptation, we argue that they should be considered as a
null-model for adapting populations.Comment: to appear in PNA
A Strategy analysis for genetic association studies with known inbreeding
Background: Association studies consist in identifying the genetic variants which are related to a specific disease through the use of statistical multiple hypothesis testing or segregation analysis in pedigrees. This type of studies has been very successful in the case of Mendelian monogenic disorders while it has been less successful in identifying genetic variants related to complex diseases where the insurgence depends on the interactions between different genes and the environment. The current technology allows to genotype more than a million of markers and this number has been rapidly increasing in the last years with the imputation based on templates sets and whole genome sequencing. This type of data introduces a great amount of noise in the statistical analysis and usually requires a great number of samples. Current methods seldom take into account gene-gene and gene-environment interactions which are fundamental especially in complex diseases. In this paper we propose to use a non-parametric additive model to detect the genetic variants related to diseases which accounts for interactions of unknown order. Although this is not new to
the current literature, we show that in an isolated population, where the most related subjects share also most of their genetic code, the use of additive models may be improved if the available genealogical tree is taken into account. Specifically, we form a sample of cases and controls with the highest inbreeding by means of the Hungarian method, and estimate the set of genes/environmental variables, associated with the disease, by means of Random Forest.
Results: We have evidence, from statistical theory, simulations and two applications, that we build a suitable
procedure to eliminate stratification between cases and controls and that it also has enough precision in
identifying genetic variants responsible for a disease. This procedure has been successfully used for the betathalassemia, which is a well known Mendelian disease, and also to the common asthma where we have identified
candidate genes that underlie to the susceptibility of the asthma. Some of such candidate genes have been also found related to common asthma in the current literature.
Conclusions: The data analysis approach, based on selecting the most related cases and controls along with the Random Forest model, is a powerful tool for detecting genetic variants associated to a disease in isolated
populations. Moreover, this method provides also a prediction model that has accuracy in estimating the unknown disease status and that can be generally used to build kit tests for a wide class of Mendelian diseases
Stochastic modelling, Bayesian inference, and new in vivo measurements elucidate the debated mtDNA bottleneck mechanism
Dangerous damage to mitochondrial DNA (mtDNA) can be ameliorated during
mammalian development through a highly debated mechanism called the mtDNA
bottleneck. Uncertainty surrounding this process limits our ability to address
inherited mtDNA diseases. We produce a new, physically motivated, generalisable
theoretical model for mtDNA populations during development, allowing the first
statistical comparison of proposed bottleneck mechanisms. Using approximate
Bayesian computation and mouse data, we find most statistical support for a
combination of binomial partitioning of mtDNAs at cell divisions and random
mtDNA turnover, meaning that the debated exact magnitude of mtDNA copy number
depletion is flexible. New experimental measurements from a wild-derived mtDNA
pairing in mice confirm the theoretical predictions of this model. We
analytically solve a mathematical description of this mechanism, computing
probabilities of mtDNA disease onset, efficacy of clinical sampling strategies,
and effects of potential dynamic interventions, thus developing a quantitative
and experimentally-supported stochastic theory of the bottleneck.Comment: Main text: 14 pages, 5 figures; Supplement: 17 pages, 4 figures;
Total: 31 pages, 9 figure
Selection strategies for randomly partitioned genetic replicators
The amplification cycle of many replicators (natural or artificial) involves
the usage of a host compartment, inside of which the replicator express
phenotypic compounds necessary to carry out its genetic replication. For
example, viruses infect cells, where they express their own proteins and
replicate. In this process, the host cell boundary limits the diffusion of the
viral protein products, thereby ensuring that phenotypic compounds, such as
proteins, promote the replication of the genes that encoded them. This role of
maintaining spatial co-localization, also called genotype-phenotype linkage, is
a critical function of compartments in natural selection. In most cases
however, individual replicating elements do not distribute systematically among
the hosts, but are randomly partitioned. Depending on the replicator-to-host
ratio, more than one variant may thus occupy some compartments, blurring the
genotype-phenotype linkage and affecting the effectiveness of natural
selection. We derive selection equations for a variety of such random multiple
occupancy situations, in particular considering the effect of replicator
population polymorphism and internal replication dynamics. We conclude that the
deleterious effect of random multiple occupancy on selection is relatively
benign, and may even completely vanish is some specific cases. In addition,
given that higher mean occupancy allows larger populations to be channeled
through the selection process, and thus provide a better exploration of
phenotypic diversity, we show that it may represent a valid strategy in both
natural and technological cases.Comment: 36 pages, 7 figure
Bayesian modeling of recombination events in bacterial populations
Background: We consider the discovery of recombinant segments jointly with their origins within multilocus DNA sequences from bacteria representing heterogeneous populations of fairly closely related species. The currently available methods for recombination detection capable of probabilistic characterization of uncertainty have a limited applicability in practice as the number of
strains in a data set increases.
Results: We introduce a Bayesian spatial structural model representing the continuum of origins over sites within the observed sequences, including a probabilistic characterization of uncertainty related to the origin of any particular site. To enable a statistically accurate and practically feasible approach to the analysis of large-scale data sets representing a single genus, we have developed a novel software tool (BRAT, Bayesian Recombination Tracker) implementing the model and the
corresponding learning algorithm, which is capable of identifying the posterior optimal structure and to estimate the marginal posterior probabilities of putative origins over the sites.
Conclusion: A multitude of challenging simulation scenarios and an analysis of real data from seven
housekeeping genes of 120 strains of genus Burkholderia are used to illustrate the possibilities
offered by our approach. The software is freely available for download at URL http://web.abo.fi/fak/
mnf//mate/jc/software/brat.html
On optimality of kernels for approximate Bayesian computation using sequential Monte Carlo
Approximate Bayesian computation (ABC) has gained popularity over the past few years for the analysis of complex models arising in population genetics, epidemiology and system biology. Sequential Monte Carlo (SMC) approaches have become work-horses in ABC. Here we discuss how to construct the perturbation kernels that are required in ABC SMC approaches, in order to construct a sequence of distributions that start out from a suitably defined prior and converge towards the unknown posterior. We derive optimality criteria for different kernels, which are based on the Kullback-Leibler divergence between a distribution and the distribution of the perturbed particles. We will show that for many complicated posterior distributions, locally adapted kernels tend to show the best performance. We find that the added moderate cost of adapting kernel functions is easily regained in terms of the higher acceptance rate. We demonstrate the computational efficiency gains in a range of toy examples which illustrate some of the challenges faced in real-world applications of ABC, before turning to two demanding parameter inference problems in molecular biology, which highlight the huge increases in efficiency that can be gained from choice of optimal kernels. We conclude with a general discussion of the rational choice of perturbation kernels in ABC SMC settings
- …