21,476 research outputs found
Inference of Ancestral Recombination Graphs through Topological Data Analysis
The recent explosion of genomic data has underscored the need for
interpretable and comprehensive analyses that can capture complex phylogenetic
relationships within and across species. Recombination, reassortment and
horizontal gene transfer constitute examples of pervasive biological phenomena
that cannot be captured by tree-like representations. Starting from hundreds of
genomes, we are interested in the reconstruction of potential evolutionary
histories leading to the observed data. Ancestral recombination graphs
represent potential histories that explicitly accommodate recombination and
mutation events across orthologous genomes. However, they are computationally
costly to reconstruct, usually being infeasible for more than few tens of
genomes. Recently, Topological Data Analysis (TDA) methods have been proposed
as robust and scalable methods that can capture the genetic scale and frequency
of recombination. We build upon previous TDA developments for detecting and
quantifying recombination, and present a novel framework that can be applied to
hundreds of genomes and can be interpreted in terms of minimal histories of
mutation and recombination events, quantifying the scales and identifying the
genomic locations of recombinations. We implement this framework in a software
package, called TARGet, and apply it to several examples, including small
migration between different populations, human recombination, and horizontal
evolution in finches inhabiting the Gal\'apagos Islands.Comment: 33 pages, 12 figures. The accompanying software, instructions and
example files used in the manuscript can be obtained from
https://github.com/RabadanLab/TARGe
Stochastic modelling, Bayesian inference, and new in vivo measurements elucidate the debated mtDNA bottleneck mechanism
Dangerous damage to mitochondrial DNA (mtDNA) can be ameliorated during
mammalian development through a highly debated mechanism called the mtDNA
bottleneck. Uncertainty surrounding this process limits our ability to address
inherited mtDNA diseases. We produce a new, physically motivated, generalisable
theoretical model for mtDNA populations during development, allowing the first
statistical comparison of proposed bottleneck mechanisms. Using approximate
Bayesian computation and mouse data, we find most statistical support for a
combination of binomial partitioning of mtDNAs at cell divisions and random
mtDNA turnover, meaning that the debated exact magnitude of mtDNA copy number
depletion is flexible. New experimental measurements from a wild-derived mtDNA
pairing in mice confirm the theoretical predictions of this model. We
analytically solve a mathematical description of this mechanism, computing
probabilities of mtDNA disease onset, efficacy of clinical sampling strategies,
and effects of potential dynamic interventions, thus developing a quantitative
and experimentally-supported stochastic theory of the bottleneck.Comment: Main text: 14 pages, 5 figures; Supplement: 17 pages, 4 figures;
Total: 31 pages, 9 figure
Efficient data augmentation for fitting stochastic epidemic models to prevalence data
Stochastic epidemic models describe the dynamics of an epidemic as a disease
spreads through a population. Typically, only a fraction of cases are observed
at a set of discrete times. The absence of complete information about the time
evolution of an epidemic gives rise to a complicated latent variable problem in
which the state space size of the epidemic grows large as the population size
increases. This makes analytically integrating over the missing data infeasible
for populations of even moderate size. We present a data augmentation Markov
chain Monte Carlo (MCMC) framework for Bayesian estimation of stochastic
epidemic model parameters, in which measurements are augmented with
subject-level disease histories. In our MCMC algorithm, we propose each new
subject-level path, conditional on the data, using a time-inhomogeneous
continuous-time Markov process with rates determined by the infection histories
of other individuals. The method is general, and may be applied, with minimal
modifications, to a broad class of stochastic epidemic models. We present our
algorithm in the context of multiple stochastic epidemic models in which the
data are binomially sampled prevalence counts, and apply our method to data
from an outbreak of influenza in a British boarding school
Past and present cosmic structure in the SDSS DR7 main sample
We present a chrono-cosmography project, aiming at the inference of the four
dimensional formation history of the observed large scale structure from its
origin to the present epoch. To do so, we perform a full-scale Bayesian
analysis of the northern galactic cap of the Sloan Digital Sky Survey (SDSS)
Data Release 7 main galaxy sample, relying on a fully probabilistic, physical
model of the non-linearly evolved density field. Besides inferring initial
conditions from observations, our methodology naturally and accurately
reconstructs non-linear features at the present epoch, such as walls and
filaments, corresponding to high-order correlation functions generated by
late-time structure formation. Our inference framework self-consistently
accounts for typical observational systematic and statistical uncertainties
such as noise, survey geometry and selection effects. We further account for
luminosity dependent galaxy biases and automatic noise calibration within a
fully Bayesian approach. As a result, this analysis provides highly-detailed
and accurate reconstructions of the present density field on scales larger than
Mpc, constrained by SDSS observations. This approach also leads to
the first quantitative inference of plausible formation histories of the
dynamic large scale structure underlying the observed galaxy distribution. The
results described in this work constitute the first full Bayesian non-linear
analysis of the cosmic large scale structure with the demonstrated capability
of uncertainty quantification. Some of these results will be made publicly
available along with this work. The level of detail of inferred results and the
high degree of control on observational uncertainties pave the path towards
high precision chrono-cosmography, the subject of simultaneously studying the
dynamics and the morphology of the inhomogeneous Universe.Comment: 27 pages, 9 figure
- …