33,688 research outputs found
Fitting stochastic epidemic models to gene genealogies using linear noise approximation
Phylodynamics is a set of population genetics tools that aim at
reconstructing demographic history of a population based on molecular sequences
of individuals sampled from the population of interest. One important task in
phylodynamics is to estimate changes in (effective) population size. When
applied to infectious disease sequences such estimation of population size
trajectories can provide information about changes in the number of infections.
To model changes in the number of infected individuals, current phylodynamic
methods use non-parametric approaches, parametric approaches, and stochastic
modeling in conjunction with likelihood-free Bayesian methods. The first class
of methods yields results that are hard-to-interpret epidemiologically. The
second class of methods provides estimates of important epidemiological
parameters, such as infection and removal/recovery rates, but ignores variation
in the dynamics of infectious disease spread. The third class of methods is the
most advantageous statistically, but relies on computationally intensive
particle filtering techniques that limits its applications. We propose a
Bayesian model that combines phylodynamic inference and stochastic epidemic
models, and achieves computational tractability by using a linear noise
approximation (LNA) --- a technique that allows us to approximate probability
densities of stochastic epidemic model trajectories. LNA opens the door for
using modern Markov chain Monte Carlo tools to approximate the joint posterior
distribution of the disease transmission parameters and of high dimensional
vectors describing unobserved changes in the stochastic epidemic model
compartment sizes (e.g., numbers of infectious and susceptible individuals). We
apply our estimation technique to Ebola genealogies estimated using viral
genetic data from the 2014 epidemic in Sierra Leone and Liberia.Comment: 43 pages, 6 figures in the main tex
Distributed top-k aggregation queries at large
Top-k query processing is a fundamental building block for efficient ranking in a large number of applications. Efficiency is a central issue, especially for distributed settings, when the data is spread across different nodes in a network. This paper introduces novel optimization methods for top-k aggregation queries in such distributed environments. The optimizations can be applied to all algorithms that fall into the frameworks of the prior TPUT and KLEE methods. The optimizations address three degrees of freedom: 1) hierarchically grouping input lists into top-k operator trees and optimizing the tree structure, 2) computing data-adaptive scan depths for different input sources, and 3) data-adaptive sampling of a small subset of input sources in scenarios with hundreds or thousands of query-relevant network nodes. All optimizations are based on a statistical cost model that utilizes local synopses, e.g., in the form of histograms, efficiently computed convolutions, and estimators based on order statistics. The paper presents comprehensive experiments, with three different real-life datasets and using the ns-2 network simulator for a packet-level simulation of a large Internet-style network
Reliable ABC model choice via random forests
Approximate Bayesian computation (ABC) methods provide an elaborate approach
to Bayesian inference on complex models, including model choice. Both
theoretical arguments and simulation experiments indicate, however, that model
posterior probabilities may be poorly evaluated by standard ABC techniques. We
propose a novel approach based on a machine learning tool named random forests
to conduct selection among the highly complex models covered by ABC algorithms.
We thus modify the way Bayesian model selection is both understood and
operated, in that we rephrase the inferential goal as a classification problem,
first predicting the model that best fits the data with random forests and
postponing the approximation of the posterior probability of the predicted MAP
for a second stage also relying on random forests. Compared with earlier
implementations of ABC model choice, the ABC random forest approach offers
several potential improvements: (i) it often has a larger discriminative power
among the competing models, (ii) it is more robust against the number and
choice of statistics summarizing the data, (iii) the computing effort is
drastically reduced (with a gain in computation efficiency of at least fifty),
and (iv) it includes an approximation of the posterior probability of the
selected model. The call to random forests will undoubtedly extend the range of
size of datasets and complexity of models that ABC can handle. We illustrate
the power of this novel methodology by analyzing controlled experiments as well
as genuine population genetics datasets. The proposed methodologies are
implemented in the R package abcrf available on the CRAN.Comment: 39 pages, 15 figures, 6 table
Does foraging efficiency vary with colony size in the fairy martin Petrochelidon ariel?
Colonial breeding occurs in a wide range of taxa, however the advantages promoting its evolution and maintenance remain poorly understood. In many avian species, breeding colonies vary by several orders of magnitude and one approach to investigating the evolution of coloniality has been to examine how potential costs and benefits vary with colony size. Several hypotheses predict that foraging efficiency may improve with colony size, through benefits associated with social foraging and information exchange. However, it is argued that competition for limited food resources will also increase with colony size, potentially reducing foraging success. Here we use a number of measures (brood feeding rates, chick condition and survival, and adult condition) to estimate foraging efficiency in the fairy martin Petrochelidon ariel, across a range of colony sizes in a single season (17 colonies, size range 28-139 pairs). Brood provisioning rates were collected from multiple colonies simultaneously using an electronic monitoring system, controlling for temporal variation in environmental conditions. Provisioning rate was correlated with nestling condition, though we found no clear relationship between provisioning rate and colony size for either male or female parents. However, chicks were generally in worse condition and broods more likely to fail or experience partial loss in larger colonies. Moreover, the average condition of adults declined with colony size. Overall, these findings suggest that foraging efficiency declines with colony size in fairy martins, supporting the increased competition hypothesis. However, other factors, such as an increased ectoparasitise load in large colonies or change in the composition of phenotypes with colony size may have also contributed to these patterns.
- …