33,688 research outputs found

    Fitting stochastic epidemic models to gene genealogies using linear noise approximation

    Get PDF
    Phylodynamics is a set of population genetics tools that aim at reconstructing demographic history of a population based on molecular sequences of individuals sampled from the population of interest. One important task in phylodynamics is to estimate changes in (effective) population size. When applied to infectious disease sequences such estimation of population size trajectories can provide information about changes in the number of infections. To model changes in the number of infected individuals, current phylodynamic methods use non-parametric approaches, parametric approaches, and stochastic modeling in conjunction with likelihood-free Bayesian methods. The first class of methods yields results that are hard-to-interpret epidemiologically. The second class of methods provides estimates of important epidemiological parameters, such as infection and removal/recovery rates, but ignores variation in the dynamics of infectious disease spread. The third class of methods is the most advantageous statistically, but relies on computationally intensive particle filtering techniques that limits its applications. We propose a Bayesian model that combines phylodynamic inference and stochastic epidemic models, and achieves computational tractability by using a linear noise approximation (LNA) --- a technique that allows us to approximate probability densities of stochastic epidemic model trajectories. LNA opens the door for using modern Markov chain Monte Carlo tools to approximate the joint posterior distribution of the disease transmission parameters and of high dimensional vectors describing unobserved changes in the stochastic epidemic model compartment sizes (e.g., numbers of infectious and susceptible individuals). We apply our estimation technique to Ebola genealogies estimated using viral genetic data from the 2014 epidemic in Sierra Leone and Liberia.Comment: 43 pages, 6 figures in the main tex

    Distributed top-k aggregation queries at large

    Get PDF
    Top-k query processing is a fundamental building block for efficient ranking in a large number of applications. Efficiency is a central issue, especially for distributed settings, when the data is spread across different nodes in a network. This paper introduces novel optimization methods for top-k aggregation queries in such distributed environments. The optimizations can be applied to all algorithms that fall into the frameworks of the prior TPUT and KLEE methods. The optimizations address three degrees of freedom: 1) hierarchically grouping input lists into top-k operator trees and optimizing the tree structure, 2) computing data-adaptive scan depths for different input sources, and 3) data-adaptive sampling of a small subset of input sources in scenarios with hundreds or thousands of query-relevant network nodes. All optimizations are based on a statistical cost model that utilizes local synopses, e.g., in the form of histograms, efficiently computed convolutions, and estimators based on order statistics. The paper presents comprehensive experiments, with three different real-life datasets and using the ns-2 network simulator for a packet-level simulation of a large Internet-style network

    Reliable ABC model choice via random forests

    Full text link
    Approximate Bayesian computation (ABC) methods provide an elaborate approach to Bayesian inference on complex models, including model choice. Both theoretical arguments and simulation experiments indicate, however, that model posterior probabilities may be poorly evaluated by standard ABC techniques. We propose a novel approach based on a machine learning tool named random forests to conduct selection among the highly complex models covered by ABC algorithms. We thus modify the way Bayesian model selection is both understood and operated, in that we rephrase the inferential goal as a classification problem, first predicting the model that best fits the data with random forests and postponing the approximation of the posterior probability of the predicted MAP for a second stage also relying on random forests. Compared with earlier implementations of ABC model choice, the ABC random forest approach offers several potential improvements: (i) it often has a larger discriminative power among the competing models, (ii) it is more robust against the number and choice of statistics summarizing the data, (iii) the computing effort is drastically reduced (with a gain in computation efficiency of at least fifty), and (iv) it includes an approximation of the posterior probability of the selected model. The call to random forests will undoubtedly extend the range of size of datasets and complexity of models that ABC can handle. We illustrate the power of this novel methodology by analyzing controlled experiments as well as genuine population genetics datasets. The proposed methodologies are implemented in the R package abcrf available on the CRAN.Comment: 39 pages, 15 figures, 6 table

    Does foraging efficiency vary with colony size in the fairy martin Petrochelidon ariel?

    Get PDF
    Colonial breeding occurs in a wide range of taxa, however the advantages promoting its evolution and maintenance remain poorly understood. In many avian species, breeding colonies vary by several orders of magnitude and one approach to investigating the evolution of coloniality has been to examine how potential costs and benefits vary with colony size. Several hypotheses predict that foraging efficiency may improve with colony size, through benefits associated with social foraging and information exchange. However, it is argued that competition for limited food resources will also increase with colony size, potentially reducing foraging success. Here we use a number of measures (brood feeding rates, chick condition and survival, and adult condition) to estimate foraging efficiency in the fairy martin Petrochelidon ariel, across a range of colony sizes in a single season (17 colonies, size range 28-139 pairs). Brood provisioning rates were collected from multiple colonies simultaneously using an electronic monitoring system, controlling for temporal variation in environmental conditions. Provisioning rate was correlated with nestling condition, though we found no clear relationship between provisioning rate and colony size for either male or female parents. However, chicks were generally in worse condition and broods more likely to fail or experience partial loss in larger colonies. Moreover, the average condition of adults declined with colony size. Overall, these findings suggest that foraging efficiency declines with colony size in fairy martins, supporting the increased competition hypothesis. However, other factors, such as an increased ectoparasitise load in large colonies or change in the composition of phenotypes with colony size may have also contributed to these patterns.
    corecore