563 research outputs found

    Semi-automatic selection of summary statistics for ABC model choice

    Full text link
    A central statistical goal is to choose between alternative explanatory models of data. In many modern applications, such as population genetics, it is not possible to apply standard methods based on evaluating the likelihood functions of the models, as these are numerically intractable. Approximate Bayesian computation (ABC) is a commonly used alternative for such situations. ABC simulates data x for many parameter values under each model, which is compared to the observed data xobs. More weight is placed on models under which S(x) is close to S(xobs), where S maps data to a vector of summary statistics. Previous work has shown the choice of S is crucial to the efficiency and accuracy of ABC. This paper provides a method to select good summary statistics for model choice. It uses a preliminary step, simulating many x values from all models and fitting regressions to this with the model as response. The resulting model weight estimators are used as S in an ABC analysis. Theoretical results are given to justify this as approximating low dimensional sufficient statistics. A substantive application is presented: choosing between competing coalescent models of demographic growth for Campylobacter jejuni in New Zealand using multi-locus sequence typing data

    Optimal detection of changepoints with a linear computational cost

    Full text link
    We consider the problem of detecting multiple changepoints in large data sets. Our focus is on applications where the number of changepoints will increase as we collect more data: for example in genetics as we analyse larger regions of the genome, or in finance as we observe time-series over longer periods. We consider the common approach of detecting changepoints through minimising a cost function over possible numbers and locations of changepoints. This includes several established procedures for detecting changing points, such as penalised likelihood and minimum description length. We introduce a new method for finding the minimum of such cost functions and hence the optimal number and location of changepoints that has a computational cost which, under mild conditions, is linear in the number of observations. This compares favourably with existing methods for the same problem whose computational cost can be quadratic or even cubic. In simulation studies we show that our new method can be orders of magnitude faster than these alternative exact methods. We also compare with the Binary Segmentation algorithm for identifying changepoints, showing that the exactness of our approach can lead to substantial improvements in the accuracy of the inferred segmentation of the data.Comment: 25 pages, 4 figures, To appear in Journal of the American Statistical Associatio

    Particle Approximations of the Score and Observed Information Matrix for Parameter Estimation in State Space Models With Linear Computational Cost

    Get PDF
    Poyiadjis et al. (2011) show how particle methods can be used to estimate both the score and the observed information matrix for state space models. These methods either suffer from a computational cost that is quadratic in the number of particles, or produce estimates whose variance increases quadratically with the amount of data. This paper introduces an alternative approach for estimating these terms at a computational cost that is linear in the number of particles. The method is derived using a combination of kernel density estimation, to avoid the particle degeneracy that causes the quadratically increasing variance, and Rao-Blackwellisation. Crucially, we show the method is robust to the choice of bandwidth within the kernel density estimation, as it has good asymptotic properties regardless of this choice. Our estimates of the score and observed information matrix can be used within both online and batch procedures for estimating parameters for state space models. Empirical results show improved parameter estimates compared to existing methods at a significantly reduced computational cost. Supplementary materials including code are available

    The Time Machine: A Simulation Approach for Stochastic Trees

    Full text link
    In the following paper we consider a simulation technique for stochastic trees. One of the most important areas in computational genetics is the calculation and subsequent maximization of the likelihood function associated to such models. This typically consists of using importance sampling (IS) and sequential Monte Carlo (SMC) techniques. The approach proceeds by simulating the tree, backward in time from observed data, to a most recent common ancestor (MRCA). However, in many cases, the computational time and variance of estimators are often too high to make standard approaches useful. In this paper we propose to stop the simulation, subsequently yielding biased estimates of the likelihood surface. The bias is investigated from a theoretical point of view. Results from simulation studies are also given to investigate the balance between loss of accuracy, saving in computing time and variance reduction.Comment: 22 Pages, 5 Figure

    Bayesian computation via empirical likelihood

    Full text link
    Approximate Bayesian computation (ABC) has become an essential tool for the analysis of complex stochastic models when the likelihood function is numerically unavailable. However, the well-established statistical method of empirical likelihood provides another route to such settings that bypasses simulations from the model and the choices of the ABC parameters (summary statistics, distance, tolerance), while being convergent in the number of observations. Furthermore, bypassing model simulations may lead to significant time savings in complex models, for instance those found in population genetics. The BCel algorithm we develop in this paper also provides an evaluation of its own performance through an associated effective sample size. The method is illustrated using several examples, including estimation of standard distributions, time series, and population genetics models.Comment: 21 pages, 12 figures, revised version of the previous version with a new titl

    INTEGRAL/SPI data segmentation to retrieve sources intensity variations

    Get PDF
    International audienceContext. The INTEGRAL/SPI, X/γ-ray spectrometer (20 keV–8 MeV) is an instrument for which recovering source intensity variations is not straightforward and can constitute a difficulty for data analysis. In most cases, determining the source intensity changes between exposures is largely based on a priori information.Aims. We propose techniques that help to overcome the difficulty related to source intensity variations, which make this step more rational. In addition, the constructed “synthetic” light curves should permit us to obtain a sky model that describes the data better and optimizes the source signal-to-noise ratios.Methods. For this purpose, the time intensity variation of each source was modeled as a combination of piecewise segments of time during which a given source exhibits a constant intensity. To optimize the signal-to-noise ratios, the number of segments was minimized. We present a first method that takes advantage of previous time series that can be obtained from another instrument on-board the INTEGRAL observatory. A data segmentation algorithm was then used to synthesize the time series into segments. The second method no longer needs external light curves, but solely SPI raw data. For this, we developed a specific algorithm that involves the SPI transfer function.Results. The time segmentation algorithms that were developed solve a difficulty inherent to the SPI instrument, which is the intensity variations of sources between exposures, and it allows us to obtain more information about the sources’ behavior

    Sequential quasi-Monte Carlo: Introduction for Non-Experts, Dimension Reduction, Application to Partly Observed Diffusion Processes

    Full text link
    SMC (Sequential Monte Carlo) is a class of Monte Carlo algorithms for filtering and related sequential problems. Gerber and Chopin (2015) introduced SQMC (Sequential quasi-Monte Carlo), a QMC version of SMC. This paper has two objectives: (a) to introduce Sequential Monte Carlo to the QMC community, whose members are usually less familiar with state-space models and particle filtering; (b) to extend SQMC to the filtering of continuous-time state-space models, where the latent process is a diffusion. A recurring point in the paper will be the notion of dimension reduction, that is how to implement SQMC in such a way that it provides good performance despite the high dimension of the problem.Comment: To be published in the proceedings of MCMQMC 201

    A computationally efficient, high-dimensional multiple changepoint procedure with application to global terrorism incidence

    Get PDF
    Detecting changepoints in datasets with many variates is a data science challenge of increasing importance. Motivated by the problem of detecting changes in the incidence of terrorism from a global terrorism database, we propose a novel approach to multiple changepoint detection in multivariate time series. Our method, which we call SUBSET, is a model-based approach which uses a penalised likelihood to detect changes for a wide class of parametric settings. We provide theory that guides the choice of penalties to use for SUBSET, and that shows it has high power to detect changes regardless of whether only a few variates or many variates change. Empirical results show that SUBSET out-performs many existing approaches for detecting changes in mean in Gaussian data; additionally, unlike these alternative methods, it can be easily extended to non-Gaussian settings such as are appropriate for modelling counts of terrorist events
    • …
    corecore