508 research outputs found

    How Many Subpopulations is Too Many? Exponential Lower Bounds for Inferring Population Histories

    Full text link
    Reconstruction of population histories is a central problem in population genetics. Existing coalescent-based methods, like the seminal work of Li and Durbin (Nature, 2011), attempt to solve this problem using sequence data but have no rigorous guarantees. Determining the amount of data needed to correctly reconstruct population histories is a major challenge. Using a variety of tools from information theory, the theory of extremal polynomials, and approximation theory, we prove new sharp information-theoretic lower bounds on the problem of reconstructing population structure -- the history of multiple subpopulations that merge, split and change sizes over time. Our lower bounds are exponential in the number of subpopulations, even when reconstructing recent histories. We demonstrate the sharpness of our lower bounds by providing algorithms for distinguishing and learning population histories with matching dependence on the number of subpopulations. Along the way and of independent interest, we essentially determine the optimal number of samples needed to learn an exponential mixture distribution information-theoretically, proving the upper bound by analyzing natural (and efficient) algorithms for this problem.Comment: 38 pages, Appeared in RECOMB 201

    TRANSIENT BAYESIAN INFERENCE FOR SHORT AND LONG-TAILED GI/G/1 QUEUEING SYSTEMS

    Get PDF
    In this paper, we describe how to make Bayesian inference for the transient behaviour and busy period in a single server system with general and unknown distribution for the service and interarrival time. The dense family of Coxian distributions is used for the service and arrival process to the system. This distribution model is reparametrized such that it is possible to define a non-informative prior which allows for the approximation of heavytailed distributions. Reversible jump Markov chain Monte Carlo methods are used to estimate the predictive distribution of the interarrival and service time. Our procedure for estimating the system measures is based in recent results for known parameters which are frequently implemented by using symbolical packages. Alternatively, we propose a simple numerical technique that can be performed for every MCMC iteration so that we can estimate interesting measures, such as the transient queue length distribution. We illustrate our approach with simulated and real queues.

    On the accuracy of phase-type approximations of heavy-tailed risk models

    Get PDF
    Numerical evaluation of ruin probabilities in the classical risk model is an important problem. If claim sizes are heavy-tailed, then such evaluations are challenging. To overcome this, an attractive way is to approximate the claim sizes with a phase-type distribution. What is not clear though is how many phases are enough in order to achieve a specific accuracy in the approximation of the ruin probability. The goals of this paper are to investigate the number of phases required so that we can achieve a pre-specified accuracy for the ruin probability and to provide error bounds. Also, in the special case of a completely monotone claim size distribution we develop an algorithm to estimate the ruin probability by approximating the excess claim size distribution with a hyperexponential one. Finally, we compare our approximation with the heavy traffic and heavy tail approximations.Comment: 24 pages, 13 figures, 8 tables, 38 reference

    The pseudo-self-similar traffic model: application and validation

    Get PDF
    Since the early 1990¿s, a variety of studies has shown that network traffic, both for local- and wide-area networks, has self-similar properties. This led to new approaches in network traffic modelling because most traditional traffic approaches result in the underestimation of performance measures of interest. Instead of developing completely new traffic models, a number of researchers have proposed to adapt traditional traffic modelling approaches to incorporate aspects of self-similarity. The motivation for doing so is the hope to be able to reuse techniques and tools that have been developed in the past and with which experience has been gained. One such approach for a traffic model that incorporates aspects of self-similarity is the so-called pseudo self-similar traffic model. This model is appealing, as it is easy to understand and easily embedded in Markovian performance evaluation studies. In applying this model in a number of cases, we have perceived various problems which we initially thought were particular to these specific cases. However, we recently have been able to show that these problems are fundamental to the pseudo self-similar traffic model. In this paper we review the pseudo self-similar traffic model and discuss its fundamental shortcomings. As far as we know, this is the first paper that discusses these shortcomings formally. We also report on ongoing work to overcome some of these problems

    A stochastic evolutionary model generating a mixture of exponential distributions

    Get PDF
    Recent interest in human dynamics has stimulated the investigation of the stochastic processes that explain human behaviour in various contexts, such as mobile phone networks and social media. In this paper, we extend the stochastic urn-based model proposed in \cite{FENN15} so that it can generate mixture models, in particular, a mixture of exponential distributions. The model is designed to capture the dynamics of survival analysis, traditionally employed in clinical trials, reliability analysis in engineering, and more recently in the analysis of large data sets recording human dynamics. The mixture modelling approach, which is relatively simple and well understood, is very effective in capturing heterogeneity in data. We provide empirical evidence for the validity of the model, using a data set of popular search engine queries collected over a period of 114 months. We show that the survival function of these queries is closely matched by the exponential mixture solution for our model

    TVaR-based capital allocation with copulas

    Get PDF
    Because of regulation projects from control organizations such as the European solvency II reform and recent economic events, insurance companies need to consolidate their capital reserve with coherent amounts allocated to the whole company and to each line of business. The present study considers an insurance portfolio consisting of several lines of risk which are linked by a copula and aims to evaluate not only the capital allocation for the overall portfolio but also the contribution of each risk over their aggregation. We use the tail value at risk (TVaR) as risk measure. The handy form of the FGM copula permits an exact expression for the TVaR of the sum of the risks and for the TVaR-based allocations when claim amounts are exponentially distributed and distributed as a mixture of exponentials. We first examine the bivariate model and then the multivariate case. We also show how to approximate the TVaR of the aggregate risk and the contribution of each risk when using any copula.

    Lagrangian Velocity Correlations and Absolute Dispersion in the Midlatitude Troposphere

    Full text link
    Employing daily wind data from the ECMWF, we perform passive particle advection to estimate the Lagrangian velocity correlation functions (LVCF) associated with the midlatitude tropospheric flow. In particular we decompose the velocity field into time mean and transient (or eddy) components to better understand the nature of the LVCF's. A closely related quantity, the absolute dispersion (AD) is also examined. Given the anisotropy of the flow, meridional and zonal characteristics are considered separately. The zonal LVCF is seen to be non-exponential. In fact, for intermediate timescales it can either be interpreted as a power law of the form τα\tau^{-\alpha} with 0<α<1 0<\alpha<1 or as the sum of exponentials with differing timescales - both interpretations being equivalent. More importantly the long time correlations in the zonal flow result in a superdiffusive zonal AD regime. On the other hand, the meridional LVCF decays rapidly to zero. Before approaching zero the meridional LVCF shows a region of negative correlation - a consequence of the presence of planetary scale Rossby waves. As a result the meridional AD, apart from showing the classical asymptotic ballistic and diffusive regimes, displays transient subdiffusive behaviour.Comment: Revised version. Submitted to JA

    Bayesian estimation of ruin probabilities with heterogeneous and heavy-tailed insurance claim size distribution

    Get PDF
    This paper describes a Bayesian approach to make inference for risk reserve processes with unknown claim size distribution. A flexible model based on mixtures of Erlang distributions is proposed to approximate the special features frequently observed in insurance claim sizes such as long tails and heterogeneity. A Bayesian density estimation approach for the claim sizes is implemented using reversible jump Markov Chain Monte Carlo methods. An advantage of the considered mixture model is that it belongs to the class of phase-type distributions and then, explicit evaluations of the ruin probabilities are possible. Furthermore, from a statistical point of view, the parametric structure of the mixtures of Erlang distribution others some advantages compared with the whole over-parameterized family of phase-type distributions. Given the observed claim arrivals and claim sizes, we show how to estimate the ruin probabilities, as a function of the initial capital, and predictive intervals which give a measure of the uncertainty in the estimations
    corecore