508 research outputs found
How Many Subpopulations is Too Many? Exponential Lower Bounds for Inferring Population Histories
Reconstruction of population histories is a central problem in population
genetics. Existing coalescent-based methods, like the seminal work of Li and
Durbin (Nature, 2011), attempt to solve this problem using sequence data but
have no rigorous guarantees. Determining the amount of data needed to correctly
reconstruct population histories is a major challenge. Using a variety of tools
from information theory, the theory of extremal polynomials, and approximation
theory, we prove new sharp information-theoretic lower bounds on the problem of
reconstructing population structure -- the history of multiple subpopulations
that merge, split and change sizes over time. Our lower bounds are exponential
in the number of subpopulations, even when reconstructing recent histories. We
demonstrate the sharpness of our lower bounds by providing algorithms for
distinguishing and learning population histories with matching dependence on
the number of subpopulations. Along the way and of independent interest, we
essentially determine the optimal number of samples needed to learn an
exponential mixture distribution information-theoretically, proving the upper
bound by analyzing natural (and efficient) algorithms for this problem.Comment: 38 pages, Appeared in RECOMB 201
TRANSIENT BAYESIAN INFERENCE FOR SHORT AND LONG-TAILED GI/G/1 QUEUEING SYSTEMS
In this paper, we describe how to make Bayesian inference for the transient behaviour and busy period in a single server system with general and unknown distribution for the service and interarrival time. The dense family of Coxian distributions is used for the service and arrival process to the system. This distribution model is reparametrized such that it is possible to define a non-informative prior which allows for the approximation of heavytailed distributions. Reversible jump Markov chain Monte Carlo methods are used to estimate the predictive distribution of the interarrival and service time. Our procedure for estimating the system measures is based in recent results for known parameters which are frequently implemented by using symbolical packages. Alternatively, we propose a simple numerical technique that can be performed for every MCMC iteration so that we can estimate interesting measures, such as the transient queue length distribution. We illustrate our approach with simulated and real queues.
On the accuracy of phase-type approximations of heavy-tailed risk models
Numerical evaluation of ruin probabilities in the classical risk model is an
important problem. If claim sizes are heavy-tailed, then such evaluations are
challenging. To overcome this, an attractive way is to approximate the claim
sizes with a phase-type distribution. What is not clear though is how many
phases are enough in order to achieve a specific accuracy in the approximation
of the ruin probability. The goals of this paper are to investigate the number
of phases required so that we can achieve a pre-specified accuracy for the ruin
probability and to provide error bounds. Also, in the special case of a
completely monotone claim size distribution we develop an algorithm to estimate
the ruin probability by approximating the excess claim size distribution with a
hyperexponential one. Finally, we compare our approximation with the heavy
traffic and heavy tail approximations.Comment: 24 pages, 13 figures, 8 tables, 38 reference
The pseudo-self-similar traffic model: application and validation
Since the early 1990¿s, a variety of studies has shown that network traffic, both for local- and wide-area networks, has self-similar properties. This led to new approaches in network traffic modelling because most traditional traffic approaches result in the underestimation of performance measures of interest. Instead of developing completely new traffic models, a number of researchers have proposed to adapt traditional traffic modelling approaches to incorporate aspects of self-similarity. The motivation for doing so is the hope to be able to reuse techniques and tools that have been developed in the past and with which experience has been gained. One such approach for a traffic model that incorporates aspects of self-similarity is the so-called pseudo self-similar traffic model. This model is appealing, as it is easy to understand and easily embedded in Markovian performance evaluation studies. In applying this model in a number of cases, we have perceived various problems which we initially thought were particular to these specific cases. However, we recently have been able to show that these problems are fundamental to the pseudo self-similar traffic model. In this paper we review the pseudo self-similar traffic model and discuss its fundamental shortcomings. As far as we know, this is the first paper that discusses these shortcomings formally. We also report on ongoing work to overcome some of these problems
A stochastic evolutionary model generating a mixture of exponential distributions
Recent interest in human dynamics has stimulated the investigation of the stochastic processes that explain human behaviour in various contexts, such as mobile phone networks and social media.
In this paper, we extend the stochastic urn-based model proposed in \cite{FENN15} so that it can generate mixture models,
in particular, a mixture of exponential distributions.
The model is designed to capture the dynamics of survival analysis, traditionally employed in clinical trials, reliability analysis in engineering, and more recently in the analysis of large data sets recording human dynamics. The mixture modelling approach, which is relatively simple and well understood, is very effective in capturing heterogeneity in data.
We provide empirical evidence for the validity of the model, using a data set of popular search engine queries collected over a period of 114 months. We show that the survival function of these queries is closely matched by the exponential mixture solution for our model
TVaR-based capital allocation with copulas
Because of regulation projects from control organizations such as the European solvency II reform and recent economic events, insurance companies need to consolidate their capital reserve with coherent amounts allocated to the whole company and to each line of business. The present study considers an insurance portfolio consisting of several lines of risk which are linked by a copula and aims to evaluate not only the capital allocation for the overall portfolio but also the contribution of each risk over their aggregation. We use the tail value at risk (TVaR) as risk measure. The handy form of the FGM copula permits an exact expression for the TVaR of the sum of the risks and for the TVaR-based allocations when claim amounts are exponentially distributed and distributed as a mixture of exponentials. We first examine the bivariate model and then the multivariate case. We also show how to approximate the TVaR of the aggregate risk and the contribution of each risk when using any copula.
Lagrangian Velocity Correlations and Absolute Dispersion in the Midlatitude Troposphere
Employing daily wind data from the ECMWF, we perform passive particle
advection to estimate the Lagrangian velocity correlation functions (LVCF)
associated with the midlatitude tropospheric flow. In particular we decompose
the velocity field into time mean and transient (or eddy) components to better
understand the nature of the LVCF's. A closely related quantity, the absolute
dispersion (AD) is also examined.
Given the anisotropy of the flow, meridional and zonal characteristics are
considered separately. The zonal LVCF is seen to be non-exponential. In fact,
for intermediate timescales it can either be interpreted as a power law of the
form with or as the sum of exponentials with
differing timescales - both interpretations being equivalent. More importantly
the long time correlations in the zonal flow result in a superdiffusive zonal
AD regime. On the other hand, the meridional LVCF decays rapidly to zero.
Before approaching zero the meridional LVCF shows a region of negative
correlation - a consequence of the presence of planetary scale Rossby waves. As
a result the meridional AD, apart from showing the classical asymptotic
ballistic and diffusive regimes, displays transient subdiffusive behaviour.Comment: Revised version. Submitted to JA
Bayesian estimation of ruin probabilities with heterogeneous and heavy-tailed insurance claim size distribution
This paper describes a Bayesian approach to make inference for risk reserve processes with unknown claim size distribution. A flexible model based on mixtures of Erlang distributions is proposed to approximate the special features frequently observed in insurance claim sizes such as long tails and heterogeneity. A Bayesian density estimation approach for the claim sizes is implemented using reversible jump Markov Chain Monte Carlo methods. An advantage of the considered mixture model is that it belongs to the
class of phase-type distributions and then, explicit evaluations of the ruin probabilities are possible. Furthermore, from a statistical point of view, the parametric structure of the mixtures of Erlang distribution others some advantages compared with the whole over-parameterized family of phase-type distributions. Given the observed claim arrivals and claim sizes, we show how to estimate the ruin probabilities, as a function of the initial capital, and predictive intervals which give a measure of the uncertainty in the estimations
- …