Search CORE

7,273 research outputs found

On the first k moments of the random count of a pattern in a multi-states sequence generated by a Markov source

Author: Nuel Grégory
Publication venue: 'Applied Probability Trust'
Publication date: 22/09/2009
Field of study

In this paper, we develop an explicit formula allowing to compute the first k moments of the random count of a pattern in a multi-states sequence generated by a Markov source. We derive efficient algorithms allowing to deal both with low or high complexity patterns and either homogeneous or heterogenous Markov models. We then apply these results to the distribution of DNA patterns in genomic sequences where we show that moment-based developments (namely: Edgeworth's expansion and Gram-Charlier type B series) allow to improve the reliability of common asymptotic approximations like Gaussian or Poisson approximations

arXiv.org e-Print Archive

HAL Descartes

String Matching and 1d Lattice Gases

Author: A. D. Barbour
A. Dembo
B. Prum
D. Achlioptas
D. E. Knuth
E. Rivals
F. Gürsey
G. E. Uhlenbeck
G. Reinert
H. Harborth
H. S. Wilf
I. Fudos
I. Z. Fisher
J. Kleffe
Jane F. Gentleman
L. Goldstein
L. J. Guibas
L. J. Guibas
L. J. Guibas
M. Mézard
M. Régnier
M. Régnier
M. S. Waterman
M. X. Geske
Muhittin Mungan
O. Chrysaphinou
O. Chrysaphinou
O. Chrysaphinou
P. Pevzner
R. Monasson
S. B. Boyer
S. Karlin
S. Kirkpatrick
S. Robin
S. Robin
S. Robin
S. Schbath
W. Feller
Y. Fu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/08/2005
Field of study

We calculate the probability distributions for the number of occurrences

n

of a given

l

letter word in a random string of

k

letters. Analytical expressions for the distribution are known for the asymptotic regimes (i)

k \gg r^l \gg 1

(Gaussian) and

k,l \to \infty

such that

k/r^l

is finite (Compound Poisson). However, it is known that these distributions do now work well in the intermediate regime

k \gtrsim r^l \gtrsim 1

. We show that the problem of calculating the string matching probability can be cast into a determining the configurational partition function of a 1d lattice gas with interacting particles so that the matching probability becomes the grand-partition sum of the lattice gas, with the number of particles corresponding to the number of matches. We perform a virial expansion of the effective equation of state and obtain the probability distribution. Our result reproduces the behavior of the distribution in all regimes. We are also able to show analytically how the limiting distributions arise. Our analysis builds on the fact that the effective interactions between the particles consist of a relatively strong core of size

l

, the word length, followed by a weak, exponentially decaying tail. We find that the asymptotic regimes correspond to the case where the tail of the interactions can be neglected, while in the intermediate regime they need to be kept in the analysis. Our results are readily generalized to the case where the random strings are generated by more complicated stochastic processes such as a non-uniform letter probability distribution or Markov chains. We show that in these cases the tails of the effective interactions can be made even more dominant rendering thus the asymptotic approximations less accurate in such a regime.Comment: 44 pages and 8 figures. Major revision of previous version. The lattice gas analogy has been worked out in full, including virial expansion and equation of state. This constitutes the main part of the paper now. Connections with existing work is made and references should be up to date now. To be submitted for publicatio

arXiv.org e-Print Archive

Crossref

An R Implementation of the Polya-Aeppli Distribution

Author: Burden Conrad J.
Publication venue
Publication date: 11/06/2014
Field of study

An efficient implementation of the Polya-Aeppli, or geometirc compound Poisson, distribution in the statistical programming language R is presented. The implementation is available as the package polyaAeppli and consists of functions for the mass function, cumulative distribution function, quantile function and random variate generation with those parameters conventionally provided for standard univatiate probability distributions in the stats package in RComment: 9 pages, 2 figure

arXiv.org e-Print Archive

CiteSeerX

Calculation of aggregate loss distributions

Author: Shevchenko Pavel V.
Publication venue
Publication date: 01/01/2010
Field of study

Estimation of the operational risk capital under the Loss Distribution Approach requires evaluation of aggregate (compound) loss distributions which is one of the classic problems in risk theory. Closed-form solutions are not available for the distributions typically used in operational risk. However with modern computer processing power, these distributions can be calculated virtually exactly using numerical methods. This paper reviews numerical algorithms that can be successfully used to calculate the aggregate loss distributions. In particular Monte Carlo, Panjer recursion and Fourier transformation methods are presented and compared. Also, several closed-form approximations based on moment matching and asymptotic result for heavy-tailed distributions are reviewed

arXiv.org e-Print Archive

CiteSeerX

Macquarie University ResearchOnline