7,273 research outputs found
On the first k moments of the random count of a pattern in a multi-states sequence generated by a Markov source
In this paper, we develop an explicit formula allowing to compute the first k
moments of the random count of a pattern in a multi-states sequence generated
by a Markov source. We derive efficient algorithms allowing to deal both with
low or high complexity patterns and either homogeneous or heterogenous Markov
models. We then apply these results to the distribution of DNA patterns in
genomic sequences where we show that moment-based developments (namely:
Edgeworth's expansion and Gram-Charlier type B series) allow to improve the
reliability of common asymptotic approximations like Gaussian or Poisson
approximations
String Matching and 1d Lattice Gases
We calculate the probability distributions for the number of occurrences
of a given letter word in a random string of letters. Analytical
expressions for the distribution are known for the asymptotic regimes (i) (Gaussian) and such that is finite
(Compound Poisson). However, it is known that these distributions do now work
well in the intermediate regime . We show that the
problem of calculating the string matching probability can be cast into a
determining the configurational partition function of a 1d lattice gas with
interacting particles so that the matching probability becomes the
grand-partition sum of the lattice gas, with the number of particles
corresponding to the number of matches. We perform a virial expansion of the
effective equation of state and obtain the probability distribution. Our result
reproduces the behavior of the distribution in all regimes. We are also able to
show analytically how the limiting distributions arise. Our analysis builds on
the fact that the effective interactions between the particles consist of a
relatively strong core of size , the word length, followed by a weak,
exponentially decaying tail. We find that the asymptotic regimes correspond to
the case where the tail of the interactions can be neglected, while in the
intermediate regime they need to be kept in the analysis. Our results are
readily generalized to the case where the random strings are generated by more
complicated stochastic processes such as a non-uniform letter probability
distribution or Markov chains. We show that in these cases the tails of the
effective interactions can be made even more dominant rendering thus the
asymptotic approximations less accurate in such a regime.Comment: 44 pages and 8 figures. Major revision of previous version. The
lattice gas analogy has been worked out in full, including virial expansion
and equation of state. This constitutes the main part of the paper now.
Connections with existing work is made and references should be up to date
now. To be submitted for publicatio
An R Implementation of the Polya-Aeppli Distribution
An efficient implementation of the Polya-Aeppli, or geometirc compound
Poisson, distribution in the statistical programming language R is presented.
The implementation is available as the package polyaAeppli and consists of
functions for the mass function, cumulative distribution function, quantile
function and random variate generation with those parameters conventionally
provided for standard univatiate probability distributions in the stats package
in RComment: 9 pages, 2 figure
Calculation of aggregate loss distributions
Estimation of the operational risk capital under the Loss Distribution
Approach requires evaluation of aggregate (compound) loss distributions which
is one of the classic problems in risk theory. Closed-form solutions are not
available for the distributions typically used in operational risk. However
with modern computer processing power, these distributions can be calculated
virtually exactly using numerical methods. This paper reviews numerical
algorithms that can be successfully used to calculate the aggregate loss
distributions. In particular Monte Carlo, Panjer recursion and Fourier
transformation methods are presented and compared. Also, several closed-form
approximations based on moment matching and asymptotic result for heavy-tailed
distributions are reviewed
- …