346 research outputs found
On empirical cumulant generating functions of code lengths for individual sequences
We consider the problem of lossless compression of individual sequences using
finite-state (FS) machines, from the perspective of the best achievable
empirical cumulant generating function (CGF) of the code length, i.e., the
normalized logarithm of the empirical average of the exponentiated code length.
Since the probabilistic CGF is minimized in terms of the R\'enyi entropy of the
source, one of the motivations of this study is to derive an
individual-sequence analogue of the R\'enyi entropy, in the same way that the
FS compressibility is the individual-sequence counterpart of the Shannon
entropy. We consider the CGF of the code-length both from the perspective of
fixed-to-variable (F-V) length coding and the perspective of
variable-to-variable (V-V) length coding, where the latter turns out to yield a
better result, that coincides with the FS compressibility. We also extend our
results to compression with side information, available at both the encoder and
decoder. In this case, the V-V version no longer coincides with the FS
compressibility, but results in a different complexity measure.Comment: 15 pages; submitted for publicatio
Generating artificial light curves: Revisited and updated
The production of artificial light curves with known statistical and
variability properties is of great importance in astrophysics. Consolidating
the confidence levels during cross-correlation studies, understanding the
artefacts induced by sampling irregularities, establishing detection limits for
future observatories are just some of the applications of simulated data sets.
Currently, the widely used methodology of amplitude and phase randomisation is
able to produce artificial light curves which have a given underlying power
spectral density (PSD) but which are strictly Gaussian distributed. This
restriction is a significant limitation, since the majority of the light curves
e.g. active galactic nuclei, X-ray binaries, gamma-ray bursts show strong
deviations from Gaussianity exhibiting `burst-like' events in their light
curves yielding long-tailed probability distribution functions (PDFs). In this
study we propose a simple method which is able to precisely reproduce light
curves which match both the PSD and the PDF of either an observed light curve
or a theoretical model. The PDF can be representative of either the parent
distribution or the actual distribution of the observed data, depending on the
study to be conducted for a given source. The final artificial light curves
contain all of the statistical and variability properties of the observed
source or theoretical model i.e. same PDF and PSD, respectively. Within the
framework of Reproducible Research, the code, together with the illustrative
example used in this manuscript, are both made publicly available in the form
of an interactive Mathematica notebook.Comment: Accepted for publication in MNRAS. The paper is 23 pages long and
contains 21 figures and 2 tables. The Mathematica notebook can be found in
the web as part of this paper (Online Material) or at
http://www.astro.soton.ac.uk/~de1e08/ArtificialLightCurves
Spectral identification and estimation of mixed causal-noncausal invertible-noninvertible models
This paper introduces new techniques for estimating, identifying and
simulating mixed causal-noncausal invertible-noninvertible models. We propose a
framework that integrates high-order cumulants, merging both the spectrum and
bispectrum into a single estimation function. The model that most adequately
represents the data under the assumption that the error term is i.i.d. is
selected. Our Monte Carlo study reveals unbiased parameter estimates and a high
frequency with which correct models are identified. We illustrate our strategy
through an empirical analysis of returns from 24 Fama-French emerging market
stock portfolios. The findings suggest that each portfolio displays noncausal
dynamics, producing white noise residuals devoid of conditional heteroscedastic
effects
On Nonlinear Compression Costs: When Shannon Meets Rényi
In compression problems, the minimum average codeword length is achieved by Shannon entropy, and efficient coding schemes such as Arithmetic Coding (AC) achieve optimal compression. In contrast, when minimizing the exponential average length, Rènyi entropy emerges as a compression lower bound. This paper presents a novel approach that extends and applies the AC model to achieve results that are arbitrarily close to Rènyi's lower bound. While rooted in the theoretical framework assuming independent and identically distributed symbols, the empirical testing of this generalized AC model on a Wikipedia dataset with correlated symbols reveals significant performance enhancements over its classical counterpart, when considering the exponential average. The paper also demonstrates an intriguing equivalence between minimizing the exponential average and minimizing the likelihood of exceeding a predetermined threshold in codewords' length. An extensive experimental comparison between generalized and classical AC unveils a remarkable reduction, by several orders of magnitude, in the fraction of codewords surpassing the specified threshold in the Wikipedia dataset
- …