346 research outputs found

    On empirical cumulant generating functions of code lengths for individual sequences

    Full text link
    We consider the problem of lossless compression of individual sequences using finite-state (FS) machines, from the perspective of the best achievable empirical cumulant generating function (CGF) of the code length, i.e., the normalized logarithm of the empirical average of the exponentiated code length. Since the probabilistic CGF is minimized in terms of the R\'enyi entropy of the source, one of the motivations of this study is to derive an individual-sequence analogue of the R\'enyi entropy, in the same way that the FS compressibility is the individual-sequence counterpart of the Shannon entropy. We consider the CGF of the code-length both from the perspective of fixed-to-variable (F-V) length coding and the perspective of variable-to-variable (V-V) length coding, where the latter turns out to yield a better result, that coincides with the FS compressibility. We also extend our results to compression with side information, available at both the encoder and decoder. In this case, the V-V version no longer coincides with the FS compressibility, but results in a different complexity measure.Comment: 15 pages; submitted for publicatio

    Generating artificial light curves: Revisited and updated

    Full text link
    The production of artificial light curves with known statistical and variability properties is of great importance in astrophysics. Consolidating the confidence levels during cross-correlation studies, understanding the artefacts induced by sampling irregularities, establishing detection limits for future observatories are just some of the applications of simulated data sets. Currently, the widely used methodology of amplitude and phase randomisation is able to produce artificial light curves which have a given underlying power spectral density (PSD) but which are strictly Gaussian distributed. This restriction is a significant limitation, since the majority of the light curves e.g. active galactic nuclei, X-ray binaries, gamma-ray bursts show strong deviations from Gaussianity exhibiting `burst-like' events in their light curves yielding long-tailed probability distribution functions (PDFs). In this study we propose a simple method which is able to precisely reproduce light curves which match both the PSD and the PDF of either an observed light curve or a theoretical model. The PDF can be representative of either the parent distribution or the actual distribution of the observed data, depending on the study to be conducted for a given source. The final artificial light curves contain all of the statistical and variability properties of the observed source or theoretical model i.e. same PDF and PSD, respectively. Within the framework of Reproducible Research, the code, together with the illustrative example used in this manuscript, are both made publicly available in the form of an interactive Mathematica notebook.Comment: Accepted for publication in MNRAS. The paper is 23 pages long and contains 21 figures and 2 tables. The Mathematica notebook can be found in the web as part of this paper (Online Material) or at http://www.astro.soton.ac.uk/~de1e08/ArtificialLightCurves

    Spectral identification and estimation of mixed causal-noncausal invertible-noninvertible models

    Full text link
    This paper introduces new techniques for estimating, identifying and simulating mixed causal-noncausal invertible-noninvertible models. We propose a framework that integrates high-order cumulants, merging both the spectrum and bispectrum into a single estimation function. The model that most adequately represents the data under the assumption that the error term is i.i.d. is selected. Our Monte Carlo study reveals unbiased parameter estimates and a high frequency with which correct models are identified. We illustrate our strategy through an empirical analysis of returns from 24 Fama-French emerging market stock portfolios. The findings suggest that each portfolio displays noncausal dynamics, producing white noise residuals devoid of conditional heteroscedastic effects

    On Nonlinear Compression Costs: When Shannon Meets Rényi

    Get PDF
    In compression problems, the minimum average codeword length is achieved by Shannon entropy, and efficient coding schemes such as Arithmetic Coding (AC) achieve optimal compression. In contrast, when minimizing the exponential average length, Rènyi entropy emerges as a compression lower bound. This paper presents a novel approach that extends and applies the AC model to achieve results that are arbitrarily close to Rènyi's lower bound. While rooted in the theoretical framework assuming independent and identically distributed symbols, the empirical testing of this generalized AC model on a Wikipedia dataset with correlated symbols reveals significant performance enhancements over its classical counterpart, when considering the exponential average. The paper also demonstrates an intriguing equivalence between minimizing the exponential average and minimizing the likelihood of exceeding a predetermined threshold in codewords' length. An extensive experimental comparison between generalized and classical AC unveils a remarkable reduction, by several orders of magnitude, in the fraction of codewords surpassing the specified threshold in the Wikipedia dataset

    Bit-Interleaved Coded Modulation

    Get PDF
    corecore