2,017 research outputs found
Source coding with escort distributions and Renyi entropy bounds
We discuss the interest of escort distributions and R\'enyi entropy in the
context of source coding. We first recall a source coding theorem by Campbell
relating a generalized measure of length to the R\'enyi-Tsallis entropy. We
show that the associated optimal codes can be obtained using considerations on
escort-distributions. We propose a new family of measure of length involving
escort-distributions and we show that these generalized lengths are also
bounded below by the R\'enyi entropy. Furthermore, we obtain that the standard
Shannon codes lengths are optimum for the new generalized lengths measures,
whatever the entropic index. Finally, we show that there exists in this setting
an interplay between standard and escort distributions
On some entropy functionals derived from R\'enyi information divergence
We consider the maximum entropy problems associated with R\'enyi -entropy,
subject to two kinds of constraints on expected values. The constraints
considered are a constraint on the standard expectation, and a constraint on
the generalized expectation as encountered in nonextensive statistics. The
optimum maximum entropy probability distributions, which can exhibit a
power-law behaviour, are derived and characterized. The R\'enyi entropy of the
optimum distributions can be viewed as a function of the constraint. This
defines two families of entropy functionals in the space of possible expected
values. General properties of these functionals, including nonnegativity,
minimum, convexity, are documented. Their relationships as well as numerical
aspects are also discussed. Finally, we work out some specific cases for the
reference measure and recover in a limit case some well-known entropies
An amended MaxEnt formulation for deriving Tsallis factors, and associated issues
An amended MaxEnt formulation for systems displaced from the conventional
MaxEnt equilibrium is proposed. This formulation involves the minimization of
the Kullback-Leibler divergence to a reference (or maximization of Shannon
-entropy), subject to a constraint that implicates a second reference
distribution and tunes the new equilibrium. In this setting, the
equilibrium distribution is the generalized escort distribution associated to
and . The account of an additional constraint, an observable given
by a statistical mean, leads to the maximization of R\'{e}nyi/Tsallis
-entropy subject to that constraint. Two natural scenarii for this
observation constraint are considered, and the classical and generalized
constraint of nonextensive statistics are recovered. The solutions to the
maximization of R\'{e}nyi -entropy subject to the two types of constraints
are derived. These optimum distributions, that are Levy-like distributions, are
self-referential. We then propose two `alternate' (but effectively computable)
dual functions, whose maximizations enable to identify the optimum parameters.
Finally, a duality between solutions and the underlying Legendre structure are
presented.Comment: Presented at MaxEnt2006, Paris, France, july 10-13, 200
Wavelet-Based Entropy Measures to Characterize Two-Dimensional Fractional Brownian Fields
The aim of this work was to extend the results of Perez et al. (Physica A (2006), 365 (2), 282–288) to the two-dimensional (2D) fractional Brownian field. In particular, we defined Shannon entropy using the wavelet spectrum from which the Hurst exponent is estimated by the regression of the logarithm of the square coefficients over the levels of resolutions. Using the same methodology. we also defined two other entropies in 2D: Tsallis and the Rényi entropies. A simulation study was performed for showing the ability of the method to characterize 2D (in this case, α = 2) self-similar processes
The information-theoretic meaning of Gagliardo--Nirenberg type inequalities
Gagliardo--Nirenberg inequalities are interpolation inequalities which were
proved independently by Gagliardo and Nirenberg in the late fifties. In recent
years, their connections with theoretic aspects of information theory and
nonlinear diffusion equations allowed to obtain some of them in optimal form,
by recovering both the sharp constants and the explicit form of the optimizers.
In this note, at the light of these recent researches, we review the main
connections between Shannon-type entropies, diffusion equations and a class of
these inequalities
On empirical cumulant generating functions of code lengths for individual sequences
We consider the problem of lossless compression of individual sequences using
finite-state (FS) machines, from the perspective of the best achievable
empirical cumulant generating function (CGF) of the code length, i.e., the
normalized logarithm of the empirical average of the exponentiated code length.
Since the probabilistic CGF is minimized in terms of the R\'enyi entropy of the
source, one of the motivations of this study is to derive an
individual-sequence analogue of the R\'enyi entropy, in the same way that the
FS compressibility is the individual-sequence counterpart of the Shannon
entropy. We consider the CGF of the code-length both from the perspective of
fixed-to-variable (F-V) length coding and the perspective of
variable-to-variable (V-V) length coding, where the latter turns out to yield a
better result, that coincides with the FS compressibility. We also extend our
results to compression with side information, available at both the encoder and
decoder. In this case, the V-V version no longer coincides with the FS
compressibility, but results in a different complexity measure.Comment: 15 pages; submitted for publicatio
Generalizations of Fano's Inequality for Conditional Information Measures via Majorization Theory
Fano's inequality is one of the most elementary, ubiquitous, and important
tools in information theory. Using majorization theory, Fano's inequality is
generalized to a broad class of information measures, which contains those of
Shannon and R\'{e}nyi. When specialized to these measures, it recovers and
generalizes the classical inequalities. Key to the derivation is the
construction of an appropriate conditional distribution inducing a desired
marginal distribution on a countably infinite alphabet. The construction is
based on the infinite-dimensional version of Birkhoff's theorem proven by
R\'{e}v\'{e}sz [Acta Math. Hungar. 1962, 3, 188{\textendash}198], and the
constraint of maintaining a desired marginal distribution is similar to
coupling in probability theory. Using our Fano-type inequalities for Shannon's
and R\'{e}nyi's information measures, we also investigate the asymptotic
behavior of the sequence of Shannon's and R\'{e}nyi's equivocations when the
error probabilities vanish. This asymptotic behavior provides a novel
characterization of the asymptotic equipartition property (AEP) via Fano's
inequality.Comment: 44 pages, 3 figure
Strong and Weak Optimizations in Classical and Quantum Models of Stochastic Processes
Among the predictive hidden Markov models that describe a given stochastic
process, the {\epsilon}-machine is strongly minimal in that it minimizes every
R\'enyi-based memory measure. Quantum models can be smaller still. In contrast
with the {\epsilon}-machine's unique role in the classical setting, however,
among the class of processes described by pure-state hidden quantum Markov
models, there are those for which there does not exist any strongly minimal
model. Quantum memory optimization then depends on which memory measure best
matches a given problem circumstance.Comment: 14 pages, 14 figures;
http://csc.ucdavis.edu/~cmg/compmech/pubs/uemum.ht
Sequence information gain based motif analysis
Background: The detection of regulatory regions in candidate sequences is essential for the understanding of the regulation of a particular gene and the mechanisms involved. This paper proposes a novel methodology based on information theoretic metrics for finding regulatory sequences in promoter regions. Results: This methodology (SIGMA) has been tested on genomic sequence data for Homo sapiens and Mus musculus. SIGMA has been compared with different publicly available alternatives for motif detection, such as MEME/MAST, Biostrings (Bioconductor package), MotifRegressor, and previous work such Qresiduals projections or information theoretic based detectors. Comparative results, in the form of Receiver Operating Characteristic curves, show how, in 70 % of the studied Transcription Factor Binding Sites, the SIGMA detector has a better performance and behaves more robustly than the methods compared, while having a similar computational time. The performance of SIGMA can be explained by its parametric simplicity in the modelling of the non-linear co-variability in the binding motif positions. Conclusions: Sequence Information Gain based Motif Analysis is a generalisation of a non-linear model of the cis-regulatory sequences detection based on Information Theory. This generalisation allows us to detect transcription factor binding sites with maximum performance disregarding the covariability observed in the positions of the training set of sequences. SIGMA is freely available to the public at http://b2slab.upc.edu.Postprint (published version
- …