1,706 research outputs found
On Match Lengths, Zero Entropy and Large Deviations - with Application to Sliding Window Lempel-Ziv Algorithm
The Sliding Window Lempel-Ziv (SWLZ) algorithm that makes use of recurrence
times and match lengths has been studied from various perspectives in
information theory literature. In this paper, we undertake a finer study of
these quantities under two different scenarios, i) \emph{zero entropy} sources
that are characterized by strong long-term memory, and ii) the processes with
weak memory as described through various mixing conditions.
For zero entropy sources, a general statement on match length is obtained. It
is used in the proof of almost sure optimality of Fixed Shift Variant of
Lempel-Ziv (FSLZ) and SWLZ algorithms given in literature. Through an example
of stationary and ergodic processes generated by an irrational rotation we
establish that for a window of size , a compression ratio given by
where depends on and approaches 1 as
, is obtained under the application of FSLZ and SWLZ
algorithms. Also, we give a general expression for the compression ratio for a
class of stationary and ergodic processes with zero entropy.
Next, we extend the study of Ornstein and Weiss on the asymptotic behavior of
the \emph{normalized} version of recurrence times and establish the \emph{large
deviation property} (LDP) for a class of mixing processes. Also, an estimator
of entropy based on recurrence times is proposed for which large deviation
principle is proved for sources satisfying similar mixing conditions.Comment: accepted to appear in IEEE Transactions on Information Theor
Guessing based on length functions
A guessing wiretapper's performance on a Shannon cipher system is analyzed
for a source with memory. Close relationships between guessing functions and
length functions are first established. Subsequently, asymptotically optimal
encryption and attack strategies are identified and their performances analyzed
for sources with memory. The performance metrics are exponents of guessing
moments and probability of large deviations. The metrics are then characterized
for unifilar sources. Universal asymptotically optimal encryption and attack
strategies are also identified for unifilar sources. Guessing in the increasing
order of Lempel-Ziv coding lengths is proposed for finite-state sources, and
shown to be asymptotically optimal. Finally, competitive optimality properties
of guessing in the increasing order of description lengths and Lempel-Ziv
coding lengths are demonstrated.Comment: 16 pages, Submitted to IEEE Transactions on Information Theory,
Special issue on Information Theoretic Security, Simplified proof of
Proposition
An Information-Theoretic Test for Dependence with an Application to the Temporal Structure of Stock Returns
Information theory provides ideas for conceptualising information and
measuring relationships between objects. It has found wide application in the
sciences, but economics and finance have made surprisingly little use of it. We
show that time series data can usefully be studied as information -- by noting
the relationship between statistical redundancy and dependence, we are able to
use the results of information theory to construct a test for joint dependence
of random variables. The test is in the same spirit of those developed by
Ryabko and Astola (2005, 2006b,a), but differs from these in that we add extra
randomness to the original stochatic process. It uses data compression to
estimate the entropy rate of a stochastic process, which allows it to measure
dependence among sets of random variables, as opposed to the existing
econometric literature that uses entropy and finds itself restricted to
pairwise tests of dependence. We show how serial dependence may be detected in
S&P500 and PSI20 stock returns over different sample periods and frequencies.
We apply the test to synthetic data to judge its ability to recover known
temporal dependence structures.Comment: 22 pages, 7 figure
Pattern matching in Lempel-Ziv compressed strings: fast, simple, and deterministic
Countless variants of the Lempel-Ziv compression are widely used in many
real-life applications. This paper is concerned with a natural modification of
the classical pattern matching problem inspired by the popularity of such
compression methods: given an uncompressed pattern s[1..m] and a Lempel-Ziv
representation of a string t[1..N], does s occur in t? Farach and Thorup gave a
randomized O(nlog^2(N/n)+m) time solution for this problem, where n is the size
of the compressed representation of t. We improve their result by developing a
faster and fully deterministic O(nlog(N/n)+m) time algorithm with the same
space complexity. Note that for highly compressible texts, log(N/n) might be of
order n, so for such inputs the improvement is very significant. A (tiny)
fragment of our method can be used to give an asymptotically optimal solution
for the substring hashing problem considered by Farach and Muthukrishnan.Comment: submitte
- …