78,526 research outputs found
Analysis and Applications of the T-complexity
T-codes are variable-length self-synchronizing codes introduced by Titchener in 1984. T-code codewords are constructed recursively from a finite alphabet using an algorithm called T-augmentation, resulting in excellent self-synchronization properties. An algorithm called T-decomposition parses a given sequence into a series of T-prefixes, and finds a T-code set in which the sequence is encoded to a longest codeword. There are similarities and differences between T-decomposition and the conventional LZ78 incremental parsing. The LZ78 incremental parsing algorithm parses a given sequence into consecutive distinct subsequences (words) sequentially in such a way that each word consists of the longest matching word parsed previously and a literal symbol. Then, the LZ-complexity is defined as the number of words. By contrast, T-decomposition parses a given sequence into a series of T-prefixes, each of which consists of the recursive concatenation of the longest matching T-prefix parsed previously and a literal symbol, and it has to access the whole sequence every time it determines a T-prefix. Alike to the LZ-complexity, the T-complexity of a sequence is defined as the number of T-prefixes, however, the T-complexity of a particular sequence in general tends to be smaller than the LZ-complexity. In the first part of the thesis, we deal with our contributions to the theory of T-codes. In order to realize a sequential determination of T-prefixes, we devise a new T-decomposition algorithm using forward parsing. Both the T-complexity profile obtained from the forward T-decomposition and the LZ-complexity profile can be derived in a unified way using a differential equation method. The method focuses on the increase of the average codeword length of a code tree. The obtained formulas are confirmed to coincide with those of previous studies. The magnitude of the T-complexity of a given sequence s in general indicates the degree of randomness. However, there exist interesting sequences that have much larger T-complexities than any random sequences. We investigate the maximum T-complexity sequences and the maximum LZ-complexity sequences using various techniques including those of the test suite released by the National Institute of Standards and Technology (NIST) of the U.S. government, and find that the maximum T-complexity sequences are less random than the maximum LZ-complexity sequences. In the second part of the thesis, we present our achievements in terms of application. We consider two applications -- data compression and randomness testing. First, we propose a new data compression scheme based on T-codes using a dictionary method such that all phrases added to a dictionary have a recursive structure similar to T-codes. Our data compression scheme can compress any of the files in the Calgary Corpus more efficiently than previous schemes based on T-codes and the UNIX compress, a variant of LZ78 (LZW). Next, we introduce a randomness test based on the T-complexity. Recently, the Lempel-Ziv (LZ) randomness test based on the LZ-complexity was officially excluded from the NIST test suite. This is because the distribution of P-values for random sequences of length 106, the most common length used, is strictly discrete in the case of the LZ-complexity. Our test solves this problem because the T-complexity features an almost ideal uniform continuous distribution of P-values for random sequences of length 106. The proposed test outperforms the NIST LZ test, a modified LZ test proposed by Doganaksoy and Göloglu, and all other tests included in the NIST test suite, in terms of the detection of undesirable pseudorandom sequences generated by a multiplicative congruential generator (MCG) and non-random byte sequences Y = Y0, Y1, Y2, ···, where Y3i and Y(3i+1) are random, but Y(3i+2) is given by Y3i + Y(3i+1) mod 28.報告番号: 甲26141 ; 学位授与年月日: 2010-03-24 ; 学位の種別: 課程博士 ; 学位の種類: 博士(科学) ; 学位記番号: 博創域第558号 ; 研究科・専攻: 新領域創成科学研究科基盤科学研究系複雑理工学専
Algorithmic Complexity for Short Binary Strings Applied to Psychology: A Primer
Since human randomness production has been studied and widely used to assess
executive functions (especially inhibition), many measures have been suggested
to assess the degree to which a sequence is random-like. However, each of them
focuses on one feature of randomness, leading authors to have to use multiple
measures. Here we describe and advocate for the use of the accepted universal
measure for randomness based on algorithmic complexity, by means of a novel
previously presented technique using the the definition of algorithmic
probability. A re-analysis of the classical Radio Zenith data in the light of
the proposed measure and methodology is provided as a study case of an
application.Comment: To appear in Behavior Research Method
Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity
The relationship between the Bayesian approach and the minimum description
length approach is established. We sharpen and clarify the general modeling
principles MDL and MML, abstracted as the ideal MDL principle and defined from
Bayes's rule by means of Kolmogorov complexity. The basic condition under which
the ideal principle should be applied is encapsulated as the Fundamental
Inequality, which in broad terms states that the principle is valid when the
data are random, relative to every contemplated hypothesis and also these
hypotheses are random relative to the (universal) prior. Basically, the ideal
principle states that the prior probability associated with the hypothesis
should be given by the algorithmic universal probability, and the sum of the
log universal probability of the model plus the log of the probability of the
data given the model should be minimized. If we restrict the model class to the
finite sets then application of the ideal principle turns into Kolmogorov's
minimal sufficient statistic. In general we show that data compression is
almost always the best strategy, both in hypothesis identification and
prediction.Comment: 35 pages, Latex. Submitted IEEE Trans. Inform. Theor
Algorithmic complexity for psychology: A user-friendly implementation of the coding theorem method
Kolmogorov-Chaitin complexity has long been believed to be impossible to
approximate when it comes to short sequences (e.g. of length 5-50). However,
with the newly developed \emph{coding theorem method} the complexity of strings
of length 2-11 can now be numerically estimated. We present the theoretical
basis of algorithmic complexity for short strings (ACSS) and describe an
R-package providing functions based on ACSS that will cover psychologists'
needs and improve upon previous methods in three ways: (1) ACSS is now
available not only for binary strings, but for strings based on up to 9
different symbols, (2) ACSS no longer requires time-consuming computing, and
(3) a new approach based on ACSS gives access to an estimation of the
complexity of strings of any length. Finally, three illustrative examples show
how these tools can be applied to psychology.Comment: to appear in "Behavioral Research Methods", 14 pages in journal
format, R package at http://cran.r-project.org/web/packages/acss/index.htm
Postprocessing for quantum random number generators: entropy evaluation and randomness extraction
Quantum random-number generators (QRNGs) can offer a means to generate
information-theoretically provable random numbers, in principle. In practice,
unfortunately, the quantum randomness is inevitably mixed with classical
randomness due to classical noises. To distill this quantum randomness, one
needs to quantify the randomness of the source and apply a randomness
extractor. Here, we propose a generic framework for evaluating quantum
randomness of real-life QRNGs by min-entropy, and apply it to two different
existing quantum random-number systems in the literature. Moreover, we provide
a guideline of QRNG data postprocessing for which we implement two
information-theoretically provable randomness extractors: Toeplitz-hashing
extractor and Trevisan's extractor.Comment: 13 pages, 2 figure
Kolmogorov Complexity in perspective. Part I: Information Theory and Randomnes
We survey diverse approaches to the notion of information: from Shannon
entropy to Kolmogorov complexity. Two of the main applications of Kolmogorov
complexity are presented: randomness and classification. The survey is divided
in two parts in the same volume. Part I is dedicated to information theory and
the mathematical formalization of randomness based on Kolmogorov complexity.
This last application goes back to the 60's and 70's with the work of
Martin-L\"of, Schnorr, Chaitin, Levin, and has gained new impetus in the last
years.Comment: 40 page
- …