Search CORE

108 research outputs found

Time-universal data compression and prediction

Author: Ryabko Boris
Publication venue
Publication date: 09/09/2018
Field of study

Suppose there is a large file which should be transmitted (or stored) and there are several (say, m) admissible data-compressors. It seems natural to try all the compressors and then choose the best, i.e. the one that gives the shortest compressed file. Then transfer (or store) the index number of the best compressor (it requires log m bits) and the compressed file.The only problem is the time, which essentially increases due to the need to compress the file m times (in order to find the best compressor). We propose a method that encodes the file with the optimal compressor, but uses a relatively small additional time: the ratio of this extra time and the total time of calculation can be limited by an arbitrary positive constant. Generally speaking, in many situations it may be necessary find the best data compressor out of a given set, which is often done by comparing them empirically. One of the goals of this work is to turn such a selection process into a part of the data compression method, automating and optimizing it

arXiv.org e-Print Archive

Using Information Theory to Study the Efficiency and Capacity of Caching in the Computer Networks

Author: Ryabko Boris
Publication venue
Publication date: 13/10/2013
Field of study

Nowadays computer networks use different kind of memory whose speeds and capacities vary widely. There exist methods of a so-called caching which are intended to use the different kinds of memory in such a way that the frequently used data are stored in the faster memory, wheres the infrequent ones are stored in the slower memory. We address the problems of estimating the caching efficiency and its capacity. We define the efficiency and capacity of the caching and suggest a method for their estimation based on the analysis of kinds of the accessible memory

arXiv.org e-Print Archive

Fast Enumeration of Combinatorial Objects

Author: Ryabko Boris
Publication venue
Publication date: 15/01/2006
Field of study

The problem of ranking can be described as follows. We have a set of combinatorial objects

S

, such as, say, the k-subsets of n things, and we can imagine that they have been arranged in some list, say lexicographically, and we want to have a fast method for obtaining the rank of a given object in the list. This problem is widely known in Combinatorial Analysis, Computer Science and Information Theory. Ranking is closely connected with the hashing problem, especially with perfect hashing and with generating of random combinatorial objects. In Information Theory the ranking problem is closely connected with so-called enumerative encoding, which may be described as follows: there is a set of words

S

and an enumerative code has to one-to-one encode every

s \in S

by a binary word

code(s)

. The length of the

code(s)

must be the same for all

s \in S

. Clearly,

|code (s)|\geq \log |S|

. (Here and below

\log x=\log_{2}x)

.) The suggested method allows the exponential growth of the speed of encoding and decoding for all combinatorial problems of enumeration which are considered, including the enumeration of permutations, compositions and others

arXiv.org e-Print Archive

The Imaginary Sliding Window As a New Data Structure for Adaptive Algorithms

Author: Ryabko Boris
Publication venue
Publication date: 27/09/2008
Field of study

The scheme of the sliding window is known in Information Theory, Computer Science, the problem of predicting and in stastistics. Let a source with unknown statistics generate some word

... x_{-1}x_{0}x_{1}x_{2}...

in some alphabet

A

. For every moment

t, t=...

-1, 0, 1, ...

, one stores the word ("window")

x_{t-w} x_{t-w+1}... x_{t-1}

where

w

w \geq 1

, is called "window length". In the theory of universal coding, the code of the

x_{t}

depends on source ststistics estimated by the window, in the problem of predicting, each letter

x_{t}

is predicted using information of the window, etc. After that the letter

x_{t}

is included in the window on the right, while

x_{t-w}

is removed from the window. It is the sliding window scheme. This scheme has two merits: it allows one i) to estimate the source statistics quite precisely and ii) to adapt the code in case of a change in the source' statistics. However this scheme has a defect, namely, the necessity to store the window (i.e. the word

x_{t-w}... x_{t-1})

which needs a large memory size for large

w

. A new scheme named "the Imaginary Sliding Window (ISW)" is constructed. The gist of this scheme is that not the last element

x_{t-w}

but rather a random one is removed from the window. This allows one to retain both merits of the sliding window as well as the possibility of not storing the window and thus significantly decreasing the memory size.Comment: Published in: Problems of information transmission,1996,v.32,#

arXiv.org e-Print Archive

Applications of Universal Source Coding to Statistical Analysis of Time Series

Author: Ryabko Boris
Publication venue
Publication date: 01/01/2008
Field of study

We show how universal codes can be used for solving some of the most important statistical problems for time series. By definition, a universal code (or a universal lossless data compressor) can compress any sequence generated by a stationary and ergodic source asymptotically to the Shannon entropy, which, in turn, is the best achievable ratio for lossless data compressors. We consider finite-alphabet and real-valued time series and the following problems: estimation of the limiting probabilities for finite-alphabet time series and estimation of the density for real-valued time series, the on-line prediction, regression, classification (or problems with side information) for both types of the time series and the following problems of hypothesis testing: goodness-of-fit testing, or identity testing, and testing of serial independence. It is important to note that all problems are considered in the framework of classical mathematical statistics and, on the other hand, everyday methods of data compression (or archivers) can be used as a tool for the estimation and testing. It turns out, that quite often the suggested methods and tests are more powerful than known ones when they are applied in practice.Comment: accepted for publicatio

arXiv.org e-Print Archive

CiteSeerX

Two-faced processes and random number generators

Author: Ryabko Boris
Publication venue
Publication date: 22/12/2015
Field of study

We describe random processes (with binary alphabet) whose entropy is less than 1 (per letter), but they mimic true random process, i.e., by definition, generated sequence can be interpreted as the result of the flips of a fair coin with sides that are labeled 0 and 1. It gives a possibility to construct Random Number Generators which possess theoretical guarantees. This, in turn, is important for applications such as those in cryptography

arXiv.org e-Print Archive

Using Information Theory to Study the Efficiency and Capacity of Computers and Similar Devices

Author: Ryabko Boris
Publication venue
Publication date: 18/03/2010
Field of study

We address the problems of estimating the computer efficiency and the computer capacity. We define the computer efficiency and capacity and suggest a method for their estimation, based on the analysis of processor instructions and kinds of accessible memory. It is shown how the suggested method can be applied to estimate the computer capacity. In particular, this consideration gives a new look at the organization of the memory of a computer. Obtained results can be of some interest for practical application

arXiv.org e-Print Archive

Compression-based methods for nonparametric density estimation, on-line prediction, regression and classification for time series

Author: Ryabko Boris
Publication venue
Publication date: 01/11/2007
Field of study

We address the problem of nonparametric estimation of characteristics for stationary and ergodic time series. We consider finite-alphabet time series and real-valued ones and the following four problems: i) estimation of the (limiting) probability (or estimation of the density for real-valued time series), ii) on-line prediction, iii) regression and iv) classification (or so-called problems with side information). We show that so-called archivers (or data compressors) can be used as a tool for solving these problems. In particular, firstly, it is proven that any so-called universal code (or universal data compressor) can be used as a basis for constructing asymptotically optimal methods for the above problems. (By definition, a universal code can "compress" any sequence generated by a stationary and ergodic source asymptotically till the Shannon entropy of the source.) And, secondly, we show experimentally that estimates, which are based on practically used methods of data compression, have a reasonable precision

arXiv.org e-Print Archive

Application of the Computer Capacity to the Analysis of Processors Evolution

Author: Rakitskiy Anton
Ryabko Boris
Publication venue
Publication date: 14/05/2017
Field of study

The notion of computer capacity was proposed in 2012, and this quantity has been estimated for computers of different kinds. In this paper we show that, when designing new processors, the manufacturers change the parameters that affect the computer capacity. This allows us to predict the values of parameters of future processors. As the main example we use Intel processors, due to the accessibility of detailed description of all their technical characteristics

arXiv.org e-Print Archive

Prediction of Large Alphabet Processes and Its Application to Adaptive Source Coding

Author: Astola Jaakko
Ryabko Boris
Publication venue
Publication date: 01/01/2005
Field of study

The problem of predicting a sequence

x_1,x_2,...

generated by a discrete source with unknown statistics is considered. Each letter

x_{t+1}

is predicted using information on the word

x_1x_2... x_t

only. In fact, this problem is a classical problem which has received much attention. Its history can be traced back to Laplace. We address the problem where each

x_i

belongs to some large (or even infinite) alphabet. A method is presented for which the precision is greater than for known algorithms, where precision is estimated by the Kullback-Leibler divergence. The results can readily be translated to results about adaptive coding.Comment: submitte

arXiv.org e-Print Archive

CiteSeerX