Search CORE

9 research outputs found

On asymptotically optimal tests for random number generators

Author: Ryabko Boris
Publication venue
Publication date: 10/12/2019
Field of study

The problem of constructing effective statistical tests for random number generators (RNG) is considered. Currently, statistical tests for RNGs are a mandatory part of cryptographic information protection systems, but their effectiveness is mainly estimated based on experiments with various RNGs. We find an asymptotic estimate for the p-value of an optimal test in the case where the alternative hypothesis is a known stationary ergodic source, and then describe a family of tests each of which has the same asymptotic estimate of the p-value for any (unknown) stationary ergodic source

arXiv.org e-Print Archive

Cryptology ePrint Archive

Fast and Efficient Entropy Coding Architectures for Massive Data Compression

Author: Aulí Llinàs Francesc
Publication venue
Publication date: 01/01/2023
Field of study

The compression of data is fundamental to alleviating the costs of transmitting and storing massive datasets employed in myriad fields of our society. Most compression systems employ an entropy coder in their coding pipeline to remove the redundancy of coded symbols. The entropy-coding stage needs to be efficient, to yield high compression ratios, and fast, to process large amounts of data rapidly. Despite their widespread use, entropy coders are commonly assessed for some particular scenario or coding system. This work provides a general framework to assess and optimize different entropy coders. First, the paper describes three main families of entropy coders, namely those based on variable-to-variable length codes (V2VLC), arithmetic coding (AC), and tabled asymmetric numeral systems (tANS). Then, a low-complexity architecture for the most representative coder(s) of each family is presented-more precisely, a general version of V2VLC, the MQ, M, and a fixed-length version of AC and two different implementations of tANS. These coders are evaluated under different coding conditions in terms of compression efficiency and computational throughput. The results obtained suggest that V2VLC and tANS achieve the highest compression ratios for most coding rates and that the AC coder that uses fixed-length codewords attains the highest throughput. The experimental evaluation discloses the advantages and shortcomings of each entropy-coding scheme, providing insights that may help to select this stage in forthcoming compression systems

Diposit Digital de Documents de la UAB

Average Redundancy for Known Sources: Ubiquitous Trees in Source Coding

Author: Wojciech Szpankowski
Publication venue: Discrete Mathematics & Theoretical Computer Science
Publication date: 01/01/2008
Field of study

Analytic information theory aims at studying problems of information theory using analytic techniques of computer science and combinatorics. Following Hadamard's precept, these problems are tackled by complex analysis methods such as generating functions, Mellin transform, Fourier series, saddle point method, analytic poissonization and depoissonization, and singularity analysis. This approach lies at the crossroad of computer science and information theory. In this survey we concentrate on one facet of information theory (i.e., source coding better known as data compression), namely the

\textit{redundancy rate}

problem. The redundancy rate problem determines by how much the actual code length exceeds the optimal code length. We further restrict our interest to the

\textit{average}

redundancy for

\textit{known}

sources, that is, when statistics of information sources are known. We present precise analyses of three types of lossless data compression schemes, namely fixed-to-variable (FV) length codes, variable-to-fixed (VF) length codes, and variable-to-variable (VV) length codes. In particular, we investigate average redundancy of Huffman, Tunstall, and Khodak codes. These codes have succinct representations as

\textit{trees}

, either as coding or parsing trees, and we analyze here some of their parameters (e.g., the average path from the root to a leaf)

Directory of Open Access Journals

Universal Source Coding in the Non-Asymptotic Regime

Author
Publication venue
Publication date: 01/01/2018
Field of study

abstract: Fundamental limits of fixed-to-variable (F-V) and variable-to-fixed (V-F) length universal source coding at short blocklengths is characterized. For F-V length coding, the Type Size (TS) code has previously been shown to be optimal up to the third-order rate for universal compression of all memoryless sources over finite alphabets. The TS code assigns sequences ordered based on their type class sizes to binary strings ordered lexicographically. Universal F-V coding problem for the class of first-order stationary, irreducible and aperiodic Markov sources is first considered. Third-order coding rate of the TS code for the Markov class is derived. A converse on the third-order coding rate for the general class of F-V codes is presented which shows the optimality of the TS code for such Markov sources. This type class approach is then generalized for compression of the parametric sources. A natural scheme is to define two sequences to be in the same type class if and only if they are equiprobable under any model in the parametric class. This natural approach, however, is shown to be suboptimal. A variation of the Type Size code is introduced, where type classes are defined based on neighborhoods of minimal sufficient statistics. Asymptotics of the overflow rate of this variation is derived and a converse result establishes its optimality up to the third-order term. These results are derived for parametric families of i.i.d. sources as well as Markov sources. Finally, universal V-F length coding of the class of parametric sources is considered in the short blocklengths regime. The proposed dictionary which is used to parse the source output stream, consists of sequences in the boundaries of transition from low to high quantized type complexity, hence the name Type Complexity (TC) code. For large enough dictionary, the

\epsilon

-coding rate of the TC code is derived and a converse result is derived showing its optimality up to the third-order term.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

ASU Digital Repository

Space Efficient Encodings for Bit-strings, Range queries and Related Problems

Author: 조승범
Publication venue: 서울대학교 대학원
Publication date: 01/02/2016
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2016. 2. Srinivasa Rao Satti.In this thesis, we design and implement various space efficient data structures. Most of these structures use spaces close to the information-theoretic lower bound while supporting the queries efficiently. In particular, this thesis is concerned with the data structures for four problems: (i) supporting \rank{} and \select{} queries on compressed bit strings, (ii) nearest larger neighbor problem, (iii) simultaneous encodings for range and next/previous larger/smaller value queries, and (iv) range \topk{} queries on two-dimensional arrays. We first consider practical implementations of \emph{compressed} bitvectors, which support \rank{} and \select{} operations on a given bit-string, while storing the bit-string in compressed form~\cite{DBLP:conf/dcc/JoJORS14}. Our approach relies on \emph{variable-to-fixed} encodings of the bit-string, an approach that has not yet been considered systematically for practical encodings of bitvectors. We show that this approach leads to fast practical implementations with low \emph{redundancy} (i.e., the space used by the bitvector in addition to the compressed representation of the bit-string), and is a flexible and promising solution to the problem of supporting \rank{} and \select{} on moderately compressible bit-strings, such as those encountered in real-world applications. Next, we propose space-efficient data structures for the nearest larger neighbor problem~\cite{IWOCA2014,walcom-JoRS15}. Given a sequence of

n

elements from a total order, and a position in the sequence, the nearest larger neighbor (\NLV{}) query returns the position of the element which is closest to the query position, and is larger than the element at the query position. The problem of finding all nearest larger neighbors has attracted interest due to its applications for parenthesis matching and in computational geometry~\cite{AsanoBK09,AsanoK13,BerkmanSV93}. We consider a data structure version of this problem, which is to preprocess a given sequence of elements to construct a data structure that can answer \NLN{} queries efficiently. For one-dimensional arrays, we give time-space tradeoffs for the problem on \textit{indexing model}. For two-dimensional arrays, we give an optimal encoding with constant query on \textit{encoding model}. We also propose space-efficient encodings which support various range queries, and previous and next smaller/larger value queries~\cite{cocoonJS15}. Given a sequence of

n

elements from a total order, we obtain a

4.088n + o(n)

-bit encoding that supports all these queries where

n

is the length of input array. For the case when we need to support all these queries in constant time, we give an encoding that takes

4.585n + o(n)

bits. This improves the

5.08n+o(n)

-bit encoding obtained by encoding the colored

2d

-Min and

2d

-Max heaps proposed by Fischer~\cite{Fischer11}. We extend the original DFUDS~\cite{BDMRRR05} encoding of the colored

2d

-Min and

2d

-Max heap that supports the queries in constant time. Then, we combine the extended DFUDS of

2d

-Min heap and

2d

-Max heap using the Min-Max encoding of Gawrychowski and Nicholson~\cite{Gawry14} with some modifications. We also obtain encodings that take lesser space and support a subset of these queries. Finally, we consider the various encodings that support range \topk{} queries on a two-dimensional array containing elements from a total order. For an

m \times n

array, we first propose an optimal encoding for answering one-sided \topk{} queries, whose query range is restricted to

[1 \dots m][1 \dots a]

, for

1 \le a \le n

. Next, we propose an encoding for the general \topk{} queries that takes

m^2\lg{{(k+1)n \choose n}} + m\lg{m}+o(n)

bits. This generalizes the \topk{} encoding of Gawrychowski and Nicholson~\cite{Gawry14}.Chapter 1 Introduction 1 1.1 Computational model 2 1.1.1 Encoding and indexing models 2 1.2 Contribution of the thesis 3 1.3 Organization of the thesis 5 Chapter 2 Preliminaries 7 Chapter 3 Compressed bit vectors based on variable-to-fixed encodings 10 3.1 Introduction 10 3.2 Bit-vectors using V2F coding 14 3.3 V2F compression algorithms for bit-strings 16 3.3.1 Tunstall code 16 3.3.2 Enumerative codes 19 3.3.3 LZW algorithm 23 3.3.4 Empirical evaluation of the compressors 23 3.4 Practical implementation of bitvectors based on V2F compression. 26 3.4.1 Testing Methodology 29 3.4.2 Results of Empirical Evaluation 33 3.5 Future works 35 Chapter 4 Space Efficient Data Structures for Nearest Larger Neighbor 39 4.1 Introduction 39 4.2 Indexing NLV queries on 1D arrays 43 4.3 Encoding NLN queries on2D binary arrays 44 4.4 Encoding NLN queries for general 2D arrays 50 4.4.1 2D NLN in the encoding model–distinct case 50 4.4.2 2D NLN in the encoding model–general case 53 4.5 Open problems 63 Chapter 5 Simultaneous encodings for range and next/previous larger/smaller value queries 64 5.1 Introduction 64 5.2 Preliminaries 67 5.2.1 2d-Min heap 69 5.2.2 Encoding range min-max queries 72 5.3 Extended DFUDS for colored 2d-Min heap 75 5.4 Encoding colored 2d-Min and 2d-Max heaps 80 5.4.1 Combined data structure for DCMin(A) and DCMax(A) 82 5.4.2 Encoding colored 2d-Min and 2d-Max heaps using less space 88 5.5 Open problems 89 Chapter 6 Encoding Two-dimensional range Top-k queries 90 6.1 Introduction 90 6.2 Encoding one-sided range Top-k queries on 2D array 92 6.3 Encoding general range Top-k queries on 2D array 95 6.4 Open problems 99 Chapter 7 Conculsion 100 Bibliography 103 요약 112Docto

SNU Open Repository and Archive