674 research outputs found

    Optimal prefix codes for pairs of geometrically-distributed random variables

    Full text link
    Optimal prefix codes are studied for pairs of independent, integer-valued symbols emitted by a source with a geometric probability distribution of parameter qq, 0<q<10{<}q{<}1. By encoding pairs of symbols, it is possible to reduce the redundancy penalty of symbol-by-symbol encoding, while preserving the simplicity of the encoding and decoding procedures typical of Golomb codes and their variants. It is shown that optimal codes for these so-called two-dimensional geometric distributions are \emph{singular}, in the sense that a prefix code that is optimal for one value of the parameter qq cannot be optimal for any other value of qq. This is in sharp contrast to the one-dimensional case, where codes are optimal for positive-length intervals of the parameter qq. Thus, in the two-dimensional case, it is infeasible to give a compact characterization of optimal codes for all values of the parameter qq, as was done in the one-dimensional case. Instead, optimal codes are characterized for a discrete sequence of values of qq that provide good coverage of the unit interval. Specifically, optimal prefix codes are described for q=2−1/kq=2^{-1/k} (k≄1k\ge 1), covering the range q≄1/2q\ge 1/2, and q=2−kq=2^{-k} (k>1k>1), covering the range q<1/2q<1/2. The described codes produce the expected reduction in redundancy with respect to the one-dimensional case, while maintaining low complexity coding operations.Comment: To appear in IEEE Transactions on Information Theor

    Tight Bounds on the R\'enyi Entropy via Majorization with Applications to Guessing and Compression

    Full text link
    This paper provides tight bounds on the R\'enyi entropy of a function of a discrete random variable with a finite number of possible values, where the considered function is not one-to-one. To that end, a tight lower bound on the R\'enyi entropy of a discrete random variable with a finite support is derived as a function of the size of the support, and the ratio of the maximal to minimal probability masses. This work was inspired by the recently published paper by Cicalese et al., which is focused on the Shannon entropy, and it strengthens and generalizes the results of that paper to R\'enyi entropies of arbitrary positive orders. In view of these generalized bounds and the works by Arikan and Campbell, non-asymptotic bounds are derived for guessing moments and lossless data compression of discrete memoryless sources.Comment: The paper was published in the Entropy journal (special issue on Probabilistic Methods in Information Theory, Hypothesis Testing, and Coding), vol. 20, no. 12, paper no. 896, November 22, 2018. Online available at https://www.mdpi.com/1099-4300/20/12/89

    Lower Bounds for Oblivious Near-Neighbor Search

    Get PDF
    We prove an Ω(dlg⁥n/(lg⁥lg⁥n)2)\Omega(d \lg n/ (\lg\lg n)^2) lower bound on the dynamic cell-probe complexity of statistically oblivious\mathit{oblivious} approximate-near-neighbor search (ANN\mathsf{ANN}) over the dd-dimensional Hamming cube. For the natural setting of d=Θ(log⁥n)d = \Theta(\log n), our result implies an Ω~(lg⁥2n)\tilde{\Omega}(\lg^2 n) lower bound, which is a quadratic improvement over the highest (non-oblivious) cell-probe lower bound for ANN\mathsf{ANN}. This is the first super-logarithmic unconditional\mathit{unconditional} lower bound for ANN\mathsf{ANN} against general (non black-box) data structures. We also show that any oblivious static\mathit{static} data structure for decomposable search problems (like ANN\mathsf{ANN}) can be obliviously dynamized with O(log⁥n)O(\log n) overhead in update and query time, strengthening a classic result of Bentley and Saxe (Algorithmica, 1980).Comment: 28 page

    Degrees and distances in random and evolving Apollonian networks

    Get PDF
    This paper studies Random and Evolving Apollonian networks (RANs and EANs), in d dimension for any d>=2, i.e. dynamically evolving random d dimensional simplices looked as graphs inside an initial d-dimensional simplex. We determine the limiting degree distribution in RANs and show that it follows a power law tail with exponent tau=(2d-1)/(d-1). We further show that the degree distribution in EANs converges to the same degree distribution if the simplex-occupation parameter in the n-th step of the dynamics is q_n->0 and sum_{n=0}^infty q_n =infty. This result gives a rigorous proof for the conjecture of Zhang et al. that EANs tend to show similar behavior as RANs once the occupation parameter q->0. We also determine the asymptotic behavior of shortest paths in RANs and EANs for arbitrary d dimensions. For RANs we show that the shortest path between two uniformly chosen vertices (typical distance), the flooding time of a uniformly picked vertex and the diameter of the graph after n steps all scale as constant times log n. We determine the constants for all three cases and prove a central limit theorem for the typical distances. We prove a similar CLT for typical distances in EANs

    Probabilistic Shaping for Finite Blocklengths: Distribution Matching and Sphere Shaping

    Get PDF
    In this paper, we provide for the first time a systematic comparison of distribution matching (DM) and sphere shaping (SpSh) algorithms for short blocklength probabilistic amplitude shaping. For asymptotically large blocklengths, constant composition distribution matching (CCDM) is known to generate the target capacity-achieving distribution. As the blocklength decreases, however, the resulting rate loss diminishes the efficiency of CCDM. We claim that for such short blocklengths and over the additive white Gaussian channel (AWGN), the objective of shaping should be reformulated as obtaining the most energy-efficient signal space for a given rate (rather than matching distributions). In light of this interpretation, multiset-partition DM (MPDM), enumerative sphere shaping (ESS) and shell mapping (SM), are reviewed as energy-efficient shaping techniques. Numerical results show that MPDM and SpSh have smaller rate losses than CCDM. SpSh--whose sole objective is to maximize the energy efficiency--is shown to have the minimum rate loss amongst all. We provide simulation results of the end-to-end decoding performance showing that up to 1 dB improvement in power efficiency over uniform signaling can be obtained with MPDM and SpSh at blocklengths around 200. Finally, we present a discussion on the complexity of these algorithms from the perspective of latency, storage and computations.Comment: 18 pages, 10 figure

    Exact Common Information

    Full text link
    This paper introduces the notion of exact common information, which is the minimum description length of the common randomness needed for the exact distributed generation of two correlated random variables (X,Y)(X,Y). We introduce the quantity G(X;Y)=min⁡X→W→YH(W)G(X;Y)=\min_{X\to W \to Y} H(W) as a natural bound on the exact common information and study its properties and computation. We then introduce the exact common information rate, which is the minimum description rate of the common randomness for the exact generation of a 2-DMS (X,Y)(X,Y). We give a multiletter characterization for it as the limit Gˉ(X;Y)=lim⁡n→∞(1/n)G(Xn;Yn)\bar{G}(X;Y)=\lim_{n\to \infty}(1/n)G(X^n;Y^n). While in general Gˉ(X;Y)\bar{G}(X;Y) is greater than or equal to the Wyner common information, we show that they are equal for the Symmetric Binary Erasure Source. We do not know, however, if the exact common information rate has a single letter characterization in general

    ENGINEERING COMPRESSED STATIC FUNCTIONS AND MINIMAL PERFECT HASH FUNCTIONS

    Get PDF
    \emph{Static functions} are data structures meant to store arbitrary mappings from finite sets to integers; that is, given universe of items UU, a set of n∈Nn \in \mathbb{N} pairs (ki,vi)(k_i,v_i) where ki∈S⊂U,∣S∣=nk_i \in S \subset U, |S|=n, and vi∈{0,1,
,m−1},m∈Nv_i \in \{0, 1, \ldots, m-1\} , m \in \mathbb{N} , a static function will retrieve viv_i given kik_i (usually, in constant time). When every key is mapped into a different value this function is called \emph{perfect hash function} and when n=mn=m the data structure yields an injective numbering S→{0,1,
n−1}S\to \lbrace0,1, \ldots n-1 \rbrace; this mapping is called a \emph{minimal perfect hash function}. Big data brought back one of the most critical challenges that computer scientists have been tackling during the last fifty years, that is, analyzing big amounts of data that do not fit in main memory. While for small keysets these mappings can be easily implemented using hash tables, this solution does not scale well for bigger sets. Static functions and MPHFs break the information-theoretical lower bound of storing the set SS because they are allowed to return \emph{any} value if the queried key is not in the original keyset. The classical constructions technique for static functions can achieve just O(nb)O(nb) bits space, where b=log⁡(m)b=\log(m), and the one for MPHFs O(n)O(n) bits of space (always with constant access time). All these features make static functions and MPHFs powerful techniques when handling, for instance, large sets of strings, and they are essential building blocks of space-efficient data structures such as (compressed) full-text indexes, monotone MPHFs, Bloom filter-like data structures, and prefix-search data structures. The biggest challenge of this construction technique involves lowering the multiplicative constants hidden inside the asymptotic space bounds while keeping feasible construction times. In this thesis, we take advantage of the recent result in random linear systems theory regarding the ratio between the number of variables and number of the equations, and in perfect hash data structures, to achieve practical static functions with the lowest space bounds so far, and construction time comparable with widely used techniques. The new results, however, require solving linear systems that require more than a simple triangulation process, as it happens in current state-of-the-art solutions. The main challenge in making such structures usable is mitigating the cubic running time of Gaussian elimination at construction time. To this purpose, we introduce novel techniques based on \emph{broadword programming} and a heuristic derived from \emph{structured Gaussian elimination}. We obtained data structures that are significantly smaller than commonly used hypergraph-based constructions while maintaining or improving the lookup times and providing still feasible construction.We then apply these improvements to another kind of structures: \emph{compressed static hash functions}. The theoretical construction technique for this kind of data structure uses prefix-free codes with variable length to encode the set of values. Adopting this solution, we can reduce the\n space usage of each element to (essentially) the entropy of the list of output values of the function.Indeed, we need to solve an even bigger linear system of equations, and the time required to build the structure increases. In this thesis, we present the first engineered implementation of compressed hash functions. For example, we were able to store a function with geometrically distributed output, with parameter p=0.5p=0.5in just 2.282.28 bit per key, independently of the key set, with a construction time double with respect to that of a state-of-the-art non-compressed function, which requires ≈log⁡log⁡n\approx\log \log n bits per key, where nn is the number of keys, and similar lookup time. We can also store a function with an output distributed following a Zipfian distribution with parameter s=2s=2 and N=106N= 10^6 in just 2.752.75 bits per key, whereas a non-compressed function would require more than 2020, with a threefold increase in construction time and significantly faster lookups

    Capacity, coding and interference cancellation in multiuser multicarrier wireless communications systems

    Get PDF
    Multicarrier modulation and multiuser systems have generated a great deal of research during the last decade. Orthogonal Frequency Division Multiplexing (OFDM) is a multicarrier modulation generated with the inverse Discrete Fourier Transform, which has been adopted for standards in wireless and wire-line communications. Multiuser wireless systems using multicarrier modulation suffer from the effects of dispersive fading channels, which create multi-access, inter-symbol, and inter-carrier interference (MAI, ISI, ICI). Nevertheless, channel dispersion also provides diversity, which can be exploited and has the potential to increase robustness against fading. Multiuser multi-carrier systems can be implemented using Orthogonal Frequency Division Multiple Access (OFDMA), a flexible orthogonal multiplexing scheme that can implement time and frequency division multiplexing, and using multicarrier code division multiple access (MC-CDMA). Coding, interference cancellation, and resource sharing schemes to improve the performance of multiuser multicarrier systems on wireless channels were addressed in this dissertation. Performance of multiple access schemes applied to a downlink multiuser wireless system was studied from an information theory perspective and from a more practical perspective. For time, frequency, and code division, implemented using OFDMA and MC-CDMA, the system outage capacity region was calculated for a correlated fading channel. It was found that receiver complexity determines which scheme offers larger capacity regions, and that OFDMA results in a better compromise between complexity and performance than MC-CDMA. From the more practical perspective of bit error rate, the effects of channel coding and interleaving were investigated. Results in terms of coding bounds as well as simulation were obtained, showing that OFDMAbased orthogonal multiple access schemes are more sensitive to the effectiveness of the code to provide diversity than non-orthogonal, MC-CDMA-based schemes. While cellular multiuser schemes suffer mainly from MAI, OFDM-based broadcasting systems suffer from ICI, in particular when operating as a single frequency network (SFN). It was found that for SFN the performance of a conventional OFDM receiver rapidly degrades when transmitters have frequency synchronization errors. Several methods based on linear and decision-feedback ICI cancellation were proposed and evaluated, showing improved robustness against ICI. System function characterization of time-variant dispersive channels is important for understanding their effects on single carrier and multicarrier modulation. Using time-frequency duality it was shown that MC-CDMA and DS-CDMA are strictly dual on dispersive channels. This property was used to derive optimal matched filter structures, and to determine a criterion for the selection of spreading sequences for both DS and MC CDMA. The analysis of multiple antenna systems provided a unified framework for the study of DS-CDMA and MC-CDMA on time and frequency dispersive channels, which can also be used to compare their performance
    • 

    corecore