674 research outputs found
Optimal prefix codes for pairs of geometrically-distributed random variables
Optimal prefix codes are studied for pairs of independent, integer-valued
symbols emitted by a source with a geometric probability distribution of
parameter , . By encoding pairs of symbols, it is possible to
reduce the redundancy penalty of symbol-by-symbol encoding, while preserving
the simplicity of the encoding and decoding procedures typical of Golomb codes
and their variants. It is shown that optimal codes for these so-called
two-dimensional geometric distributions are \emph{singular}, in the sense that
a prefix code that is optimal for one value of the parameter cannot be
optimal for any other value of . This is in sharp contrast to the
one-dimensional case, where codes are optimal for positive-length intervals of
the parameter . Thus, in the two-dimensional case, it is infeasible to give
a compact characterization of optimal codes for all values of the parameter
, as was done in the one-dimensional case. Instead, optimal codes are
characterized for a discrete sequence of values of that provide good
coverage of the unit interval. Specifically, optimal prefix codes are described
for (), covering the range , and
(), covering the range . The described codes produce the expected
reduction in redundancy with respect to the one-dimensional case, while
maintaining low complexity coding operations.Comment: To appear in IEEE Transactions on Information Theor
Tight Bounds on the R\'enyi Entropy via Majorization with Applications to Guessing and Compression
This paper provides tight bounds on the R\'enyi entropy of a function of a
discrete random variable with a finite number of possible values, where the
considered function is not one-to-one. To that end, a tight lower bound on the
R\'enyi entropy of a discrete random variable with a finite support is derived
as a function of the size of the support, and the ratio of the maximal to
minimal probability masses. This work was inspired by the recently published
paper by Cicalese et al., which is focused on the Shannon entropy, and it
strengthens and generalizes the results of that paper to R\'enyi entropies of
arbitrary positive orders. In view of these generalized bounds and the works by
Arikan and Campbell, non-asymptotic bounds are derived for guessing moments and
lossless data compression of discrete memoryless sources.Comment: The paper was published in the Entropy journal (special issue on
Probabilistic Methods in Information Theory, Hypothesis Testing, and Coding),
vol. 20, no. 12, paper no. 896, November 22, 2018. Online available at
https://www.mdpi.com/1099-4300/20/12/89
Lower Bounds for Oblivious Near-Neighbor Search
We prove an lower bound on the dynamic
cell-probe complexity of statistically
approximate-near-neighbor search () over the -dimensional
Hamming cube. For the natural setting of , our result
implies an lower bound, which is a quadratic
improvement over the highest (non-oblivious) cell-probe lower bound for
. This is the first super-logarithmic
lower bound for against general (non black-box) data structures.
We also show that any oblivious data structure for
decomposable search problems (like ) can be obliviously dynamized
with overhead in update and query time, strengthening a classic
result of Bentley and Saxe (Algorithmica, 1980).Comment: 28 page
Degrees and distances in random and evolving Apollonian networks
This paper studies Random and Evolving Apollonian networks (RANs and EANs),
in d dimension for any d>=2, i.e. dynamically evolving random d dimensional
simplices looked as graphs inside an initial d-dimensional simplex. We
determine the limiting degree distribution in RANs and show that it follows a
power law tail with exponent tau=(2d-1)/(d-1). We further show that the degree
distribution in EANs converges to the same degree distribution if the
simplex-occupation parameter in the n-th step of the dynamics is q_n->0 and
sum_{n=0}^infty q_n =infty. This result gives a rigorous proof for the
conjecture of Zhang et al. that EANs tend to show similar behavior as RANs once
the occupation parameter q->0. We also determine the asymptotic behavior of
shortest paths in RANs and EANs for arbitrary d dimensions. For RANs we show
that the shortest path between two uniformly chosen vertices (typical
distance), the flooding time of a uniformly picked vertex and the diameter of
the graph after n steps all scale as constant times log n. We determine the
constants for all three cases and prove a central limit theorem for the typical
distances. We prove a similar CLT for typical distances in EANs
Probabilistic Shaping for Finite Blocklengths: Distribution Matching and Sphere Shaping
In this paper, we provide for the first time a systematic comparison of
distribution matching (DM) and sphere shaping (SpSh) algorithms for short
blocklength probabilistic amplitude shaping. For asymptotically large
blocklengths, constant composition distribution matching (CCDM) is known to
generate the target capacity-achieving distribution. As the blocklength
decreases, however, the resulting rate loss diminishes the efficiency of CCDM.
We claim that for such short blocklengths and over the additive white Gaussian
channel (AWGN), the objective of shaping should be reformulated as obtaining
the most energy-efficient signal space for a given rate (rather than matching
distributions). In light of this interpretation, multiset-partition DM (MPDM),
enumerative sphere shaping (ESS) and shell mapping (SM), are reviewed as
energy-efficient shaping techniques. Numerical results show that MPDM and SpSh
have smaller rate losses than CCDM. SpSh--whose sole objective is to maximize
the energy efficiency--is shown to have the minimum rate loss amongst all. We
provide simulation results of the end-to-end decoding performance showing that
up to 1 dB improvement in power efficiency over uniform signaling can be
obtained with MPDM and SpSh at blocklengths around 200. Finally, we present a
discussion on the complexity of these algorithms from the perspective of
latency, storage and computations.Comment: 18 pages, 10 figure
Exact Common Information
This paper introduces the notion of exact common information, which is the
minimum description length of the common randomness needed for the exact
distributed generation of two correlated random variables . We introduce
the quantity as a natural bound on the exact
common information and study its properties and computation. We then introduce
the exact common information rate, which is the minimum description rate of the
common randomness for the exact generation of a 2-DMS . We give a
multiletter characterization for it as the limit . While in general is greater than or
equal to the Wyner common information, we show that they are equal for the
Symmetric Binary Erasure Source. We do not know, however, if the exact common
information rate has a single letter characterization in general
ENGINEERING COMPRESSED STATIC FUNCTIONS AND MINIMAL PERFECT HASH FUNCTIONS
\emph{Static functions} are data structures meant to store arbitrary mappings from finite sets to integers; that is, given universe of items , a set of pairs where , and , a static function will retrieve given (usually, in constant time). When every key is mapped into a different value this function is called \emph{perfect hash function} and when the data structure yields an injective numbering ; this mapping is called a \emph{minimal perfect hash function}. Big data brought back one of the most critical challenges that computer scientists have been tackling during the last fifty years, that is, analyzing big amounts of data that do not fit in main memory. While for small keysets these mappings can be easily implemented using hash tables, this solution does not scale well for bigger sets. Static functions and MPHFs break the information-theoretical lower bound of storing the set because they are allowed to return \emph{any} value if the queried key is not in the original keyset. The classical constructions technique for static functions can achieve just bits space, where , and the one for MPHFs bits of space (always with constant access time). All these features make static functions and MPHFs powerful techniques when handling, for instance, large sets of strings, and they are essential building blocks of space-efficient data structures such as (compressed) full-text indexes, monotone MPHFs, Bloom filter-like data structures, and prefix-search data structures. The biggest challenge of this construction technique involves lowering the multiplicative constants hidden inside the asymptotic space bounds while keeping feasible construction times. In this thesis, we take advantage of the recent result in random linear systems theory regarding the ratio between the number of variables and number of the equations, and in perfect hash data structures, to achieve practical static functions with the lowest space bounds so far, and construction time comparable with widely used techniques. The new results, however, require solving linear systems that require more than a simple triangulation process, as it happens in current state-of-the-art solutions. The main challenge in making such structures usable is mitigating the cubic running time of Gaussian elimination at construction time. To this purpose, we introduce novel techniques based on \emph{broadword programming} and a heuristic derived from \emph{structured Gaussian elimination}. We obtained data structures that are significantly smaller than commonly used hypergraph-based constructions while maintaining or improving the lookup times and providing still feasible construction.We then apply these improvements to another kind of structures: \emph{compressed static hash functions}. The theoretical construction technique for this kind of data structure uses prefix-free codes with variable length to encode the set of values. Adopting this solution, we can reduce the\n space usage of each element to (essentially) the entropy of the list of output values of the function.Indeed, we need to solve an even bigger linear system of equations, and the time required to build the structure increases. In this thesis, we present the first engineered implementation of compressed hash functions. For example, we were able to store a function with geometrically distributed output, with parameter in just bit per key, independently of the key set, with a construction time double with respect to that of a state-of-the-art non-compressed function, which requires bits per key, where is the number of keys, and similar lookup time. We can also store a function with an output distributed following a Zipfian distribution with parameter and in just bits per key, whereas a non-compressed function would require more than , with a threefold increase in construction time and significantly faster lookups
Capacity, coding and interference cancellation in multiuser multicarrier wireless communications systems
Multicarrier modulation and multiuser systems have generated a great deal of research during the last decade. Orthogonal Frequency Division Multiplexing (OFDM) is a multicarrier modulation generated with the inverse Discrete Fourier Transform, which has been adopted for standards in wireless and wire-line communications. Multiuser wireless systems using multicarrier modulation suffer from the effects of dispersive fading channels, which create multi-access, inter-symbol, and inter-carrier interference (MAI, ISI, ICI). Nevertheless, channel dispersion also provides diversity, which can be exploited and has the potential to increase robustness against fading. Multiuser multi-carrier systems can be implemented using Orthogonal Frequency Division Multiple Access (OFDMA), a flexible orthogonal multiplexing scheme that can implement time and frequency division multiplexing, and using multicarrier code division multiple access (MC-CDMA). Coding, interference cancellation, and resource sharing schemes to improve the performance of multiuser multicarrier systems on wireless channels were addressed in this dissertation.
Performance of multiple access schemes applied to a downlink multiuser wireless system was studied from an information theory perspective and from a more practical perspective. For time, frequency, and code division, implemented using OFDMA and MC-CDMA, the system outage capacity region was calculated for a correlated fading channel. It was found that receiver complexity determines which scheme offers larger capacity regions, and that OFDMA results in a better compromise between complexity and performance than MC-CDMA. From the more practical perspective of bit error rate, the effects of channel coding and interleaving were investigated. Results in terms of coding bounds as well as simulation were obtained, showing that OFDMAbased orthogonal multiple access schemes are more sensitive to the effectiveness of the code to provide diversity than non-orthogonal, MC-CDMA-based schemes.
While cellular multiuser schemes suffer mainly from MAI, OFDM-based broadcasting systems suffer from ICI, in particular when operating as a single frequency network (SFN). It was found that for SFN the performance of a conventional OFDM receiver rapidly degrades when transmitters have frequency synchronization errors. Several methods based on linear and decision-feedback ICI cancellation were proposed and evaluated, showing improved robustness against ICI.
System function characterization of time-variant dispersive channels is important for understanding their effects on single carrier and multicarrier modulation. Using time-frequency duality it was shown that MC-CDMA and DS-CDMA are strictly dual on dispersive channels. This property was used to derive optimal matched filter structures, and to determine a criterion for the selection of spreading sequences for both DS and MC CDMA. The analysis of multiple antenna systems provided a unified framework for the study of DS-CDMA and MC-CDMA on time and frequency dispersive channels, which can also be used to compare their performance
- âŠ