7,698 research outputs found

    Lower Bounds on the Redundancy of Huffman Codes with Known and Unknown Probabilities

    Full text link
    In this paper we provide a method to obtain tight lower bounds on the minimum redundancy achievable by a Huffman code when the probability distribution underlying an alphabet is only partially known. In particular, we address the case where the occurrence probabilities are unknown for some of the symbols in an alphabet. Bounds can be obtained for alphabets of a given size, for alphabets of up to a given size, and for alphabets of arbitrary size. The method operates on a Computer Algebra System, yielding closed-form numbers for all results. Finally, we show the potential of the proposed method to shed some light on the structure of the minimum redundancy achievable by the Huffman code

    The placement of the head that maximizes predictability. An information theoretic approach

    Get PDF
    The minimization of the length of syntactic dependencies is a well-established principle of word order and the basis of a mathematical theory of word order. Here we complete that theory from the perspective of information theory, adding a competing word order principle: the maximization of predictability of a target element. These two principles are in conflict: to maximize the predictability of the head, the head should appear last, which maximizes the costs with respect to dependency length minimization. The implications of such a broad theoretical framework to understand the optimality, diversity and evolution of the six possible orderings of subject, object and verb are reviewed.Comment: in press in Glottometric

    Optimal Prefix Codes for Infinite Alphabets with Nonlinear Costs

    Full text link
    Let P={p(i)}P = \{p(i)\} be a measure of strictly positive probabilities on the set of nonnegative integers. Although the countable number of inputs prevents usage of the Huffman algorithm, there are nontrivial PP for which known methods find a source code that is optimal in the sense of minimizing expected codeword length. For some applications, however, a source code should instead minimize one of a family of nonlinear objective functions, β\beta-exponential means, those of the form logaip(i)an(i)\log_a \sum_i p(i) a^{n(i)}, where n(i)n(i) is the length of the iith codeword and aa is a positive constant. Applications of such minimizations include a novel problem of maximizing the chance of message receipt in single-shot communications (a<1a<1) and a previously known problem of minimizing the chance of buffer overflow in a queueing system (a>1a>1). This paper introduces methods for finding codes optimal for such exponential means. One method applies to geometric distributions, while another applies to distributions with lighter tails. The latter algorithm is applied to Poisson distributions and both are extended to alphabetic codes, as well as to minimizing maximum pointwise redundancy. The aforementioned application of minimizing the chance of buffer overflow is also considered.Comment: 14 pages, 6 figures, accepted to IEEE Trans. Inform. Theor

    `The frozen accident&#x27; as an evolutionary adaptation: A rate distortion theory perspective on the dynamics and symmetries of genetic coding mechanisms

    Get PDF
    We survey some interpretations and related issues concerning the frozen hypothesis due to F. Crick and how it can be explained in terms of several natural mechanisms involving error correction codes, spin glasses, symmetry breaking and the characteristic robustness of genetic networks. The approach to most of these questions involves using elements of Shannon&#x27;s rate distortion theory incorporating a semantic system which is meaningful for the relevant alphabets and vocabulary implemented in transmission of the genetic code. We apply the fundamental homology between information source uncertainty with the free energy density of a thermodynamical system with respect to transcriptional regulators and the communication channels of sequence/structure in proteins. This leads to the suggestion that the frozen accident may have been a type of evolutionary adaptation

    A spin glass model for reconstructing nonlinearly encrypted signals corrupted by noise

    Get PDF
    An encryption of a signal sRN{\bf s}\in\mathbb{R^N} is a random mapping sy=(y1,,yM)TRM{\bf s}\mapsto \textbf{y}=(y_1,\ldots,y_M)^T\in \mathbb{R}^M which can be corrupted by an additive noise. Given the Encryption Redundancy Parameter (ERP) μ=M/N1\mu=M/N\ge 1, the signal strength parameter R=isi2/NR=\sqrt{\sum_i s_i^2/N}, and the ('bare') noise-to-signal ratio (NSR) γ0\gamma\ge 0, we consider the problem of reconstructing s{\bf s} from its corrupted image by a Least Square Scheme for a certain class of random Gaussian mappings. The problem is equivalent to finding the configuration of minimal energy in a certain version of spherical spin glass model, with squared Gaussian-distributed random potential. We use the Parisi replica symmetry breaking scheme to evaluate the mean overlap p[0,1]p_{\infty}\in [0,1] between the original signal and its recovered image (known as 'estimator') as NN\to \infty, which is a measure of the quality of the signal reconstruction. We explicitly analyze the general case of linear-quadratic family of random mappings and discuss the full p(γ)p_{\infty} (\gamma) curve. When nonlinearity exceeds a certain threshold but redundancy is not yet too big, the replica symmetric solution is necessarily broken in some interval of NSR. We show that encryptions with a nonvanishing linear component permit reconstructions with p>0p_{\infty}>0 for any μ>1\mu>1 and any γ<\gamma<\infty, with pγ1/2p_{\infty}\sim \gamma^{-1/2} as γ\gamma\to \infty. In contrast, for the case of purely quadratic nonlinearity, for any ERP μ>1\mu>1 there exists a threshold NSR value γc(μ)\gamma_c(\mu) such that p=0p_{\infty}=0 for γ>γc(μ)\gamma>\gamma_c(\mu) making the reconstruction impossible. The behaviour close to the threshold is given by p(γcγ)3/4p_{\infty}\sim (\gamma_c-\gamma)^{3/4} and is controlled by the replica symmetry breaking mechanism.Comment: 33 pages, 5 figure
    corecore