3,656 research outputs found

    Universal Compression of Power-Law Distributions

    Full text link
    English words and the outputs of many other natural processes are well-known to follow a Zipf distribution. Yet this thoroughly-established property has never been shown to help compress or predict these important processes. We show that the expected redundancy of Zipf distributions of order α>1\alpha>1 is roughly the 1/α1/\alpha power of the expected redundancy of unrestricted distributions. Hence for these orders, Zipf distributions can be better compressed and predicted than was previously known. Unlike the expected case, we show that worst-case redundancy is roughly the same for Zipf and for unrestricted distributions. Hence Zipf distributions have significantly different worst-case and expected redundancies, making them the first natural distribution class shown to have such a difference.Comment: 20 page

    Logics for Unranked Trees: An Overview

    Get PDF
    Labeled unranked trees are used as a model of XML documents, and logical languages for them have been studied actively over the past several years. Such logics have different purposes: some are better suited for extracting data, some for expressing navigational properties, and some make it easy to relate complex properties of trees to the existence of tree automata for those properties. Furthermore, logics differ significantly in their model-checking properties, their automata models, and their behavior on ordered and unordered trees. In this paper we present a survey of logics for unranked trees

    Fingerprinting with Minimum Distance Decoding

    Full text link
    This work adopts an information theoretic framework for the design of collusion-resistant coding/decoding schemes for digital fingerprinting. More specifically, the minimum distance decision rule is used to identify 1 out of t pirates. Achievable rates, under this detection rule, are characterized in two distinct scenarios. First, we consider the averaging attack where a random coding argument is used to show that the rate 1/2 is achievable with t=2 pirates. Our study is then extended to the general case of arbitrary tt highlighting the underlying complexity-performance tradeoff. Overall, these results establish the significant performance gains offered by minimum distance decoding as compared to other approaches based on orthogonal codes and correlation detectors. In the second scenario, we characterize the achievable rates, with minimum distance decoding, under any collusion attack that satisfies the marking assumption. For t=2 pirates, we show that the rate 1−H(0.25)≈0.1881-H(0.25)\approx 0.188 is achievable using an ensemble of random linear codes. For t≥3t\geq 3, the existence of a non-resolvable collusion attack, with minimum distance decoding, for any non-zero rate is established. Inspired by our theoretical analysis, we then construct coding/decoding schemes for fingerprinting based on the celebrated Belief-Propagation framework. Using an explicit repeat-accumulate code, we obtain a vanishingly small probability of misidentification at rate 1/3 under averaging attack with t=2. For collusion attacks which satisfy the marking assumption, we use a more sophisticated accumulate repeat accumulate code to obtain a vanishingly small misidentification probability at rate 1/9 with t=2. These results represent a marked improvement over the best available designs in the literature.Comment: 26 pages, 6 figures, submitted to IEEE Transactions on Information Forensics and Securit

    A Computational Framework for Efficient Error Correcting Codes Using an Artificial Neural Network Paradigm.

    Get PDF
    The quest for an efficient computational approach to neural connectivity problems has undergone a significant evolution in the last few years. The current best systems are far from equaling human performance, especially when a program of instructions is executed sequentially as in a von Neuman computer. On the other hand, neural net models are potential candidates for parallel processing since they explore many competing hypotheses simultaneously using massively parallel nets composed of many computational elements connected by links with variable weights. Thus, the application of modeling of a neural network must be complemented by deep insight into how to embed algorithms for an error correcting paradigm in order to gain the advantage of parallel computation. In this dissertation, we construct a neural network for single error detection and correction in linear codes. Then we present an error-detecting paradigm in the framework of neural networks. We consider the problem of error detection of systematic unidirectional codes which is assumed to have double or triple errors. The generalization of network construction for the error-detecting codes is discussed with a heuristic algorithm. We also describe models of the code construction, detection and correction of t-EC/d-ED/AUED (t-Error Correcting/d-Error Detecting/All Unidirectional Error Detecting) codes which are more general codes in the error correcting paradigm

    Accounting for molecular stochasticity in systematic revisions: species limits and phylogeny of Paroaria

    Get PDF
    Different frameworks have been proposed for using molecular data in systematic revisions, but there is ongoing debate on their applicability, merits and shortcomings. In this paper we examine the fit between morphological and molecular data in the systematic revision of Paroaria, a group of conspicuous songbirds endemic to South America. We delimited species based on examination of > 600 specimens, and developed distance-gap, and distance- and character-based coalescent simulations to test species limits with molecular data. The morphological and molecular data collected were then analyzed using parsimony, maximum likelihood, and Bayesian phylogenetics. The simulations were better at evaluating the new species limits than using genetic distances. Species diversity within Paroaria had been underestimated by 60%, and the revised genus comprises eight species. Phylogenetic analyses consistently recovered a congruent topology for the most recently derived species in the genus, but the most basal divergences were not resolved with these data. The systematic and phylogenetic hypotheses developed here are relevant to both setting conservation priorities and understanding the biogeography of South America. 
&#xa

    Check-hybrid GLDPC Codes: Systematic Elimination of Trapping Sets and Guaranteed Error Correction Capability

    Full text link
    In this paper, we propose a new approach to construct a class of check-hybrid generalized low-density parity-check (CH-GLDPC) codes which are free of small trapping sets. The approach is based on converting some selected check nodes involving a trapping set into super checks corresponding to a 2-error correcting component code. Specifically, we follow two main purposes to construct the check-hybrid codes; first, based on the knowledge of the trapping sets of the global LDPC code, single parity checks are replaced by super checks to disable the trapping sets. We show that by converting specified single check nodes, denoted as critical checks, to super checks in a trapping set, the parallel bit flipping (PBF) decoder corrects the errors on a trapping set and hence eliminates the trapping set. The second purpose is to minimize the rate loss caused by replacing the super checks through finding the minimum number of such critical checks. We also present an algorithm to find critical checks in a trapping set of column-weight 3 LDPC code and then provide upper bounds on the minimum number of such critical checks such that the decoder corrects all error patterns on elementary trapping sets. Moreover, we provide a fixed set for a class of constructed check-hybrid codes. The guaranteed error correction capability of the CH-GLDPC codes is also studied. We show that a CH-GLDPC code in which each variable node is connected to 2 super checks corresponding to a 2-error correcting component code corrects up to 5 errors. The results are also extended to column-weight 4 LDPC codes. Finally, we investigate the eliminating of trapping sets of a column-weight 3 LDPC code using the Gallager B decoding algorithm and generalize the results obtained for the PBF for the Gallager B decoding algorithm

    On Pseudocodewords and Improved Union Bound of Linear Programming Decoding of HDPC Codes

    Full text link
    In this paper, we present an improved union bound on the Linear Programming (LP) decoding performance of the binary linear codes transmitted over an additive white Gaussian noise channels. The bounding technique is based on the second-order of Bonferroni-type inequality in probability theory, and it is minimized by Prim's minimum spanning tree algorithm. The bound calculation needs the fundamental cone generators of a given parity-check matrix rather than only their weight spectrum, but involves relatively low computational complexity. It is targeted to high-density parity-check codes, where the number of their generators is extremely large and these generators are spread densely in the Euclidean space. We explore the generator density and make a comparison between different parity-check matrix representations. That density effects on the improvement of the proposed bound over the conventional LP union bound. The paper also presents a complete pseudo-weight distribution of the fundamental cone generators for the BCH[31,21,5] code
    • …
    corecore