3,514 research outputs found

    Capacity Upper Bounds for Deletion-Type Channels

    Full text link
    We develop a systematic approach, based on convex programming and real analysis, for obtaining upper bounds on the capacity of the binary deletion channel and, more generally, channels with i.i.d. insertions and deletions. Other than the classical deletion channel, we give a special attention to the Poisson-repeat channel introduced by Mitzenmacher and Drinea (IEEE Transactions on Information Theory, 2006). Our framework can be applied to obtain capacity upper bounds for any repetition distribution (the deletion and Poisson-repeat channels corresponding to the special cases of Bernoulli and Poisson distributions). Our techniques essentially reduce the task of proving capacity upper bounds to maximizing a univariate, real-valued, and often concave function over a bounded interval. We show the following: 1. The capacity of the binary deletion channel with deletion probability dd is at most (1d)logφ(1-d)\log\varphi for d1/2d\geq 1/2, and, assuming the capacity function is convex, is at most 1dlog(4/φ)1-d\log(4/\varphi) for d<1/2d<1/2, where φ=(1+5)/2\varphi=(1+\sqrt{5})/2 is the golden ratio. This is the first nontrivial capacity upper bound for any value of dd outside the limiting case d0d\to 0 that is fully explicit and proved without computer assistance. 2. We derive the first set of capacity upper bounds for the Poisson-repeat channel. 3. We derive several novel upper bounds on the capacity of the deletion channel. All upper bounds are maximums of efficiently computable, and concave, univariate real functions over a bounded domain. In turn, we upper bound these functions in terms of explicit elementary and standard special functions, whose maximums can be found even more efficiently (and sometimes, analytically, for example for d=1/2d=1/2). Along the way, we develop several new techniques of potentially independent interest in information theory, probability, and mathematical analysis.Comment: Minor edits, In Proceedings of 50th Annual ACM SIGACT Symposium on the Theory of Computing (STOC), 201

    An Upper Bound on the Capacity of non-Binary Deletion Channels

    Get PDF
    We derive an upper bound on the capacity of non-binary deletion channels. Although binary deletion channels have received significant attention over the years, and many upper and lower bounds on their capacity have been derived, such studies for the non-binary case are largely missing. The state of the art is the following: as a trivial upper bound, capacity of an erasure channel with the same input alphabet as the deletion channel can be used, and as a lower bound the results by Diggavi and Grossglauser are available. In this paper, we derive the first non-trivial non-binary deletion channel capacity upper bound and reduce the gap with the existing achievable rates. To derive the results we first prove an inequality between the capacity of a 2K-ary deletion channel with deletion probability dd, denoted by C2K(d)C_{2K}(d), and the capacity of the binary deletion channel with the same deletion probability, C2(d)C_2(d), that is, C2K(d)C2(d)+(1d)log(K)C_{2K}(d)\leq C_2(d)+(1-d)\log(K). Then by employing some existing upper bounds on the capacity of the binary deletion channel, we obtain upper bounds on the capacity of the 2K-ary deletion channel. We illustrate via examples the use of the new bounds and discuss their asymptotic behavior as d0d \rightarrow 0.Comment: accepted for presentation in ISIT 201

    A Note on the Deletion Channel Capacity

    Full text link
    Memoryless channels with deletion errors as defined by a stochastic channel matrix allowing for bit drop outs are considered in which transmitted bits are either independently deleted with probability dd or unchanged with probability 1d1-d. Such channels are information stable, hence their Shannon capacity exists. However, computation of the channel capacity is formidable, and only some upper and lower bounds on the capacity exist. In this paper, we first show a simple result that the parallel concatenation of two different independent deletion channels with deletion probabilities d1d_1 and d2d_2, in which every input bit is either transmitted over the first channel with probability of λ\lambda or over the second one with probability of 1λ1-\lambda, is nothing but another deletion channel with deletion probability of d=λd1+(1λ)d2d=\lambda d_1+(1-\lambda)d_2. We then provide an upper bound on the concatenated deletion channel capacity C(d)C(d) in terms of the weighted average of C(d1)C(d_1), C(d2)C(d_2) and the parameters of the three channels. An interesting consequence of this bound is that C(λd1+(1λ))λC(d1)C(\lambda d_1+(1-\lambda))\leq \lambda C(d_1) which enables us to provide an improved upper bound on the capacity of the i.i.d. deletion channels, i.e., C(d)0.4143(1d)C(d)\leq 0.4143(1-d) for d0.65d\geq 0.65. This generalizes the asymptotic result by Dalai as it remains valid for all d0.65d\geq 0.65. Using the same approach we are also able to improve upon existing upper bounds on the capacity of the deletion/substitution channel.Comment: Submitted to the IEEE Transactions on Information Theor

    Write Channel Model for Bit-Patterned Media Recording

    Full text link
    We propose a new write channel model for bit-patterned media recording that reflects the data dependence of write synchronization errors. It is shown that this model accommodates both substitution-like errors and insertion-deletion errors whose statistics are determined by an underlying channel state process. We study information theoretic properties of the write channel model, including the capacity, symmetric information rate, Markov-1 rate and the zero-error capacity.Comment: 11 pages, 12 figures, journa

    Models and information-theoretic bounds for nanopore sequencing

    Full text link
    Nanopore sequencing is an emerging new technology for sequencing DNA, which can read long fragments of DNA (~50,000 bases) in contrast to most current short-read sequencing technologies which can only read hundreds of bases. While nanopore sequencers can acquire long reads, the high error rates (20%-30%) pose a technical challenge. In a nanopore sequencer, a DNA is migrated through a nanopore and current variations are measured. The DNA sequence is inferred from this observed current pattern using an algorithm called a base-caller. In this paper, we propose a mathematical model for the "channel" from the input DNA sequence to the observed current, and calculate bounds on the information extraction capacity of the nanopore sequencer. This model incorporates impairments like (non-linear) inter-symbol interference, deletions, as well as random response. These information bounds have two-fold application: (1) The decoding rate with a uniform input distribution can be used to calculate the average size of the plausible list of DNA sequences given an observed current trace. This bound can be used to benchmark existing base-calling algorithms, as well as serving a performance objective to design better nanopores. (2) When the nanopore sequencer is used as a reader in a DNA storage system, the storage capacity is quantified by our bounds

    Fundamental Bounds and Approaches to Sequence Reconstruction from Nanopore Sequencers

    Full text link
    Nanopore sequencers are emerging as promising new platforms for high-throughput sequencing. As with other technologies, sequencer errors pose a major challenge for their effective use. In this paper, we present a novel information theoretic analysis of the impact of insertion-deletion (indel) errors in nanopore sequencers. In particular, we consider the following problems: (i) for given indel error characteristics and rate, what is the probability of accurate reconstruction as a function of sequence length; (ii) what is the number of `typical' sequences within the distortion bound induced by indel errors; (iii) using replicated extrusion (the process of passing a DNA strand through the nanopore), what is the number of replicas needed to reduce the distortion bound so that only one typical sequence exists within the distortion bound. Our results provide a number of important insights: (i) the maximum length of a sequence that can be accurately reconstructed in the presence of indel and substitution errors is relatively small; (ii) the number of typical sequences within the distortion bound is large; and (iii) replicated extrusion is an effective technique for unique reconstruction. In particular, we show that the number of replicas is a slow function (logarithmic) of sequence length -- implying that through replicated extrusion, we can sequence large reads using nanopore sequencers. Our model considers indel and substitution errors separately. In this sense, it can be viewed as providing (tight) bounds on reconstruction lengths and repetitions for accurate reconstruction when the two error modes are considered in a single model.Comment: 12 pages, 5 figure
    corecore