Search CORE

48,271 research outputs found

An Upper Bound on the Capacity of non-Binary Deletion Channels

Author: Duman Tolga M.
Rahmati Mojtaba
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

We derive an upper bound on the capacity of non-binary deletion channels. Although binary deletion channels have received significant attention over the years, and many upper and lower bounds on their capacity have been derived, such studies for the non-binary case are largely missing. The state of the art is the following: as a trivial upper bound, capacity of an erasure channel with the same input alphabet as the deletion channel can be used, and as a lower bound the results by Diggavi and Grossglauser are available. In this paper, we derive the first non-trivial non-binary deletion channel capacity upper bound and reduce the gap with the existing achievable rates. To derive the results we first prove an inequality between the capacity of a 2K-ary deletion channel with deletion probability

d

, denoted by

C_{2K}(d)

, and the capacity of the binary deletion channel with the same deletion probability,

C_2(d)

, that is,

C_{2K}(d)\leq C_2(d)+(1-d)\log(K)

. Then by employing some existing upper bounds on the capacity of the binary deletion channel, we obtain upper bounds on the capacity of the 2K-ary deletion channel. We illustrate via examples the use of the new bounds and discuss their asymptotic behavior as

d \rightarrow 0

.Comment: accepted for presentation in ISIT 201

arXiv.org e-Print Archive

Bilkent University Institutional Repository

Efficient File Synchronization: a Distributed Source Coding Approach

Author: Ma Nan
Ramchandran Kannan
Tse David
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/07/2011
Field of study

The problem of reconstructing a source sequence with the presence of decoder side-information that is mis-synchronized to the source due to deletions is studied in a distributed source coding framework. Motivated by practical applications, the deletion process is assumed to be bursty and is modeled by a Markov chain. The minimum rate needed to reconstruct the source sequence with high probability is characterized in terms of an information theoretic expression, which is interpreted as the amount of information of the deleted content and the locations of deletions, subtracting "nature's secret", that is, the uncertainty of the locations given the source and side-information. For small bursty deletion probability, the asymptotic expansion of the minimum rate is computed.Comment: 9 pages, 2 figures. A shorter version will appear in IEEE International Symposium on Information Theory (ISIT), 201

arXiv.org e-Print Archive

Crossref

Fundamental Bounds and Approaches to Sequence Reconstruction from Nanopore Sequencers

Author: Duda Jarek
Grama Ananth
Szpankowski Wojciech
Publication venue
Publication date: 11/01/2016
Field of study

Nanopore sequencers are emerging as promising new platforms for high-throughput sequencing. As with other technologies, sequencer errors pose a major challenge for their effective use. In this paper, we present a novel information theoretic analysis of the impact of insertion-deletion (indel) errors in nanopore sequencers. In particular, we consider the following problems: (i) for given indel error characteristics and rate, what is the probability of accurate reconstruction as a function of sequence length; (ii) what is the number of `typical' sequences within the distortion bound induced by indel errors; (iii) using replicated extrusion (the process of passing a DNA strand through the nanopore), what is the number of replicas needed to reduce the distortion bound so that only one typical sequence exists within the distortion bound. Our results provide a number of important insights: (i) the maximum length of a sequence that can be accurately reconstructed in the presence of indel and substitution errors is relatively small; (ii) the number of typical sequences within the distortion bound is large; and (iii) replicated extrusion is an effective technique for unique reconstruction. In particular, we show that the number of replicas is a slow function (logarithmic) of sequence length -- implying that through replicated extrusion, we can sequence large reads using nanopore sequencers. Our model considers indel and substitution errors separately. In this sense, it can be viewed as providing (tight) bounds on reconstruction lengths and repetitions for accurate reconstruction when the two error modes are considered in a single model.Comment: 12 pages, 5 figure

arXiv.org e-Print Archive

CiteSeerX