3,514 research outputs found
Capacity Upper Bounds for Deletion-Type Channels
We develop a systematic approach, based on convex programming and real
analysis, for obtaining upper bounds on the capacity of the binary deletion
channel and, more generally, channels with i.i.d. insertions and deletions.
Other than the classical deletion channel, we give a special attention to the
Poisson-repeat channel introduced by Mitzenmacher and Drinea (IEEE Transactions
on Information Theory, 2006). Our framework can be applied to obtain capacity
upper bounds for any repetition distribution (the deletion and Poisson-repeat
channels corresponding to the special cases of Bernoulli and Poisson
distributions). Our techniques essentially reduce the task of proving capacity
upper bounds to maximizing a univariate, real-valued, and often concave
function over a bounded interval. We show the following:
1. The capacity of the binary deletion channel with deletion probability
is at most for , and, assuming the capacity
function is convex, is at most for , where
is the golden ratio. This is the first nontrivial
capacity upper bound for any value of outside the limiting case
that is fully explicit and proved without computer assistance.
2. We derive the first set of capacity upper bounds for the Poisson-repeat
channel.
3. We derive several novel upper bounds on the capacity of the deletion
channel. All upper bounds are maximums of efficiently computable, and concave,
univariate real functions over a bounded domain. In turn, we upper bound these
functions in terms of explicit elementary and standard special functions, whose
maximums can be found even more efficiently (and sometimes, analytically, for
example for ).
Along the way, we develop several new techniques of potentially independent
interest in information theory, probability, and mathematical analysis.Comment: Minor edits, In Proceedings of 50th Annual ACM SIGACT Symposium on
the Theory of Computing (STOC), 201
An Upper Bound on the Capacity of non-Binary Deletion Channels
We derive an upper bound on the capacity of non-binary deletion channels.
Although binary deletion channels have received significant attention over the
years, and many upper and lower bounds on their capacity have been derived,
such studies for the non-binary case are largely missing. The state of the art
is the following: as a trivial upper bound, capacity of an erasure channel with
the same input alphabet as the deletion channel can be used, and as a lower
bound the results by Diggavi and Grossglauser are available. In this paper, we
derive the first non-trivial non-binary deletion channel capacity upper bound
and reduce the gap with the existing achievable rates. To derive the results we
first prove an inequality between the capacity of a 2K-ary deletion channel
with deletion probability , denoted by , and the capacity of the
binary deletion channel with the same deletion probability, , that is,
. Then by employing some existing upper
bounds on the capacity of the binary deletion channel, we obtain upper bounds
on the capacity of the 2K-ary deletion channel. We illustrate via examples the
use of the new bounds and discuss their asymptotic behavior as .Comment: accepted for presentation in ISIT 201
A Note on the Deletion Channel Capacity
Memoryless channels with deletion errors as defined by a stochastic channel
matrix allowing for bit drop outs are considered in which transmitted bits are
either independently deleted with probability or unchanged with probability
. Such channels are information stable, hence their Shannon capacity
exists. However, computation of the channel capacity is formidable, and only
some upper and lower bounds on the capacity exist. In this paper, we first show
a simple result that the parallel concatenation of two different independent
deletion channels with deletion probabilities and , in which every
input bit is either transmitted over the first channel with probability of
or over the second one with probability of , is nothing
but another deletion channel with deletion probability of . We then provide an upper bound on the concatenated
deletion channel capacity in terms of the weighted average of ,
and the parameters of the three channels. An interesting consequence
of this bound is that which
enables us to provide an improved upper bound on the capacity of the i.i.d.
deletion channels, i.e., for . This
generalizes the asymptotic result by Dalai as it remains valid for all . Using the same approach we are also able to improve upon existing upper
bounds on the capacity of the deletion/substitution channel.Comment: Submitted to the IEEE Transactions on Information Theor
Write Channel Model for Bit-Patterned Media Recording
We propose a new write channel model for bit-patterned media recording that
reflects the data dependence of write synchronization errors. It is shown that
this model accommodates both substitution-like errors and insertion-deletion
errors whose statistics are determined by an underlying channel state process.
We study information theoretic properties of the write channel model, including
the capacity, symmetric information rate, Markov-1 rate and the zero-error
capacity.Comment: 11 pages, 12 figures, journa
Models and information-theoretic bounds for nanopore sequencing
Nanopore sequencing is an emerging new technology for sequencing DNA, which
can read long fragments of DNA (~50,000 bases) in contrast to most current
short-read sequencing technologies which can only read hundreds of bases. While
nanopore sequencers can acquire long reads, the high error rates (20%-30%) pose
a technical challenge. In a nanopore sequencer, a DNA is migrated through a
nanopore and current variations are measured. The DNA sequence is inferred from
this observed current pattern using an algorithm called a base-caller. In this
paper, we propose a mathematical model for the "channel" from the input DNA
sequence to the observed current, and calculate bounds on the information
extraction capacity of the nanopore sequencer. This model incorporates
impairments like (non-linear) inter-symbol interference, deletions, as well as
random response. These information bounds have two-fold application: (1) The
decoding rate with a uniform input distribution can be used to calculate the
average size of the plausible list of DNA sequences given an observed current
trace. This bound can be used to benchmark existing base-calling algorithms, as
well as serving a performance objective to design better nanopores. (2) When
the nanopore sequencer is used as a reader in a DNA storage system, the storage
capacity is quantified by our bounds
Fundamental Bounds and Approaches to Sequence Reconstruction from Nanopore Sequencers
Nanopore sequencers are emerging as promising new platforms for
high-throughput sequencing. As with other technologies, sequencer errors pose a
major challenge for their effective use. In this paper, we present a novel
information theoretic analysis of the impact of insertion-deletion (indel)
errors in nanopore sequencers. In particular, we consider the following
problems: (i) for given indel error characteristics and rate, what is the
probability of accurate reconstruction as a function of sequence length; (ii)
what is the number of `typical' sequences within the distortion bound induced
by indel errors; (iii) using replicated extrusion (the process of passing a DNA
strand through the nanopore), what is the number of replicas needed to reduce
the distortion bound so that only one typical sequence exists within the
distortion bound.
Our results provide a number of important insights: (i) the maximum length of
a sequence that can be accurately reconstructed in the presence of indel and
substitution errors is relatively small; (ii) the number of typical sequences
within the distortion bound is large; and (iii) replicated extrusion is an
effective technique for unique reconstruction. In particular, we show that the
number of replicas is a slow function (logarithmic) of sequence length --
implying that through replicated extrusion, we can sequence large reads using
nanopore sequencers. Our model considers indel and substitution errors
separately. In this sense, it can be viewed as providing (tight) bounds on
reconstruction lengths and repetitions for accurate reconstruction when the two
error modes are considered in a single model.Comment: 12 pages, 5 figure
- …