4,390 research outputs found
A Proof of Entropy Minimization for Outputs in Deletion Channels via Hidden Word Statistics
From the output produced by a memoryless deletion channel from a uniformly
random input of known length , one obtains a posterior distribution on the
channel input. The difference between the Shannon entropy of this distribution
and that of the uniform prior measures the amount of information about the
channel input which is conveyed by the output of length , and it is natural
to ask for which outputs this is extremized. This question was posed in a
previous work, where it was conjectured on the basis of experimental data that
the entropy of the posterior is minimized and maximized by the constant strings
and and the alternating strings
and respectively. In the present
work we confirm the minimization conjecture in the asymptotic limit using
results from hidden word statistics. We show how the analytic-combinatorial
methods of Flajolet, Szpankowski and Vall\'ee for dealing with the hidden
pattern matching problem can be applied to resolve the case of fixed output
length and , by obtaining estimates for the entropy in
terms of the moments of the posterior distribution and establishing its
minimization via a measure of autocorrelation.Comment: 11 pages, 2 figure
Fundamental Bounds and Approaches to Sequence Reconstruction from Nanopore Sequencers
Nanopore sequencers are emerging as promising new platforms for
high-throughput sequencing. As with other technologies, sequencer errors pose a
major challenge for their effective use. In this paper, we present a novel
information theoretic analysis of the impact of insertion-deletion (indel)
errors in nanopore sequencers. In particular, we consider the following
problems: (i) for given indel error characteristics and rate, what is the
probability of accurate reconstruction as a function of sequence length; (ii)
what is the number of `typical' sequences within the distortion bound induced
by indel errors; (iii) using replicated extrusion (the process of passing a DNA
strand through the nanopore), what is the number of replicas needed to reduce
the distortion bound so that only one typical sequence exists within the
distortion bound.
Our results provide a number of important insights: (i) the maximum length of
a sequence that can be accurately reconstructed in the presence of indel and
substitution errors is relatively small; (ii) the number of typical sequences
within the distortion bound is large; and (iii) replicated extrusion is an
effective technique for unique reconstruction. In particular, we show that the
number of replicas is a slow function (logarithmic) of sequence length --
implying that through replicated extrusion, we can sequence large reads using
nanopore sequencers. Our model considers indel and substitution errors
separately. In this sense, it can be viewed as providing (tight) bounds on
reconstruction lengths and repetitions for accurate reconstruction when the two
error modes are considered in a single model.Comment: 12 pages, 5 figure
Symbol Synchronization for Diffusive Molecular Communication Systems
Symbol synchronization refers to the estimation of the start of a symbol
interval and is needed for reliable detection. In this paper, we develop a
symbol synchronization framework for molecular communication (MC) systems where
we consider some practical challenges which have not been addressed in the
literature yet. In particular, we take into account that in MC systems, the
transmitter may not be equipped with an internal clock and may not be able to
emit molecules with a fixed release frequency. Such restrictions hold for
practical nanotransmitters, e.g. modified cells, where the lengths of the
symbol intervals may vary due to the inherent randomness in the availability of
food and energy for molecule generation, the process for molecule production,
and the release process. To address this issue, we propose to employ two types
of molecules, one for synchronization and one for data transmission. We derive
the optimal maximum likelihood (ML) symbol synchronization scheme as a
performance upper bound. Since ML synchronization entails high complexity, we
also propose two low-complexity synchronization schemes, namely a peak
observation-based scheme and a threshold-trigger scheme, which are suitable for
MC systems with limited computational capabilities. Our simulation results
reveal the effectiveness of the proposed synchronization~schemes and suggest
that the end-to-end performance of MC systems significantly depends on the
accuracy of symbol synchronization.Comment: This paper has been accepted for presentation at IEEE International
Conference on Communications (ICC) 201
Achievable Information Rates and Concatenated Codes for the DNA Nanopore Sequencing Channel
The errors occurring in DNA-based storage are correlated in nature, which is
a direct consequence of the synthesis and sequencing processes. In this paper,
we consider the memory- nanopore channel model recently introduced by Hamoum
et al., which models the inherent memory of the channel. We derive the maximum
a posteriori (MAP) decoder for this channel model. The derived MAP decoder
allows us to compute achievable information rates for the true DNA storage
channel assuming a mismatched decoder matched to the memory- nanopore
channel model, and quantify the loss in performance assuming a small memory
length--and hence limited decoding complexity. Furthermore, the derived MAP
decoder can be used to design error-correcting codes tailored to the DNA
storage channel. We show that a concatenated coding scheme with an outer
low-density parity-check code and an inner convolutional code yields excellent
performance.Comment: This paper has been accepted and awaiting publication in informatio
theory workshop (ITW) 202
Error correction for asynchronous communication and probabilistic burst deletion channels
Short-range wireless communication with low-power small-size sensors has been broadly applied in many areas such as in environmental observation, and biomedical and health care monitoring. However, such applications require a wireless sensor operating in always-on mode, which increases the power consumption of sensors significantly. Asynchronous communication is an emerging low-power approach for these applications because it provides a larger potential of significant power savings for recording sparse continuous-time signals, a smaller hardware footprint, and a lower circuit complexity compared to Nyquist-based synchronous signal processing.
In this dissertation, the classical Nyquist-based synchronous signal sampling is replaced by asynchronous sampling strategies, i.e., sampling via level crossing (LC) sampling and time encoding. Novel forward error correction schemes for sensor communication based on these sampling strategies are proposed, where the dominant errors consist of pulse deletions and insertions, and where encoding is required to take place in an instantaneous fashion. For LC sampling the presented scheme consists of a combination of an outer systematic convolutional code, an embedded inner marker code, and power-efficient frequency-shift keying modulation at the sensor node. Decoding is first obtained via a maximum a-posteriori (MAP) decoder for the inner marker code, which achieves synchronization for the insertion and deletion channel, followed by MAP decoding for the outer convolutional code. By iteratively decoding marker and convolutional codes along with interleaving, a significant reduction in terms of the expected end-to-end distortion between original and reconstructed signals can be obtained compared to non-iterative processing. Besides investigating the rate trade-off between marker and convolutional codes, it is shown that residual redundancy in the asynchronously sampled source signal can be successfully exploited in combination with redundancy only from a marker code. This provides a new low complexity alternative for deletion and insertion error correction compared to using explicit redundancy. For time encoding, only the pulse timing is of relevance at the receiver, and the outer channel code is replaced by a quantizer to represent the relative position of the pulse timing. Numerical simulations show that LC sampling outperforms time encoding in the low to moderate signal-to-noise ratio regime by a large margin.
In the second part of this dissertation, a new burst deletion correction scheme tailored to low-latency applications such as high-read/write-speed non-volatile memory is proposed. An exemplary version is given by racetrack memory, where the element of information is stored in a cell, and data reading is performed by many read ports or heads. In order to read the information, multiple cells shift to its closest head in the same direction and at the same speed, which means a block of bits (i.e., a non-binary symbol) are read by multiple heads in parallel during a shift of the cells. If the cells shift more than by one single cell location, it causes consecutive (burst) non-binary symbol deletions.
In practical systems, the maximal length of consecutive non-binary deletions is limited. Existing schemes for this scenario leverage non-binary de Bruijn sequences to perfectly locate deletions. In contrast, in this work binary marker patterns in combination with a new soft-decision decoder scheme is proposed. In this scheme, deletions are soft located by assigning a posteriori probabilities for the location of every burst deletion event and are replaced by erasures. Then, the resulting errors are further corrected by an outer channel code. Such a scheme has an advantage over using non-binary de Bruijn sequences that it in general increases the communication rate
Error-correction on non-standard communication channels
Many communication systems are poorly modelled by the standard channels assumed in the information theory literature, such as the binary symmetric channel or the additive white Gaussian noise channel. Real systems suffer from additional problems including time-varying noise, cross-talk, synchronization errors and latency constraints. In this thesis, low-density parity-check codes and codes related to them are applied to non-standard channels. First, we look at time-varying noise modelled by a Markov channel. A low-density parity-check code decoder is modified to give an improvement of over 1dB. Secondly, novel codes based on low-density parity-check codes are introduced which produce transmissions with Pr(bit = 1) ≠Pr(bit = 0). These non-linear codes are shown to be good candidates for multi-user channels with crosstalk, such as optical channels. Thirdly, a channel with synchronization errors is modelled by random uncorrelated insertion or deletion events at unknown positions. Marker codes formed from low-density parity-check codewords with regular markers inserted within them are studied. It is shown that a marker code with iterative decoding has performance close to the bounds on the channel capacity, significantly outperforming other known codes. Finally, coding for a system with latency constraints is studied. For example, if a telemetry system involves a slow channel some error correction is often needed quickly whilst the code should be able to correct remaining errors later. A new code is formed from the intersection of a convolutional code with a high rate low-density parity-check code. The convolutional code has good early decoding performance and the high rate low-density parity-check code efficiently cleans up remaining errors after receiving the entire block. Simulations of the block code show a gain of 1.5dB over a standard NASA code
- …