4,390 research outputs found

    A Proof of Entropy Minimization for Outputs in Deletion Channels via Hidden Word Statistics

    Get PDF
    From the output produced by a memoryless deletion channel from a uniformly random input of known length nn, one obtains a posterior distribution on the channel input. The difference between the Shannon entropy of this distribution and that of the uniform prior measures the amount of information about the channel input which is conveyed by the output of length mm, and it is natural to ask for which outputs this is extremized. This question was posed in a previous work, where it was conjectured on the basis of experimental data that the entropy of the posterior is minimized and maximized by the constant strings 000…\texttt{000}\ldots and 111…\texttt{111}\ldots and the alternating strings 0101…\texttt{0101}\ldots and 1010…\texttt{1010}\ldots respectively. In the present work we confirm the minimization conjecture in the asymptotic limit using results from hidden word statistics. We show how the analytic-combinatorial methods of Flajolet, Szpankowski and Vall\'ee for dealing with the hidden pattern matching problem can be applied to resolve the case of fixed output length and n→∞n\rightarrow\infty, by obtaining estimates for the entropy in terms of the moments of the posterior distribution and establishing its minimization via a measure of autocorrelation.Comment: 11 pages, 2 figure

    Fundamental Bounds and Approaches to Sequence Reconstruction from Nanopore Sequencers

    Full text link
    Nanopore sequencers are emerging as promising new platforms for high-throughput sequencing. As with other technologies, sequencer errors pose a major challenge for their effective use. In this paper, we present a novel information theoretic analysis of the impact of insertion-deletion (indel) errors in nanopore sequencers. In particular, we consider the following problems: (i) for given indel error characteristics and rate, what is the probability of accurate reconstruction as a function of sequence length; (ii) what is the number of `typical' sequences within the distortion bound induced by indel errors; (iii) using replicated extrusion (the process of passing a DNA strand through the nanopore), what is the number of replicas needed to reduce the distortion bound so that only one typical sequence exists within the distortion bound. Our results provide a number of important insights: (i) the maximum length of a sequence that can be accurately reconstructed in the presence of indel and substitution errors is relatively small; (ii) the number of typical sequences within the distortion bound is large; and (iii) replicated extrusion is an effective technique for unique reconstruction. In particular, we show that the number of replicas is a slow function (logarithmic) of sequence length -- implying that through replicated extrusion, we can sequence large reads using nanopore sequencers. Our model considers indel and substitution errors separately. In this sense, it can be viewed as providing (tight) bounds on reconstruction lengths and repetitions for accurate reconstruction when the two error modes are considered in a single model.Comment: 12 pages, 5 figure

    Symbol Synchronization for Diffusive Molecular Communication Systems

    Full text link
    Symbol synchronization refers to the estimation of the start of a symbol interval and is needed for reliable detection. In this paper, we develop a symbol synchronization framework for molecular communication (MC) systems where we consider some practical challenges which have not been addressed in the literature yet. In particular, we take into account that in MC systems, the transmitter may not be equipped with an internal clock and may not be able to emit molecules with a fixed release frequency. Such restrictions hold for practical nanotransmitters, e.g. modified cells, where the lengths of the symbol intervals may vary due to the inherent randomness in the availability of food and energy for molecule generation, the process for molecule production, and the release process. To address this issue, we propose to employ two types of molecules, one for synchronization and one for data transmission. We derive the optimal maximum likelihood (ML) symbol synchronization scheme as a performance upper bound. Since ML synchronization entails high complexity, we also propose two low-complexity synchronization schemes, namely a peak observation-based scheme and a threshold-trigger scheme, which are suitable for MC systems with limited computational capabilities. Our simulation results reveal the effectiveness of the proposed synchronization~schemes and suggest that the end-to-end performance of MC systems significantly depends on the accuracy of symbol synchronization.Comment: This paper has been accepted for presentation at IEEE International Conference on Communications (ICC) 201

    Achievable Information Rates and Concatenated Codes for the DNA Nanopore Sequencing Channel

    Full text link
    The errors occurring in DNA-based storage are correlated in nature, which is a direct consequence of the synthesis and sequencing processes. In this paper, we consider the memory-kk nanopore channel model recently introduced by Hamoum et al., which models the inherent memory of the channel. We derive the maximum a posteriori (MAP) decoder for this channel model. The derived MAP decoder allows us to compute achievable information rates for the true DNA storage channel assuming a mismatched decoder matched to the memory-kk nanopore channel model, and quantify the loss in performance assuming a small memory length--and hence limited decoding complexity. Furthermore, the derived MAP decoder can be used to design error-correcting codes tailored to the DNA storage channel. We show that a concatenated coding scheme with an outer low-density parity-check code and an inner convolutional code yields excellent performance.Comment: This paper has been accepted and awaiting publication in informatio theory workshop (ITW) 202

    Error correction for asynchronous communication and probabilistic burst deletion channels

    Get PDF
    Short-range wireless communication with low-power small-size sensors has been broadly applied in many areas such as in environmental observation, and biomedical and health care monitoring. However, such applications require a wireless sensor operating in always-on mode, which increases the power consumption of sensors significantly. Asynchronous communication is an emerging low-power approach for these applications because it provides a larger potential of significant power savings for recording sparse continuous-time signals, a smaller hardware footprint, and a lower circuit complexity compared to Nyquist-based synchronous signal processing. In this dissertation, the classical Nyquist-based synchronous signal sampling is replaced by asynchronous sampling strategies, i.e., sampling via level crossing (LC) sampling and time encoding. Novel forward error correction schemes for sensor communication based on these sampling strategies are proposed, where the dominant errors consist of pulse deletions and insertions, and where encoding is required to take place in an instantaneous fashion. For LC sampling the presented scheme consists of a combination of an outer systematic convolutional code, an embedded inner marker code, and power-efficient frequency-shift keying modulation at the sensor node. Decoding is first obtained via a maximum a-posteriori (MAP) decoder for the inner marker code, which achieves synchronization for the insertion and deletion channel, followed by MAP decoding for the outer convolutional code. By iteratively decoding marker and convolutional codes along with interleaving, a significant reduction in terms of the expected end-to-end distortion between original and reconstructed signals can be obtained compared to non-iterative processing. Besides investigating the rate trade-off between marker and convolutional codes, it is shown that residual redundancy in the asynchronously sampled source signal can be successfully exploited in combination with redundancy only from a marker code. This provides a new low complexity alternative for deletion and insertion error correction compared to using explicit redundancy. For time encoding, only the pulse timing is of relevance at the receiver, and the outer channel code is replaced by a quantizer to represent the relative position of the pulse timing. Numerical simulations show that LC sampling outperforms time encoding in the low to moderate signal-to-noise ratio regime by a large margin. In the second part of this dissertation, a new burst deletion correction scheme tailored to low-latency applications such as high-read/write-speed non-volatile memory is proposed. An exemplary version is given by racetrack memory, where the element of information is stored in a cell, and data reading is performed by many read ports or heads. In order to read the information, multiple cells shift to its closest head in the same direction and at the same speed, which means a block of bits (i.e., a non-binary symbol) are read by multiple heads in parallel during a shift of the cells. If the cells shift more than by one single cell location, it causes consecutive (burst) non-binary symbol deletions. In practical systems, the maximal length of consecutive non-binary deletions is limited. Existing schemes for this scenario leverage non-binary de Bruijn sequences to perfectly locate deletions. In contrast, in this work binary marker patterns in combination with a new soft-decision decoder scheme is proposed. In this scheme, deletions are soft located by assigning a posteriori probabilities for the location of every burst deletion event and are replaced by erasures. Then, the resulting errors are further corrected by an outer channel code. Such a scheme has an advantage over using non-binary de Bruijn sequences that it in general increases the communication rate

    Error-correction on non-standard communication channels

    Get PDF
    Many communication systems are poorly modelled by the standard channels assumed in the information theory literature, such as the binary symmetric channel or the additive white Gaussian noise channel. Real systems suffer from additional problems including time-varying noise, cross-talk, synchronization errors and latency constraints. In this thesis, low-density parity-check codes and codes related to them are applied to non-standard channels. First, we look at time-varying noise modelled by a Markov channel. A low-density parity-check code decoder is modified to give an improvement of over 1dB. Secondly, novel codes based on low-density parity-check codes are introduced which produce transmissions with Pr(bit = 1) ≠ Pr(bit = 0). These non-linear codes are shown to be good candidates for multi-user channels with crosstalk, such as optical channels. Thirdly, a channel with synchronization errors is modelled by random uncorrelated insertion or deletion events at unknown positions. Marker codes formed from low-density parity-check codewords with regular markers inserted within them are studied. It is shown that a marker code with iterative decoding has performance close to the bounds on the channel capacity, significantly outperforming other known codes. Finally, coding for a system with latency constraints is studied. For example, if a telemetry system involves a slow channel some error correction is often needed quickly whilst the code should be able to correct remaining errors later. A new code is formed from the intersection of a convolutional code with a high rate low-density parity-check code. The convolutional code has good early decoding performance and the high rate low-density parity-check code efficiently cleans up remaining errors after receiving the entire block. Simulations of the block code show a gain of 1.5dB over a standard NASA code
    • …
    corecore