125 research outputs found

    A Lower Bound on the List-Decodability of Insdel Codes

    Full text link
    For codes equipped with metrics such as Hamming metric, symbol pair metric or cover metric, the Johnson bound guarantees list-decodability of such codes. That is, the Johnson bound provides a lower bound on the list-decoding radius of a code in terms of its relative minimum distance δ\delta, list size LL and the alphabet size q.q. For study of list-decodability of codes with insertion and deletion errors (we call such codes insdel codes), it is natural to ask the open problem whether there is also a Johnson-type bound. The problem was first investigated by Wachter-Zeh and the result was amended by Hayashi and Yasunaga where a lower bound on the list-decodability for insdel codes was derived. The main purpose of this paper is to move a step further towards solving the above open problem. In this work, we provide a new lower bound for the list-decodability of an insdel code. As a consequence, we show that unlike the Johnson bound for codes under other metrics that is tight, the bound on list-decodability of insdel codes given by Hayashi and Yasunaga is not tight. Our main idea is to show that if an insdel code with a given Levenshtein distance dd is not list-decodable with list size LL, then the list decoding radius is lower bounded by a bound involving LL and dd. In other words, if the list decoding radius is less than this lower bound, the code must be list-decodable with list size LL. At the end of the paper we use such bound to provide an insdel-list-decodability bound for various well-known codes, which has not been extensively studied before

    Error correction for asynchronous communication and probabilistic burst deletion channels

    Get PDF
    Short-range wireless communication with low-power small-size sensors has been broadly applied in many areas such as in environmental observation, and biomedical and health care monitoring. However, such applications require a wireless sensor operating in always-on mode, which increases the power consumption of sensors significantly. Asynchronous communication is an emerging low-power approach for these applications because it provides a larger potential of significant power savings for recording sparse continuous-time signals, a smaller hardware footprint, and a lower circuit complexity compared to Nyquist-based synchronous signal processing. In this dissertation, the classical Nyquist-based synchronous signal sampling is replaced by asynchronous sampling strategies, i.e., sampling via level crossing (LC) sampling and time encoding. Novel forward error correction schemes for sensor communication based on these sampling strategies are proposed, where the dominant errors consist of pulse deletions and insertions, and where encoding is required to take place in an instantaneous fashion. For LC sampling the presented scheme consists of a combination of an outer systematic convolutional code, an embedded inner marker code, and power-efficient frequency-shift keying modulation at the sensor node. Decoding is first obtained via a maximum a-posteriori (MAP) decoder for the inner marker code, which achieves synchronization for the insertion and deletion channel, followed by MAP decoding for the outer convolutional code. By iteratively decoding marker and convolutional codes along with interleaving, a significant reduction in terms of the expected end-to-end distortion between original and reconstructed signals can be obtained compared to non-iterative processing. Besides investigating the rate trade-off between marker and convolutional codes, it is shown that residual redundancy in the asynchronously sampled source signal can be successfully exploited in combination with redundancy only from a marker code. This provides a new low complexity alternative for deletion and insertion error correction compared to using explicit redundancy. For time encoding, only the pulse timing is of relevance at the receiver, and the outer channel code is replaced by a quantizer to represent the relative position of the pulse timing. Numerical simulations show that LC sampling outperforms time encoding in the low to moderate signal-to-noise ratio regime by a large margin. In the second part of this dissertation, a new burst deletion correction scheme tailored to low-latency applications such as high-read/write-speed non-volatile memory is proposed. An exemplary version is given by racetrack memory, where the element of information is stored in a cell, and data reading is performed by many read ports or heads. In order to read the information, multiple cells shift to its closest head in the same direction and at the same speed, which means a block of bits (i.e., a non-binary symbol) are read by multiple heads in parallel during a shift of the cells. If the cells shift more than by one single cell location, it causes consecutive (burst) non-binary symbol deletions. In practical systems, the maximal length of consecutive non-binary deletions is limited. Existing schemes for this scenario leverage non-binary de Bruijn sequences to perfectly locate deletions. In contrast, in this work binary marker patterns in combination with a new soft-decision decoder scheme is proposed. In this scheme, deletions are soft located by assigning a posteriori probabilities for the location of every burst deletion event and are replaced by erasures. Then, the resulting errors are further corrected by an outer channel code. Such a scheme has an advantage over using non-binary de Bruijn sequences that it in general increases the communication rate

    Efficient Linear and Affine Codes for Correcting Insertions/Deletions

    Full text link
    This paper studies \emph{linear} and \emph{affine} error-correcting codes for correcting synchronization errors such as insertions and deletions. We call such codes linear/affine insdel codes. Linear codes that can correct even a single deletion are limited to have information rate at most 1/21/2 (achieved by the trivial 2-fold repetition code). Previously, it was (erroneously) reported that more generally no non-trivial linear codes correcting kk deletions exist, i.e., that the (k+1)(k+1)-fold repetition codes and its rate of 1/(k+1)1/(k+1) are basically optimal for any kk. We disprove this and show the existence of binary linear codes of length nn and rate just below 1/21/2 capable of correcting Ω(n)\Omega(n) insertions and deletions. This identifies rate 1/21/2 as a sharp threshold for recovery from deletions for linear codes, and reopens the quest for a better understanding of the capabilities of linear codes for correcting insertions/deletions. We prove novel outer bounds and existential inner bounds for the rate vs. (edit) distance trade-off of linear insdel codes. We complement our existential results with an efficient synchronization-string-based transformation that converts any asymptotically-good linear code for Hamming errors into an asymptotically-good linear code for insdel errors. Lastly, we show that the 12\frac{1}{2}-rate limitation does not hold for affine codes by giving an explicit affine code of rate 1−ϵ1-\epsilon which can efficiently correct a constant fraction of insdel errors

    String Measures: Computational Complexity and Related Problems in Communication

    Get PDF
    Strings are fundamental objects in computer science. Modern applications such as text processing, bioinformatics, and distributed data storage systems often need to deal with very large strings. These applications motivated the study of the computational complexity of string related problems as well as a better understanding of edit operations on strings in general. In this thesis, we study several problems related to edit type string measures and error correcting codes for edit errors, i.e. insertions and deletions. The results presented in this thesis can be roughly partitioned into two parts. The first part is about the space complexity of computing or approximating string measures. We study three classical string measures: edit distance (ED), longest common subsequence (LCS), and longest increasing subsequence (LIS). Our first main result shows that all these three string measures can be approximated to within a 1+o(1) multiplicative factor using only polylog space in polynomial time. We further study ED and LCS in the asymmetric streaming model introduced by Saks and Seshadhri (SODA, 2013). The model can be viewed as an intermediate model between the random access model and the standard streaming model. In this model, one has streaming access to one of the input strings and random access to the other. For both ED and LCS, we present new algorithms as well as several space lower bounds in the asymmetric streaming model. The second part of our results is about locally decodable codes (LDCs) that can tolerate edit errors. LDCs are a class of error correcting code that allow quick recovery of a message symbol by only looking at a few positions of the encoded message (codeword). LDCs for Hamming errors have been extensively studied while arguably little is known about LDCs for edit errors. In this thesis, we present exponential lower bounds for LDCs that can tolerate edit errors. In particular, we show that 2-query linear LDCs for edit errors do not exist, and the codeword length of any constant query LDCs for edit errors must be exponential. These bounds exhibit a strict separation between Hamming errors and edit errors. We also introduce the notion of LDCs with randomized encoding, which can be viewed as a relaxation of the standard LDCs. We give constructions of LDCs with randomized encoding that can achieve significantly better rate-query tradeoffs

    LIPIcs, Volume 261, ICALP 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 261, ICALP 2023, Complete Volum

    Coding for storage and testing

    Get PDF
    The problem of reconstructing strings from substring information has found many applications due to its importance in genomic data sequencing and DNA- and polymer-based data storage. Motivated by platforms that use chains of binary synthetic polymers as the recording media and read the content via tandem mass spectrometers, we propose new a family of codes that allows for both unique string reconstruction and correction of multiple mass errors. We first consider the paradigm where the masses of substrings of the input string form the evidence set. We consider two approaches: The first approach pertains to asymmetric errors and the error-correction is achieved by introducing redundancy that scales linearly with the number of errors and logarithmically with the length of the string. The proposed construction allows for the string to be uniquely reconstructed based only on its erroneous substring composition multiset. The asymptotic code rate of the scheme is one, and decoding is accomplished via a simplified version of the Backtracking algorithm used for the Turnpike problem. For symmetric errors, we use a polynomial characterization of the mass information and adapt polynomial evaluation code constructions for this setting. In the process, we develop new efficient decoding algorithms for a constant number of composition errors. The second part of this dissertation addresses a practical paradigm that requires reconstructing mixtures of strings based on the union of compositions of their prefixes and suffixes, generated by mass spectrometry devices. We describe new coding methods that allow for unique joint reconstruction of subsets of strings selected from a code and provide upper and lower bounds on the asymptotic rate of the underlying codebooks. Our code constructions combine properties of binary BhB_h and Dyck strings and can be extended to accommodate missing substrings in the pool. In the final chapter of this dissertation, we focus on group testing. We begin with a review of the gold-standard testing protocol for Covid-19, real-time, reverse transcription PCR, and its properties and associated measurement data such as amplification curves that can guide the development of appropriate and accurate adaptive group testing protocols. We then proceed to examine various off-the-shelf group testing methods for Covid-19, and identify their strengths and weaknesses for the application at hand. Finally, we present a collection of new analytical results for adaptive semiquantitative group testing with combinatorial priors, including performance bounds, algorithmic solutions, and noisy testing protocols. The worst-case paradigm extends and improves upon prior work on semiquantitative group testing with and without specialized PCR noise models

    Design of large polyphase filters in the Quadratic Residue Number System

    Full text link
    • …
    corecore