36 research outputs found
An Iteratively Decodable Tensor Product Code with Application to Data Storage
The error pattern correcting code (EPCC) can be constructed to provide a
syndrome decoding table targeting the dominant error events of an inter-symbol
interference channel at the output of the Viterbi detector. For the size of the
syndrome table to be manageable and the list of possible error events to be
reasonable in size, the codeword length of EPCC needs to be short enough.
However, the rate of such a short length code will be too low for hard drive
applications. To accommodate the required large redundancy, it is possible to
record only a highly compressed function of the parity bits of EPCC's tensor
product with a symbol correcting code. In this paper, we show that the proposed
tensor error-pattern correcting code (T-EPCC) is linear time encodable and also
devise a low-complexity soft iterative decoding algorithm for EPCC's tensor
product with q-ary LDPC (T-EPCC-qLDPC). Simulation results show that
T-EPCC-qLDPC achieves almost similar performance to single-level qLDPC with a
1/2 KB sector at 50% reduction in decoding complexity. Moreover, 1 KB
T-EPCC-qLDPC surpasses the performance of 1/2 KB single-level qLDPC at the same
decoder complexity.Comment: Hakim Alhussien, Jaekyun Moon, "An Iteratively Decodable Tensor
Product Code with Application to Data Storage
Deletion codes in the high-noise and high-rate regimes
The noise model of deletions poses significant challenges in coding theory,
with basic questions like the capacity of the binary deletion channel still
being open. In this paper, we study the harder model of worst-case deletions,
with a focus on constructing efficiently decodable codes for the two extreme
regimes of high-noise and high-rate. Specifically, we construct polynomial-time
decodable codes with the following trade-offs (for any eps > 0):
(1) Codes that can correct a fraction 1-eps of deletions with rate poly(eps)
over an alphabet of size poly(1/eps);
(2) Binary codes of rate 1-O~(sqrt(eps)) that can correct a fraction eps of
deletions; and
(3) Binary codes that can be list decoded from a fraction (1/2-eps) of
deletions with rate poly(eps)
Our work is the first to achieve the qualitative goals of correcting a
deletion fraction approaching 1 over bounded alphabets, and correcting a
constant fraction of bit deletions with rate aproaching 1. The above results
bring our understanding of deletion code constructions in these regimes to a
similar level as worst-case errors
Recommended from our members
Advances in Compression using Probabilistic Models
The increasing demand for data transmission and storage necessitate the use of efficient compression methods. Compression algorithms work by mapping data to a more compact representation from which the original data can be recovered. To operate efficiently, they need to capture the characteristics of the data distribution, which can be difficult, especially for high-dimensional data.
One emerging solution lies in applying probabilistic machine learning to capture the data distribution in an unsupervised manner. Once a probabilistic model for the data is defined, variational inference can be used to infer its parameters from data. Variational inference is closely related to the optimal compression size, as stated by Hinton's bits-back argument: the evidence lower bound, the objective optimized by variational inference, corresponds to a lower bound on the optimal compression size of the average datapoint. However, current compression methods rely on variational inference merely as a heuristic, and they do not approach its postulated efficiency. In this thesis, we present principled and practical algorithms that get closer to this limit. After discussing our approach, we demonstrate its efficacy in image compression and model compression.
First, we focus on image compression, where we use a variational autoencoder to learn a mapping between the images and their unobserved, latent representations. We propose a stochastic coding scheme to encode the latent representation, from which the original image can be approximately reconstructed. Next, we look at the compression of deep learning models. We use variational inference to approximate the posterior distribution of the weights in a neural network, and apply our stochastic coding scheme to encode a weight configuration. Finally, we investigate a connection between variational inference and our compression algorithm. We show that a technique we used for compression can improve variational inference by generating samples from a highly flexible posterior approximation, without significantly increasing the computational costs
Density Evolution for Asymmetric Memoryless Channels
Density evolution is one of the most powerful analytical tools for
low-density parity-check (LDPC) codes and graph codes with message passing
decoding algorithms. With channel symmetry as one of its fundamental
assumptions, density evolution (DE) has been widely and successfully applied to
different channels, including binary erasure channels, binary symmetric
channels, binary additive white Gaussian noise channels, etc. This paper
generalizes density evolution for non-symmetric memoryless channels, which in
turn broadens the applications to general memoryless channels, e.g. z-channels,
composite white Gaussian noise channels, etc. The central theorem underpinning
this generalization is the convergence to perfect projection for any fixed size
supporting tree. A new iterative formula of the same complexity is then
presented and the necessary theorems for the performance concentration theorems
are developed. Several properties of the new density evolution method are
explored, including stability results for general asymmetric memoryless
channels. Simulations, code optimizations, and possible new applications
suggested by this new density evolution method are also provided. This result
is also used to prove the typicality of linear LDPC codes among the coset code
ensemble when the minimum check node degree is sufficiently large. It is shown
that the convergence to perfect projection is essential to the belief
propagation algorithm even when only symmetric channels are considered. Hence
the proof of the convergence to perfect projection serves also as a completion
of the theory of classical density evolution for symmetric memoryless channels.Comment: To appear in the IEEE Transactions on Information Theor
Coding against synchronisation and related errors
In this thesis, we study aspects of coding against synchronisation errors, such as deletions and replications, and related errors. Synchronisation errors are a source of fundamental open problems in information theory, because they introduce correlations between output symbols even when input symbols are independently distributed. We focus on random errors, and consider two complementary problems:
We study the optimal rate of reliable information transmission through channels with synchronisation and related errors (the channel capacity). Unlike simpler error models, the capacity of such channels is unknown. We first consider the geometric sticky channel, which replicates input bits according to a geometric distribution. Previously, bounds on its capacity were known only via numerical methods, which do not aid our conceptual understanding of this quantity. We derive sharp analytical capacity upper bounds which approach, and sometimes surpass, numerical bounds. This opens the door to a mathematical treatment of its capacity. We consider also the geometric deletion channel, combining deletions and geometric replications. We derive analytical capacity upper bounds, and notably prove that the capacity is bounded away from the maximum when the deletion probability is small, meaning that this channel behaves differently than related well-studied channels in this regime. Finally, we adapt techniques developed to handle synchronisation errors to derive improved upper bounds and structural results on the capacity of the discrete-time Poisson channel, a model of optical communication.
Motivated by portable DNA-based storage and trace reconstruction, we introduce and study the coded trace reconstruction problem, where the goal is to design efficiently encodable high-rate codes whose codewords can be efficiently reconstructed from few reads corrupted by deletions. Remarkably, we design such n-bit codes with rate 1-O(1/log n) that require exponentially fewer reads than average-case trace reconstruction algorithms.Open Acces
GROTESQUE: Noisy Group Testing (Quick and Efficient)
Group-testing refers to the problem of identifying (with high probability) a
(small) subset of defectives from a (large) set of items via a "small"
number of "pooled" tests. For ease of presentation in this work we focus on the
regime when D = \cO{N^{1-\gap}} for some \gap > 0. The tests may be
noiseless or noisy, and the testing procedure may be adaptive (the pool
defining a test may depend on the outcome of a previous test), or non-adaptive
(each test is performed independent of the outcome of other tests). A rich body
of literature demonstrates that tests are
information-theoretically necessary and sufficient for the group-testing
problem, and provides algorithms that achieve this performance. However, it is
only recently that reconstruction algorithms with computational complexity that
is sub-linear in have started being investigated (recent work by
\cite{GurI:04,IndN:10, NgoP:11} gave some of the first such algorithms). In the
scenario with adaptive tests with noisy outcomes, we present the first scheme
that is simultaneously order-optimal (up to small constant factors) in both the
number of tests and the decoding complexity (\cO{D\log(N)} in both the
performance metrics). The total number of stages of our adaptive algorithm is
"small" (\cO{\log(D)}). Similarly, in the scenario with non-adaptive tests
with noisy outcomes, we present the first scheme that is simultaneously
near-optimal in both the number of tests and the decoding complexity (via an
algorithm that requires \cO{D\log(D)\log(N)} tests and has a decoding
complexity of {}. Finally, we present an
adaptive algorithm that only requires 2 stages, and for which both the number
of tests and the decoding complexity scale as {}. For all three settings the probability of error of our
algorithms scales as \cO{1/(poly(D)}.Comment: 26 pages, 5 figure
Sparse graph codes for compression, sensing, and secrecy
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from student PDF version of thesis.Includes bibliographical references (p. 201-212).Sparse graph codes were first introduced by Gallager over 40 years ago. Over the last two decades, such codes have been the subject of intense research, and capacity approaching sparse graph codes with low complexity encoding and decoding algorithms have been designed for many channels. Motivated by the success of sparse graph codes for channel coding, we explore the use of sparse graph codes for four other problems related to compression, sensing, and security. First, we construct locally encodable and decodable source codes for a simple class of sources. Local encodability refers to the property that when the original source data changes slightly, the compression produced by the source code can be updated easily. Local decodability refers to the property that a single source symbol can be recovered without having to decode the entire source block. Second, we analyze a simple message-passing algorithm for compressed sensing recovery, and show that our algorithm provides a nontrivial f1/f1 guarantee. We also show that very sparse matrices and matrices whose entries must be either 0 or 1 have poor performance with respect to the restricted isometry property for the f2 norm. Third, we analyze the performance of a special class of sparse graph codes, LDPC codes, for the problem of quantizing a uniformly random bit string under Hamming distortion. We show that LDPC codes can come arbitrarily close to the rate-distortion bound using an optimal quantizer. This is a special case of a general result showing a duality between lossy source coding and channel coding-if we ignore computational complexity, then good channel codes are automatically good lossy source codes. We also prove a lower bound on the average degree of vertices in an LDPC code as a function of the gap to the rate-distortion bound. Finally, we construct efficient, capacity-achieving codes for the wiretap channel, a model of communication that allows one to provide information-theoretic, rather than computational, security guarantees. Our main results include the introduction of a new security critertion which is an information-theoretic analog of semantic security, the construction of capacity-achieving codes possessing strong security with nearly linear time encoding and decoding algorithms for any degraded wiretap channel, and the construction of capacity-achieving codes possessing semantic security with linear time encoding and decoding algorithms for erasure wiretap channels. Our analysis relies on a relatively small set of tools. One tool is density evolution, a powerful method for analyzing the behavior of message-passing algorithms on long, random sparse graph codes. Another concept we use extensively is the notion of an expander graph. Expander graphs have powerful properties that allow us to prove adversarial, rather than probabilistic, guarantees for message-passing algorithms. Expander graphs are also useful in the context of the wiretap channel because they provide a method for constructing randomness extractors. Finally, we use several well-known isoperimetric inequalities (Harper's inequality, Azuma's inequality, and the Gaussian Isoperimetric inequality) in our analysis of the duality between lossy source coding and channel coding.by Venkat Bala Chandar.Ph.D