6 research outputs found
Information Theoretic Principles of Universal Discrete Denoising
Today, the internet makes tremendous amounts of data widely available. Often,
the same information is behind multiple different available data sets. This
lends growing importance to latent variable models that try to learn the hidden
information from the available imperfect versions. For example, social media
platforms can contain an abundance of pictures of the same person or object,
yet all of which are taken from different perspectives. In a simplified
scenario, one may consider pictures taken from the same perspective, which are
distorted by noise. This latter application allows for a rigorous mathematical
treatment, which is the content of this contribution. We apply a recently
developed method of dependent component analysis to image denoising when
multiple distorted copies of one and the same image are available, each being
corrupted by a different and unknown noise process. In a simplified scenario,
we assume that the distorted image is corrupted by noise that acts
independently on each pixel. We answer completely the question of how to
perform optimal denoising, when at least three distorted copies are available:
First we define optimality of an algorithm in the presented scenario, and then
we describe an aymptotically optimal universal discrete denoising algorithm
(UDDA). In the case of binary data and binary symmetric noise, we develop a
simplified variant of the algorithm, dubbed BUDDA, which we prove to attain
universal denoising uniformly.Comment: 10 pages, 6 figure
A Universal Scheme for WynerâZiv Coding of Discrete Sources
We consider the WynerâZiv (WZ) problem of lossy compression where the decompressor observes a noisy version of the source, whose statistics are unknown. A new family of WZ coding algorithms is proposed and their universal optimality is proven. Compression consists of sliding-window processing followed by LempelâZiv (LZ) compression, while the decompressor is based on a modification of the discrete universal denoiser (DUDE) algorithm to take advantage of side information. The new algorithms not only universally attain the fundamental limits, but also suggest a paradigm for practical WZ coding. The effectiveness of our approach is illustrated with experiments on binary images, and English text using a low complexity algorithm motivated by our class of universally optimal WZ codes
DUDE-Seq: Fast, Flexible, and Robust Denoising for Targeted Amplicon Sequencing
We consider the correction of errors from nucleotide sequences produced by
next-generation targeted amplicon sequencing. The next-generation sequencing
(NGS) platforms can provide a great deal of sequencing data thanks to their
high throughput, but the associated error rates often tend to be high.
Denoising in high-throughput sequencing has thus become a crucial process for
boosting the reliability of downstream analyses. Our methodology, named
DUDE-Seq, is derived from a general setting of reconstructing finite-valued
source data corrupted by a discrete memoryless channel and effectively corrects
substitution and homopolymer indel errors, the two major types of sequencing
errors in most high-throughput targeted amplicon sequencing platforms. Our
experimental studies with real and simulated datasets suggest that the proposed
DUDE-Seq not only outperforms existing alternatives in terms of
error-correction capability and time efficiency, but also boosts the
reliability of downstream analyses. Further, the flexibility of DUDE-Seq
enables its robust application to different sequencing platforms and analysis
pipelines by simple updates of the noise model. DUDE-Seq is available at
http://data.snu.ac.kr/pub/dude-seq
Algorithms for discrete denoising under channel uncertainty,â Submitted to
The goal of a denoising algorithm is to reconstruct a signal from its noise-corrupted observations. Perfect reconstruction is seldom possible and performance is measured under a given fidelity criterion. In a recent work, the authors addressed the problem of denoising unknown discrete signals corrupted by a discrete memoryless channel when the channel, rather than being completely known, is only known to lie in some uncertainty set of possible channels. A (sequence of) denoiser(s) was derived for this case and shown to be asymptotically optimal with respect to a worst-case criterion argued most relevant to this setting. In the present work we address the implementation and complexity of this denoiser, establishing its practicality. We also present empirical results suggesting the potential of these schemes to do well in practice. A key component of our schemes is an estimator of the subset of channels in the uncertainty set that are feasible in the sense of giving rise to the noisy signal statistics for some noiseless signal distribution. We establish the efficiency of this estimator (theoretically, algorithmically, and experimentally). We also present a modification of the recently developed discrete universal denoiser (DUDE) that assumes a channel based on the said estimator, and show that, in practice, the resulting scheme often performs comparably to our asymptotically minimax schemes. For concreteness, we focus on the binary alphabet case, but also discuss the extensions of the algorithms to general finite alphabets. Index Terms Channel Uncertainty, convex optimization, denoising algorithms, discrete universal denoising, image denoising, minimax schemes. I