666 research outputs found
Discrete Denoising with Shifts
We introduce S-DUDE, a new algorithm for denoising DMC-corrupted data. The
algorithm, which generalizes the recently introduced DUDE (Discrete Universal
DEnoiser) of Weissman et al., aims to compete with a genie that has access, in
addition to the noisy data, also to the underlying clean data, and can choose
to switch, up to times, between sliding window denoisers in a way that
minimizes the overall loss. When the underlying data form an individual
sequence, we show that the S-DUDE performs essentially as well as this genie,
provided that is sub-linear in the size of the data. When the clean data is
emitted by a piecewise stationary process, we show that the S-DUDE achieves the
optimum distribution-dependent performance, provided that the same
sub-linearity condition is imposed on the number of switches. To further
substantiate the universal optimality of the S-DUDE, we show that when the
number of switches is allowed to grow linearly with the size of the data,
\emph{any} (sequence of) scheme(s) fails to compete in the above senses. Using
dynamic programming, we derive an efficient implementation of the S-DUDE, which
has complexity (time and memory) growing only linearly with the data size and
the number of switches . Preliminary experimental results are presented,
suggesting that S-DUDE has the capacity to significantly improve on the
performance attained by the original DUDE in applications where the nature of
the data abruptly changes in time (or space), as is often the case in practice.Comment: 30 pages, 3 figures, submitted to IEEE Trans. Inform. Theor
Scanning and Sequential Decision Making for Multidimensional Data -- Part II: The Noisy Case
We consider the problem of sequential decision making for random fields corrupted by noise. In this scenario, the decision maker observes a noisy version of the data, yet judged with respect to the clean data. In particular, we first consider the problem of scanning and sequentially filtering noisy random fields. In this case, the sequential filter is given the freedom to choose the path over which it traverses the random field (e.g., noisy image or video sequence), thus it is natural to ask what is the best achievable performance and how sensitive this performance is to the choice of the scan. We formally define the problem of scanning and filtering, derive a bound on the best achievable performance, and quantify the excess loss occurring when nonoptimal scanners are used, compared to optimal scanning and filtering. We then discuss the problem of scanning and prediction for noisy random fields. This setting is a natural model for applications such as restoration and coding of noisy images. We formally define the problem of scanning and prediction of a noisy multidimensional array and relate the optimal performance to the clean scandictability defined by Merhav and Weissman. Moreover, bounds on the excess loss due to suboptimal scans are derived, and a universal prediction algorithm is suggested. This paper is the second part of a two-part paper. The first paper dealt with scanning and sequential decision making on noiseless data arrays
Universal Minimax Discrete Denoising under Channel Uncertainty
The goal of a denoising algorithm is to recover a signal from its
noise-corrupted observations. Perfect recovery is seldom possible and
performance is measured under a given single-letter fidelity criterion. For
discrete signals corrupted by a known discrete memoryless channel, the DUDE was
recently shown to perform this task asymptotically optimally, without knowledge
of the statistical properties of the source. In the present work we address the
scenario where, in addition to the lack of knowledge of the source statistics,
there is also uncertainty in the channel characteristics. We propose a family
of discrete denoisers and establish their asymptotic optimality under a minimax
performance criterion which we argue is appropriate for this setting. As we
show elsewhere, the proposed schemes can also be implemented computationally
efficiently.Comment: Submitted to IEEE Transactions of Information Theor
Entropy rate calculations of algebraic measures
Let . We use a special class of translation invariant
measures on called algebraic measures to study the entropy rate
of a hidden Markov processes. Under some irreducibility assumptions of the
Markov transition matrix we derive exact formulas for the entropy rate of a
general state hidden Markov process derived from a Markov source corrupted
by a specific noise model. We obtain upper bounds on the error when using an
approximation to the formulas and numerically compute the entropy rates of two
and three state hidden Markov models
Information Theoretic Principles of Universal Discrete Denoising
Today, the internet makes tremendous amounts of data widely available. Often,
the same information is behind multiple different available data sets. This
lends growing importance to latent variable models that try to learn the hidden
information from the available imperfect versions. For example, social media
platforms can contain an abundance of pictures of the same person or object,
yet all of which are taken from different perspectives. In a simplified
scenario, one may consider pictures taken from the same perspective, which are
distorted by noise. This latter application allows for a rigorous mathematical
treatment, which is the content of this contribution. We apply a recently
developed method of dependent component analysis to image denoising when
multiple distorted copies of one and the same image are available, each being
corrupted by a different and unknown noise process. In a simplified scenario,
we assume that the distorted image is corrupted by noise that acts
independently on each pixel. We answer completely the question of how to
perform optimal denoising, when at least three distorted copies are available:
First we define optimality of an algorithm in the presented scenario, and then
we describe an aymptotically optimal universal discrete denoising algorithm
(UDDA). In the case of binary data and binary symmetric noise, we develop a
simplified variant of the algorithm, dubbed BUDDA, which we prove to attain
universal denoising uniformly.Comment: 10 pages, 6 figure
DUDE-Seq: Fast, Flexible, and Robust Denoising for Targeted Amplicon Sequencing
We consider the correction of errors from nucleotide sequences produced by
next-generation targeted amplicon sequencing. The next-generation sequencing
(NGS) platforms can provide a great deal of sequencing data thanks to their
high throughput, but the associated error rates often tend to be high.
Denoising in high-throughput sequencing has thus become a crucial process for
boosting the reliability of downstream analyses. Our methodology, named
DUDE-Seq, is derived from a general setting of reconstructing finite-valued
source data corrupted by a discrete memoryless channel and effectively corrects
substitution and homopolymer indel errors, the two major types of sequencing
errors in most high-throughput targeted amplicon sequencing platforms. Our
experimental studies with real and simulated datasets suggest that the proposed
DUDE-Seq not only outperforms existing alternatives in terms of
error-correction capability and time efficiency, but also boosts the
reliability of downstream analyses. Further, the flexibility of DUDE-Seq
enables its robust application to different sequencing platforms and analysis
pipelines by simple updates of the noise model. DUDE-Seq is available at
http://data.snu.ac.kr/pub/dude-seq
Confidence Sets in Time-Series Filtering
The problem of filtering of finite-alphabet stationary ergodic time series is
considered. A method for constructing a confidence set for the (unknown) signal
is proposed, such that the resulting set has the following properties: First,
it includes the unknown signal with probability , where is a
parameter supplied to the filter. Second, the size of the confidence sets grows
exponentially with the rate that is asymptotically equal to the conditional
entropy of the signal given the data. Moreover, it is shown that this rate is
optimal.Comment: some of the results were reported at ISIT2011, St. Petersburg,
Russia, pp. 2436-243
- …