1,530 research outputs found
A Universal Scheme for WynerâZiv Coding of Discrete Sources
We consider the WynerâZiv (WZ) problem of lossy compression where the decompressor observes a noisy version of the source, whose statistics are unknown. A new family of WZ coding algorithms is proposed and their universal optimality is proven. Compression consists of sliding-window processing followed by LempelâZiv (LZ) compression, while the decompressor is based on a modification of the discrete universal denoiser (DUDE) algorithm to take advantage of side information. The new algorithms not only universally attain the fundamental limits, but also suggest a paradigm for practical WZ coding. The effectiveness of our approach is illustrated with experiments on binary images, and English text using a low complexity algorithm motivated by our class of universally optimal WZ codes
Discrete Denoising with Shifts
We introduce S-DUDE, a new algorithm for denoising DMC-corrupted data. The
algorithm, which generalizes the recently introduced DUDE (Discrete Universal
DEnoiser) of Weissman et al., aims to compete with a genie that has access, in
addition to the noisy data, also to the underlying clean data, and can choose
to switch, up to times, between sliding window denoisers in a way that
minimizes the overall loss. When the underlying data form an individual
sequence, we show that the S-DUDE performs essentially as well as this genie,
provided that is sub-linear in the size of the data. When the clean data is
emitted by a piecewise stationary process, we show that the S-DUDE achieves the
optimum distribution-dependent performance, provided that the same
sub-linearity condition is imposed on the number of switches. To further
substantiate the universal optimality of the S-DUDE, we show that when the
number of switches is allowed to grow linearly with the size of the data,
\emph{any} (sequence of) scheme(s) fails to compete in the above senses. Using
dynamic programming, we derive an efficient implementation of the S-DUDE, which
has complexity (time and memory) growing only linearly with the data size and
the number of switches . Preliminary experimental results are presented,
suggesting that S-DUDE has the capacity to significantly improve on the
performance attained by the original DUDE in applications where the nature of
the data abruptly changes in time (or space), as is often the case in practice.Comment: 30 pages, 3 figures, submitted to IEEE Trans. Inform. Theor
Universal Minimax Discrete Denoising under Channel Uncertainty
The goal of a denoising algorithm is to recover a signal from its
noise-corrupted observations. Perfect recovery is seldom possible and
performance is measured under a given single-letter fidelity criterion. For
discrete signals corrupted by a known discrete memoryless channel, the DUDE was
recently shown to perform this task asymptotically optimally, without knowledge
of the statistical properties of the source. In the present work we address the
scenario where, in addition to the lack of knowledge of the source statistics,
there is also uncertainty in the channel characteristics. We propose a family
of discrete denoisers and establish their asymptotic optimality under a minimax
performance criterion which we argue is appropriate for this setting. As we
show elsewhere, the proposed schemes can also be implemented computationally
efficiently.Comment: Submitted to IEEE Transactions of Information Theor
DUDE-Seq: Fast, Flexible, and Robust Denoising for Targeted Amplicon Sequencing
We consider the correction of errors from nucleotide sequences produced by
next-generation targeted amplicon sequencing. The next-generation sequencing
(NGS) platforms can provide a great deal of sequencing data thanks to their
high throughput, but the associated error rates often tend to be high.
Denoising in high-throughput sequencing has thus become a crucial process for
boosting the reliability of downstream analyses. Our methodology, named
DUDE-Seq, is derived from a general setting of reconstructing finite-valued
source data corrupted by a discrete memoryless channel and effectively corrects
substitution and homopolymer indel errors, the two major types of sequencing
errors in most high-throughput targeted amplicon sequencing platforms. Our
experimental studies with real and simulated datasets suggest that the proposed
DUDE-Seq not only outperforms existing alternatives in terms of
error-correction capability and time efficiency, but also boosts the
reliability of downstream analyses. Further, the flexibility of DUDE-Seq
enables its robust application to different sequencing platforms and analysis
pipelines by simple updates of the noise model. DUDE-Seq is available at
http://data.snu.ac.kr/pub/dude-seq
Methodology for Analyzing and Characterizing Error Generation in Presence of Autocorrelated Demands in Stochastic Inventory Models
Most techniques that describe and solve stochastic inventory problems rely upon the assumption of identically and independently distributed (IID) demands. Stochastic inventory formulations that fail to capture serially-correlated components in the demand lead to serious errors. This dissertation provides a robust method that approximates solutions to the stochastic inventory problem where the control review system is continuous, the demand contains autocorrelated components, and the lost sales case is considered. A simulation optimization technique based on simulated annealing (SA), pattern search (PS), and ranking and selection (R&S) is developed and used to generate near-optimal solutions. The proposed method accounts for the randomness and dependency of the demand as well as for the inherent constraints of the inventory model.
The impact of serially-correlated demand is investigated for discrete and continuous dependent input models. For the discrete dependent model, the autocorrelated demand is assumed to behave as a discrete Markov-modulated chain (DMC), while a first-order autoregressive AR(1) process is assumed for describing the continuous demand. The effects of these demand patterns combined with structural cost variations on estimating both total costs and control policy parameters were examined.
Results demonstrated that formulations that ignore the serially-correlated component performed worse than those that considered it. In this setting, the effect of holding cost and its interaction with penalty cost become stronger and more significant as the serially-correlated component increases. The growth rate in the error generated in total costs by formulations that ignore dependency components is significant and fits exponential models.
To verify the effectiveness of the proposed simulation optimization method for finding the near-optimal inventory policy at different levels of autocorrelation factors, total costs, and stockout rates were estimated. The results provide additional evidence that serially-correlated components in the demand have a relevant impact on determining inventory control policies and estimating measurement of performance
Maximal-Capacity Discrete Memoryless Channel Identification
The problem of identifying the channel with the highest capacity among
several discrete memoryless channels (DMCs) is considered. The problem is cast
as a pure-exploration multi-armed bandit problem, which follows the practical
use of training sequences to sense the communication channel statistics. A
capacity estimator is proposed and tight confidence bounds on the estimator
error are derived. Based on this capacity estimator, a gap-elimination
algorithm termed BestChanID is proposed, which is oblivious to the
capacity-achieving input distribution and is guaranteed to output the DMC with
the largest capacity, with a desired confidence. Furthermore, two additional
algorithms NaiveChanSel and MedianChanEl, that output with certain confidence a
DMC with capacity close to the maximal, are introduced. Each of those
algorithms is beneficial in a different regime and can be used as a subroutine
in BestChanID. The sample complexity of all algorithms is analyzed as a
function of the desired confidence parameter, the number of channels, and the
channels' input and output alphabet sizes. The cost of best channel
identification is shown to scale quadratically with the alphabet size, and a
fundamental lower bound for the required number of channel senses to identify
the best channel with a certain confidence is derived
Channels That Die
Given the possibility of communication systems failing catastrophically, we
investigate limits to communicating over channels that fail at random times.
These channels are finite-state semi-Markov channels. We show that
communication with arbitrarily small probability of error is not possible.
Making use of results in finite blocklength channel coding, we determine
sequences of blocklengths that optimize transmission volume communicated at
fixed maximum message error probabilities. We provide a partial ordering of
communication channels. A dynamic programming formulation is used to show the
structural result that channel state feedback does not improve performance
- âŠ