    A Universal Scheme for Wyner–Ziv Coding of Discrete Sources

    We consider the Wyner–Ziv (WZ) problem of lossy compression where the decompressor observes a noisy version of the source, whose statistics are unknown. A new family of WZ coding algorithms is proposed and their universal optimality is proven. Compression consists of sliding-window processing followed by Lempel–Ziv (LZ) compression, while the decompressor is based on a modification of the discrete universal denoiser (DUDE) algorithm to take advantage of side information. The new algorithms not only universally attain the fundamental limits, but also suggest a paradigm for practical WZ coding. The effectiveness of our approach is illustrated with experiments on binary images, and English text using a low complexity algorithm motivated by our class of universally optimal WZ codes

    Discrete Denoising with Shifts

    We introduce S-DUDE, a new algorithm for denoising DMC-corrupted data. The algorithm, which generalizes the recently introduced DUDE (Discrete Universal DEnoiser) of Weissman et al., aims to compete with a genie that has access, in addition to the noisy data, also to the underlying clean data, and can choose to switch, up to mm times, between sliding window denoisers in a way that minimizes the overall loss. When the underlying data form an individual sequence, we show that the S-DUDE performs essentially as well as this genie, provided that mm is sub-linear in the size of the data. When the clean data is emitted by a piecewise stationary process, we show that the S-DUDE achieves the optimum distribution-dependent performance, provided that the same sub-linearity condition is imposed on the number of switches. To further substantiate the universal optimality of the S-DUDE, we show that when the number of switches is allowed to grow linearly with the size of the data, \emph{any} (sequence of) scheme(s) fails to compete in the above senses. Using dynamic programming, we derive an efficient implementation of the S-DUDE, which has complexity (time and memory) growing only linearly with the data size and the number of switches mm. Preliminary experimental results are presented, suggesting that S-DUDE has the capacity to significantly improve on the performance attained by the original DUDE in applications where the nature of the data abruptly changes in time (or space), as is often the case in practice.Comment: 30 pages, 3 figures, submitted to IEEE Trans. Inform. Theor

    Universal Minimax Discrete Denoising under Channel Uncertainty

    The goal of a denoising algorithm is to recover a signal from its noise-corrupted observations. Perfect recovery is seldom possible and performance is measured under a given single-letter fidelity criterion. For discrete signals corrupted by a known discrete memoryless channel, the DUDE was recently shown to perform this task asymptotically optimally, without knowledge of the statistical properties of the source. In the present work we address the scenario where, in addition to the lack of knowledge of the source statistics, there is also uncertainty in the channel characteristics. We propose a family of discrete denoisers and establish their asymptotic optimality under a minimax performance criterion which we argue is appropriate for this setting. As we show elsewhere, the proposed schemes can also be implemented computationally efficiently.Comment: Submitted to IEEE Transactions of Information Theor

    DUDE-Seq: Fast, Flexible, and Robust Denoising for Targeted Amplicon Sequencing

    We consider the correction of errors from nucleotide sequences produced by next-generation targeted amplicon sequencing. The next-generation sequencing (NGS) platforms can provide a great deal of sequencing data thanks to their high throughput, but the associated error rates often tend to be high. Denoising in high-throughput sequencing has thus become a crucial process for boosting the reliability of downstream analyses. Our methodology, named DUDE-Seq, is derived from a general setting of reconstructing finite-valued source data corrupted by a discrete memoryless channel and effectively corrects substitution and homopolymer indel errors, the two major types of sequencing errors in most high-throughput targeted amplicon sequencing platforms. Our experimental studies with real and simulated datasets suggest that the proposed DUDE-Seq not only outperforms existing alternatives in terms of error-correction capability and time efficiency, but also boosts the reliability of downstream analyses. Further, the flexibility of DUDE-Seq enables its robust application to different sequencing platforms and analysis pipelines by simple updates of the noise model. DUDE-Seq is available at http://data.snu.ac.kr/pub/dude-seq

    Methodology for Analyzing and Characterizing Error Generation in Presence of Autocorrelated Demands in Stochastic Inventory Models

    Most techniques that describe and solve stochastic inventory problems rely upon the assumption of identically and independently distributed (IID) demands. Stochastic inventory formulations that fail to capture serially-correlated components in the demand lead to serious errors. This dissertation provides a robust method that approximates solutions to the stochastic inventory problem where the control review system is continuous, the demand contains autocorrelated components, and the lost sales case is considered. A simulation optimization technique based on simulated annealing (SA), pattern search (PS), and ranking and selection (R&S) is developed and used to generate near-optimal solutions. The proposed method accounts for the randomness and dependency of the demand as well as for the inherent constraints of the inventory model. The impact of serially-correlated demand is investigated for discrete and continuous dependent input models. For the discrete dependent model, the autocorrelated demand is assumed to behave as a discrete Markov-modulated chain (DMC), while a first-order autoregressive AR(1) process is assumed for describing the continuous demand. The effects of these demand patterns combined with structural cost variations on estimating both total costs and control policy parameters were examined. Results demonstrated that formulations that ignore the serially-correlated component performed worse than those that considered it. In this setting, the effect of holding cost and its interaction with penalty cost become stronger and more significant as the serially-correlated component increases. The growth rate in the error generated in total costs by formulations that ignore dependency components is significant and fits exponential models. To verify the effectiveness of the proposed simulation optimization method for finding the near-optimal inventory policy at different levels of autocorrelation factors, total costs, and stockout rates were estimated. The results provide additional evidence that serially-correlated components in the demand have a relevant impact on determining inventory control policies and estimating measurement of performance

    Maximal-Capacity Discrete Memoryless Channel Identification

    The problem of identifying the channel with the highest capacity among several discrete memoryless channels (DMCs) is considered. The problem is cast as a pure-exploration multi-armed bandit problem, which follows the practical use of training sequences to sense the communication channel statistics. A capacity estimator is proposed and tight confidence bounds on the estimator error are derived. Based on this capacity estimator, a gap-elimination algorithm termed BestChanID is proposed, which is oblivious to the capacity-achieving input distribution and is guaranteed to output the DMC with the largest capacity, with a desired confidence. Furthermore, two additional algorithms NaiveChanSel and MedianChanEl, that output with certain confidence a DMC with capacity close to the maximal, are introduced. Each of those algorithms is beneficial in a different regime and can be used as a subroutine in BestChanID. The sample complexity of all algorithms is analyzed as a function of the desired confidence parameter, the number of channels, and the channels' input and output alphabet sizes. The cost of best channel identification is shown to scale quadratically with the alphabet size, and a fundamental lower bound for the required number of channel senses to identify the best channel with a certain confidence is derived

    Channels That Die

    Given the possibility of communication systems failing catastrophically, we investigate limits to communicating over channels that fail at random times. These channels are finite-state semi-Markov channels. We show that communication with arbitrarily small probability of error is not possible. Making use of results in finite blocklength channel coding, we determine sequences of blocklengths that optimize transmission volume communicated at fixed maximum message error probabilities. We provide a partial ordering of communication channels. A dynamic programming formulation is used to show the structural result that channel state feedback does not improve performance
