Search CORE

7 research outputs found

DUDE-Seq: Fast, Flexible, and Robust Denoising for Targeted Amplicon Sequencing

Author: Lee Byunghan
Moon Taesup
Weissman Tsachy
Yoon Sungroh
Publication venue
Publication date: 01/01/2017
Field of study

We consider the correction of errors from nucleotide sequences produced by next-generation targeted amplicon sequencing. The next-generation sequencing (NGS) platforms can provide a great deal of sequencing data thanks to their high throughput, but the associated error rates often tend to be high. Denoising in high-throughput sequencing has thus become a crucial process for boosting the reliability of downstream analyses. Our methodology, named DUDE-Seq, is derived from a general setting of reconstructing finite-valued source data corrupted by a discrete memoryless channel and effectively corrects substitution and homopolymer indel errors, the two major types of sequencing errors in most high-throughput targeted amplicon sequencing platforms. Our experimental studies with real and simulated datasets suggest that the proposed DUDE-Seq not only outperforms existing alternatives in terms of error-correction capability and time efficiency, but also boosts the reliability of downstream analyses. Further, the flexibility of DUDE-Seq enables its robust application to different sequencing platforms and analysis pipelines by simple updates of the noise model. DUDE-Seq is available at http://data.snu.ac.kr/pub/dude-seq

arXiv.org e-Print Archive

Directory of Open Access Journals

FigShare

Discrete Denoising with Shifts

Author: Senior Member
Taesup Moon
Tsachy Weissman
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

We introduce S-DUDE, a new algorithm for denoising DMC-corrupted data. The algorithm, which generalizes the recently introduced DUDE (Discrete Universal DEnoiser) of Weissman et al., aims to compete with a genie that has access, in addition to the noisy data, also to the underlying clean data, and can choose to switch, up to

m

times, between sliding window denoisers in a way that minimizes the overall loss. When the underlying data form an individual sequence, we show that the S-DUDE performs essentially as well as this genie, provided that

m

is sub-linear in the size of the data. When the clean data is emitted by a piecewise stationary process, we show that the S-DUDE achieves the optimum distribution-dependent performance, provided that the same sub-linearity condition is imposed on the number of switches. To further substantiate the universal optimality of the S-DUDE, we show that when the number of switches is allowed to grow linearly with the size of the data, \emph{any} (sequence of) scheme(s) fails to compete in the above senses. Using dynamic programming, we derive an efficient implementation of the S-DUDE, which has complexity (time and memory) growing only linearly with the data size and the number of switches

m

. Preliminary experimental results are presented, suggesting that S-DUDE has the capacity to significantly improve on the performance attained by the original DUDE in applications where the nature of the data abruptly changes in time (or space), as is often the case in practice.Comment: 30 pages, 3 figures, submitted to IEEE Trans. Inform. Theor

arXiv.org e-Print Archive

CiteSeerX

Crossref

Universal Minimax Discrete Denoising under Channel Uncertainty

Author: Gemelos George
Sigurjonsson Styrmir
Weissman Tsachy
Publication venue
Publication date: 17/08/2005
Field of study

The goal of a denoising algorithm is to recover a signal from its noise-corrupted observations. Perfect recovery is seldom possible and performance is measured under a given single-letter fidelity criterion. For discrete signals corrupted by a known discrete memoryless channel, the DUDE was recently shown to perform this task asymptotically optimally, without knowledge of the statistical properties of the source. In the present work we address the scenario where, in addition to the lack of knowledge of the source statistics, there is also uncertainty in the channel characteristics. We propose a family of discrete denoisers and establish their asymptotic optimality under a minimax performance criterion which we argue is appropriate for this setting. As we show elsewhere, the proposed schemes can also be implemented computationally efficiently.Comment: Submitted to IEEE Transactions of Information Theor

arXiv.org e-Print Archive

CiteSeerX

Universal Denoising for the Finite-Input-General-Output Channel

Author: Amir Dembo
Tsachy Weissman
Publication venue
Publication date: 01/01/2005
Field of study

We consider the problem of reconstructing a finite-alphabet signal corrupted by a known memoryless channel with a general output alphabet. The goodness of the reconstruction is measured by a given loss function. We (constructively) establish the existence of a universal (sequence of) denoiser(s) attaining asymptotically the optimum distribution-dependent performance for any stationary source that may be generating the noiseless signal. We show, in fact, that there is a whole family of denoiser sequences with this property. These schemes are shown to be universal also in a semi-stochastic setting, where the only randomness assumed is that associated with the channel noise. The scheme is practical, with complexity O(n ) (for any # > 0) and working storage size sub-linear in the input data length. This extends recent work that presented a discrete universal denoiser for recovering a discrete source corrupted by a DMC

CiteSeerX

Universal Denoising for the Finite-Input General-Output Channel

Author: A. Dembo
T. Weissman
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref