Search CORE

64,801 research outputs found

Scanning and Sequential Decision Making for Multi-Dimensional Data - Part I: the Noiseless Case

Author: Cohen Asaf
Merhav Neri
Weissman Tsachy
Publication venue
Publication date: 08/05/2007
Field of study

We investigate the problem of scanning and prediction ("scandiction", for short) of multidimensional data arrays. This problem arises in several aspects of image and video processing, such as predictive coding, for example, where an image is compressed by coding the error sequence resulting from scandicting it. Thus, it is natural to ask what is the optimal method to scan and predict a given image, what is the resulting minimum prediction loss, and whether there exist specific scandiction schemes which are universal in some sense. Specifically, we investigate the following problems: First, modeling the data array as a random field, we wish to examine whether there exists a scandiction scheme which is independent of the field's distribution, yet asymptotically achieves the same performance as if this distribution was known. This question is answered in the affirmative for the set of all spatially stationary random fields and under mild conditions on the loss function. We then discuss the scenario where a non-optimal scanning order is used, yet accompanied by an optimal predictor, and derive bounds on the excess loss compared to optimal scanning and prediction. This paper is the first part of a two-part paper on sequential decision making for multi-dimensional data. It deals with clean, noiseless data arrays. The second part deals with noisy data arrays, namely, with the case where the decision maker observes only a noisy version of the data, yet it is judged with respect to the original, clean data.Comment: 46 pages, 2 figures. Revised version: title changed, section 1 revised, section 3.1 added, a few minor/technical corrections mad

arXiv.org e-Print Archive

Caltech Authors

Discrete Denoising with Shifts

Author: Senior Member
Taesup Moon
Tsachy Weissman
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

We introduce S-DUDE, a new algorithm for denoising DMC-corrupted data. The algorithm, which generalizes the recently introduced DUDE (Discrete Universal DEnoiser) of Weissman et al., aims to compete with a genie that has access, in addition to the noisy data, also to the underlying clean data, and can choose to switch, up to

m

times, between sliding window denoisers in a way that minimizes the overall loss. When the underlying data form an individual sequence, we show that the S-DUDE performs essentially as well as this genie, provided that

m

is sub-linear in the size of the data. When the clean data is emitted by a piecewise stationary process, we show that the S-DUDE achieves the optimum distribution-dependent performance, provided that the same sub-linearity condition is imposed on the number of switches. To further substantiate the universal optimality of the S-DUDE, we show that when the number of switches is allowed to grow linearly with the size of the data, \emph{any} (sequence of) scheme(s) fails to compete in the above senses. Using dynamic programming, we derive an efficient implementation of the S-DUDE, which has complexity (time and memory) growing only linearly with the data size and the number of switches

m

. Preliminary experimental results are presented, suggesting that S-DUDE has the capacity to significantly improve on the performance attained by the original DUDE in applications where the nature of the data abruptly changes in time (or space), as is often the case in practice.Comment: 30 pages, 3 figures, submitted to IEEE Trans. Inform. Theor

arXiv.org e-Print Archive

CiteSeerX

Crossref

A Tight Excess Risk Bound via a Unified PAC-Bayesian-Rademacher-Shtarkov-MDL Complexity

Author: Grünwald Peter D.
Mehta Nishant A.
Publication venue
Publication date: 20/10/2017
Field of study

We present a novel notion of complexity that interpolates between and generalizes some classic existing complexity notions in learning theory: for estimators like empirical risk minimization (ERM) with arbitrary bounded losses, it is upper bounded in terms of data-independent Rademacher complexity; for generalized Bayesian estimators, it is upper bounded by the data-dependent information complexity (also known as stochastic or PAC-Bayesian,

\mathrm{KL}(\text{posterior} \operatorname{\|} \text{prior})

complexity. For (penalized) ERM, the new complexity reduces to (generalized) normalized maximum likelihood (NML) complexity, i.e. a minimax log-loss individual-sequence regret. Our first main result bounds excess risk in terms of the new complexity. Our second main result links the new complexity via Rademacher complexity to

L_2(P)

entropy, thereby generalizing earlier results of Opper, Haussler, Lugosi, and Cesa-Bianchi who did the log-loss case with

L_\infty

. Together, these results recover optimal bounds for VC- and large (polynomial entropy) classes, replacing localized Rademacher complexity by a simpler analysis which almost completely separates the two aspects that determine the achievable rates: 'easiness' (Bernstein) conditions and model complexity.Comment: 38 page

arXiv.org e-Print Archive

CWI's Institutional Repository

Divide and Conquer Kernel Ridge Regression: A Distributed Algorithm with Minimax Optimal Rates

Author: Duchi John C.
Wainwright Martin J.
Zhang Yuchen
Publication venue
Publication date: 29/04/2014
Field of study

We establish optimal convergence rates for a decomposition-based scalable approach to kernel ridge regression. The method is simple to describe: it randomly partitions a dataset of size N into m subsets of equal size, computes an independent kernel ridge regression estimator for each subset, then averages the local solutions into a global predictor. This partitioning leads to a substantial reduction in computation time versus the standard approach of performing kernel ridge regression on all N samples. Our two main theorems establish that despite the computational speed-up, statistical optimality is retained: as long as m is not too large, the partition-based estimator achieves the statistical minimax rate over all estimators using the set of N samples. As concrete examples, our theory guarantees that the number of processors m may grow nearly linearly for finite-rank kernels and Gaussian kernels and polynomially in N for Sobolev spaces, which in turn allows for substantial reductions in computational cost. We conclude with experiments on both simulated data and a music-prediction task that complement our theoretical results, exhibiting the computational and statistical benefits of our approach

arXiv.org e-Print Archive

CiteSeerX