Search CORE

717 research outputs found

Optimum Self Random Number Generation Rate and Its Application to Rate Distortion Perception Function

Author: Nomura Ryo
Publication venue
Publication date: 31/01/2024
Field of study

The self-random number generation (SRNG) problem is considered for general setting. In the literature, the optimum SRNG rate with respect to the variational distance has been discussed. In this paper, we first try to characterize the optimum SRNG rate with respect to a subclass of

f

-divergences. The subclass of

f

-divergences considered in this paper includes typical distance measures such as the variational distance, the KL divergence, the Hellinger distance and so on. Hence our result can be considered as a generalization of the previous result with respect to the variational distance. Next, we consider the obtained optimum SRNG rate from several viewpoints. The

\varepsilon

-source coding problem is one of related problems with the SRNG problem. Our results reveal how the SRNG problem with the

f

-divergence relate to the

\varepsilon

-fixed-length source coding problem. We also apply our results to the rate distortion perception (RDP) function. As a result, we can establish a lower bound for the RDP function with respect to

f

-divergences using our findings. Finally, we discuss the representation of the optimum SRNG rate using the smooth R\'enyi entropy

arXiv.org e-Print Archive

Information-based complexity, feedback and dynamics in convex programming

Author: Raginsky Maxim
Rakhlin Alexander
Publication venue
Publication date: 01/01/2011
Field of study

We study the intrinsic limitations of sequential convex optimization through the lens of feedback information theory. In the oracle model of optimization, an algorithm queries an {\em oracle} for noisy information about the unknown objective function, and the goal is to (approximately) minimize every function in a given class using as few queries as possible. We show that, in order for a function to be optimized, the algorithm must be able to accumulate enough information about the objective. This, in turn, puts limits on the speed of optimization under specific assumptions on the oracle and the type of feedback. Our techniques are akin to the ones used in statistical literature to obtain minimax lower bounds on the risks of estimation procedures; the notable difference is that, unlike in the case of i.i.d. data, a sequential optimization algorithm can gather observations in a {\em controlled} manner, so that the amount of information at each step is allowed to change in time. In particular, we show that optimization algorithms often obey the law of diminishing returns: the signal-to-noise ratio drops as the optimization algorithm approaches the optimum. To underscore the generality of the tools, we use our approach to derive fundamental lower bounds for a certain active learning problem. Overall, the present work connects the intuitive notions of information in optimization, experimental design, estimation, and active learning to the quantitative notion of Shannon information.Comment: final version; to appear in IEEE Transactions on Information Theor

arXiv.org e-Print Archive

CiteSeerX

ScholarlyCommons@Penn

Adaptive sensing performance lower bounds for sparse signal detection and support estimation

Author: Castro Rui M.
Publication venue: 'Bernoulli Society for Mathematical Statistics and Probability'
Publication date: 01/01/2014
Field of study

This paper gives a precise characterization of the fundamental limits of adaptive sensing for diverse estimation and testing problems concerning sparse signals. We consider in particular the setting introduced in (IEEE Trans. Inform. Theory 57 (2011) 6222-6235) and show necessary conditions on the minimum signal magnitude for both detection and estimation: if

{\mathbf {x}}\in \mathbb{R}^n

is a sparse vector with

s

non-zero components then it can be reliably detected in noise provided the magnitude of the non-zero components exceeds

\sqrt{2/s}

. Furthermore, the signal support can be exactly identified provided the minimum magnitude exceeds

\sqrt{2\log s}

. Notably there is no dependence on

n

, the extrinsic signal dimension. These results show that the adaptive sensing methodologies proposed previously in the literature are essentially optimal, and cannot be substantially improved. In addition, these results provide further insights on the limits of adaptive compressive sensing.Comment: Published in at http://dx.doi.org/10.3150/13-BEJ555 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

arXiv.org e-Print Archive

Repository TU/e

Crossref

Pure OAI Repository

Replicability in Reinforcement Learning

Author: Karbasi Amin
Velegkas Grigoris
Yang Lin F.
Zhou Felix
Publication venue
Publication date: 27/10/2023
Field of study

We initiate the mathematical study of replicability as an algorithmic property in the context of reinforcement learning (RL). We focus on the fundamental setting of discounted tabular MDPs with access to a generative model. Inspired by Impagliazzo et al. [2022], we say that an RL algorithm is replicable if, with high probability, it outputs the exact same policy after two executions on i.i.d. samples drawn from the generator when its internal randomness is the same. We first provide an efficient

\rho

-replicable algorithm for

(\varepsilon, \delta)

-optimal policy estimation with sample and time complexity

\widetilde O\left(\frac{N^3\cdot\log(1/\delta)}{(1-\gamma)^5\cdot\varepsilon^2\cdot\rho^2}\right)

, where

N

is the number of state-action pairs. Next, for the subclass of deterministic algorithms, we provide a lower bound of order

\Omega\left(\frac{N^3}{(1-\gamma)^3\cdot\varepsilon^2\cdot\rho^2}\right)

. Then, we study a relaxed version of replicability proposed by Kalavasis et al. [2023] called TV indistinguishability. We design a computationally efficient TV indistinguishable algorithm for policy estimation whose sample complexity is

\widetilde O\left(\frac{N^2\cdot\log(1/\delta)}{(1-\gamma)^5\cdot\varepsilon^2\cdot\rho^2}\right)

. At the cost of

\exp(N)

running time, we transform these TV indistinguishable algorithms to

\rho

-replicable ones without increasing their sample complexity. Finally, we introduce the notion of approximate-replicability where we only require that two outputted policies are close under an appropriate statistical divergence (e.g., Renyi) and show an improved sample complexity of

\widetilde O\left(\frac{N\cdot\log(1/\delta)}{(1-\gamma)^5\cdot\varepsilon^2\cdot\rho^2}\right)

.Comment: to be published in neurips 202

arXiv.org e-Print Archive

Linear stochastic dynamics with nonlinear fractal properties

Author: Arnéodo
Bak
Benzi
Choquet
Cloern
Cohen
Cook
de Calan
Didier Sornette
Feigenbaum
Fisher
Graham
Grinstein
Gruber
Halsey
Heagy
Huberman
Hughes
Jones
Jögi
Kesten
Levy
Mantegna
Mantegna
May
Nieuwenhuizen
Nieuwenhuizen
Ouillon
Platt
Pratt
Redner
Rhodes
Saleur
Saleur
Schenzle
Sornette
Sornette
Sornette
Sornette
Sugihara
Takayasu
Takayasu
Tu
Weisbuch
Publication venue: 'Elsevier BV'
Publication date: 09/09/1997
Field of study

Stochastic processes with multiplicative noise have been studied independently in several different contexts over the past decades. We focus on the regime, found for a generic set of control parameters, in which stochastic processes with multiplicative noise produce intermittency of a special kind, characterized by a power law probability density distribution. We present a review of applications on population dynamics, epidemics, finance and insurance applications with relation to ARCH(1) process, immigration and investment portfolios and the internet. We highlight the common physical mechanism and summarize the main known results. The distribution and statistical properties of the duration of intermittent bursts are also characterized in details.Comment: 26 pages, Physica A (in press

arXiv.org e-Print Archive

Crossref

A Model for Prejudiced Learning in Noisy Environments

Author: Andreas U. Schmidt
Arnold
Benzi
Benzi
Binney
Bollt
Bonanno
Bshouty
Chapeau-Blondeau
Comellasa
Crutchfield
Dellnitz
Dykman
Fronzoni
Gao
Grimm
Hastie
Hecht
Johansen
Krawieckia
Lasota
Lee
Lee
Loreto
Loreto
Newman
Newman
Rinaldi
Sri Namachchivaya
Sutton
Vapnik
Watts
Yoshida
Publication venue: 'Elsevier BV'
Publication date: 01/01/2004
Field of study

Based on the heuristics that maintaining presumptions can be beneficial in uncertain environments, we propose a set of basic axioms for learning systems to incorporate the concept of prejudice. The simplest, memoryless model of a deterministic learning rule obeying the axioms is constructed, and shown to be equivalent to the logistic map. The system's performance is analysed in an environment in which it is subject to external randomness, weighing learning defectiveness against stability gained. The corresponding random dynamical system with inhomogeneous, additive noise is studied, and shown to exhibit the phenomena of noise induced stability and stochastic bifurcations. The overall results allow for the interpretation that prejudice in uncertain environments entails a considerable portion of stubbornness as a secondary phenomenon.Comment: 21 pages, 11 figures; reduced graphics to slash size, full version on Author's homepage. Minor revisions in text and references, identical to version to be published in Applied Mathematics and Computatio

arXiv.org e-Print Archive

CiteSeerX

Crossref

Is Pessimism Provably Efficient for Offline RL?

Author: Jin Ying
Wang Zhaoran
Yang Zhuoran
Publication venue
Publication date: 30/12/2020
Field of study

We study offline reinforcement learning (RL), which aims to learn an optimal policy based on a dataset collected a priori. Due to the lack of further interactions with the environment, offline RL suffers from the insufficient coverage of the dataset, which eludes most existing theoretical analysis. In this paper, we propose a pessimistic variant of the value iteration algorithm (PEVI), which incorporates an uncertainty quantifier as the penalty function. Such a penalty function simply flips the sign of the bonus function for promoting exploration in online RL, which makes it easily implementable and compatible with general function approximators. Without assuming the sufficient coverage of the dataset, we establish a data-dependent upper bound on the suboptimality of PEVI for general Markov decision processes (MDPs). When specialized to linear MDPs, it matches the information-theoretic lower bound up to multiplicative factors of the dimension and horizon. In other words, pessimism is not only provably efficient but also minimax optimal. In particular, given the dataset, the learned policy serves as the ``best effort'' among all policies, as no other policies can do better. Our theoretical analysis identifies the critical role of pessimism in eliminating a notion of spurious correlation, which emerges from the ``irrelevant'' trajectories that are less covered by the dataset and not informative for the optimal policy.Comment: 53 pages, 3 figure

arXiv.org e-Print Archive