717 research outputs found
Optimum Self Random Number Generation Rate and Its Application to Rate Distortion Perception Function
The self-random number generation (SRNG) problem is considered for general
setting. In the literature, the optimum SRNG rate with respect to the
variational distance has been discussed. In this paper, we first try to
characterize the optimum SRNG rate with respect to a subclass of
-divergences. The subclass of -divergences considered in this paper
includes typical distance measures such as the variational distance, the KL
divergence, the Hellinger distance and so on. Hence our result can be
considered as a generalization of the previous result with respect to the
variational distance. Next, we consider the obtained optimum SRNG rate from
several viewpoints. The -source coding problem is one of related
problems with the SRNG problem. Our results reveal how the SRNG problem with
the -divergence relate to the -fixed-length source coding
problem. We also apply our results to the rate distortion perception (RDP)
function. As a result, we can establish a lower bound for the RDP function with
respect to -divergences using our findings. Finally, we discuss the
representation of the optimum SRNG rate using the smooth R\'enyi entropy
Information-based complexity, feedback and dynamics in convex programming
We study the intrinsic limitations of sequential convex optimization through
the lens of feedback information theory. In the oracle model of optimization,
an algorithm queries an {\em oracle} for noisy information about the unknown
objective function, and the goal is to (approximately) minimize every function
in a given class using as few queries as possible. We show that, in order for a
function to be optimized, the algorithm must be able to accumulate enough
information about the objective. This, in turn, puts limits on the speed of
optimization under specific assumptions on the oracle and the type of feedback.
Our techniques are akin to the ones used in statistical literature to obtain
minimax lower bounds on the risks of estimation procedures; the notable
difference is that, unlike in the case of i.i.d. data, a sequential
optimization algorithm can gather observations in a {\em controlled} manner, so
that the amount of information at each step is allowed to change in time. In
particular, we show that optimization algorithms often obey the law of
diminishing returns: the signal-to-noise ratio drops as the optimization
algorithm approaches the optimum. To underscore the generality of the tools, we
use our approach to derive fundamental lower bounds for a certain active
learning problem. Overall, the present work connects the intuitive notions of
information in optimization, experimental design, estimation, and active
learning to the quantitative notion of Shannon information.Comment: final version; to appear in IEEE Transactions on Information Theor
Adaptive sensing performance lower bounds for sparse signal detection and support estimation
This paper gives a precise characterization of the fundamental limits of
adaptive sensing for diverse estimation and testing problems concerning sparse
signals. We consider in particular the setting introduced in (IEEE Trans.
Inform. Theory 57 (2011) 6222-6235) and show necessary conditions on the
minimum signal magnitude for both detection and estimation: if is a sparse vector with non-zero components then it
can be reliably detected in noise provided the magnitude of the non-zero
components exceeds . Furthermore, the signal support can be exactly
identified provided the minimum magnitude exceeds . Notably
there is no dependence on , the extrinsic signal dimension. These results
show that the adaptive sensing methodologies proposed previously in the
literature are essentially optimal, and cannot be substantially improved. In
addition, these results provide further insights on the limits of adaptive
compressive sensing.Comment: Published in at http://dx.doi.org/10.3150/13-BEJ555 the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
Replicability in Reinforcement Learning
We initiate the mathematical study of replicability as an algorithmic
property in the context of reinforcement learning (RL). We focus on the
fundamental setting of discounted tabular MDPs with access to a generative
model. Inspired by Impagliazzo et al. [2022], we say that an RL algorithm is
replicable if, with high probability, it outputs the exact same policy after
two executions on i.i.d. samples drawn from the generator when its internal
randomness is the same. We first provide an efficient -replicable
algorithm for -optimal policy estimation with sample and
time complexity ,
where is the number of state-action pairs. Next, for the subclass of
deterministic algorithms, we provide a lower bound of order
.
Then, we study a relaxed version of replicability proposed by Kalavasis et al.
[2023] called TV indistinguishability. We design a computationally efficient TV
indistinguishable algorithm for policy estimation whose sample complexity is
.
At the cost of running time, we transform these TV indistinguishable
algorithms to -replicable ones without increasing their sample
complexity. Finally, we introduce the notion of approximate-replicability where
we only require that two outputted policies are close under an appropriate
statistical divergence (e.g., Renyi) and show an improved sample complexity of
.Comment: to be published in neurips 202
Linear stochastic dynamics with nonlinear fractal properties
Stochastic processes with multiplicative noise have been studied
independently in several different contexts over the past decades. We focus on
the regime, found for a generic set of control parameters, in which stochastic
processes with multiplicative noise produce intermittency of a special kind,
characterized by a power law probability density distribution. We present a
review of applications on population dynamics, epidemics, finance and insurance
applications with relation to ARCH(1) process, immigration and investment
portfolios and the internet. We highlight the common physical mechanism and
summarize the main known results. The distribution and statistical properties
of the duration of intermittent bursts are also characterized in details.Comment: 26 pages, Physica A (in press
A Model for Prejudiced Learning in Noisy Environments
Based on the heuristics that maintaining presumptions can be beneficial in
uncertain environments, we propose a set of basic axioms for learning systems
to incorporate the concept of prejudice. The simplest, memoryless model of a
deterministic learning rule obeying the axioms is constructed, and shown to be
equivalent to the logistic map. The system's performance is analysed in an
environment in which it is subject to external randomness, weighing learning
defectiveness against stability gained. The corresponding random dynamical
system with inhomogeneous, additive noise is studied, and shown to exhibit the
phenomena of noise induced stability and stochastic bifurcations. The overall
results allow for the interpretation that prejudice in uncertain environments
entails a considerable portion of stubbornness as a secondary phenomenon.Comment: 21 pages, 11 figures; reduced graphics to slash size, full version on
Author's homepage. Minor revisions in text and references, identical to
version to be published in Applied Mathematics and Computatio
Is Pessimism Provably Efficient for Offline RL?
We study offline reinforcement learning (RL), which aims to learn an optimal
policy based on a dataset collected a priori. Due to the lack of further
interactions with the environment, offline RL suffers from the insufficient
coverage of the dataset, which eludes most existing theoretical analysis. In
this paper, we propose a pessimistic variant of the value iteration algorithm
(PEVI), which incorporates an uncertainty quantifier as the penalty function.
Such a penalty function simply flips the sign of the bonus function for
promoting exploration in online RL, which makes it easily implementable and
compatible with general function approximators.
Without assuming the sufficient coverage of the dataset, we establish a
data-dependent upper bound on the suboptimality of PEVI for general Markov
decision processes (MDPs). When specialized to linear MDPs, it matches the
information-theoretic lower bound up to multiplicative factors of the dimension
and horizon. In other words, pessimism is not only provably efficient but also
minimax optimal. In particular, given the dataset, the learned policy serves as
the ``best effort'' among all policies, as no other policies can do better. Our
theoretical analysis identifies the critical role of pessimism in eliminating a
notion of spurious correlation, which emerges from the ``irrelevant''
trajectories that are less covered by the dataset and not informative for the
optimal policy.Comment: 53 pages, 3 figure
- …