717 research outputs found

    Optimum Self Random Number Generation Rate and Its Application to Rate Distortion Perception Function

    Full text link
    The self-random number generation (SRNG) problem is considered for general setting. In the literature, the optimum SRNG rate with respect to the variational distance has been discussed. In this paper, we first try to characterize the optimum SRNG rate with respect to a subclass of ff-divergences. The subclass of ff-divergences considered in this paper includes typical distance measures such as the variational distance, the KL divergence, the Hellinger distance and so on. Hence our result can be considered as a generalization of the previous result with respect to the variational distance. Next, we consider the obtained optimum SRNG rate from several viewpoints. The ε\varepsilon-source coding problem is one of related problems with the SRNG problem. Our results reveal how the SRNG problem with the ff-divergence relate to the ε\varepsilon-fixed-length source coding problem. We also apply our results to the rate distortion perception (RDP) function. As a result, we can establish a lower bound for the RDP function with respect to ff-divergences using our findings. Finally, we discuss the representation of the optimum SRNG rate using the smooth R\'enyi entropy

    Information-based complexity, feedback and dynamics in convex programming

    Get PDF
    We study the intrinsic limitations of sequential convex optimization through the lens of feedback information theory. In the oracle model of optimization, an algorithm queries an {\em oracle} for noisy information about the unknown objective function, and the goal is to (approximately) minimize every function in a given class using as few queries as possible. We show that, in order for a function to be optimized, the algorithm must be able to accumulate enough information about the objective. This, in turn, puts limits on the speed of optimization under specific assumptions on the oracle and the type of feedback. Our techniques are akin to the ones used in statistical literature to obtain minimax lower bounds on the risks of estimation procedures; the notable difference is that, unlike in the case of i.i.d. data, a sequential optimization algorithm can gather observations in a {\em controlled} manner, so that the amount of information at each step is allowed to change in time. In particular, we show that optimization algorithms often obey the law of diminishing returns: the signal-to-noise ratio drops as the optimization algorithm approaches the optimum. To underscore the generality of the tools, we use our approach to derive fundamental lower bounds for a certain active learning problem. Overall, the present work connects the intuitive notions of information in optimization, experimental design, estimation, and active learning to the quantitative notion of Shannon information.Comment: final version; to appear in IEEE Transactions on Information Theor

    Adaptive sensing performance lower bounds for sparse signal detection and support estimation

    Get PDF
    This paper gives a precise characterization of the fundamental limits of adaptive sensing for diverse estimation and testing problems concerning sparse signals. We consider in particular the setting introduced in (IEEE Trans. Inform. Theory 57 (2011) 6222-6235) and show necessary conditions on the minimum signal magnitude for both detection and estimation: if xRn{\mathbf {x}}\in \mathbb{R}^n is a sparse vector with ss non-zero components then it can be reliably detected in noise provided the magnitude of the non-zero components exceeds 2/s\sqrt{2/s}. Furthermore, the signal support can be exactly identified provided the minimum magnitude exceeds 2logs\sqrt{2\log s}. Notably there is no dependence on nn, the extrinsic signal dimension. These results show that the adaptive sensing methodologies proposed previously in the literature are essentially optimal, and cannot be substantially improved. In addition, these results provide further insights on the limits of adaptive compressive sensing.Comment: Published in at http://dx.doi.org/10.3150/13-BEJ555 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

    Replicability in Reinforcement Learning

    Full text link
    We initiate the mathematical study of replicability as an algorithmic property in the context of reinforcement learning (RL). We focus on the fundamental setting of discounted tabular MDPs with access to a generative model. Inspired by Impagliazzo et al. [2022], we say that an RL algorithm is replicable if, with high probability, it outputs the exact same policy after two executions on i.i.d. samples drawn from the generator when its internal randomness is the same. We first provide an efficient ρ\rho-replicable algorithm for (ε,δ)(\varepsilon, \delta)-optimal policy estimation with sample and time complexity O~(N3log(1/δ)(1γ)5ε2ρ2)\widetilde O\left(\frac{N^3\cdot\log(1/\delta)}{(1-\gamma)^5\cdot\varepsilon^2\cdot\rho^2}\right), where NN is the number of state-action pairs. Next, for the subclass of deterministic algorithms, we provide a lower bound of order Ω(N3(1γ)3ε2ρ2)\Omega\left(\frac{N^3}{(1-\gamma)^3\cdot\varepsilon^2\cdot\rho^2}\right). Then, we study a relaxed version of replicability proposed by Kalavasis et al. [2023] called TV indistinguishability. We design a computationally efficient TV indistinguishable algorithm for policy estimation whose sample complexity is O~(N2log(1/δ)(1γ)5ε2ρ2)\widetilde O\left(\frac{N^2\cdot\log(1/\delta)}{(1-\gamma)^5\cdot\varepsilon^2\cdot\rho^2}\right). At the cost of exp(N)\exp(N) running time, we transform these TV indistinguishable algorithms to ρ\rho-replicable ones without increasing their sample complexity. Finally, we introduce the notion of approximate-replicability where we only require that two outputted policies are close under an appropriate statistical divergence (e.g., Renyi) and show an improved sample complexity of O~(Nlog(1/δ)(1γ)5ε2ρ2)\widetilde O\left(\frac{N\cdot\log(1/\delta)}{(1-\gamma)^5\cdot\varepsilon^2\cdot\rho^2}\right).Comment: to be published in neurips 202

    Linear stochastic dynamics with nonlinear fractal properties

    Full text link
    Stochastic processes with multiplicative noise have been studied independently in several different contexts over the past decades. We focus on the regime, found for a generic set of control parameters, in which stochastic processes with multiplicative noise produce intermittency of a special kind, characterized by a power law probability density distribution. We present a review of applications on population dynamics, epidemics, finance and insurance applications with relation to ARCH(1) process, immigration and investment portfolios and the internet. We highlight the common physical mechanism and summarize the main known results. The distribution and statistical properties of the duration of intermittent bursts are also characterized in details.Comment: 26 pages, Physica A (in press

    A Model for Prejudiced Learning in Noisy Environments

    Full text link
    Based on the heuristics that maintaining presumptions can be beneficial in uncertain environments, we propose a set of basic axioms for learning systems to incorporate the concept of prejudice. The simplest, memoryless model of a deterministic learning rule obeying the axioms is constructed, and shown to be equivalent to the logistic map. The system's performance is analysed in an environment in which it is subject to external randomness, weighing learning defectiveness against stability gained. The corresponding random dynamical system with inhomogeneous, additive noise is studied, and shown to exhibit the phenomena of noise induced stability and stochastic bifurcations. The overall results allow for the interpretation that prejudice in uncertain environments entails a considerable portion of stubbornness as a secondary phenomenon.Comment: 21 pages, 11 figures; reduced graphics to slash size, full version on Author's homepage. Minor revisions in text and references, identical to version to be published in Applied Mathematics and Computatio

    Is Pessimism Provably Efficient for Offline RL?

    Full text link
    We study offline reinforcement learning (RL), which aims to learn an optimal policy based on a dataset collected a priori. Due to the lack of further interactions with the environment, offline RL suffers from the insufficient coverage of the dataset, which eludes most existing theoretical analysis. In this paper, we propose a pessimistic variant of the value iteration algorithm (PEVI), which incorporates an uncertainty quantifier as the penalty function. Such a penalty function simply flips the sign of the bonus function for promoting exploration in online RL, which makes it easily implementable and compatible with general function approximators. Without assuming the sufficient coverage of the dataset, we establish a data-dependent upper bound on the suboptimality of PEVI for general Markov decision processes (MDPs). When specialized to linear MDPs, it matches the information-theoretic lower bound up to multiplicative factors of the dimension and horizon. In other words, pessimism is not only provably efficient but also minimax optimal. In particular, given the dataset, the learned policy serves as the ``best effort'' among all policies, as no other policies can do better. Our theoretical analysis identifies the critical role of pessimism in eliminating a notion of spurious correlation, which emerges from the ``irrelevant'' trajectories that are less covered by the dataset and not informative for the optimal policy.Comment: 53 pages, 3 figure
    corecore