Search CORE

51,540 research outputs found

Risk and optimal policies in bandit experiments

Author: Adusumilli Karun
Publication venue
Publication date: 12/12/2021
Field of study

This paper provides a decision theoretic analysis of bandit experiments. The bandit setting corresponds to a dynamic programming problem, but solving this directly is typically infeasible. Working within the framework of diffusion asymptotics, we define a suitable notion of asymptotic Bayes risk for bandit settings. For normally distributed rewards, the minimal Bayes risk can be characterized as the solution to a nonlinear second-order partial differential equation (PDE). Using a limit of experiments approach, we show that this PDE characterization also holds asymptotically under both parametric and non-parametric distribution of the rewards. The approach further describes the state variables it is asymptotically sufficient to restrict attention to, and therefore suggests a practical strategy for dimension reduction. The upshot is that we can approximate the dynamic programming problem defining the bandit setting with a PDE which can be efficiently solved using sparse matrix routines. We derive near-optimal policies from the numerical solutions to these equations. The proposed policies substantially dominate existing methods such Thompson sampling. The framework also allows for substantial generalizations to the bandit problem such as time discounting and pure exploration motives

arXiv.org e-Print Archive

Shape-constrained Estimation of Value Functions

Author: Glynn Peter W.
Mousavi Mohammad
Publication venue
Publication date: 01/01/2013
Field of study

We present a fully nonparametric method to estimate the value function, via simulation, in the context of expected infinite-horizon discounted rewards for Markov chains. Estimating such value functions plays an important role in approximate dynamic programming and applied probability in general. We incorporate "soft information" into the estimation algorithm, such as knowledge of convexity, monotonicity, or Lipchitz constants. In the presence of such information, a nonparametric estimator for the value function can be computed that is provably consistent as the simulated time horizon tends to infinity. As an application, we implement our method on price tolling agreement contracts in energy markets

arXiv.org e-Print Archive

CiteSeerX

Beyond Biomass: Valuing Genetic Diversity in Natural Resource Management

Author: Baskett ML
Dedrick A
Faig A
Springborn MR
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

Strategies for increasing production of goods from working and natural systems have raised concerns that the diversity of species on which these services depend may be eroding. This loss of natural capital threatens to homogenize global food supplies and compromise the stability of human welfare. We assess the trade off between artificial augmentation of biomass and degradation of biodiversity underlying a populations' ability to adapt to shocks. Our application involves the augmentation of wild stocks of salmon. Practices in this system have generated warnings that genetic erosion may lead to a loss of the “portfolio effect” and the value of this loss is not accounted for in decision making. We construct an integrated bioeconomic model of salmon biomass and genetic diversity. Our results show how practices that homogenize natural systems can still generate positive returns. However, the substitution of more physical capital and labor for natural capital must be maintained for gains to persist, weakens the capacity for adaptation should this investment cease, and can cause substantial loss of population wildness. We apply an emerging optimization method—approximate dynamic programming—to solve the model without simplifying restrictions imposed previously

Crossref

eScholarship - University of California