Search CORE

2,584 research outputs found

Recommended from our members

Myopic policies for budgeted optimization with constrained experiments

Author: Lin Wei
Publication venue: 'Oregon State University'
Publication date
Field of study

Motivated by a real-world problem, we study a novel setting for budgeted optimization where the goal is to optimize an unknown function f(x) given a budget. In our setting, it is not practical to request samples of f(x) at precise input values due to the formidable cost of experimental setup at precise values. Rather, we may request constrained experiments, which give the experimenter constraints on x for which they must return f(x). Importantly, as the constraints become looser, the experimental cost decreases, but the uncertainty about the location of the next observation increases. Our problem is to manage this trade-off by selecting a sequence of constrained experiments to best optimize f within the budget. We propose a number of myopic policies for selecting constrained experiments using both model-free and model-based approaches, inspired by policies for unconstrained settings. Experiments on synthetic and real-world functions indicate that our policies outperform random selection, that the model-based policies are superior to model-free ones, and give insights into which policies are preferable overall.Graduation date: 2008Keywords: Machine Learning, Budgeted Learnin

ScholarsArchive@OSU

Budgeted Reinforcement Learning in Continuous State Space

Author: Carrara Nicolas
Laroche Romain
Leurent Edouard
Maillard Odalric-Ambrym
Pietquin Olivier
Urvoy Tanguy
Publication venue
Publication date: 27/05/2019
Field of study

A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of a cost signal constrained to lie below an - adjustable - threshold. So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is a fixed point of a novel Budgeted Bellman Optimality operator. This observation allows us to introduce natural extensions of Deep Reinforcement Learning algorithms to address large-scale BMDPs. We validate our approach on two simulated applications: spoken dialogue and autonomous driving.Comment: N. Carrara and E. Leurent have equally contribute

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot