2,584 research outputs found
Recommended from our members
Myopic policies for budgeted optimization with constrained experiments
Motivated by a real-world problem, we study a novel setting for budgeted optimization where the goal is to optimize an unknown function f(x) given a budget. In our setting, it is not practical to request samples of f(x) at precise input values due to the formidable cost of experimental setup at precise values. Rather, we may request constrained experiments, which give the experimenter constraints on x for which they must return f(x). Importantly, as the constraints become looser, the experimental cost decreases, but the uncertainty about the location of the next observation increases. Our problem is to manage this trade-off by selecting a sequence of constrained experiments to best optimize f within the budget. We propose a number of myopic policies for selecting constrained experiments using both model-free and model-based approaches, inspired by policies for unconstrained settings. Experiments on synthetic and real-world functions indicate that our policies outperform random selection, that the model-based policies are superior to model-free ones, and give insights into which policies are preferable overall.Graduation date: 2008Keywords: Machine Learning, Budgeted Learnin
Budgeted Reinforcement Learning in Continuous State Space
A Budgeted Markov Decision Process (BMDP) is an extension of a Markov
Decision Process to critical applications requiring safety constraints. It
relies on a notion of risk implemented in the shape of a cost signal
constrained to lie below an - adjustable - threshold. So far, BMDPs could only
be solved in the case of finite state spaces with known dynamics. This work
extends the state-of-the-art to continuous spaces environments and unknown
dynamics. We show that the solution to a BMDP is a fixed point of a novel
Budgeted Bellman Optimality operator. This observation allows us to introduce
natural extensions of Deep Reinforcement Learning algorithms to address
large-scale BMDPs. We validate our approach on two simulated applications:
spoken dialogue and autonomous driving.Comment: N. Carrara and E. Leurent have equally contribute
- …