1 research outputs found
Marginal Utility for Planning in Continuous or Large Discrete Action Spaces
Sample-based planning is a powerful family of algorithms for generating
intelligent behavior from a model of the environment. Generating good candidate
actions is critical to the success of sample-based planners, particularly in
continuous or large action spaces. Typically, candidate action generation
exhausts the action space, uses domain knowledge, or more recently, involves
learning a stochastic policy to provide such search guidance. In this paper we
explore explicitly learning a candidate action generator by optimizing a novel
objective, marginal utility. The marginal utility of an action generator
measures the increase in value of an action over previously generated actions.
We validate our approach in both curling, a challenging stochastic domain with
continuous state and action spaces, and a location game with a discrete but
large action space. We show that a generator trained with the marginal utility
objective outperforms hand-coded schemes built on substantial domain knowledge,
trained stochastic policies, and other natural objectives for generating
actions for sampled-based planners