This work investigates Monte-Carlo planning for agents in stochastic
environments, with multiple objectives. We propose the Convex Hull Monte-Carlo
Tree-Search (CHMCTS) framework, which builds upon Trial Based Heuristic Tree
Search and Convex Hull Value Iteration (CHVI), as a solution to multi-objective
planning in large environments. Moreover, we consider how to pose the problem
of approximating multiobjective planning solutions as a contextual multi-armed
bandits problem, giving a principled motivation for how to select actions from
the view of contextual regret. This leads us to the use of Contextual Zooming
for action selection, yielding Zooming CHMCTS. We evaluate our algorithm using
the Generalised Deep Sea Treasure environment, demonstrating that Zooming
CHMCTS can achieve a sublinear contextual regret and scales better than CHVI on
a given computational budget.Comment: Camera-ready version of paper accepted to ICAPS 2020, along with
relevant appendice