Critic Sequential Monte Carlo

Dabiri, Setareh; Lavington, Jonathan Wilder; Lioutas, Vasileios; Liu, Yunpeng; Niedoba, Matthew; Scibior, Adam; Sefas, Justice; Wood, Frank; Zwartsenberg, Berend

Critic Sequential Monte Carlo

Authors: Setareh Dabiri
Jonathan Wilder Lavington
Vasileios Lioutas
Yunpeng Liu
Matthew Niedoba
Adam Scibior
Justice Sefas
Frank Wood
Berend Zwartsenberg
Publication date: 30 May 2022
Publisher

Abstract

We introduce CriticSMC, a new algorithm for planning as inference built from a novel composition of sequential Monte Carlo with learned soft-Q function heuristic factors. This algorithm is structured so as to allow using large numbers of putative particles leading to efficient utilization of computational resource and effective discovery of high reward trajectories even in environments with difficult reward surfaces such as those arising from hard constraints. Relative to prior art our approach is notably still compatible with model-free reinforcement learning in the sense that the implicit policy we produce can be used at test time in the absence of a world model. Our experiments on self-driving car collision avoidance in simulation demonstrate improvements against baselines in terms of infraction minimization relative to computational effort while maintaining diversity and realism of found trajectories.Comment: 20 pages, 3 figure

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2205.15460

Last time updated on 14/08/2022