60 research outputs found
A Unified Algorithm Framework for Unsupervised Discovery of Skills based on Determinantal Point Process
Learning rich skills through temporal abstractions without supervision of
external rewards is at the frontier of Reinforcement Learning research.
Existing works mainly fall into two distinctive categories: variational and
Laplacian-based skill (a.k.a., option) discovery. The former maximizes the
diversity of the discovered options through a mutual information loss but
overlooks coverage of the state space, while the latter focuses on improving
the coverage of options by increasing connectivity during exploration, but does
not consider diversity. In this paper, we propose a unified framework that
quantifies diversity and coverage through a novel use of the Determinantal
Point Process (DPP) and enables unsupervised option discovery explicitly
optimizing both objectives. Specifically, we define the DPP kernel matrix with
the Laplacian spectrum of the state transition graph and use the expected mode
number in the trajectories as the objective to capture and enhance both
diversity and coverage of the learned options. The proposed option discovery
algorithm is extensively evaluated using challenging tasks built with Mujoco
and Atari, demonstrating that our proposed algorithm substantially outperforms
SOTA baselines from both diversity- and coverage-driven categories. The codes
are available at https://github.com/LucasCJYSDL/ODPP
LookOut: Diverse Multi-Future Prediction and Planning for Self-Driving
Self-driving vehicles need to anticipate a diverse set of future traffic
scenarios in order to safely share the road with other traffic participants
that may exhibit rare but dangerous driving. In this paper, we present LookOut,
an approach to jointly perceive the environment and predict a diverse set of
futures from sensor data, estimate their probability, and optimize a
contingency plan over these diverse future realizations. In particular, we
learn a diverse joint distribution over multi-agent future trajectories in a
traffic scene that allows us to cover a wide range of future modes with high
sample efficiency while leveraging the expressive power of generative models.
Unlike previous work in diverse motion forecasting, our diversity objective
explicitly rewards sampling future scenarios that require distinct reactions
from the self-driving vehicle for improved safety. Our contingency planner then
finds comfortable trajectories that ensure safe reactions to a wide range of
future scenarios. Through extensive evaluations, we show that our model
demonstrates significantly more diverse and sample-efficient motion forecasting
in a large-scale self-driving dataset as well as safer and more comfortable
motion plans in long-term closed-loop simulations than current state-of-the-art
models
- …