22 research outputs found
Mega-Reward: Achieving Human-Level Play without Extrinsic Rewards
Intrinsic rewards were introduced to simulate how human intelligence works;
they are usually evaluated by intrinsically-motivated play, i.e., playing games
without extrinsic rewards but evaluated with extrinsic rewards. However, none
of the existing intrinsic reward approaches can achieve human-level performance
under this very challenging setting of intrinsically-motivated play. In this
work, we propose a novel megalomania-driven intrinsic reward (called
mega-reward), which, to our knowledge, is the first approach that achieves
human-level performance in intrinsically-motivated play. Intuitively,
mega-reward comes from the observation that infants' intelligence develops when
they try to gain more control on entities in an environment; therefore,
mega-reward aims to maximize the control capabilities of agents on given
entities in a given environment. To formalize mega-reward, a relational
transition model is proposed to bridge the gaps between direct and latent
control. Experimental studies show that mega-reward (i) can greatly outperform
all state-of-the-art intrinsic reward approaches, (ii) generally achieves the
same level of performance as Ex-PPO and professional human-level scores, and
(iii) has also a superior performance when it is incorporated with extrinsic
rewards
Object-Oriented Dynamics Learning through Multi-Level Abstraction
Object-based approaches for learning action-conditioned dynamics has
demonstrated promise for generalization and interpretability. However, existing
approaches suffer from structural limitations and optimization difficulties for
common environments with multiple dynamic objects. In this paper, we present a
novel self-supervised learning framework, called Multi-level Abstraction
Object-oriented Predictor (MAOP), which employs a three-level learning
architecture that enables efficient object-based dynamics learning from raw
visual observations. We also design a spatial-temporal relational reasoning
mechanism for MAOP to support instance-level dynamics learning and handle
partial observability. Our results show that MAOP significantly outperforms
previous methods in terms of sample efficiency and generalization over novel
environments for learning environment models. We also demonstrate that learned
dynamics models enable efficient planning in unseen environments, comparable to
true environment models. In addition, MAOP learns semantically and visually
interpretable disentangled representations.Comment: Accepted to the Thirthy-Fourth AAAI Conference On Artificial
Intelligence (AAAI), 202
ELDEN: Exploration via Local Dependencies
Tasks with large state space and sparse rewards present a longstanding
challenge to reinforcement learning. In these tasks, an agent needs to explore
the state space efficiently until it finds a reward. To deal with this problem,
the community has proposed to augment the reward function with intrinsic
reward, a bonus signal that encourages the agent to visit interesting states.
In this work, we propose a new way of defining interesting states for
environments with factored state spaces and complex chained dependencies, where
an agent's actions may change the value of one entity that, in order, may
affect the value of another entity. Our insight is that, in these environments,
interesting states for exploration are states where the agent is uncertain
whether (as opposed to how) entities such as the agent or objects have some
influence on each other. We present ELDEN, Exploration via Local DepENdencies,
a novel intrinsic reward that encourages the discovery of new interactions
between entities. ELDEN utilizes a novel scheme -- the partial derivative of
the learned dynamics to model the local dependencies between entities
accurately and computationally efficiently. The uncertainty of the predicted
dependencies is then used as an intrinsic reward to encourage exploration
toward new interactions. We evaluate the performance of ELDEN on four different
domains with complex dependencies, ranging from 2D grid worlds to 3D robotic
tasks. In all domains, ELDEN correctly identifies local dependencies and learns
successful policies, significantly outperforming previous state-of-the-art
exploration methods.Comment: Accepted to NeurIPS 202