11,224 research outputs found
Controllability-Aware Unsupervised Skill Discovery
One of the key capabilities of intelligent agents is the ability to discover
useful skills without external supervision. However, the current unsupervised
skill discovery methods are often limited to acquiring simple, easy-to-learn
skills due to the lack of incentives to discover more complex, challenging
behaviors. We introduce a novel unsupervised skill discovery method,
Controllability-aware Skill Discovery (CSD), which actively seeks complex,
hard-to-control skills without supervision. The key component of CSD is a
controllability-aware distance function, which assigns larger values to state
transitions that are harder to achieve with the current skills. Combined with
distance-maximizing skill discovery, CSD progressively learns more challenging
skills over the course of training as our jointly trained distance function
reduces rewards for easy-to-achieve skills. Our experimental results in six
robotic manipulation and locomotion environments demonstrate that CSD can
discover diverse complex skills including object manipulation and locomotion
skills with no supervision, significantly outperforming prior unsupervised
skill discovery methods. Videos and code are available at
https://seohong.me/projects/csd/Comment: ICML 202
METRA: Scalable Unsupervised RL with Metric-Aware Abstraction
Unsupervised pre-training strategies have proven to be highly effective in
natural language processing and computer vision. Likewise, unsupervised
reinforcement learning (RL) holds the promise of discovering a variety of
potentially useful behaviors that can accelerate the learning of a wide array
of downstream tasks. Previous unsupervised RL approaches have mainly focused on
pure exploration and mutual information skill learning. However, despite the
previous attempts, making unsupervised RL truly scalable still remains a major
open challenge: pure exploration approaches might struggle in complex
environments with large state spaces, where covering every possible transition
is infeasible, and mutual information skill learning approaches might
completely fail to explore the environment due to the lack of incentives. To
make unsupervised RL scalable to complex, high-dimensional environments, we
propose a novel unsupervised RL objective, which we call Metric-Aware
Abstraction (METRA). Our main idea is, instead of directly covering the entire
state space, to only cover a compact latent space that is metrically
connected to the state space by temporal distances. By learning to move in
every direction in the latent space, METRA obtains a tractable set of diverse
behaviors that approximately cover the state space, being scalable to
high-dimensional environments. Through our experiments in five locomotion and
manipulation environments, we demonstrate that METRA can discover a variety of
useful behaviors even in complex, pixel-based environments, being the first
unsupervised RL method that discovers diverse locomotion behaviors in
pixel-based Quadruped and Humanoid. Our code and videos are available at
https://seohong.me/projects/metra
Object-Oriented Dynamics Learning through Multi-Level Abstraction
Object-based approaches for learning action-conditioned dynamics has
demonstrated promise for generalization and interpretability. However, existing
approaches suffer from structural limitations and optimization difficulties for
common environments with multiple dynamic objects. In this paper, we present a
novel self-supervised learning framework, called Multi-level Abstraction
Object-oriented Predictor (MAOP), which employs a three-level learning
architecture that enables efficient object-based dynamics learning from raw
visual observations. We also design a spatial-temporal relational reasoning
mechanism for MAOP to support instance-level dynamics learning and handle
partial observability. Our results show that MAOP significantly outperforms
previous methods in terms of sample efficiency and generalization over novel
environments for learning environment models. We also demonstrate that learned
dynamics models enable efficient planning in unseen environments, comparable to
true environment models. In addition, MAOP learns semantically and visually
interpretable disentangled representations.Comment: Accepted to the Thirthy-Fourth AAAI Conference On Artificial
Intelligence (AAAI), 202
Diverse Offline Imitation via Fenchel Duality
There has been significant recent progress in the area of unsupervised skill
discovery, with various works proposing mutual information based objectives, as
a source of intrinsic motivation. Prior works predominantly focused on
designing algorithms that require online access to the environment. In
contrast, we develop an \textit{offline} skill discovery algorithm. Our problem
formulation considers the maximization of a mutual information objective
constrained by a KL-divergence. More precisely, the constraints ensure that the
state occupancy of each skill remains close to the state occupancy of an
expert, within the support of an offline dataset with good state-action
coverage. Our main contribution is to connect Fenchel duality, reinforcement
learning and unsupervised skill discovery, and to give a simple offline
algorithm for learning diverse skills that are aligned with an expert
Efficient Candidate Screening Under Multiple Tests and Implications for Fairness
When recruiting job candidates, employers rarely observe their underlying
skill level directly. Instead, they must administer a series of interviews
and/or collate other noisy signals in order to estimate the worker's skill.
Traditional economics papers address screening models where employers access
worker skill via a single noisy signal. In this paper, we extend this
theoretical analysis to a multi-test setting, considering both Bernoulli and
Gaussian models. We analyze the optimal employer policy both when the employer
sets a fixed number of tests per candidate and when the employer can set a
dynamic policy, assigning further tests adaptively based on results from the
previous tests. To start, we characterize the optimal policy when employees
constitute a single group, demonstrating some interesting trade-offs.
Subsequently, we address the multi-group setting, demonstrating that when the
noise levels vary across groups, a fundamental impossibility emerges whereby we
cannot administer the same number of tests, subject candidates to the same
decision rule, and yet realize the same outcomes in both groups
Variational Curriculum Reinforcement Learning for Unsupervised Discovery of Skills
Mutual information-based reinforcement learning (RL) has been proposed as a
promising framework for retrieving complex skills autonomously without a
task-oriented reward function through mutual information (MI) maximization or
variational empowerment. However, learning complex skills is still challenging,
due to the fact that the order of training skills can largely affect sample
efficiency. Inspired by this, we recast variational empowerment as curriculum
learning in goal-conditioned RL with an intrinsic reward function, which we
name Variational Curriculum RL (VCRL). From this perspective, we propose a
novel approach to unsupervised skill discovery based on information theory,
called Value Uncertainty Variational Curriculum (VUVC). We prove that, under
regularity conditions, VUVC accelerates the increase of entropy in the visited
states compared to the uniform curriculum. We validate the effectiveness of our
approach on complex navigation and robotic manipulation tasks in terms of
sample efficiency and state coverage speed. We also demonstrate that the skills
discovered by our method successfully complete a real-world robot navigation
task in a zero-shot setup and that incorporating these skills with a global
planner further increases the performance.Comment: ICML 2023. First two authors contributed equally. Code at
https://github.com/seongun-kim/vcr
- …