272 research outputs found
Provable Representation Learning for Imitation Learning via Bi-level Optimization
A common strategy in modern learning systems is to learn a representation
that is useful for many tasks, a.k.a. representation learning. We study this
strategy in the imitation learning setting for Markov decision processes (MDPs)
where multiple experts' trajectories are available. We formulate representation
learning as a bi-level optimization problem where the "outer" optimization
tries to learn the joint representation and the "inner" optimization encodes
the imitation learning setup and tries to learn task-specific parameters. We
instantiate this framework for the imitation learning settings of behavior
cloning and observation-alone. Theoretically, we show using our framework that
representation learning can provide sample complexity benefits for imitation
learning in both settings. We also provide proof-of-concept experiments to
verify our theory.Comment: 26 page
Visual Chunking: A List Prediction Framework for Region-Based Object Detection
We consider detecting objects in an image by iteratively selecting from a set
of arbitrarily shaped candidate regions. Our generic approach, which we term
visual chunking, reasons about the locations of multiple object instances in an
image while expressively describing object boundaries. We design an
optimization criterion for measuring the performance of a list of such
detections as a natural extension to a common per-instance metric. We present
an efficient algorithm with provable performance for building a high-quality
list of detections from any candidate set of region-based proposals. We also
develop a simple class-specific algorithm to generate a candidate region
instance in near-linear time in the number of low-level superpixels that
outperforms other region generating methods. In order to make predictions on
novel images at testing time without access to ground truth, we develop
learning approaches to emulate these algorithms' behaviors. We demonstrate that
our new approach outperforms sophisticated baselines on benchmark datasets.Comment: to appear at ICRA 201
Making Linear MDPs Practical via Contrastive Representation Learning
It is common to address the curse of dimensionality in Markov decision
processes (MDPs) by exploiting low-rank representations. This motivates much of
the recent theoretical study on linear MDPs. However, most approaches require a
given representation under unrealistic assumptions about the normalization of
the decomposition or introduce unresolved computational challenges in practice.
Instead, we consider an alternative definition of linear MDPs that
automatically ensures normalization while allowing efficient representation
learning via contrastive estimation. The framework also admits
confidence-adjusted index algorithms, enabling an efficient and principled
approach to incorporating optimism or pessimism in the face of uncertainty. To
the best of our knowledge, this provides the first practical representation
learning method for linear MDPs that achieves both strong theoretical
guarantees and empirical performance. Theoretically, we prove that the proposed
algorithm is sample efficient in both the online and offline settings.
Empirically, we demonstrate superior performance over existing state-of-the-art
model-based and model-free algorithms on several benchmarks.Comment: ICML 2022. The first two authors contribute equall
Behavior Prior Representation learning for Offline Reinforcement Learning
Offline reinforcement learning (RL) struggles in environments with rich and
noisy inputs, where the agent only has access to a fixed dataset without
environment interactions. Past works have proposed common workarounds based on
the pre-training of state representations, followed by policy training. In this
work, we introduce a simple, yet effective approach for learning state
representations. Our method, Behavior Prior Representation (BPR), learns state
representations with an easy-to-integrate objective based on behavior cloning
of the dataset: we first learn a state representation by mimicking actions from
the dataset, and then train a policy on top of the fixed representation, using
any off-the-shelf Offline RL algorithm. Theoretically, we prove that BPR
carries out performance guarantees when integrated into algorithms that have
either policy improvement guarantees (conservative algorithms) or produce lower
bounds of the policy values (pessimistic algorithms). Empirically, we show that
BPR combined with existing state-of-the-art Offline RL algorithms leads to
significant improvements across several offline control benchmarks
BiERL: A Meta Evolutionary Reinforcement Learning Framework via Bilevel Optimization
Evolutionary reinforcement learning (ERL) algorithms recently raise attention
in tackling complex reinforcement learning (RL) problems due to high
parallelism, while they are prone to insufficient exploration or model collapse
without carefully tuning hyperparameters (aka meta-parameters). In the paper,
we propose a general meta ERL framework via bilevel optimization (BiERL) to
jointly update hyperparameters in parallel to training the ERL model within a
single agent, which relieves the need for prior domain knowledge or costly
optimization procedure before model deployment. We design an elegant meta-level
architecture that embeds the inner-level's evolving experience into an
informative population representation and introduce a simple and feasible
evaluation of the meta-level fitness function to facilitate learning
efficiency. We perform extensive experiments in MuJoCo and Box2D tasks to
verify that as a general framework, BiERL outperforms various baselines and
consistently improves the learning performance for a diversity of ERL
algorithms.Comment: Published as a conference paper at European Conference on Artificial
Intelligence (ECAI) 202
Batch Policy Learning under Constraints
When learning policies for real-world domains, two important questions arise:
(i) how to efficiently use pre-collected off-policy, non-optimal behavior data;
and (ii) how to mediate among different competing objectives and constraints.
We thus study the problem of batch policy learning under multiple constraints,
and offer a systematic solution. We first propose a flexible meta-algorithm
that admits any batch reinforcement learning and online learning procedure as
subroutines. We then present a specific algorithmic instantiation and provide
performance guarantees for the main objective and all constraints. To certify
constraint satisfaction, we propose a new and simple method for off-policy
policy evaluation (OPE) and derive PAC-style bounds. Our algorithm achieves
strong empirical results in different domains, including in a challenging
problem of simulated car driving subject to multiple constraints such as lane
keeping and smooth driving. We also show experimentally that our OPE method
outperforms other popular OPE techniques on a standalone basis, especially in a
high-dimensional setting
- …