13,004 research outputs found
Hierarchical Imitation and Reinforcement Learning
We study how to effectively leverage expert feedback to learn sequential
decision-making policies. We focus on problems with sparse rewards and long
time horizons, which typically pose significant challenges in reinforcement
learning. We propose an algorithmic framework, called hierarchical guidance,
that leverages the hierarchical structure of the underlying problem to
integrate different modes of expert interaction. Our framework can incorporate
different combinations of imitation learning (IL) and reinforcement learning
(RL) at different levels, leading to dramatic reductions in both expert effort
and cost of exploration. Using long-horizon benchmarks, including Montezuma's
Revenge, we demonstrate that our approach can learn significantly faster than
hierarchical RL, and be significantly more label-efficient than standard IL. We
also theoretically analyze labeling cost for certain instantiations of our
framework.Comment: Proceedings of the 35th International Conference on Machine Learning
(ICML 2018
Batch Policy Learning under Constraints
When learning policies for real-world domains, two important questions arise:
(i) how to efficiently use pre-collected off-policy, non-optimal behavior data;
and (ii) how to mediate among different competing objectives and constraints.
We thus study the problem of batch policy learning under multiple constraints,
and offer a systematic solution. We first propose a flexible meta-algorithm
that admits any batch reinforcement learning and online learning procedure as
subroutines. We then present a specific algorithmic instantiation and provide
performance guarantees for the main objective and all constraints. To certify
constraint satisfaction, we propose a new and simple method for off-policy
policy evaluation (OPE) and derive PAC-style bounds. Our algorithm achieves
strong empirical results in different domains, including in a challenging
problem of simulated car driving subject to multiple constraints such as lane
keeping and smooth driving. We also show experimentally that our OPE method
outperforms other popular OPE techniques on a standalone basis, especially in a
high-dimensional setting
A General Large Neighborhood Search Framework for Solving Integer Programs
This paper studies how to design abstractions of large-scale combinatorial optimization problems that can leverage existing state-of-the-art solvers in general purpose ways, and that are amenable to data-driven design. The goal is to arrive at new approaches that can reliably outperform existing solvers in wall-clock time. We focus on solving integer programs, and ground our approach in the large neighborhood search (LNS) paradigm, which iteratively chooses a subset of variables to optimize while leaving the remainder fixed. The appeal of LNS is that it can easily use any existing solver as a subroutine, and thus can inherit the benefits of carefully engineered heuristic approaches and their software implementations. We also show that one can learn a good neighborhood selector from training data. Through an extensive empirical validation, we demonstrate that our LNS framework can significantly outperform, in wall-clock time, compared to state-of-the-art commercial solvers such as Gurobi
- …