1,710 research outputs found
Imitation-Projected Policy Gradient for Programmatic Reinforcement Learning
We study the problem of programmatic reinforcement learning, in which policies are represented as short programs in a symbolic language. Programmatic policies can be more interpretable, generalizable, and amenable to formal verification than neural policies; however, designing rigorous learning approaches for such policies remains a challenge. Our approach to this challenge - a meta-algorithm called PROPEL - is based on three insights. First, we view our learning task as optimization in policy space, modulo the constraint that the desired policy has a programmatic representation, and solve this optimization problem using a form of mirror descent that takes a gradient step into the unconstrained policy space and then projects back onto the constrained space. Second, we view the unconstrained policy space as mixing neural and programmatic representations, which enables employing state-of-the-art deep policy gradient approaches. Third, we cast the projection step as program synthesis via imitation learning, and exploit contemporary combinatorial methods for this task. We present theoretical convergence results for PROPEL and empirically evaluate the approach in three continuous control domains. The experiments show that PROPEL can significantly outperform state-of-the-art approaches for learning programmatic policies
Programmatic Imitation Learning from Unlabeled and Noisy Demonstrations
Imitation Learning (IL) is a promising paradigm for teaching robots to
perform novel tasks using demonstrations. Most existing approaches for IL
utilize neural networks (NN), however, these methods suffer from several
well-known limitations: they 1) require large amounts of training data, 2) are
hard to interpret, and 3) are hard to repair and adapt. There is an emerging
interest in programmatic imitation learning (PIL), which offers significant
promise in addressing the above limitations. In PIL, the learned policy is
represented in a programming language, making it amenable to interpretation and
repair. However, state-of-the-art PIL algorithms assume access to action labels
and struggle to learn from noisy real-world demonstrations. In this paper, we
propose PLUNDER, a novel PIL algorithm that integrates a probabilistic program
synthesizer in an iterative Expectation-Maximization (EM) framework to address
these shortcomings. Unlike existing PIL approaches, PLUNDER synthesizes
probabilistic programmatic policies that are particularly well-suited for
modeling the uncertainties inherent in real-world demonstrations. Our approach
leverages an EM loop to simultaneously infer the missing action labels and the
most likely probabilistic policy. We benchmark PLUNDER against several
established IL techniques, and demonstrate its superiority across five
challenging imitation learning tasks under noise. PLUNDER policies achieve 95%
accuracy in matching the given demonstrations, outperforming the next best
baseline by 19%. Additionally, policies generated by PLUNDER successfully
complete the tasks 17% more frequently than the nearest baseline
Neurosymbolic Reinforcement Learning and Planning: A Survey
The area of Neurosymbolic Artificial Intelligence (Neurosymbolic AI) is
rapidly developing and has become a popular research topic, encompassing
sub-fields such as Neurosymbolic Deep Learning (Neurosymbolic DL) and
Neurosymbolic Reinforcement Learning (Neurosymbolic RL). Compared to
traditional learning methods, Neurosymbolic AI offers significant advantages by
simplifying complexity and providing transparency and explainability.
Reinforcement Learning(RL), a long-standing Artificial Intelligence(AI) concept
that mimics human behavior using rewards and punishment, is a fundamental
component of Neurosymbolic RL, a recent integration of the two fields that has
yielded promising results. The aim of this paper is to contribute to the
emerging field of Neurosymbolic RL by conducting a literature survey. Our
evaluation focuses on the three components that constitute Neurosymbolic RL:
neural, symbolic, and RL. We categorize works based on the role played by the
neural and symbolic parts in RL, into three taxonomies:Learning for Reasoning,
Reasoning for Learning and Learning-Reasoning. These categories are further
divided into sub-categories based on their applications. Furthermore, we
analyze the RL components of each research work, including the state space,
action space, policy module, and RL algorithm. Additionally, we identify
research opportunities and challenges in various applications within this
dynamic field.Comment: 16 pages, 9 figures, IEEE Transactions on Artificial Intelligenc
From explanation to synthesis: Compositional program induction for learning from demonstration
Hybrid systems are a compact and natural mechanism with which to address
problems in robotics. This work introduces an approach to learning hybrid
systems from demonstrations, with an emphasis on extracting models that are
explicitly verifiable and easily interpreted by robot operators. We fit a
sequence of controllers using sequential importance sampling under a generative
switching proportional controller task model. Here, we parameterise controllers
using a proportional gain and a visually verifiable joint angle goal. Inference
under this model is challenging, but we address this by introducing an
attribution prior extracted from a neural end-to-end visuomotor control model.
Given the sequence of controllers comprising a task, we simplify the trace
using grammar parsing strategies, taking advantage of the sequence
compositionality, before grounding the controllers by training perception
networks to predict goals given images. Using this approach, we are
successfully able to induce a program for a visuomotor reaching task involving
loops and conditionals from a single demonstration and a neural end-to-end
model. In addition, we are able to discover the program used for a tower
building task. We argue that computer program-like control systems are more
interpretable than alternative end-to-end learning approaches, and that hybrid
systems inherently allow for better generalisation across task configurations
New Frameworks for Structured Policy Learning
Sequential decision making applications are playing an increasingly important role in everyday life. Research interest in machine learning approaches to sequential decision making has surged thanks to recent empirical successes of reinforcement learning and imitation learning techniques, partly fueled by recent advances in deep learning-based function approximation. However in many real-world sequential decision making applications, relying purely on black box policy learning is often insufficient, due to practical requirements of data efficiency, interpretability, safety guarantees, etc. These challenges collectively make it difficult for many existing policy learning methods to find success in realistic applications.
In this dissertation, we present recent advances in structured policy learning, which are new machine learning frameworks that integrate policy learning with principled notions of domain knowledge, which spans value-based, policy-based, and model-based structures. Our framework takes flexible reduction-style approaches that can integrate structure with reinforcement learning, imitation learning and robust control techniques. In addition to methodological advances, we demonstrate several successful applications of the new policy learning frameworks.</p
Choosing Well Your Opponents: How to Guide the Synthesis of Programmatic Strategies
This paper introduces Local Learner (2L), an algorithm for providing a set of
reference strategies to guide the search for programmatic strategies in
two-player zero-sum games. Previous learning algorithms, such as Iterated Best
Response (IBR), Fictitious Play (FP), and Double-Oracle (DO), can be
computationally expensive or miss important information for guiding search
algorithms. 2L actively selects a set of reference strategies to improve the
search signal. We empirically demonstrate the advantages of our approach while
guiding a local search algorithm for synthesizing strategies in three games,
including MicroRTS, a challenging real-time strategy game. Results show that 2L
learns reference strategies that provide a stronger search signal than IBR, FP,
and DO. We also simulate a tournament of MicroRTS, where a synthesizer using 2L
outperformed the winners of the two latest MicroRTS competitions, which were
programmatic strategies written by human programmers.Comment: International Joint Conference on Artificial Intelligence (IJCAI)
202
- …