7,477 research outputs found
Noisy Symbolic Abstractions for Deep RL: A case study with Reward Machines
Natural and formal languages provide an effective mechanism for humans to
specify instructions and reward functions. We investigate how to generate
policies via RL when reward functions are specified in a symbolic language
captured by Reward Machines, an increasingly popular automaton-inspired
structure. We are interested in the case where the mapping of environment state
to a symbolic (here, Reward Machine) vocabulary -- commonly known as the
labelling function -- is uncertain from the perspective of the agent. We
formulate the problem of policy learning in Reward Machines with noisy symbolic
abstractions as a special class of POMDP optimization problem, and investigate
several methods to address the problem, building on existing and new
techniques, the latter focused on predicting Reward Machine state, rather than
on grounding of individual symbols. We analyze these methods and evaluate them
experimentally under varying degrees of uncertainty in the correct
interpretation of the symbolic vocabulary. We verify the strength of our
approach and the limitation of existing methods via an empirical investigation
on both illustrative, toy domains and partially observable, deep RL domains.Comment: NeurIPS Deep Reinforcement Learning Workshop 202
Learning Reward Machines in Cooperative Multi-Agent Tasks
This paper presents a novel approach to Multi-Agent Reinforcement Learning
(MARL) that combines cooperative task decomposition with the learning of reward
machines (RMs) encoding the structure of the sub-tasks. The proposed method
helps deal with the non-Markovian nature of the rewards in partially observable
environments and improves the interpretability of the learnt policies required
to complete the cooperative task. The RMs associated with each sub-task are
learnt in a decentralised manner and then used to guide the behaviour of each
agent. By doing so, the complexity of a cooperative multi-agent problem is
reduced, allowing for more effective learning. The results suggest that our
approach is a promising direction for future research in MARL, especially in
complex environments with large state spaces and multiple agents.Comment: Neuro-symbolic AI for Agent and Multi-Agent Systems Workshop at
AAMAS'2
Memory Augmented Control Networks
Planning problems in partially observable environments cannot be solved
directly with convolutional networks and require some form of memory. But, even
memory networks with sophisticated addressing schemes are unable to learn
intelligent reasoning satisfactorily due to the complexity of simultaneously
learning to access memory and plan. To mitigate these challenges we introduce
the Memory Augmented Control Network (MACN). The proposed network architecture
consists of three main parts. The first part uses convolutions to extract
features and the second part uses a neural network-based planning module to
pre-plan in the environment. The third part uses a network controller that
learns to store those specific instances of past information that are necessary
for planning. The performance of the network is evaluated in discrete grid
world environments for path planning in the presence of simple and complex
obstacles. We show that our network learns to plan and can generalize to new
environments
Predictive-State Decoders: Encoding the Future into Recurrent Networks
Recurrent neural networks (RNNs) are a vital modeling technique that rely on
internal states learned indirectly by optimization of a supervised,
unsupervised, or reinforcement training loss. RNNs are used to model dynamic
processes that are characterized by underlying latent states whose form is
often unknown, precluding its analytic representation inside an RNN. In the
Predictive-State Representation (PSR) literature, latent state processes are
modeled by an internal state representation that directly models the
distribution of future observations, and most recent work in this area has
relied on explicitly representing and targeting sufficient statistics of this
probability distribution. We seek to combine the advantages of RNNs and PSRs by
augmenting existing state-of-the-art recurrent neural networks with
Predictive-State Decoders (PSDs), which add supervision to the network's
internal state representation to target predicting future observations.
Predictive-State Decoders are simple to implement and easily incorporated into
existing training pipelines via additional loss regularization. We demonstrate
the effectiveness of PSDs with experimental results in three different domains:
probabilistic filtering, Imitation Learning, and Reinforcement Learning. In
each, our method improves statistical performance of state-of-the-art recurrent
baselines and does so with fewer iterations and less data.Comment: NIPS 201
- …