42,092 research outputs found
AdsorbRL: Deep Multi-Objective Reinforcement Learning for Inverse Catalysts Design
A central challenge of the clean energy transition is the development of
catalysts for low-emissions technologies. Recent advances in Machine Learning
for quantum chemistry drastically accelerate the computation of catalytic
activity descriptors such as adsorption energies. Here we introduce AdsorbRL, a
Deep Reinforcement Learning agent aiming to identify potential catalysts given
a multi-objective binding energy target, trained using offline learning on the
Open Catalyst 2020 and Materials Project data sets. We experiment with Deep
Q-Network agents to traverse the space of all ~160,000 possible unary, binary
and ternary compounds of 55 chemical elements, with very sparse rewards based
on adsorption energy known for only between 2,000 and 3,000 catalysts per
adsorbate. To constrain the actions space, we introduce Random Edge Traversal
and train a single-objective DQN agent on the known states subgraph, which we
find strengthens target binding energy by an average of 4.1 eV. We extend this
approach to multi-objective, goal-conditioned learning, and train a DQN agent
to identify materials with the highest (respectively lowest) adsorption
energies for multiple simultaneous target adsorbates. We experiment with
Objective Sub-Sampling, a novel training scheme aimed at encouraging
exploration in the multi-objective setup, and demonstrate simultaneous
adsorption energy improvement across all target adsorbates, by an average of
0.8 eV. Overall, our results suggest strong potential for Deep Reinforcement
Learning applied to the inverse catalysts design problem.Comment: 37th Conference on Neural Information Processing Systems (NeurIPS
2023), AI for Accelerated Materials Design Worksho
Towards Learning to Speak and Hear Through Multi-Agent Communication over a Continuous Acoustic Channel
While multi-agent reinforcement learning has been used as an effective means
to study emergent communication between agents, existing work has focused
almost exclusively on communication with discrete symbols. Human communication
often takes place (and emerged) over a continuous acoustic channel; human
infants acquire language in large part through continuous signalling with their
caregivers. We therefore ask: Are we able to observe emergent language between
agents with a continuous communication channel trained through reinforcement
learning? And if so, what is the impact of channel characteristics on the
emerging language? We propose an environment and training methodology to serve
as a means to carry out an initial exploration of these questions. We use a
simple messaging environment where a "speaker" agent needs to convey a concept
to a "listener". The Speaker is equipped with a vocoder that maps symbols to a
continuous waveform, this is passed over a lossy continuous channel, and the
Listener needs to map the continuous signal to the concept. Using deep
Q-learning, we show that basic compositionality emerges in the learned language
representations. We find that noise is essential in the communication channel
when conveying unseen concept combinations. And we show that we can ground the
emergent communication by introducing a caregiver predisposed to "hearing" or
"speaking" English. Finally, we describe how our platform serves as a starting
point for future work that uses a combination of deep reinforcement learning
and multi-agent systems to study our questions of continuous signalling in
language learning and emergence.Comment: 12 pages, 6 figures, 3 tables; under review as a conference paper at
ICLR 202
Guiding Robot Exploration in Reinforcement Learning via Automated Planning
Reinforcement learning (RL) enables an agent to learn from trial-and-error
experiences toward achieving long-term goals; automated planning aims to
compute plans for accomplishing tasks using action knowledge. Despite their
shared goal of completing complex tasks, the development of RL and automated
planning has been largely isolated due to their different computational
modalities. Focusing on improving RL agents' learning efficiency, we develop
Guided Dyna-Q (GDQ) to enable RL agents to reason with action knowledge to
avoid exploring less-relevant states. The action knowledge is used for
generating artificial experiences from an optimistic simulation. GDQ has been
evaluated in simulation and using a mobile robot conducting navigation tasks in
a multi-room office environment. Compared with competitive baselines, GDQ
significantly reduces the effort in exploration while improving the quality of
learned policies.Comment: Accepted in International Conference of Planning and Scheduling
(ICAPS-21
Independent Learning Approaches: Overcoming Multi-Agent Learning Pathologies In Team-Games
Deep Neural Networks enable Reinforcement Learning (RL) agents to learn behaviour policies directly from high-dimensional observations. As a result, the field of Deep Reinforcement Learning (DRL) has seen a great number of successes. Recently the sub-field of Multi-Agent DRL (MADRL) has received an increased amount of attention. However, considerations are required when using RL in Multi-Agent Systems. For instance Independent Learners (ILs) lack the convergence guarantees of many single-agent RL approaches, even in domains that do not require a MADRL approach. Furthermore, ILs must often overcome a number of learning pathologies to converge upon an optimal joint-policy. Numerous IL approaches have been proposed to facilitate cooperation, including hysteretic Q-learning (Matignon et al., 2007) and leniency (Panait et al., 2006). Recently LMRL2, a variation of leniency, proved robust towards a number of pathologies in low-dimensional domains, including miscoordination, relative overgeneralization, stochasticity, the alter-exploration problem and the moving target problem (Wei and Luke, 2016). In contrast, the majority of work on ILs in MADRL focuses on an amplified moving target problem, caused by neural networks being trained with potentially obsolete samples drawn from experience replay memories. In this thesis we combine advances from research on ILs with DRL algorithms. However, first we evaluate the robustness of tabular approaches along each of the above pathology dimensions. Upon identifying a number of weaknesses that prevent LMRL2 from consistently converging upon optimal joint-policies we propose a new version of leniency, Distributed-Lenient Q-learning (DLQ). We find DLQ delivers state of the art performances in strategic-form and Markov games from Multi-Agent Reinforcement Learning literature. We subsequently scale leniency to MADRL, introducing Lenient (Double) Deep Q-Network (LDDQN). We empirically evaluate LDDQN with extensions of the Cooperative Multi-Agent Object Transportation Problem (Bucsoniu et al., 2010), finding that LDDQN outperforms hysteretic deep Q-learners in domains with multiple dropzones yielding stochastic rewards. Finally, to evaluate deep ILs along each pathology dimension we introduce a new MADRL environment: the Apprentice Firemen Game (AFG). We find lenient and hysteretic approaches fail to consistently learn near optimal joint-policies in the AFG. To address these pathologies we introduce Negative Update Intervals-DDQN (NUI-DDQN), a MADRL algorithm which discards episodes yielding cumulative rewards outside the range of expanding intervals. NUI-DDQN consistently gravitates towards optimal joint-policies in deterministic and stochastic reward settings of the AFG, overcoming the outlined pathologies
Recommended from our members
Towards Informed Exploration for Deep Reinforcement Learning
In this thesis, we discuss various techniques for improving exploration for deep reinforcement learning. We begin with a brief review of reinforcement learning (RL) and the fundamental v.s. exploitation trade-off. Then we review how deep RL has improved upon classical and summarize six categories of the latest exploration methods for deep RL, in the order increasing usage of prior information. We then explore representative works in three categories discuss their strengths and weaknesses. The first category, represented by Soft Q-learning, uses regularization to encourage exploration. The second category, represented by count-based via hashing, maps states to hash codes for counting and assigns higher exploration to less-encountered states. The third category utilizes hierarchy and is represented by modular architecture for RL agents to play StarCraft II. Finally, we conclude that exploration by prior knowledge is a promising research direction and suggest topics of potentially impact
- …