42,092 research outputs found

    AdsorbRL: Deep Multi-Objective Reinforcement Learning for Inverse Catalysts Design

    Full text link
    A central challenge of the clean energy transition is the development of catalysts for low-emissions technologies. Recent advances in Machine Learning for quantum chemistry drastically accelerate the computation of catalytic activity descriptors such as adsorption energies. Here we introduce AdsorbRL, a Deep Reinforcement Learning agent aiming to identify potential catalysts given a multi-objective binding energy target, trained using offline learning on the Open Catalyst 2020 and Materials Project data sets. We experiment with Deep Q-Network agents to traverse the space of all ~160,000 possible unary, binary and ternary compounds of 55 chemical elements, with very sparse rewards based on adsorption energy known for only between 2,000 and 3,000 catalysts per adsorbate. To constrain the actions space, we introduce Random Edge Traversal and train a single-objective DQN agent on the known states subgraph, which we find strengthens target binding energy by an average of 4.1 eV. We extend this approach to multi-objective, goal-conditioned learning, and train a DQN agent to identify materials with the highest (respectively lowest) adsorption energies for multiple simultaneous target adsorbates. We experiment with Objective Sub-Sampling, a novel training scheme aimed at encouraging exploration in the multi-objective setup, and demonstrate simultaneous adsorption energy improvement across all target adsorbates, by an average of 0.8 eV. Overall, our results suggest strong potential for Deep Reinforcement Learning applied to the inverse catalysts design problem.Comment: 37th Conference on Neural Information Processing Systems (NeurIPS 2023), AI for Accelerated Materials Design Worksho

    Towards Learning to Speak and Hear Through Multi-Agent Communication over a Continuous Acoustic Channel

    Full text link
    While multi-agent reinforcement learning has been used as an effective means to study emergent communication between agents, existing work has focused almost exclusively on communication with discrete symbols. Human communication often takes place (and emerged) over a continuous acoustic channel; human infants acquire language in large part through continuous signalling with their caregivers. We therefore ask: Are we able to observe emergent language between agents with a continuous communication channel trained through reinforcement learning? And if so, what is the impact of channel characteristics on the emerging language? We propose an environment and training methodology to serve as a means to carry out an initial exploration of these questions. We use a simple messaging environment where a "speaker" agent needs to convey a concept to a "listener". The Speaker is equipped with a vocoder that maps symbols to a continuous waveform, this is passed over a lossy continuous channel, and the Listener needs to map the continuous signal to the concept. Using deep Q-learning, we show that basic compositionality emerges in the learned language representations. We find that noise is essential in the communication channel when conveying unseen concept combinations. And we show that we can ground the emergent communication by introducing a caregiver predisposed to "hearing" or "speaking" English. Finally, we describe how our platform serves as a starting point for future work that uses a combination of deep reinforcement learning and multi-agent systems to study our questions of continuous signalling in language learning and emergence.Comment: 12 pages, 6 figures, 3 tables; under review as a conference paper at ICLR 202

    Guiding Robot Exploration in Reinforcement Learning via Automated Planning

    Full text link
    Reinforcement learning (RL) enables an agent to learn from trial-and-error experiences toward achieving long-term goals; automated planning aims to compute plans for accomplishing tasks using action knowledge. Despite their shared goal of completing complex tasks, the development of RL and automated planning has been largely isolated due to their different computational modalities. Focusing on improving RL agents' learning efficiency, we develop Guided Dyna-Q (GDQ) to enable RL agents to reason with action knowledge to avoid exploring less-relevant states. The action knowledge is used for generating artificial experiences from an optimistic simulation. GDQ has been evaluated in simulation and using a mobile robot conducting navigation tasks in a multi-room office environment. Compared with competitive baselines, GDQ significantly reduces the effort in exploration while improving the quality of learned policies.Comment: Accepted in International Conference of Planning and Scheduling (ICAPS-21

    Independent Learning Approaches: Overcoming Multi-Agent Learning Pathologies In Team-Games

    Get PDF
    Deep Neural Networks enable Reinforcement Learning (RL) agents to learn behaviour policies directly from high-dimensional observations. As a result, the field of Deep Reinforcement Learning (DRL) has seen a great number of successes. Recently the sub-field of Multi-Agent DRL (MADRL) has received an increased amount of attention. However, considerations are required when using RL in Multi-Agent Systems. For instance Independent Learners (ILs) lack the convergence guarantees of many single-agent RL approaches, even in domains that do not require a MADRL approach. Furthermore, ILs must often overcome a number of learning pathologies to converge upon an optimal joint-policy. Numerous IL approaches have been proposed to facilitate cooperation, including hysteretic Q-learning (Matignon et al., 2007) and leniency (Panait et al., 2006). Recently LMRL2, a variation of leniency, proved robust towards a number of pathologies in low-dimensional domains, including miscoordination, relative overgeneralization, stochasticity, the alter-exploration problem and the moving target problem (Wei and Luke, 2016). In contrast, the majority of work on ILs in MADRL focuses on an amplified moving target problem, caused by neural networks being trained with potentially obsolete samples drawn from experience replay memories. In this thesis we combine advances from research on ILs with DRL algorithms. However, first we evaluate the robustness of tabular approaches along each of the above pathology dimensions. Upon identifying a number of weaknesses that prevent LMRL2 from consistently converging upon optimal joint-policies we propose a new version of leniency, Distributed-Lenient Q-learning (DLQ). We find DLQ delivers state of the art performances in strategic-form and Markov games from Multi-Agent Reinforcement Learning literature. We subsequently scale leniency to MADRL, introducing Lenient (Double) Deep Q-Network (LDDQN). We empirically evaluate LDDQN with extensions of the Cooperative Multi-Agent Object Transportation Problem (Bucsoniu et al., 2010), finding that LDDQN outperforms hysteretic deep Q-learners in domains with multiple dropzones yielding stochastic rewards. Finally, to evaluate deep ILs along each pathology dimension we introduce a new MADRL environment: the Apprentice Firemen Game (AFG). We find lenient and hysteretic approaches fail to consistently learn near optimal joint-policies in the AFG. To address these pathologies we introduce Negative Update Intervals-DDQN (NUI-DDQN), a MADRL algorithm which discards episodes yielding cumulative rewards outside the range of expanding intervals. NUI-DDQN consistently gravitates towards optimal joint-policies in deterministic and stochastic reward settings of the AFG, overcoming the outlined pathologies
    • …
    corecore