Search CORE

18 research outputs found

GriddlyJS: A Web IDE for Reinforcement Learning

Author: Bamford C
Jiang M
Rocktäschel T
Samvelyan M
Publication venue: NeurIPS
Publication date: 01/01/2022
Field of study

Progress in reinforcement learning (RL) research is often driven by the design of new, challenging environments-a costly undertaking requiring skills orthogonal to that of a typical machine learning researcher. The complexity of environment development has only increased with the rise of procedural-content generation (PCG) as the prevailing paradigm for producing varied environments capable of testing the robustness and generalization of RL agents. Moreover, existing environments often require complex build processes, making reproducing results difficult. To address these issues, we introduce GriddlyJS, a web-based Integrated Development Environment (IDE) based on the Griddly engine. GriddlyJS allows researchers to visually design and debug arbitrary, complex PCG grid-world environments using a convenient graphical interface, as well as visualize, evaluate, and record the performance of trained agent models. By connecting the RL workflow to the advanced functionality enabled by modern web standards, GriddlyJS allows publishing interactive agent-environment demos that reproduce experimental results directly to the web. To demonstrate the versatility of GriddlyJS, we use it to quickly develop a complex compositional puzzle-solving environment alongside arbitrary human-designed environment configurations and their solutions for use in automatic curriculum learning and offline RL. The GriddlyJS IDE is open source and freely available at https://griddly.ai

UCL Discovery

Surface Activity and Mechanism of Action of Antiarrhythmic Drugs

Author: N. T. Pryanishnikova
V. M. Samvelyan
V. V. Zakusov
Publication venue: 'Croatian Chemical Society'
Publication date: 01/01/1979
Field of study

Surface active substances are widely used in medicine, among them the drugs capable of adsorption in efficient concentrations at various interface points. The relationship between the pharmacological action of a ntiarrhythmic drugs and their surface activity and influence on the lipid-containing interfaces (bimolecular layers of phosphatidylcholine) have been studied. It has been revealed that diphisopronyle (diethylaminopropyl ether a-isopropyloxydiphenylacetic acid, hydrochloride), fubromegane (1-methyl-3-diethylaminopropyl ether 5-bromofurane-2-carboxylic acid, iodomethylate), methamicile (~-diethylaminopropyl ether benzyl acid, hydrochloride), propranolole (l-isopropylamino-3 (oxynaphtyl-1)--propanol-2, hydrochloride), chinidine (chinidine sulphate), novocainamide (~-diethylaminoethyl- amid p-aminobenzoic acid, hydrochloride), novocaine (~ -diethylaminoethyl ether p-aminobenzoic acid, hydrochloride), xylocaine (N,N-diethylamino-2,6-dimethylacethanilide, hydrochloride), trimecaine (N,N-diethylamino-2,4,6-trimethylacethanilide, hydrochloride) possess surface activity. Parallelism between the physiological action and interfacial activity of antiarrhythmic drugs has been established. Antiarrhythmics increase the electric conductance of lecithine bilayers. There exists a symbate dependence between the effect of drugs on the permeability of a bimolecular lecithin membrane and . their pharmacological activity. These results are essential a) for understanding the mode of action of antiarrhythmic agents and b) discovering new drugs which possess the required properties

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Hierarchical Kickstarting for Skill Transfer in Reinforcement Learning

Author: Grefenstette E
Matthews M
Parker-Holder J
Rocktäschel T
Samvelyan M
Publication venue: Proceedings of Machine Learning Research (PMLR)
Publication date: 01/01/2022
Field of study

Practising and honing skills forms a fundamental component of how humans learn, yet artificial agents are rarely specifically trained to perform them. Instead, they are usually trained end-to-end, with the hope being that useful skills will be implicitly learned in order to maximise discounted return of some extrinsic reward function. In this paper, we investigate how skills can be incorporated into the training of reinforcement learning (RL) agents in complex environments with large state-action spaces and sparse rewards. To this end, we created SkillHack, a benchmark of tasks and associated skills based on the game of NetHack. We evaluate a number of baselines on this benchmark, as well as our own novel skill-based method Hierarchical Kickstarting (HKS), which is shown to outperform all other evaluated methods. Our experiments show that learning with a prior knowledge of useful skills can significantly improve the performance of agents on complex problems. We ultimately argue that utilising predefined skills provides a useful inductive bias for RL problems, especially those with large state-action spaces and sparse rewards

UCL Discovery

Evolving Curricula with Regret-Based Environment Design

Author: Dennis M
Foerster J
Grefenstette E
Jiang M
Parker-Holder J
Rocktäschel T
Samvelyan M
Publication venue: Proceedings of Machine Learning Research (PMLR)
Publication date: 23/07/2022
Field of study

Training generally-capable agents with reinforcement learning (RL) remains a significant challenge. A promising avenue for improving the robustness of RL agents is through the use of curricula. One such class of methods frames environment design as a game between a student and a teacher, using regret-based objectives to produce environment instantiations (or levels) at the frontier of the student agent's capabilities. These methods benefit from theoretical robustness guarantees at equilibrium, yet they often struggle to find effective levels in challenging design spaces in practice. By contrast, evolutionary approaches incrementally alter environment complexity, resulting in potentially open-ended learning, but often rely on domain-specific heuristics and vast amounts of computational resources. This work proposes harnessing the power of evolution in a principled, regret-based curriculum. Our approach, which we call Adversarially Compounding Complexity by Editing Levels (ACCEL), seeks to constantly produce levels at the frontier of an agent's capabilities, resulting in curricula that start simple but become increasingly complex. ACCEL maintains the theoretical benefits of prior regret-based methods, while providing significant empirical gains in a diverse set of environments. An interactive version of this paper is available at https://accelagent.github.io

UCL Discovery

MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research

Author: Grefenstette E
Hambro E
Jiang M
Kirk R
Kurin V
Küttler H
Parker-Holder J
Petroni F
Rocktäschel T
Samvelyan M
Publication venue
Publication date: 01/01/2021
Field of study

Progress in deep reinforcement learning (RL) is heavily driven by the availability of challenging benchmarks used for training agents. However, benchmarks that are widely adopted by the community are not explicitly designed for evaluating specific capabilities of RL methods. While there exist environments for assessing particular open problems in RL (such as exploration, transfer learning, unsupervised environment design, or even language-assisted RL), it is generally difficult to extend these to richer, more complex environments once research goes beyond proof-of-concept results. We present MiniHack, a powerful sandbox framework for easily designing novel RL environments. MiniHack is a one-stop shop for RL experiments with environments ranging from small rooms to complex, procedurally generated worlds. By leveraging the full set of entities and environment dynamics from NetHack, one of the richest grid-based video games, MiniHack allows designing custom RL testbeds that are fast and convenient to use. With this sandbox framework, novel environments can be designed easily, either using a human-readable description language or a simple Python interface. In addition to a variety of RL tasks and baselines, MiniHack can wrap existing RL benchmarks and provide ways to seamlessly add additional complexity

arXiv.org e-Print Archive

UCL Discovery

Insights from the NeurIPS 2021 NetHack Challenge

Author: Babaev D
Byeon M
Chakraborty D
Grefenstette E
Hambro E
Jiang M
Jo D
Kanervisto A
Kim J
Kim S
Kirk R
Kurin V
Kwon T
Küttler H
Lee D
Mella V
Mohanty S
Nardelli N
Nazarov I
Ovsov N
Parker-Holder J
Raileanu R
Ramanauskas K
Rocktäschel T
Rothermel D
Samvelyan M
Sorokin D
Sypetkowski M
Sypetkowski M
Publication venue: Proceedings of Machine Learning Research (PMLR)
Publication date: 01/01/2022
Field of study

In this report, we summarize the takeaways from the first NeurIPS 2021 NetHack Challenge. Participants were tasked with developing a program or agent that can win (i.e., ‘ascend’ in) the popular dungeon-crawler game of NetHack by interacting with the NetHack Learning Environment (NLE), a scalable, procedurally generated, and challenging Gym environment for reinforcement learning (RL). The challenge showcased community-driven progress in AI with many diverse approaches significantly beating the previously best results on NetHack. Furthermore, it served as a direct comparison between neural (e.g., deep RL) and symbolic AI, as well as hybrid systems, demonstrating that on NetHack symbolic bots currently outperform deep RL by a large margin. Lastly, no agent got close to winning the game, illustrating NetHack’s suitability as a long-term benchmark for AI research

arXiv.org e-Print Archive

UCL Discovery

MAVEN: Multi-Agent Variational Exploration

Author: Mahajan A
Rashid T
Samvelyan M
Whiteson S
Publication venue: Neural Information Processing Systems Foundation
Publication date: 01/01/2019
Field of study

Oxford University Research Archive

MAVEN: Multi-Agent Variational Exploration

Author: Mahajan A
Rashid T
Samvelyan M
Whiteson S
Publication venue
Publication date: 10/12/2019
Field of study

Centralised training with decentralised execution is an important setting for cooperative deep multi-agent reinforcement learning due to communication constraints during execution and computational tractability in training. In this paper, we analyse value-based methods that are known to have superior performance in complex environments [43]. We specifically focus on QMIX [40], the current state-of-the-art in this domain. We show that the representational constraints on the joint action-values introduced by QMIX and similar methods lead to provably poor exploration and suboptimality. Furthermore, we propose a novel approach called MAVEN that hybridises value and policy-based methods by introducing a latent space for hierarchical control. The value-based agents condition their behaviour on the shared latent variable controlled by a hierarchical policy. This allows MAVEN to achieve committed, temporally extended exploration, which is key to solving complex multi-agent tasks. Our experimental results show that MAVEN achieves significant performance improvements on the challenging SMAC domain [43]

arXiv.org e-Print Archive

Oxford University Research Archive

Rigidity of the $\boldsymbol{L^p}$ -norm of the Poisson bracket on surfaces

Author: F. Cardin
F. Zapolsky
Frol Zapolsky
H. Federer
K. Samvelyan
Karina Samvelyan
L. Buhovsky
M. Entov
M. Entov
S. S. Cairns
Publication venue: 'American Institute of Mathematical Sciences (AIMS)'
Publication date
Field of study

Crossref