Search CORE

20 research outputs found

Probably Approximately Correct Greedy Maximization

Author: Oliehoek FA
Satsangi Y
Whiteson S
Publication venue
Publication date: 01/01/2016
Field of study

University of Liverpool Repository

UvA-DARE

The MADP Toolbox: An Open-Source Library for Planning and Learning in (Multi-)Agent Systems

Author: Messias JV
Oliehoek FA
Robbel P
Spaan MTJ
Terwijn B
Publication venue
Publication date: 01/08/2017
Field of study

This article describes the MultiAgent Decision Process (MADP) toolbox, a software library to support planning and learning for intelligent agents and multiagent systems in un- certain environments. Some of its key features are that it sup- ports partially observable environments and stochastic tran- sition models; has unified support for single- and multiagent systems; provides a large number of models for decision- theoretic decision making, including one-shot decision mak- ing (e.g., Bayesian games) and sequential decision mak- ing under various assumptions of observability and coopera- tion, such as Dec-POMDPs and POSGs; provides tools and parsers to quickly prototype new problems; provides an ex- tensive range of planning and learning algorithms for single- and multiagent systems; and is written in C++ and designed to be extensible via the object-oriented paradigm

University of Liverpool Repository

TU Delft Repository

Local Communication Protocols for Learning Complex Swarm Behaviors with Deep Reinforcement Learning

Author: A Martinoli
C Kube
C Moeslinger
F Arvin
FA Oliehoek
J Foerster
JK Gupta
L Bayındır
N Correll
P Basu
S Nouyan
V Mnih
Publication venue
Publication date: 01/01/2018
Field of study

Swarm systems constitute a challenging problem for reinforcement learning (RL) as the algorithm needs to learn decentralized control policies that can cope with limited local sensing and communication abilities of the agents. While it is often difficult to directly define the behavior of the agents, simple communication protocols can be defined more easily using prior knowledge about the given task. In this paper, we propose a number of simple communication protocols that can be exploited by deep reinforcement learning to find decentralized control policies in a multi-robot swarm environment. The protocols are based on histograms that encode the local neighborhood relations of the agents and can also transmit task-specific information, such as the shortest distance and direction to a desired target. In our framework, we use an adaptation of Trust Region Policy Optimization to learn complex collaborative tasks, such as formation building and building a communication link. We evaluate our findings in a simulated 2D-physics environment, and compare the implications of different communication protocols.Comment: 13 pages, 4 figures, version 2, accepted at ANTS 201

arXiv.org e-Print Archive

TUbiblio

Crossref

LiftUpp: Support to Develop Learner Performance

Author: Adderton EA
Cui X
Dawson L
Jackson D
Jimmieson P
Jones JC
Kennedy K
Mason B
Oliehoek FA
Plumbley A
Savani R
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/04/2017
Field of study

Various motivations exist to move away from the simple assessment of knowledge towards the more complex assessment and development of competence. However, to accommodate such a change, high demands are put on the supporting e-infrastructure in terms of intelligently collecting and analysing data. In this paper, we discuss these challenges and how they are being addressed by LiftUpp, a system that is now used in 70% of UK dental schools, and is finding wider applications in physiotherapy, medicine and veterinary science. We describe how data is collected for workplace-based development in dentistry using a dedicated iPad app, which enables an integrated approach to linking and assessing work flows, skills and learning outcomes. Furthermore, we detail how the various forms of collected data can be fused, visualized and integrated with conventional forms of assessment. This enables curriculum integration, improved real-time student feedback, support for administration, and informed instructional planning. Together these facets contribute to better support for the development of learners' competence in situated learning setting, as well as an improved experience. Finally, we discuss several directions for future research on intelligent teaching systems that are afforded by using the design present within LiftUpp.Comment: Short 4-page version to appear at AIED 201

arXiv.org e-Print Archive

University of Liverpool Repository

Crossref

E-space: Manchester Metropolitan University's Research Repository

PAC greedy maximization with efficient bounds on information gain for sensor selection

Author: Oliehoek FA
Satsangi Y
Whiteson SA
Publication venue: International Joint Conference on Artificial Intelligence
Publication date: 01/01/2016
Field of study

Submodular function maximization finds application in a variety of real-world decision-making problems. However, most existing methods, based on greedy maximization, assume it is computationally feasible to evaluate F , the function being maximized. Unfortunately, in many realistic settings F is too expensive to evaluate exactly even once. We present probably approximately correct greedy maximization, which requires access only to cheap anytime confidence bounds on F and uses them to prune elements. We show that, with high probability, our method returns an approximately optimal set. We also propose novel, cheap confidence bounds for conditional entropy, which appears in many common choices of F and for which it is difficult to find unbiased or bounded estimates. Finally, results on a real-world dataset from a multicamera tracking system in a shopping mall demonstrate that our approach performs comparably to existing methods, but at a fraction of the computational cost

University of Liverpool Repository

Oxford University Research Archive

International Migration, Integration and Social Cohesion online publications

MORAL: Aligning AI with Human Norms through Multi-Objective Reinforced Active Learning

Author: Oliehoek FA
Peschl M
Siebert LC
Zgonnikov A
Publication venue
Publication date: 01/01/2022
Field of study

Inferring reward functions from demonstrations and pairwise preferences are auspicious approaches for aligning Reinforcement Learning (RL) agents with human intentions. However, state-of-the art methods typically focus on learning a single reward model, thus rendering it difficult to trade off different reward functions from multiple experts. We propose Multi-Objective Reinforced Active Learning (MORAL), a novel method for combining diverse demonstrations of social norms into a Pareto-optimal policy. Through maintaining a distribution over scalarization weights, our approach is able to interactively tune a deep RL agent towards a variety of preferences, while eliminating the need for computing multiple policies. We empirically demonstrate the effectiveness of MORAL in two scenarios, which model a delivery and an emergency task that require an agent to act in the presence of normative conflicts. Overall, we consider our research a step towards multi-objective RL with learned rewards, bridging the gap between current reward learning and machine ethics literature

University of Liverpool Repository

Analysing factorizations of action-value networks for cooperative multi-agent reinforcement learning

Author: Castellini J
Oliehoek FA
Savani R
Whiteson S
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2021
Field of study

Recent years have seen the application of deep reinforcement learning techniques to cooperative multi-agent systems, with great empirical success. However, given the lack of theoretical insight, it remains unclear what the employed neural networks are learning, or how we should enhance their learning power to address the problems on which they fail. In this work, we empirically investigate the learning power of various network architectures on a series of one-shot games. Despite their simplicity, these games capture many of the crucial problems that arise in the multi-agent setting, such as an exponential number of joint actions or the lack of an explicit coordination mechanism. Our results extend those in Castellini et al. (Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS’19.International Foundation for Autonomous Agents and Multiagent Systems, pp 1862–1864, 2019) and quantify how well various approaches can represent the requisite value functions, and help us identify the reasons that can impede good performance, like sparsity of the values or too tight coordination requirements

University of Liverpool Repository

TU Delft Repository

Oxford University Research Archive

Exploiting submodular value functions for scaling up active perception

Author: Oliehoek FA
Satsangi Y
Spaan MTJ
Whiteson SA
Publication venue: Springer Verlag
Publication date: 01/01/2017
Field of study

In active perception tasks, an agent aims to select sensory actions that reduce its uncertainty about one or more hidden variables. For example, a mobile robot takes sensory actions to efficiently navigate in a new environment. While partially observable Markov decision processes (POMDPs) provide a natural model for such problems, reward functions that directly penalize uncertainty in the agent’s belief can remove the piecewise-linear and convex (PWLC) property of the value function required by most POMDP planners. Furthermore, as the number of sensors available to the agent grows, the computational cost of POMDP planning grows exponentially with it, making POMDP planning infeasible with traditional methods. In this article, we address a twofold challenge of modeling and planning for active perception tasks. We analyze ρPOMDP and POMDP-IR, two frameworks for modeling active perception tasks, that restore the PWLC property of the value function. We show the mathematical equivalence of these two frameworks by showing that given a ρPOMDP along with a policy, they can be reduced to a POMDP-IR and an equivalent policy (and vice-versa). We prove that the value function for the given ρPOMDP (and the given policy) and the reduced POMDP-IR (and the reduced policy) is the same. To efficiently plan for active perception tasks, we identify and exploit the independence properties of POMDP-IR to reduce the computational cost of solving POMDP-IR (and ρPOMDP). We propose greedy pointbased value iteration (PBVI), a new POMDP planning method that uses greedy maximization to greatly improve scalability in the action space of an active perception POMDP. Furthermore, we show that, under certain conditions, including submodularity, the value function computed using greedy PBVI is guaranteed to have bounded error with respect to the optimal value function. We establish the conditions under which the value function of an active perception POMDP is guaranteed to be submodular. Finally, we present a detailed empirical analysis on a dataset collected from a multicamera tracking system employed in a shopping mall. Our method achieves similar performance to existing methods but at a fraction of the computational cost leading to better scalability for solving active perception tasks.</p

Oxford University Research Archive

Maximizing information gain in partially observable environments via prediction rewards

Author: Lim S
Oliehoek FA
Satsangi Y
White M
Whiteson S
Publication venue
Publication date: 01/01/2020
Field of study

Information gathering in a partially observable environment can be formulated as a reinforcement learning (RL), problem where the reward depends on the agent's uncertainty. For example, the reward can be the negative entropy of the agent's belief over an unknown (or hidden) variable. Typically, the rewards of an RL agent are defined as a function of the state-action pairs and not as a function of the belief of the agent; this hinders the direct application of deep RL methods for such tasks. This paper tackles the challenge of using belief-based rewards for a deep RL agent, by offering a simple insight that maximizing any convex function of the belief of the agent can be approximated by instead maximizing a prediction reward: a reward based on prediction accuracy. In particular, we derive the exact error between negative entropy and the expected prediction reward. This insight provides theoretical motivation for several fields using prediction rewards-namely visual attention, question answering systems, and intrinsic motivation-and highlights their connection to the usually distinct fields of active perception, active sensing, and sensor placement. Based on this insight we present deep anticipatory networks (DANs), which enables an agent to take actions to reduce its uncertainty without performing explicit belief inference. We present two applications of DANs: building a sensor selection system for tracking people in a shopping mall and learning discrete models of attention on fashion MNIST and MNIST digit classification

University of Liverpool Repository

Oxford University Research Archive

An interactive, web-based tool for genealogical entity resolution

Author: Calders TGK Toon
Efremova J Julia
Oliehoek FA
Ranjbar-Sahraei B
Tuyls KP Karl
Publication venue
Publication date: 01/01/2013
Field of study

We demonstrate an interactive, web-based tool which helps historians to do Genealogical Entitiy Resolution. This work has two main goals. First, it uses Machine Learning (ML) algorithms to assist humanites researchers to perform Genealogical Entity Resolution. Second, it facilitates the generation of benchmark data for computer scientists to improve available ML-based Entity Resolution techniques

Repository TU/e

DI-fusion