Search CORE

48 research outputs found

Interactive Learning and Decision Making: Foundations, Insights & Challenges

Author: Oliehoek F.A. (author)
Publication venue: 'International Joint Conferences on Artificial Intelligence'
Publication date: 01/01/2018
Field of study

Designing "teams of intelligent agents that successfully coordinate and learn about their complex environments inhabited by other agents (such as humans)" is one of the major goals of AI, and it is the challenge that I aim to address in my research. In this paper I give an overview of some of the foundations, insights and challenges in this field of Interactive Learning and Decision Making.Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Interactive Intelligenc

University of Liverpool Repository

TU Delft Repository

Tree-Based Solution Methods for Multiagent POMDPs with Delayed Communication (extended abstract)

Author: Oliehoek F.A. (author)
Spaan M.T.J. (author)
Publication venue
Publication date: 25/10/2012
Field of study

Multiagent Partially Observable Markov Decision Processes (MPOMDPs) provide a powerful framework for optimal decision making under the assumption of instantaneous communication. We focus on a delayed communication setting (MPOMDP-DC), in which broadcasted information is delayed by at most one time step. In this paper, we show that computation of the MPOMDP-DC backup can be structured as a tree and we introduce two novel tree-based pruning techniques that exploit this structure in an effective way.Software Computer TechnologyElectrical Engineering, Mathematics and Computer Scienc

TU Delft Repository

A Cross-Field Review of State Abstraction for Markov Decision Processes

Author: Congeduti E. (author)
Oliehoek F.A. (author)
Publication venue
Publication date: 01/01/2022
Field of study

Complex real-world systems pose a significant challenge to decision making: an agent needs to explore a large environment, deal with incomplete or noisy information, generalize the experience and learn from feedback to act optimally. These processes demand vast representation capacity, thus putting a burden on the agent’s limited computational and storage resources. State abstraction enables effective solutions by forming concise representations of the agents world. As such, it has been widely investigated by several research communities which have produced a variety of different approaches. Nonetheless, relations among them still remain unseen or roughly defined. This hampers potential applications of solution methods whose scope remains limited to the specific abstraction context for which they have been designed. To this end, the goal of this paper is to organize the developed approaches and identify connections between abstraction schemes as a fundamental step towards methods generalization. As a second contribution we discuss general abstraction properties with the aim of supporting a unified perspective for state abstraction.Computer Science & Engineering-Teaching TeamInteractive Intelligenc

TU Delft Repository

Alternating Maximization with Behavioral Cloning

Author: Czechowski A.T. (author)
Oliehoek F.A. (author)
Publication venue: RU Leiden
Publication date: 01/01/2020
Field of study

The key difficulty of cooperative, decentralized planning lies in making accurate predictions about the behavior of one’s teammates. In this paper we introduce a planning method of Alternating maximization with Behavioural Cloning (ABC) – a trainable online decentralized planning algorithm based on Monte Carlo Tree Search (MCTS), combined with models of teammates learned from previous episodic runs. Our algorithm relies on the idea of alternating maximization, where agents adapt their models one at a time in round-robin manner. Under the assumption of perfect policy cloning, and with a sufficient amount of Monte Carlo samples, successive iterations of our method are guaranteed to improve joint policies, and eventually converge.Interactive Intelligenc

TU Delft Repository

Analog Circuit Design with Dyna-Style Reinforcement Learning

Author: Lee W. (author)
Oliehoek F.A. (author)
Publication venue
Publication date: 01/01/2020
Field of study

Interactive Intelligenc

TU Delft Repository

Environment Shift Games: Are Multiple Agents the Solution, and not the Problem, to Non-Stationarity?

Author: Mey A. (author)
Oliehoek F.A. (author)
Publication venue: International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS)
Publication date: 01/01/2021
Field of study

Machine learning and artificial intelligence models that interact with and in an environment will unavoidably have impact on this environment and change it. This is often a problem as many methods do not anticipate such a change in the environment and thus may start acting sub-optimally. Although efforts are made to deal with this problem, we believe that a lot of potential is unused. Driven by the recent success of predictive machine learning, we believe that in many scenarios one can predict when and how a change in the environment will occur. In this paper we introduce a blueprint that intimately connects this idea to the multiagent setting, showing that the multiagent community has a pivotal role to play in addressing the challenging problem of changing environments.Interactive Intelligenc

TU Delft Repository

Safe Multi-agent Learning via Trapping Regions

Author: Czechowski A.T. (author)
Oliehoek F.A. (author)
Publication venue: International Joint Conferences on Artificial Intelligence (IJCAI)
Publication date: 01/01/2023
Field of study

One of the main challenges of multi-agent learning lies in establishing convergence of the algorithms, as, in general, a collection of individual, self-serving agents is not guaranteed to converge with their joint policy, when learning concurrently. This is in stark contrast to most single-agent environments, and sets a prohibitive barrier for deployment in practical applications, as it induces uncertainty in long term behavior of the system. In this work, we apply the concept of trapping regions, known from qualitative theory of dynamical systems, to create safety sets in the joint strategy space for decentralized learning. We propose a binary partitioning algorithm for verification that candidate sets form trapping regions in systems with known learning dynamics, and a heuristic sampling algorithm for scenarios where learning dynamics are not known. We demonstrate the applications to a regularized version of Dirac Generative Adversarial Network, a four-intersection traffic control scenario run in a state of the art open-source microscopic traffic simulator SUMO, and a mathematical model of economic competition.Interactive Intelligenc

TU Delft Repository

Decentralized MCTS via Learned Teammate Models

Author: Czechowski A.T. (author)
Oliehoek F.A. (author)
Publication venue: 'International Joint Conferences on Artificial Intelligence'
Publication date: 01/01/2020
Field of study

Decentralized online planning can be an attractive paradigm for cooperative multi-agent systems, due to improved scalability and robustness. A key difficulty of such approach lies in making accurate predictions about the decisions of other agents. In this paper, we present a trainable online decentralized planning algorithm based on decentralized Monte Carlo Tree Search, combined with models of teammates learned from previous episodic runs. By only allowing one agent to adapt its models at a time, under the assumption of ideal policy approximation, successive iterations of our method are guaranteed to improve joint policies, and eventually lead to convergence to a Nash equilibrium. We test the efficiency of the algorithm by performing experiments in several scenarios of the spatial task allocation environment introduced in [Claes et al., 2015]. We show that deep learning and convolutional neural networks can be employed to produce accurate policy approximators which exploit the spatial features of the problem, and that the proposed algorithm improves over the baseline planning performance for particularly challenging domain configurations.Virtual/online event due to COVID-19 ? moved to January 2021 Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Interactive Intelligenc

arXiv.org e-Print Archive

Crossref

TU Delft Repository

General-Sum Multi-Agent Continuous Inverse Optimal Control

Author: Gavrila D. (author)
Muench C. (author)
Oliehoek F.A. (author)
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Modeling possible future outcomes of robot-human interactions is of importance in the intelligent vehicle and mobile robotics domains. Knowing the reward function that explains the observed behavior of a human agent is advantageous for modeling the behavior with Markov Decision Processes (MDPs). However, learning the rewards that determine the observed actions from data is complicated by interactions. We present a novel inverse reinforcement learning (IRL) algorithm that can infer the reward function in multi-Agent interactive scenarios. In particular, the agents may act boundedly rational (i.e., sub-optimal), a characteristic that is typical for human decision making. Additionally, every agent optimizes its own reward function which makes it possible to address non-cooperative setups. In contrast to other methods, the algorithm does not rely on reinforcement learning during inference of the parameters of the reward function. We demonstrate that our proposed method accurately infers the ground truth reward function in two-Agent interactive experiments.1Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Intelligent VehiclesInteractive Intelligenc

University of Liverpool Repository

TU Delft Repository

Best-Response Bayesian Reinforcement Learning with Bayes-adaptive POMDPs for Centaurs

Author: Celikok M.M. (author)
Kaski Samuel (author)
Oliehoek F.A. (author)
Publication venue: International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS)
Publication date: 01/01/2022
Field of study

Centaurs are half-human, half-AI decision-makers where the AI's goal is to complement the human. To do so, the AI must be able to recognize the goals and constraints of the human and have the means to help them. We present a novel formulation of the interaction between the human and the AI as a sequential game where the agents are modelled using Bayesian best-response models. We show that in this case the AI's problem of helping bounded-rational humans make better decisions reduces to a Bayes-adaptive POMDP. In our simulated experiments, we consider an instantiation of our framework for humans who are subjectively optimistic about the AI's future behaviour. Our results show that when equipped with a model of the human, the AI can infer the human's bounds and nudge them towards better decisions. We discuss ways in which the machine can learn to improve upon its own limitations as well with the help of the human. We identify a novel trade-off for centaurs in partially observable tasks: for the AI's actions to be acceptable to the human, the machine must make sure their beliefs are sufficiently aligned, but aligning beliefs might be costly. We present a preliminary theoretical analysis of this trade-off and its dependence on task structure.Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Interactive Intelligenc

TU Delft Repository