Search CORE

78 research outputs found

Point-Based Planning for Multi-Objective POMDPs

Author: Oliehoek Frans A
Roijers Diederik M
Whiteson Shimon
Publication venue
Publication date: 01/01/2015
Field of study

University of Liverpool Repository

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Severity-sensitive norm-governed multi-agent planning

Author: Gasparini Luca
Kollingbaum Martin J.
Norman Timothy J.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

This research was funded by Selex ES. The software developed during this research, including the norm analysis and planning algorithms, the simulator and harbour protection scenario used during evaluation is freely available from doi:10.5258/SOTON/D0139Peer reviewedPublisher PD

Aberdeen University Research

Southampton (e-Prints Soton)

Decision-Making under Uncertainty: Be Aware of your Priorities

Author: Bencomo Nelly
Samin Huma
Sawyer Pete
Publication venue: Springer
Publication date: 25/01/2022
Field of study

Self-adaptive systems (SASs) are increasingly leveraging autonomy in their decision-making to manage uncertainty in their operating environments. A key problem with SASs is ensuring their requirements remain satisfied as they adapt. The trade-off analysis of the non-functional requirements (NFRs) is key to establish balance among them. Further, when performing the trade-offs it is necessary to know the importance of each NFR to be able to resolve conflicts among them. Such trade-off analyses are often built upon optimisation methods, including decision analysis and utility theory. A problem with these techniques is that they use a single-scalar utility value to represent the overall combined priority for all the NFRs. However, this combined scalar priority value may hide information about the impacts of the environmental contexts on the individual NFRs’ priorities, which may change over time. Hence, there is a need for support for runtime, autonomous reasoning about the separate priority values for each NFR, while using the knowledge acquired based on evidence collected. In this paper, we propose Pri-AwaRE, a self-adaptive architecture that makes use of Multi-Reward Partially Observable Markov Decision Process (MR-POMDP) to perform decision-making for SASs while offering awareness of NFRs’ priorities. MR-POMDP is used as a priority-aware runtime specification model to support runtime reasoning and autonomous tuning of the distinct priority values of NFRs using a vector-valued reward function. We also evaluate the usefulness of our Pri-AwaRE approach by applying it to two substantial example applications from the networking and IoT domains

Durham Research Online

Aston Publications Explorer

Quality Assessment of MORL Algorithms: A Utility-Based Approach

Author: Beau Philipp
Kanters Timon V
Oliehoek Frans
Roijers Diederik M
Zintgraf Luisa M
Publication venue
Publication date: 19/06/2015
Field of study

Sequential decision-making problems with multiple objectives occur often in practice. In such settings, the utility of a policy depends on how the user values different trade-offs between the objectives. Such valuations can be expressed by a so-called scalarisation function. However, the exact scalarisation function can be unknown when the agents should learn or plan. Therefore, instead of a single solution, the agents aim to produce a solution set that contains an optimal solution for all possible scalarisations. Because it is often not possible to produce an exact solution set, many algorithms have been proposed that produce approximate solution sets instead. We argue that when comparing these algorithms we should do so on the basis of user utility, and on a wide range of problems. In practice however, comparison of the quality of these algorithms have typically been done with only a few limited benchmarks and metrics that do not directly express the utility for the user. In this paper, we propose two metrics that express either the expected utility, or the maximal utility loss with respect to the optimal solution set. Furthermore, we propose a generalised benchmark in order to compare algorithms more reliably

University of Liverpool Repository

Recommended from our members

Abstractions in Reasoning for Long-Term Autonomy

Author: Wray Kyle Hollins
Publication venue: ScholarWorks@UMass Amherst
Publication date: 02/07/2019
Field of study

The path to building adaptive, robust, intelligent agents has led researchers to develop a suite of powerful models and algorithms for agents with a single objective. However, in recent years, attempts to use this monolithic approach to solve an ever-expanding set of complex real-world problems, which increasingly include long-term autonomous deployments, have illuminated challenges in its ability to scale. Consequently, a fragmented collection of hierarchical and multi-objective models were developed. This trend continues into the algorithms as well, as each approximates an optimal solution in a different manner for scalability. These models and algorithms represent an attempt to solve pieces of an overarching problem: how can an agent explicitly model and integrate the necessary aspects of reasoning required to achieve long-term autonomy? This thesis presents a general hierarchical and multi-objective model called a policy network that unifies prior fragmented solutions into a single graphical decision-making structure. Policy networks are broadly useful to solve numerous real-world problems. This thesis focuses on autonomous vehicle (AV) problems: (1) route-planning with multiple objectives; (2) semi-autonomy with proactive transfer of control; and (3) intersection decision-making for reasoning online about any number of other vehicles and pedestrians. Formal models are presented for each of the distinct problems. Solutions are evaluated using real-world map data in simulation and demonstrated on a fully operational AV prototype driving on real public roads. Policy networks serve as a shared underlying framework for all three, enabling their seamless integration as parts of an overall solution for rich, real-world, scalable decision-making in agents with long-term autonomy

ScholarWorks@UMass Amherst

A practical guide to multi-objective reinforcement learning and planning

Author: Bargiacchi Eugenio
Dazeley Richard
Hayes Conor
Heintz Frederick
Howley Enda
Irissappane Athirai
Källström Johan
Macfarlane Matthew
Mannion Patrick
Nowé Ann
Ramos Gabriel
Restelli Marcello
Reymond Mathieu
Roijers Diederik
Rădulescu Roxana
Vamplew Peter
Verstraeten Timothy
Zintgraf Luisa
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Real-world sequential decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying problem and hence produce suboptimal results. This paper serves as a guide to the application of multi-objective methods to difficult problems, and is aimed at researchers who are already familiar with single-objective reinforcement learning and planning methods who wish to adopt a multi-objective perspective on their research, as well as practitioners who encounter multi-objective decision problems in practice. It identifies the factors that may influence the nature of the desired solution, and illustrates by example how these influence the design of multi-objective decision-making systems for complex problems. © 2022, The Author(s)

Federation ResearchOnline

Techniques for the allocation of resources under uncertainty

Author: Plamondon Pierrick
Publication venue: Bibliotheque de l' Universite Laval
Publication date: 01/01/2007
Field of study

L’allocation de ressources est un problème omniprésent qui survient dès que des ressources limitées doivent être distribuées parmi de multiples agents autonomes (e.g., personnes, compagnies, robots, etc). Les approches standard pour déterminer l’allocation optimale souffrent généralement d’une très grande complexité de calcul. Le but de cette thèse est de proposer des algorithmes rapides et efficaces pour allouer des ressources consommables et non consommables à des agents autonomes dont les préférences sur ces ressources sont induites par un processus stochastique. Afin d’y parvenir, nous avons développé de nouveaux modèles pour des problèmes de planifications, basés sur le cadre des Processus Décisionnels de Markov (MDPs), où l’espace d’actions possibles est explicitement paramétrisés par les ressources disponibles. Muni de ce cadre, nous avons développé des algorithmes basés sur la programmation dynamique et la recherche heuristique en temps-réel afin de générer des allocations de ressources pour des agents qui agissent dans un environnement stochastique. En particulier, nous avons utilisé la propriété acyclique des créations de tâches pour décomposer le problème d’allocation de ressources. Nous avons aussi proposé une stratégie de décomposition approximative, où les agents considèrent des interactions positives et négatives ainsi que les actions simultanées entre les agents gérants les ressources. Cependant, la majeure contribution de cette thèse est l’adoption de la recherche heuristique en temps-réel pour l’allocation de ressources. À cet effet, nous avons développé une approche basée sur la Q-décomposition munie de bornes strictes afin de diminuer drastiquement le temps de planification pour formuler une politique optimale. Ces bornes strictes nous ont permis d’élaguer l’espace d’actions pour les agents. Nous montrons analytiquement et empiriquement que les approches proposées mènent à des diminutions de la complexité de calcul par rapport à des approches de planification standard. Finalement, nous avons testé la recherche heuristique en temps-réel dans le simulateur SADM, un simulateur d’allocation de ressource pour une frégate.Resource allocation is an ubiquitous problem that arises whenever limited resources have to be distributed among multiple autonomous entities (e.g., people, companies, robots, etc). The standard approaches to determine the optimal resource allocation are computationally prohibitive. The goal of this thesis is to propose computationally efficient algorithms for allocating consumable and non-consumable resources among autonomous agents whose preferences for these resources are induced by a stochastic process. Towards this end, we have developed new models of planning problems, based on the framework of Markov Decision Processes (MDPs), where the action sets are explicitly parameterized by the available resources. Given these models, we have designed algorithms based on dynamic programming and real-time heuristic search to formulating thus allocations of resources for agents evolving in stochastic environments. In particular, we have used the acyclic property of task creation to decompose the problem of resource allocation. We have also proposed an approximative decomposition strategy, where the agents consider positive and negative interactions as well as simultaneous actions among the agents managing the resources. However, the main contribution of this thesis is the adoption of stochastic real-time heuristic search for a resource allocation. To this end, we have developed an approach based on distributed Q-values with tight bounds to diminish drastically the planning time to formulate the optimal policy. These tight bounds enable to prune the action space for the agents. We show analytically and empirically that our proposed approaches lead to drastic (in many cases, exponential) improvements in computational efficiency over standard planning methods. Finally, we have tested real-time heuristic search in the SADM simulator, a simulator for the resource allocation of a platform

CorpusUL

A Practical Guide to Multi-Objective Reinforcement Learning and Planning

Author: Bargiacchi Eugenio
Dazeley Richard
Hayes Conor F.
Heintz Fredrik
Howley Enda
Irissappane Athirai A.
Källström Johan
Macfarlane Matthew
Mannion Patrick
Nowé Ann
Ramos Gabriel
Restelli Marcello
Reymond Mathieu
Roijers Diederik M.
Rădulescu Roxana
Vamplew Peter
Verstraeten Timothy
Zintgraf Luisa M.
Publication venue
Publication date: 17/03/2021
Field of study

Real-world decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying problem and hence produce suboptimal results. This paper serves as a guide to the application of multi-objective methods to difficult problems, and is aimed at researchers who are already familiar with single-objective reinforcement learning and planning methods who wish to adopt a multi-objective perspective on their research, as well as practitioners who encounter multi-objective decision problems in practice. It identifies the factors that may influence the nature of the desired solution, and illustrates by example how these influence the design of multi-objective decision-making systems for complex problems

arXiv.org e-Print Archive

Publikationer från Linköpings universitet

Deakin Research Online

Federation ResearchOnline

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Making and Keeping Probabilistic Commitments for Trustworthy Multiagent Coordination

Author: Zhang Qi
Publication venue
Publication date: 01/01/2020
Field of study

In a large number of real world domains, such as the control of autonomous vehicles, team sports, medical diagnosis and treatment, and many others, multiple autonomous agents need to take actions based on local observations, and are interdependent in the sense that they rely on each other to accomplish tasks. Thus, achieving desired outcomes in these domains requires interagent coordination. The form of coordination this thesis focuses on is commitments, where an agent, referred to as the commitment provider, specifies guarantees about its behavior to another, referred to as the commitment recipient, so that the recipient can plan and execute accordingly without taking into account the details of the provider's behavior. This thesis grounds the concept of commitments into decision-theoretic settings where the provider's guarantees might have to be probabilistic when its actions have stochastic outcomes and it expects to reduce its uncertainty about the environment during execution. More concretely, this thesis presents a set of contributions that address three core issues for commitment-based coordination: probabilistic commitment adherence, interpretation, and formulation. The first contribution is a principled semantics for the provider to exercise maximal autonomy that responds to evolving knowledge about the environment without violating its probabilistic commitment, along with a family of algorithms for the provider to construct policies that provably respect the semantics and make explicit tradeoffs between computation cost and plan quality. The second contribution consists of theoretical analyses and empirical studies that improve our understanding of the recipient's interpretation of the partial information specified in a probabilistic commitment; the thesis shows that it is inherently easier for the recipient to robustly model a probabilistic commitment where the provider promises to enable preconditions that the recipient requires than where the provider instead promises to avoid changing already-enabled preconditions. The third contribution focuses on the problem of formulating probabilistic commitments for the fully cooperative provider and recipient; the thesis proves structural properties of the agents' values as functions of the parameters of the commitment specification that can be exploited to achieve orders of magnitude less computation for 1) formulating optimal commitments in a centralized manner, and 2) formulating (approximately) optimal queries that induce (approximately) optimal commitments for the decentralized setting in which information relevant to optimization is distributed among the agents.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/162948/1/qizhg_1.pd

Deep Blue Documents at the University of Michigan