9 research outputs found
Ordered Preference Elicitation Strategies for Supporting Multi-Objective Decision Making
In multi-objective decision planning and learning, much attention is paid to
producing optimal solution sets that contain an optimal policy for every
possible user preference profile. We argue that the step that follows, i.e,
determining which policy to execute by maximising the user's intrinsic utility
function over this (possibly infinite) set, is under-studied. This paper aims
to fill this gap. We build on previous work on Gaussian processes and pairwise
comparisons for preference modelling, extend it to the multi-objective decision
support scenario, and propose new ordered preference elicitation strategies
based on ranking and clustering. Our main contribution is an in-depth
evaluation of these strategies using computer and human-based experiments. We
show that our proposed elicitation strategies outperform the currently used
pairwise methods, and found that users prefer ranking most. Our experiments
further show that utilising monotonicity information in GPs by using a linear
prior mean at the start and virtual comparisons to the nadir and ideal points,
increases performance. We demonstrate our decision support framework in a
real-world study on traffic regulation, conducted with the city of Amsterdam.Comment: AAMAS 2018, Source code at
https://github.com/lmzintgraf/gp_pref_elici
Learning Inconsistent Preferences with Kernel Methods
We propose a probabilistic kernel approach for preferential learning from
pairwise duelling data using Gaussian Processes. Different from previous
methods, we do not impose a total order on the item space, hence can capture
more expressive latent preferential structures such as inconsistent preferences
and clusters of comparable items. Furthermore, we prove the universality of the
proposed kernels, i.e. that the corresponding reproducing kernel Hilbert Space
(RKHS) is dense in the space of skew-symmetric preference functions. To
conclude the paper, we provide an extensive set of numerical experiments on
simulated and real-world datasets showcasing the competitiveness of our
proposed method with state-of-the-art
Opponent Learning Awareness and Modelling in Multi-Objective Normal Form Games
Many real-world multi-agent interactions consider multiple distinct criteria,
i.e. the payoffs are multi-objective in nature. However, the same
multi-objective payoff vector may lead to different utilities for each
participant. Therefore, it is essential for an agent to learn about the
behaviour of other agents in the system. In this work, we present the first
study of the effects of such opponent modelling on multi-objective multi-agent
interactions with non-linear utilities. Specifically, we consider two-player
multi-objective normal form games with non-linear utility functions under the
scalarised expected returns optimisation criterion. We contribute novel
actor-critic and policy gradient formulations to allow reinforcement learning
of mixed strategies in this setting, along with extensions that incorporate
opponent policy reconstruction and learning with opponent learning awareness
(i.e., learning while considering the impact of one's policy when anticipating
the opponent's learning step). Empirical results in five different MONFGs
demonstrate that opponent learning awareness and modelling can drastically
alter the learning dynamics in this setting. When equilibria are present,
opponent modelling can confer significant benefits on agents that implement it.
When there are no Nash equilibria, opponent learning awareness and modelling
allows agents to still converge to meaningful solutions that approximate
equilibria.Comment: Under review since 14 November 202
Small Approximate Pareto Sets with Quality Bounds
We present and empirically characterize a general, parallel, heuristic algorithm for computing small ε-Pareto sets. The algorithm can be used as part of a decision support tool for settings in which computing points in objective space is computationally expensive. We use the graph clearing problem, a formalization of indirect organ exchange markets, as a prototypical example setting. We characterize the performance of the algorithm through ε-Pareto set size, ε value provided, and parallel speedup achieved. Our results show that the algorithm\u27s combination of parallel speedup and small ε-Pareto sets is sufficient to be appealing in settings requiring manual review (i.e., those that have a human in the loop) and real-time solutions
A practical guide to multi-objective reinforcement learning and planning
Real-world sequential decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying problem and hence produce suboptimal results. This paper serves as a guide to the application of multi-objective methods to difficult problems, and is aimed at researchers who are already familiar with single-objective reinforcement learning and planning methods who wish to adopt a multi-objective perspective on their research, as well as practitioners who encounter multi-objective decision problems in practice. It identifies the factors that may influence the nature of the desired solution, and illustrates by example how these influence the design of multi-objective decision-making systems for complex problems. © 2022, The Author(s)