9 research outputs found

    Ordered Preference Elicitation Strategies for Supporting Multi-Objective Decision Making

    Full text link
    In multi-objective decision planning and learning, much attention is paid to producing optimal solution sets that contain an optimal policy for every possible user preference profile. We argue that the step that follows, i.e, determining which policy to execute by maximising the user's intrinsic utility function over this (possibly infinite) set, is under-studied. This paper aims to fill this gap. We build on previous work on Gaussian processes and pairwise comparisons for preference modelling, extend it to the multi-objective decision support scenario, and propose new ordered preference elicitation strategies based on ranking and clustering. Our main contribution is an in-depth evaluation of these strategies using computer and human-based experiments. We show that our proposed elicitation strategies outperform the currently used pairwise methods, and found that users prefer ranking most. Our experiments further show that utilising monotonicity information in GPs by using a linear prior mean at the start and virtual comparisons to the nadir and ideal points, increases performance. We demonstrate our decision support framework in a real-world study on traffic regulation, conducted with the city of Amsterdam.Comment: AAMAS 2018, Source code at https://github.com/lmzintgraf/gp_pref_elici

    Learning Inconsistent Preferences with Kernel Methods

    Full text link
    We propose a probabilistic kernel approach for preferential learning from pairwise duelling data using Gaussian Processes. Different from previous methods, we do not impose a total order on the item space, hence can capture more expressive latent preferential structures such as inconsistent preferences and clusters of comparable items. Furthermore, we prove the universality of the proposed kernels, i.e. that the corresponding reproducing kernel Hilbert Space (RKHS) is dense in the space of skew-symmetric preference functions. To conclude the paper, we provide an extensive set of numerical experiments on simulated and real-world datasets showcasing the competitiveness of our proposed method with state-of-the-art

    Opponent Learning Awareness and Modelling in Multi-Objective Normal Form Games

    Full text link
    Many real-world multi-agent interactions consider multiple distinct criteria, i.e. the payoffs are multi-objective in nature. However, the same multi-objective payoff vector may lead to different utilities for each participant. Therefore, it is essential for an agent to learn about the behaviour of other agents in the system. In this work, we present the first study of the effects of such opponent modelling on multi-objective multi-agent interactions with non-linear utilities. Specifically, we consider two-player multi-objective normal form games with non-linear utility functions under the scalarised expected returns optimisation criterion. We contribute novel actor-critic and policy gradient formulations to allow reinforcement learning of mixed strategies in this setting, along with extensions that incorporate opponent policy reconstruction and learning with opponent learning awareness (i.e., learning while considering the impact of one's policy when anticipating the opponent's learning step). Empirical results in five different MONFGs demonstrate that opponent learning awareness and modelling can drastically alter the learning dynamics in this setting. When equilibria are present, opponent modelling can confer significant benefits on agents that implement it. When there are no Nash equilibria, opponent learning awareness and modelling allows agents to still converge to meaningful solutions that approximate equilibria.Comment: Under review since 14 November 202

    Ordered preference elicitation strategies for supporting multi-objective decision making

    No full text

    Small Approximate Pareto Sets with Quality Bounds

    Get PDF
    We present and empirically characterize a general, parallel, heuristic algorithm for computing small ε-Pareto sets. The algorithm can be used as part of a decision support tool for settings in which computing points in objective space is computationally expensive. We use the graph clearing problem, a formalization of indirect organ exchange markets, as a prototypical example setting. We characterize the performance of the algorithm through ε-Pareto set size, ε value provided, and parallel speedup achieved. Our results show that the algorithm\u27s combination of parallel speedup and small ε-Pareto sets is sufficient to be appealing in settings requiring manual review (i.e., those that have a human in the loop) and real-time solutions

    A practical guide to multi-objective reinforcement learning and planning

    Get PDF
    Real-world sequential decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying problem and hence produce suboptimal results. This paper serves as a guide to the application of multi-objective methods to difficult problems, and is aimed at researchers who are already familiar with single-objective reinforcement learning and planning methods who wish to adopt a multi-objective perspective on their research, as well as practitioners who encounter multi-objective decision problems in practice. It identifies the factors that may influence the nature of the desired solution, and illustrates by example how these influence the design of multi-objective decision-making systems for complex problems. © 2022, The Author(s)
    corecore