10 research outputs found

    Solving Multi-agent MDPs Optimally with Conditional Return Graphs

    Get PDF
    In cooperative multi-agent sequential decision making under uncertainty, agents must coordinate in order find an optimal joint policy that maximises joint value. Typical solution al- gorithms exploit additive structure in the value function, but in the fully-observable multi-agent MDP setting (MMDP) such structure is not present. We propose a new optimal solver for so-called TI-MMDPs, where agents can only af- fect their local state, while their value may depend on the state of others. We decompose the returns into local returns per agent that we represent compactly in a conditional re- turn graph (CRG). Using CRGs the value of a joint policy as well as bounds on the value of partially specified joint policies can be efficiently computed. We propose CoRe, a novel branch-and-bound policy search algorithm building on CRGs. CoRe typically requires less runtime than the avail- able alternatives and is able to find solutions to problems previously considered unsolvable

    Solving Transition-Independent Multi-agent MDPs with Sparse Interactions (Extended version)

    Get PDF
    In cooperative multi-agent sequential decision making under uncertainty, agents must coordinate to find an optimal joint policy that maximises joint value. Typical algorithms exploit additive structure in the value function, but in the fully-observable multi-agent MDP setting (MMDP) such structure is not present. We propose a new optimal solver for transition-independent MMDPs, in which agents can only affect their own state but their reward depends on joint transitions. We represent these dependencies compactly in conditional return graphs (CRGs). Using CRGs the value of a joint policy and the bounds on partially specified joint policies can be efficiently computed. We propose CoRe, a novel branch-and-bound policy search algorithm building on CRGs. CoRe typically requires less runtime than the available alternatives and finds solutions to problems previously unsolvable

    Bounded Approximations for Linear Multi-Objective Planning Under Uncertainty

    Get PDF
    Planning under uncertainty poses a complex problem in which multiple objectives often need to be balanced. When dealing with multiple objectives, it is often assumed that the relative importance of the objectives is known a priori. How-ever, in practice human decision makers often find it hard to specify such preferences, and would prefer a decision sup-port system that presents a range of possible alternatives. We propose two algorithms for computing these alternatives for the case of linearly weighted objectives. First, we pro-pose an anytime method, approximate optimistic linear sup-port (AOLS), that incrementally builds up a complete set of -optimal plans, exploiting the piecewise-linear and convex shape of the value function. Second, we propose an approx-imate anytime method, scalarised sample incremental im-provement (SSII), that employs weight sampling to focus on the most interesting regions in weight space, as suggested by a prior over preferences. We show empirically that our meth-ods are able to produce (near-)optimal alternative sets orders of magnitude faster than existing techniques.

    Bounded approximations for linear multi-objective planning under uncertainty.

    Get PDF
    Abstract Planning under uncertainty poses a complex problem in which multiple objectives often need to be balanced. When dealing with multiple objectives, it is often assumed that the relative importance of the objectives is known a priori. However, in practice human decision makers often find it hard to specify such preferences exactly, and would prefer a decision support system that presents a range of possible alternatives. We propose two algorithms for computing these alternatives for the case of linearly weighted objectives. First, we propose an anytime method, approximate optimistic linear support (AOLS), that incrementally builds up a complete set of -optimal plans, exploiting the piecewise-linear and convex shape of the value function. Second, we propose an approximate anytime method, scalarised sample incremental improvement (SSII), that employs weight sampling to focus on the most interesting regions in weight space, as suggested by a prior over preferences. We show empirically that our methods are able to produce (near-)optimal alternative sets orders of magnitude faster than existing techniques, thereby demonstrating that our methods provide sensible approximations in stochastic multi-objective domains

    Solving Transition-Independent Multi-agent MDPs with Sparse Interactions

    Get PDF
    In cooperative multi-agent sequential decision making under uncertainty, agents must coordinate to find an optimal joint policy that maximises joint value. Typical algorithms exploit additive structure in the value function, but in the fully-observable multi-agent MDP (MMDP) setting such structure is not present. We propose a new optimal solver for transition-independent MMDPs, in which agents can only affect their own state but their reward depends on joint transitions. We represent these de- pendencies compactly in conditional return graphs (CRGs). Using CRGs the value of a joint policy and the bounds on partially specified joint policies can be efficiently computed. We propose CoRe, a novel branch-and-bound policy search algorithm building on CRGs. CoRe typically requires less runtime than the available alternatives and finds solutions to previously unsolvable problems.Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Algorithmic
    corecore