5,639 research outputs found

    Influence-Optimistic Local Values for Multiagent Planning --- Extended Version

    Get PDF
    Recent years have seen the development of methods for multiagent planning under uncertainty that scale to tens or even hundreds of agents. However, most of these methods either make restrictive assumptions on the problem domain, or provide approximate solutions without any guarantees on quality. Methods in the former category typically build on heuristic search using upper bounds on the value function. Unfortunately, no techniques exist to compute such upper bounds for problems with non-factored value functions. To allow for meaningful benchmarking through measurable quality guarantees on a very general class of problems, this paper introduces a family of influence-optimistic upper bounds for factored decentralized partially observable Markov decision processes (Dec-POMDPs) that do not have factored value functions. Intuitively, we derive bounds on very large multiagent planning problems by subdividing them in sub-problems, and at each of these sub-problems making optimistic assumptions with respect to the influence that will be exerted by the rest of the system. We numerically compare the different upper bounds and demonstrate how we can achieve a non-trivial guarantee that a heuristic solution for problems with hundreds of agents is close to optimal. Furthermore, we provide evidence that the upper bounds may improve the effectiveness of heuristic influence search, and discuss further potential applications to multiagent planning.Comment: Long version of IJCAI 2015 paper (and extended abstract at AAMAS 2015

    How are practices made to vary? Managing practice adaptation in a multinational corporation

    Get PDF
    Research has shown that management practices are adapted and ā€˜made to fitā€™ the specific context into which they are adopted. Less attention has been paid to how organizations anticipate and purposefully influence the adaptation process. How do organizations manage the tension between allowing local adaptation of a management practice and retaining control over the practice? By studying the adaptation of a specialized quality management practice ā€“ ACE (Achieving Competitive Excellence) ā€“ in a multinational corporation in the aerospace industry, we examine how the organization manages the adaptation process at the corporate and subsidiary levels. We identified three strategies through which an organization balances the tension between standardization and variation ā€“ preserving the ā€˜coreā€™ practice while allowing local adaptation at the subsidiary level: creating and certifying progressive achievement levels; setting discretionary and mandatory adaptation parameters; and differentially adapting to context-specific and systemic misfits. While previous studies have shown how and why practices vary as they diffuse, we show how practices may diffuse because they are engineered to vary for allowing a better fit with diverse contextual specificities

    An Implementation Research Approach to Evaluating Health Insurance Programs: Insights from India

    Full text link

    Influence-Optimistic Local Values for Multiagent Planning

    Get PDF
    Over the last decade, methods for multiagent planning under uncertainty have increased in scalability. However, many methods assume value factorization or are not able to provide quality guarantees. We propose a novel family of influence-optimistic upper bounds on the optimal value for problems with lOOs of agents that do not exhibit value fec-torization

    Bad Habits: Policy Confounding and Out-of-Trajectory Generalization in RL

    Full text link
    Reinforcement learning agents may sometimes develop habits that are effective only when specific policies are followed. After an initial exploration phase in which agents try out different actions, they eventually converge toward a particular policy. When this occurs, the distribution of state-action trajectories becomes narrower, and agents start experiencing the same transitions again and again. At this point, spurious correlations may arise. Agents may then pick up on these correlations and learn state representations that do not generalize beyond the agent's trajectory distribution. In this paper, we provide a mathematical characterization of this phenomenon, which we refer to as policy confounding, and show, through a series of examples, when and how it occurs in practice

    Optimal and Approximate Q-value Functions for Decentralized POMDPs

    Get PDF
    Decision-theoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In single-agent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Q-value functions: an optimal Q-value function Q* is computed in a recursive manner by dynamic programming, and then an optimal policy is extracted from Q*. In this paper we study whether similar Q-value functions can be defined for decentralized POMDP models (Dec-POMDPs), and how policies can be extracted from such value functions. We define two forms of the optimal Q-value function for Dec-POMDPs: one that gives a normative description as the Q-value function of an optimal pure joint policy and another one that is sequentially rational and thus gives a recipe for computation. This computation, however, is infeasible for all but the smallest problems. Therefore, we analyze various approximate Q-value functions that allow for efficient computation. We describe how they relate, and we prove that they all provide an upper bound to the optimal Q-value function Q*. Finally, unifying some previous approaches for solving Dec-POMDPs, we describe a family of algorithms for extracting policies from such Q-value functions, and perform an experimental evaluation on existing test problems, including a new firefighting benchmark problem
    • ā€¦
    corecore