Search CORE

10 research outputs found

Programmation dynamique à mémoire bornée avec distribution sur les croyances pour les Dec-POMDPs

Author: Charpillet François
Corona Gabriel
Publication venue: HAL CCSD
Publication date: 02/06/2009
Field of study

National audienceNous proposons une approche heuristique pour calculer une politique approchée d'un Dec-POMDP. Il s'agit d'une approche par programmation dynamique à base de points dans la lignée des algorithmes PBDP \citep{szer2006a}, MBDP \citep{seuken2007a} et IMBDP \citep{seuken2007b} : Elle formule le choix des politiques retenues à chaque étape de la construction comme un problème d'optimisation. Le critère de ce problème repose sur une estimation de la distribution de probabilité {\em a priori} des croyances atteignables pour un horizon donné : Il s'agit de maximiser l'espérance des récompenses cumulées pour l'horizon considéré étant donné cette distribution. L'estimation de cette espérance peut se faire par échantillonnage des croyances en simulant une politique heuristique

INRIA a CCSD electronic archive server

HAL-Rennes 1

Dynamic Programming Approximations for Partially Observable Stochastic Games

Author: KUMAR Akshat
ZILBERSTEIN Shlomo
Publication venue: AAAI Press
Publication date: 01/01/2009
Field of study

Partially observable stochastic games (POSGs) provide a rich mathematical framework for planning under uncertainty by a group of agents. However, this modeling advantage comes with a price, namely a high computational cost. Solving POSGs optimally quickly becomes intractable after a few decision cycles. Our main contribution is to provide bounded approximation techniques, which enable us to scale POSG algorithms by several orders of magnitude. We study both the POSG model and its cooperative counterpart, DEC-POMDP. Experiments on a number of problems confirm the scalability of our approach while still providing useful policies

CiteSeerX

Institutional Knowledge at Singapore Management University

Point-Based Backup for Decentralized POMPDs: Complexity and New Algorithms

Author: KUMAR Akshat
ZILBERSTEIN Shlomo
Publication venue: IFAAMAS
Publication date: 01/05/2010
Field of study

Institutional Knowledge at Singapore Management University

Can bounded and self-interested agents be teammates? Application to planning in ad hoc teams

Author: A Brandenburger
B Goodwine
C Boutilier
C Camerer
C Guestrin
CF Camerer
D Koller
DS Bernstein
DV Pynadath
E Kalai
GW Brown
I Gilboa
J Mertens
JA Tatman
JC Harsanyi
K Binmore
L Panait
M Bowling
Muthukumaran Chandrasekaran
P Doshi
P Doshi
P Gmytrasiewicz
Prashant Doshi
R Nair
R Wageman
RJ Aumann
S Seuken
Y Zeng
Yifeng Zeng
Yingke Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 23/11/2016
Field of study

Planning for ad hoc teamwork is challenging because it involves agents collaborating without any prior coordination or communication. The focus is on principled methods for a single agent to cooperate with others. This motivates investigating the ad hoc teamwork problem in the context of self-interested decision-making frameworks. Agents engaged in individual decision making in multiagent settings face the task of having to reason about other agents’ actions, which may in turn involve reasoning about others. An established approximation that operationalizes this approach is to bound the infinite nesting from below by introducing level 0 models. For the purposes of this study, individual, self-interested decision making in multiagent settings is modeled using interactive dynamic influence diagrams (I-DID). These are graphical models with the benefit that they naturally offer a factored representation of the problem, allowing agents to ascribe dynamic models to others and reason about them. We demonstrate that an implication of bounded, finitely-nested reasoning by a self-interested agent is that we may not obtain optimal team solutions in cooperative settings, if it is part of a team. We address this limitation by including models at level 0 whose solutions involve reinforcement learning. We show how the learning is integrated into planning in the context of I-DIDs. This facilitates optimal teammate behavior, and we demonstrate its applicability to ad hoc teamwork on several problem domains and configurations

Northumbria University Research Portal

Crossref

Teeside University's Research Repository

Optimally Solving Dec-POMDPs as Continuous-State MDPs

Author: Amato Christopher
Buffet Olivier
Charpillet François
Dibangoye Jilles Steeve
Publication venue: 'AI Access Foundation'
Publication date: 22/02/2016
Field of study

International audienceDecentralized partially observable Markov decision processes (Dec-POMDPs) provide a general model for decision-making under uncertainty in decentralized settings, but are difficult to solve optimally (NEXP-Complete). As a new way of solving these problems, we introduce the idea of transforming a Dec-POMDP into a continuous-state deterministic MDP with a piecewise-linear and convex value function. This approach makes use of the fact that planning can be accomplished in a centralized offline manner, while execution can still be decentralized. This new Dec-POMDP formulation , which we call an occupancy MDP, allows powerful POMDP and continuous-state MDP methods to be used for the first time. To provide scalability, we refine this approach by combining heuristic search and compact representations that exploit the structure present in multi-agent domains, without losing the ability to converge to an optimal solution. In particular, we introduce a feature-based heuristic search value iteration (FB-HSVI) algorithm that relies on feature-based compact representations, point-based updates and efficient action selection. A theoretical analysis demonstrates that FB-HSVI terminates in finite time with an optimal solution. We include an extensive empirical analysis using well-known benchmarks, thereby demonstrating that our approach provides significant scalability improvements compared to the state of the art

INRIA a CCSD electronic archive server

HAL-Rennes 1

Communication Efficiency in Information Gathering through Dynamic Information Flow

Author: Kassir Abdallah
Publication venue: Faculty of Engineering and Information Technologies, School of Aerospace, Mechanical and Mechatronic Engineering
Publication date: 01/01/2014
Field of study

This thesis addresses the problem of how to improve the performance of multi-robot information gathering tasks by actively controlling the rate of communication between robots. Examples of such tasks include cooperative tracking and cooperative environmental monitoring. Communication is essential in such systems for both decentralised data fusion and decision making, but wireless networks impose capacity constraints that are frequently overlooked. While existing research has focussed on improving available communication throughput, the aim in this thesis is to develop algorithms that make more efficient use of the available communication capacity. Since information may be shared at various levels of abstraction, another challenge is the decision of where information should be processed based on limits of the computational resources available. Therefore, the flow of information needs to be controlled based on the trade-off between communication limits, computation limits and information value. In this thesis, we approach the trade-off by introducing the dynamic information flow (DIF) problem. We suggest variants of DIF that either consider data fusion communication independently or both data fusion and decision making communication simultaneously. For the data fusion case, we propose efficient decentralised solutions that dynamically adjust the flow of information. For the decision making case, we present an algorithm for communication efficiency based on local LQ approximations of information gathering problems. The algorithm is then integrated with our solution for the data fusion case to produce a complete communication efficiency solution for information gathering. We analyse our suggested algorithms and present important performance guarantees. The algorithms are validated in a custom-designed decentralised simulation framework and through field-robotic experimental demonstrations

Sydney eScholarship

Self Organized Multi Agent Swarms (SOMAS) for Network Security Control

Author: Holloway Eric M.
Publication venue: AFIT Scholar
Publication date: 16/03/2019
Field of study

Computer network security is a very serious concern in many commercial, industrial, and military environments. This paper proposes a new computer network security approach defined by self-organized agent swarms (SOMAS) which provides a novel computer network security management framework based upon desired overall system behaviors. The SOMAS structure evolves based upon the partially observable Markov decision process (POMDP) formal model and the more complex Interactive-POMDP and Decentralized-POMDP models, which are augmented with a new F(*-POMDP) model. Example swarm specific and network based behaviors are formalized and simulated. This paper illustrates through various statistical testing techniques, the significance of this proposed SOMAS architecture, and the effectiveness of self-organization and entangled hierarchies

AFTI Scholar (Air Force Institute of Technology)

Modeling Supervisory Control in Multi Robot Applications

Author: Mohammadi Raeissi Masoume
Publication venue
Publication date: 01/01/2018
Field of study

We consider multi robot applications, where a human operator monitors and supervise the team to pursue complex objectives in complex environments. Robots, specially at field sites, are often subject to unexpected events that can not be managed without the intervention of the operator(s). For example, in an environmental monitoring application, robots might face extreme environmental events (e.g. water currents) or moving obstacles (e.g. animal approaching the robots). In such scenarios, the operator often needs to interrupt the activities of individual team members to deal with particular situations. This work focuses on human-multi-robot-interaction in these casts. A widely used approach to monitor and supervise robotic teams are team plans, which allow an operator to interact via high level objectives and use automation to work out the details. The first problem we address in this context, is how human interrupts (i.e. change of action due to unexpected events) can be handled within a robotic team. Typically, after such interrupts, the operator would need to restart the team plan to ensure its success. This causes delays and imposes extra load on the operator. We address this problem by presenting an approach to encoding how interrupts can be smoothly handled within a team plan. Building on a team plan formalism that uses Colored Petri Nets, we describe a mechanism that allows a range of interrupts to be handled smoothly, allowing the team to effectively continue with its task after the operator intervention. We validate the approach with an application of robotic water monitoring. Our experiments show that the use of our interrupt mechanism decreases the time to complete the plan (up to 48% reduction) and decreases the operator load (up to 80% reduction in number of user actions). Moreover, we performed experiments with real robotic platforms to validate the applicability of our mechanism in the actual deployment of robotic watercraft. The second problem we address is how to handle intervention requests from robots to the operator. In this case, we consider autonomous robotic platforms that are able to identify their situation and ask for the intervention of the operator by sending a request. However, large teams can easily overwhelm the operator with several requests, hence hindering the team performance. As a consequence, team members will have to wait for the operator attention, and the operator becomes a bottleneck for the system. Our contribution in this context is to make the robots learn cooperative strategies to best utilize the operator's time and decrease the idle time of the robotic system. In particular, we consider a queuing model (a.k.a balking queue), where robots decide whether or not to join the queue. Such decisions are computed by considering dynamic features of the system (e.g. the severity of the request, number of requests, etc.). We examine several decision making solutions for computing these cooperative strategies, where our goal is to find a trade-off between lower idle time by joining the queue and fewer failures due to the risk of not joining the queue. We validate the proposed approaches in a simulation robotic water monitoring application. The obtained results show the effectiveness of our proposed models in comparison to the queue without balking, when considering team reward and total idle time

Catalogo dei prodotti della ricerca

Recommended from our members

Decision-Theoretic Meta-reasoning in Partially Observable and Decentralized Settings

Author: Carlin Alan Scott
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2012
Field of study

This thesis examines decentralized meta-reasoning. For a single agent or multiple agents, it may not be enough for agents to compute correct decisions if they do not do so in a timely or resource efficient fashion. The utility of agent decisions typically increases with decision quality, but decreases with computation time. The reasoning about one\u27s computation process is referred to as meta-reasoning. Aspects of meta-reasoning considered in this thesis include the reasoning about how to allocate computational resources, including when to stop one type of computation and begin another, and when to stop all computation and report an answer. Given a computational model, this translates into computing how to schedule the basic computations that solve a problem. This thesis constructs meta-reasoning strategies for the purposes of monitoring and control in multi-agent settings, specifically settings that can be modeled by the Decentralized Partially Observable Markov Decision Process (Dec-POMDP). It uses decision theory to optimize computation for efficiency in time and space in communicative and non-communicative decentralized settings. Whereas base-level reasoning describes the optimization of actual agent behaviors, the meta-reasoning strategies produced by this thesis dynamically optimize the computational resources which lead to the selection of base-level behaviors

ScholarWorks@UMass Amherst