10 research outputs found

    Programmation dynamique à mémoire bornée avec distribution sur les croyances pour les Dec-POMDPs

    Get PDF
    National audienceNous proposons une approche heuristique pour calculer une politique approchée d'un Dec-POMDP. Il s'agit d'une approche par programmation dynamique à base de points dans la lignée des algorithmes PBDP \citep{szer2006a}, MBDP \citep{seuken2007a} et IMBDP \citep{seuken2007b} : Elle formule le choix des politiques retenues à chaque étape de la construction comme un problème d'optimisation. Le critère de ce problème repose sur une estimation de la distribution de probabilité {\em a priori} des croyances atteignables pour un horizon donné : Il s'agit de maximiser l'espérance des récompenses cumulées pour l'horizon considéré étant donné cette distribution. L'estimation de cette espérance peut se faire par échantillonnage des croyances en simulant une politique heuristique

    Dynamic Programming Approximations for Partially Observable Stochastic Games

    Get PDF
    Partially observable stochastic games (POSGs) provide a rich mathematical framework for planning under uncertainty by a group of agents. However, this modeling advantage comes with a price, namely a high computational cost. Solving POSGs optimally quickly becomes intractable after a few decision cycles. Our main contribution is to provide bounded approximation techniques, which enable us to scale POSG algorithms by several orders of magnitude. We study both the POSG model and its cooperative counterpart, DEC-POMDP. Experiments on a number of problems confirm the scalability of our approach while still providing useful policies

    Can bounded and self-interested agents be teammates? Application to planning in ad hoc teams

    Get PDF
    Planning for ad hoc teamwork is challenging because it involves agents collaborating without any prior coordination or communication. The focus is on principled methods for a single agent to cooperate with others. This motivates investigating the ad hoc teamwork problem in the context of self-interested decision-making frameworks. Agents engaged in individual decision making in multiagent settings face the task of having to reason about other agents’ actions, which may in turn involve reasoning about others. An established approximation that operationalizes this approach is to bound the infinite nesting from below by introducing level 0 models. For the purposes of this study, individual, self-interested decision making in multiagent settings is modeled using interactive dynamic influence diagrams (I-DID). These are graphical models with the benefit that they naturally offer a factored representation of the problem, allowing agents to ascribe dynamic models to others and reason about them. We demonstrate that an implication of bounded, finitely-nested reasoning by a self-interested agent is that we may not obtain optimal team solutions in cooperative settings, if it is part of a team. We address this limitation by including models at level 0 whose solutions involve reinforcement learning. We show how the learning is integrated into planning in the context of I-DIDs. This facilitates optimal teammate behavior, and we demonstrate its applicability to ad hoc teamwork on several problem domains and configurations

    Optimally Solving Dec-POMDPs as Continuous-State MDPs

    Get PDF
    International audienceDecentralized partially observable Markov decision processes (Dec-POMDPs) provide a general model for decision-making under uncertainty in decentralized settings, but are difficult to solve optimally (NEXP-Complete). As a new way of solving these problems, we introduce the idea of transforming a Dec-POMDP into a continuous-state deterministic MDP with a piecewise-linear and convex value function. This approach makes use of the fact that planning can be accomplished in a centralized offline manner, while execution can still be decentralized. This new Dec-POMDP formulation , which we call an occupancy MDP, allows powerful POMDP and continuous-state MDP methods to be used for the first time. To provide scalability, we refine this approach by combining heuristic search and compact representations that exploit the structure present in multi-agent domains, without losing the ability to converge to an optimal solution. In particular, we introduce a feature-based heuristic search value iteration (FB-HSVI) algorithm that relies on feature-based compact representations, point-based updates and efficient action selection. A theoretical analysis demonstrates that FB-HSVI terminates in finite time with an optimal solution. We include an extensive empirical analysis using well-known benchmarks, thereby demonstrating that our approach provides significant scalability improvements compared to the state of the art

    Communication Efficiency in Information Gathering through Dynamic Information Flow

    Get PDF
    This thesis addresses the problem of how to improve the performance of multi-robot information gathering tasks by actively controlling the rate of communication between robots. Examples of such tasks include cooperative tracking and cooperative environmental monitoring. Communication is essential in such systems for both decentralised data fusion and decision making, but wireless networks impose capacity constraints that are frequently overlooked. While existing research has focussed on improving available communication throughput, the aim in this thesis is to develop algorithms that make more efficient use of the available communication capacity. Since information may be shared at various levels of abstraction, another challenge is the decision of where information should be processed based on limits of the computational resources available. Therefore, the flow of information needs to be controlled based on the trade-off between communication limits, computation limits and information value. In this thesis, we approach the trade-off by introducing the dynamic information flow (DIF) problem. We suggest variants of DIF that either consider data fusion communication independently or both data fusion and decision making communication simultaneously. For the data fusion case, we propose efficient decentralised solutions that dynamically adjust the flow of information. For the decision making case, we present an algorithm for communication efficiency based on local LQ approximations of information gathering problems. The algorithm is then integrated with our solution for the data fusion case to produce a complete communication efficiency solution for information gathering. We analyse our suggested algorithms and present important performance guarantees. The algorithms are validated in a custom-designed decentralised simulation framework and through field-robotic experimental demonstrations

    Self Organized Multi Agent Swarms (SOMAS) for Network Security Control

    Get PDF
    Computer network security is a very serious concern in many commercial, industrial, and military environments. This paper proposes a new computer network security approach defined by self-organized agent swarms (SOMAS) which provides a novel computer network security management framework based upon desired overall system behaviors. The SOMAS structure evolves based upon the partially observable Markov decision process (POMDP) formal model and the more complex Interactive-POMDP and Decentralized-POMDP models, which are augmented with a new F(*-POMDP) model. Example swarm specific and network based behaviors are formalized and simulated. This paper illustrates through various statistical testing techniques, the significance of this proposed SOMAS architecture, and the effectiveness of self-organization and entangled hierarchies

    Modeling Supervisory Control in Multi Robot Applications

    Get PDF
    We consider multi robot applications, where a human operator monitors and supervise the team to pursue complex objectives in complex environments. Robots, specially at field sites, are often subject to unexpected events that can not be managed without the intervention of the operator(s). For example, in an environmental monitoring application, robots might face extreme environmental events (e.g. water currents) or moving obstacles (e.g. animal approaching the robots). In such scenarios, the operator often needs to interrupt the activities of individual team members to deal with particular situations. This work focuses on human-multi-robot-interaction in these casts. A widely used approach to monitor and supervise robotic teams are team plans, which allow an operator to interact via high level objectives and use automation to work out the details. The first problem we address in this context, is how human interrupts (i.e. change of action due to unexpected events) can be handled within a robotic team. Typically, after such interrupts, the operator would need to restart the team plan to ensure its success. This causes delays and imposes extra load on the operator. We address this problem by presenting an approach to encoding how interrupts can be smoothly handled within a team plan. Building on a team plan formalism that uses Colored Petri Nets, we describe a mechanism that allows a range of interrupts to be handled smoothly, allowing the team to effectively continue with its task after the operator intervention. We validate the approach with an application of robotic water monitoring. Our experiments show that the use of our interrupt mechanism decreases the time to complete the plan (up to 48% reduction) and decreases the operator load (up to 80% reduction in number of user actions). Moreover, we performed experiments with real robotic platforms to validate the applicability of our mechanism in the actual deployment of robotic watercraft. The second problem we address is how to handle intervention requests from robots to the operator. In this case, we consider autonomous robotic platforms that are able to identify their situation and ask for the intervention of the operator by sending a request. However, large teams can easily overwhelm the operator with several requests, hence hindering the team performance. As a consequence, team members will have to wait for the operator attention, and the operator becomes a bottleneck for the system. Our contribution in this context is to make the robots learn cooperative strategies to best utilize the operator's time and decrease the idle time of the robotic system. In particular, we consider a queuing model (a.k.a balking queue), where robots decide whether or not to join the queue. Such decisions are computed by considering dynamic features of the system (e.g. the severity of the request, number of requests, etc.). We examine several decision making solutions for computing these cooperative strategies, where our goal is to find a trade-off between lower idle time by joining the queue and fewer failures due to the risk of not joining the queue. We validate the proposed approaches in a simulation robotic water monitoring application. The obtained results show the effectiveness of our proposed models in comparison to the queue without balking, when considering team reward and total idle time
    corecore