Search CORE

17 research outputs found

Stochastic Shortest Path with Energy Constraints in POMDPs

Author: Brázdil Tomáš
Chatterjee Krishnendu
Chmelík Martin
Gupta Anchit
Novotný Petr
Publication venue
Publication date: 01/01/2016
Field of study

We consider partially observable Markov decision processes (POMDPs) with a set of target states and positive integer costs associated with every transition. The traditional optimization objective (stochastic shortest path) asks to minimize the expected total cost until the target set is reached. We extend the traditional framework of POMDPs to model energy consumption, which represents a hard constraint. The energy levels may increase and decrease with transitions, and the hard constraint requires that the energy level must remain positive in all steps till the target is reached. First, we present a novel algorithm for solving POMDPs with energy levels, developing on existing POMDP solvers and using RTDP as its main method. Our second contribution is related to policy representation. For larger POMDP instances the policies computed by existing solvers are too large to be understandable. We present an automated procedure based on machine learning techniques that automatically extracts important decisions of the policy allowing us to compute succinct human readable policies. Finally, we show experimentally that our algorithm performs well and computes succinct policies on a number of POMDP instances from the literature that were naturally enhanced with energy levels.Comment: Technical report accompanying a paper published in proceedings of AAMAS 201

arXiv.org e-Print Archive

IST Austria: PubRep (Institute of Science and Technology)

Decentralized Task and Path Planning for Multi-Robot Systems

Author: Ames Aaron D.
Chen Yuxiao
Rosolia Ugo
Publication venue
Publication date: 19/11/2020
Field of study

We consider a multi-robot system with a team of collaborative robots and multiple tasks that emerges over time. We propose a fully decentralized task and path planning (DTPP) framework consisting of a task allocation module and a localized path planning module. Each task is modeled as a Markov Decision Process (MDP) or a Mixed Observed Markov Decision Process (MOMDP) depending on whether full states or partial states are observable. The task allocation module then aims at maximizing the expected pure reward (reward minus cost) of the robotic team. We fuse the Markov model into a factor graph formulation so that the task allocation can be decentrally solved using the max-sum algorithm. Each robot agent follows the optimal policy synthesized for the Markov model and we propose a localized forward dynamic programming scheme that resolves conflicts between agents and avoids collisions. The proposed framework is demonstrated with high fidelity ROS simulations and experiments with multiple ground robots

arXiv.org e-Print Archive

Caltech Authors

Solving Large Extensive-Form Games with Strategy Constraints

Author: Bowling Michael
Davis Trevor
Waugh Kevin
Publication venue
Publication date: 05/02/2019
Field of study

Extensive-form games are a common model for multiagent interactions with imperfect information. In two-player zero-sum games, the typical solution concept is a Nash equilibrium over the unconstrained strategy set for each player. In many situations, however, we would like to constrain the set of possible strategies. For example, constraints are a natural way to model limited resources, risk mitigation, safety, consistency with past observations of behavior, or other secondary objectives for an agent. In small games, optimal strategies under linear constraints can be found by solving a linear program; however, state-of-the-art algorithms for solving large games cannot handle general constraints. In this work we introduce a generalized form of Counterfactual Regret Minimization that provably finds optimal strategies under any feasible set of convex constraints. We demonstrate the effectiveness of our algorithm for finding strategies that mitigate risk in security games, and for opponent modeling in poker games when given only partial observations of private information.Comment: Appeared in AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

다차원 Parametric Min-cut을 응용한 섭동확률모델에서의 예측손실 최적화

Author: Adrian Kim
Publication venue: 서울대학교 대학원
Publication date: 01/08/2015
Field of study

학위논문 (석사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2015. 8. 정교민.We consider the problem of learning perturbation-based probabilistic models by computing and differentiating expected losses. This is a challenging computational problem that has traditionally been tackled using Monte Carlo-based methods. In this work, we show how a generalization of parametric min-cuts can be used to address the same problem, achieving high accuracy of faster than a sampling-based baseline. Utilizing our proposed Skeleton Method, we show that we can learn the perturbation model so as to directly minimize expected losses. Experimental results show that this approach offers promise as a new way of training structured prediction models under complex loss functions.Abstract i Chapter 1 Introduction 1 Chapter 2 Background: Perturbations, Expected Losses 4 Chapter 3 Algorithm: Skeleton Method 6 3.1 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.2 Finding a New Facet . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.3 Updating the Skeleton GY . . . . . . . . . . . . . . . . . . . . . . 10 3.4 Calculating Expected Loss R . . . . . . . . . . . . . . . . . . . . 11 3.5 Example: Two Parameters . . . . . . . . . . . . . . . . . . . . . . 11 Chapter 4 Learning 14 4.1 Computing Gradients: Slicing . . . . . . . . . . . . . . . . . . . . 14 4.2 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.3 Exploiting the Skeletond Method . . . . . . . . . . . . . . . . . . 17 Chapter 5 Experiments and Discussion 18 5.1 Data and Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 ii 5.2 Calculating Expected Losses . . . . . . . . . . . . . . . . . . . . 19 5.3 Calculating Gradients . . . . . . . . . . . . . . . . . . . . . . . . 20 5.4 Model Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5.4.1 Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5.4.2 Other Loss Functions . . . . . . . . . . . . . . . . . . . . 23 5.5 Expected Segmentations . . . . . . . . . . . . . . . . . . . . . . . 25 Chapter 6 Conclusion 27 Bibliography 29 초록 32Maste

SNU Open Repository and Archive