17 research outputs found

    Stochastic Shortest Path with Energy Constraints in POMDPs

    Full text link
    We consider partially observable Markov decision processes (POMDPs) with a set of target states and positive integer costs associated with every transition. The traditional optimization objective (stochastic shortest path) asks to minimize the expected total cost until the target set is reached. We extend the traditional framework of POMDPs to model energy consumption, which represents a hard constraint. The energy levels may increase and decrease with transitions, and the hard constraint requires that the energy level must remain positive in all steps till the target is reached. First, we present a novel algorithm for solving POMDPs with energy levels, developing on existing POMDP solvers and using RTDP as its main method. Our second contribution is related to policy representation. For larger POMDP instances the policies computed by existing solvers are too large to be understandable. We present an automated procedure based on machine learning techniques that automatically extracts important decisions of the policy allowing us to compute succinct human readable policies. Finally, we show experimentally that our algorithm performs well and computes succinct policies on a number of POMDP instances from the literature that were naturally enhanced with energy levels.Comment: Technical report accompanying a paper published in proceedings of AAMAS 201

    Decentralized Task and Path Planning for Multi-Robot Systems

    Get PDF
    We consider a multi-robot system with a team of collaborative robots and multiple tasks that emerges over time. We propose a fully decentralized task and path planning (DTPP) framework consisting of a task allocation module and a localized path planning module. Each task is modeled as a Markov Decision Process (MDP) or a Mixed Observed Markov Decision Process (MOMDP) depending on whether full states or partial states are observable. The task allocation module then aims at maximizing the expected pure reward (reward minus cost) of the robotic team. We fuse the Markov model into a factor graph formulation so that the task allocation can be decentrally solved using the max-sum algorithm. Each robot agent follows the optimal policy synthesized for the Markov model and we propose a localized forward dynamic programming scheme that resolves conflicts between agents and avoids collisions. The proposed framework is demonstrated with high fidelity ROS simulations and experiments with multiple ground robots

    Solving Large Extensive-Form Games with Strategy Constraints

    Full text link
    Extensive-form games are a common model for multiagent interactions with imperfect information. In two-player zero-sum games, the typical solution concept is a Nash equilibrium over the unconstrained strategy set for each player. In many situations, however, we would like to constrain the set of possible strategies. For example, constraints are a natural way to model limited resources, risk mitigation, safety, consistency with past observations of behavior, or other secondary objectives for an agent. In small games, optimal strategies under linear constraints can be found by solving a linear program; however, state-of-the-art algorithms for solving large games cannot handle general constraints. In this work we introduce a generalized form of Counterfactual Regret Minimization that provably finds optimal strategies under any feasible set of convex constraints. We demonstrate the effectiveness of our algorithm for finding strategies that mitigate risk in security games, and for opponent modeling in poker games when given only partial observations of private information.Comment: Appeared in AAAI 201

    다차원 Parametric Min-cut을 응용한 섭동확률모델에서의 예측손실 최적화

    Get PDF
    학위논문 (석사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2015. 8. 정교민.We consider the problem of learning perturbation-based probabilistic models by computing and differentiating expected losses. This is a challenging computational problem that has traditionally been tackled using Monte Carlo-based methods. In this work, we show how a generalization of parametric min-cuts can be used to address the same problem, achieving high accuracy of faster than a sampling-based baseline. Utilizing our proposed Skeleton Method, we show that we can learn the perturbation model so as to directly minimize expected losses. Experimental results show that this approach offers promise as a new way of training structured prediction models under complex loss functions.Abstract i Chapter 1 Introduction 1 Chapter 2 Background: Perturbations, Expected Losses 4 Chapter 3 Algorithm: Skeleton Method 6 3.1 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.2 Finding a New Facet . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.3 Updating the Skeleton GY . . . . . . . . . . . . . . . . . . . . . . 10 3.4 Calculating Expected Loss R . . . . . . . . . . . . . . . . . . . . 11 3.5 Example: Two Parameters . . . . . . . . . . . . . . . . . . . . . . 11 Chapter 4 Learning 14 4.1 Computing Gradients: Slicing . . . . . . . . . . . . . . . . . . . . 14 4.2 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.3 Exploiting the Skeletond Method . . . . . . . . . . . . . . . . . . 17 Chapter 5 Experiments and Discussion 18 5.1 Data and Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 ii 5.2 Calculating Expected Losses . . . . . . . . . . . . . . . . . . . . 19 5.3 Calculating Gradients . . . . . . . . . . . . . . . . . . . . . . . . 20 5.4 Model Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5.4.1 Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 5.4.2 Other Loss Functions . . . . . . . . . . . . . . . . . . . . 23 5.5 Expected Segmentations . . . . . . . . . . . . . . . . . . . . . . . 25 Chapter 6 Conclusion 27 Bibliography 29 초록 32Maste
    corecore