17 research outputs found
Stochastic Shortest Path with Energy Constraints in POMDPs
We consider partially observable Markov decision processes (POMDPs) with a
set of target states and positive integer costs associated with every
transition. The traditional optimization objective (stochastic shortest path)
asks to minimize the expected total cost until the target set is reached. We
extend the traditional framework of POMDPs to model energy consumption, which
represents a hard constraint. The energy levels may increase and decrease with
transitions, and the hard constraint requires that the energy level must remain
positive in all steps till the target is reached. First, we present a novel
algorithm for solving POMDPs with energy levels, developing on existing POMDP
solvers and using RTDP as its main method. Our second contribution is related
to policy representation. For larger POMDP instances the policies computed by
existing solvers are too large to be understandable. We present an automated
procedure based on machine learning techniques that automatically extracts
important decisions of the policy allowing us to compute succinct human
readable policies. Finally, we show experimentally that our algorithm performs
well and computes succinct policies on a number of POMDP instances from the
literature that were naturally enhanced with energy levels.Comment: Technical report accompanying a paper published in proceedings of
AAMAS 201
Decentralized Task and Path Planning for Multi-Robot Systems
We consider a multi-robot system with a team of collaborative robots and multiple tasks that emerges over time. We propose a fully decentralized task and path planning (DTPP) framework consisting of a task allocation module and a localized path planning module. Each task is modeled as a Markov Decision Process (MDP) or a Mixed Observed Markov Decision Process (MOMDP) depending on whether full states or partial states are observable. The task allocation module then aims at maximizing the expected pure reward (reward minus cost) of the robotic team. We fuse the Markov model into a factor graph formulation so that the task allocation can be decentrally solved using the max-sum algorithm. Each robot agent follows the optimal policy synthesized for the Markov model and we propose a localized forward dynamic programming scheme that resolves conflicts between agents and avoids collisions. The proposed framework is demonstrated with high fidelity ROS simulations and experiments with multiple ground robots
Solving Large Extensive-Form Games with Strategy Constraints
Extensive-form games are a common model for multiagent interactions with
imperfect information. In two-player zero-sum games, the typical solution
concept is a Nash equilibrium over the unconstrained strategy set for each
player. In many situations, however, we would like to constrain the set of
possible strategies. For example, constraints are a natural way to model
limited resources, risk mitigation, safety, consistency with past observations
of behavior, or other secondary objectives for an agent. In small games,
optimal strategies under linear constraints can be found by solving a linear
program; however, state-of-the-art algorithms for solving large games cannot
handle general constraints. In this work we introduce a generalized form of
Counterfactual Regret Minimization that provably finds optimal strategies under
any feasible set of convex constraints. We demonstrate the effectiveness of our
algorithm for finding strategies that mitigate risk in security games, and for
opponent modeling in poker games when given only partial observations of
private information.Comment: Appeared in AAAI 201
다차원 Parametric Min-cut을 응용한 섭동확률모델에서의 예측손실 최적화
학위논문 (석사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2015. 8. 정교민.We consider the problem of learning perturbation-based probabilistic models
by computing and differentiating expected losses. This is a challenging computational
problem that has traditionally been tackled using Monte Carlo-based
methods. In this work, we show how a generalization of parametric min-cuts
can be used to address the same problem, achieving high accuracy of faster than
a sampling-based baseline. Utilizing our proposed Skeleton Method, we show
that we can learn the perturbation model so as to directly minimize expected
losses. Experimental results show that this approach offers promise as a new
way of training structured prediction models under complex loss functions.Abstract i
Chapter 1 Introduction 1
Chapter 2 Background: Perturbations, Expected Losses 4
Chapter 3 Algorithm: Skeleton Method 6
3.1 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Finding a New Facet . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3 Updating the Skeleton GY . . . . . . . . . . . . . . . . . . . . . . 10
3.4 Calculating Expected Loss R . . . . . . . . . . . . . . . . . . . . 11
3.5 Example: Two Parameters . . . . . . . . . . . . . . . . . . . . . . 11
Chapter 4 Learning 14
4.1 Computing Gradients: Slicing . . . . . . . . . . . . . . . . . . . . 14
4.2 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.3 Exploiting the Skeletond Method . . . . . . . . . . . . . . . . . . 17
Chapter 5 Experiments and Discussion 18
5.1 Data and Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
ii
5.2 Calculating Expected Losses . . . . . . . . . . . . . . . . . . . . 19
5.3 Calculating Gradients . . . . . . . . . . . . . . . . . . . . . . . . 20
5.4 Model Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.4.1 Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.4.2 Other Loss Functions . . . . . . . . . . . . . . . . . . . . 23
5.5 Expected Segmentations . . . . . . . . . . . . . . . . . . . . . . . 25
Chapter 6 Conclusion 27
Bibliography 29
초록 32Maste