19,518 research outputs found
Solving Large Extensive-Form Games with Strategy Constraints
Extensive-form games are a common model for multiagent interactions with
imperfect information. In two-player zero-sum games, the typical solution
concept is a Nash equilibrium over the unconstrained strategy set for each
player. In many situations, however, we would like to constrain the set of
possible strategies. For example, constraints are a natural way to model
limited resources, risk mitigation, safety, consistency with past observations
of behavior, or other secondary objectives for an agent. In small games,
optimal strategies under linear constraints can be found by solving a linear
program; however, state-of-the-art algorithms for solving large games cannot
handle general constraints. In this work we introduce a generalized form of
Counterfactual Regret Minimization that provably finds optimal strategies under
any feasible set of convex constraints. We demonstrate the effectiveness of our
algorithm for finding strategies that mitigate risk in security games, and for
opponent modeling in poker games when given only partial observations of
private information.Comment: Appeared in AAAI 201
Fast Damage Recovery in Robotics with the T-Resilience Algorithm
Damage recovery is critical for autonomous robots that need to operate for a
long time without assistance. Most current methods are complex and costly
because they require anticipating each potential damage in order to have a
contingency plan ready. As an alternative, we introduce the T-resilience
algorithm, a new algorithm that allows robots to quickly and autonomously
discover compensatory behaviors in unanticipated situations. This algorithm
equips the robot with a self-model and discovers new behaviors by learning to
avoid those that perform differently in the self-model and in reality. Our
algorithm thus does not identify the damaged parts but it implicitly searches
for efficient behaviors that do not use them. We evaluate the T-Resilience
algorithm on a hexapod robot that needs to adapt to leg removal, broken legs
and motor failures; we compare it to stochastic local search, policy gradient
and the self-modeling algorithm proposed by Bongard et al. The behavior of the
robot is assessed on-board thanks to a RGB-D sensor and a SLAM algorithm. Using
only 25 tests on the robot and an overall running time of 20 minutes,
T-Resilience consistently leads to substantially better results than the other
approaches
- …