1,716 research outputs found
An Optimal Online Method of Selecting Source Policies for Reinforcement Learning
Transfer learning significantly accelerates the reinforcement learning
process by exploiting relevant knowledge from previous experiences. The problem
of optimally selecting source policies during the learning process is of great
importance yet challenging. There has been little theoretical analysis of this
problem. In this paper, we develop an optimal online method to select source
policies for reinforcement learning. This method formulates online source
policy selection as a multi-armed bandit problem and augments Q-learning with
policy reuse. We provide theoretical guarantees of the optimal selection
process and convergence to the optimal policy. In addition, we conduct
experiments on a grid-based robot navigation domain to demonstrate its
efficiency and robustness by comparing to the state-of-the-art transfer
learning method
Solving Large Extensive-Form Games with Strategy Constraints
Extensive-form games are a common model for multiagent interactions with
imperfect information. In two-player zero-sum games, the typical solution
concept is a Nash equilibrium over the unconstrained strategy set for each
player. In many situations, however, we would like to constrain the set of
possible strategies. For example, constraints are a natural way to model
limited resources, risk mitigation, safety, consistency with past observations
of behavior, or other secondary objectives for an agent. In small games,
optimal strategies under linear constraints can be found by solving a linear
program; however, state-of-the-art algorithms for solving large games cannot
handle general constraints. In this work we introduce a generalized form of
Counterfactual Regret Minimization that provably finds optimal strategies under
any feasible set of convex constraints. We demonstrate the effectiveness of our
algorithm for finding strategies that mitigate risk in security games, and for
opponent modeling in poker games when given only partial observations of
private information.Comment: Appeared in AAAI 201
- …