13,866 research outputs found
Learning a Behavioral Repertoire from Demonstrations
International audienceImitation Learning (IL) is a machine learning approach to learn a policy from a set of demonstrations. IL can be useful to kick-start learning before applying reinforcement learning (RL) but it can also be useful on its own, e.g. to learn to imitate human players in video games. Despite the success of systems that use IL and RL, how such systems can adapt in-between game rounds is a neglected area of study but an important aspect of many strategy games. In this paper, we present a new approach called Behavioral Repertoire Imitation Learning (BRIL) that learns a repertoire of behaviors from a set of demonstrations by augmenting the state-action pairs with behavioral descriptions. The outcome of this approach is a single neural network policy conditioned on a behavior description that can be precisely modulated. We apply this approach to train a policy on 7,777 human demonstrations for the build-order planning task in StarCraft II. Dimensionality reduction is applied to construct a low-dimensional behavioral space from a high-dimensional description of the army unit composition of each human replay. The results demonstrate that the learned policy can be effectively manipulated to express distinct behaviors. Additionally, by applying the UCB1 algorithm, the policy can adapt its behavior-in-between games-to reach a performance beyond that of the traditional IL baseline approach
Recommended from our members
The Effect of Bidirectional and Unidirectional Naming on Learning in New Ways and the Relation Between Bidirectional Naming and Basic Relational Concepts for Preschool Students
Bidirectional Naming (BiN) is the reliable demonstration of incidentally learned word-object relations as both a listener and speaker. In Experiment I, a pilot study, I tested the effects of the establishment of BiN on the rate of learning new math and reading operants under baseline Standard Learn Unit (SLU) and Instructional Demonstration Learn Unit (IDLU) conditions. I conducted a combined multiple probe and counterbalanced ABAB/BABA reversal design across participant dyads, for which each participant’s rate of acquisition was compared under the IDLU and SLU conditions before and after the acquisition of BiN. Four participants diagnosed with developmental delays were selected for the study due to the assessed absence of both the listener and speaker components of the BiN capability. Intensive Tact Instruction (ITI) and Multiple Exemplar Instruction (MEI) were used to establish BiN. After the acquisition of BiN, all four participants demonstrated accelerated rates of learning reading and math objectives when provided the opportunity to observe a model (via IDLU instruction) prior to an instructional session, indicating a functional relation between the acquisition of BiN and the acceleration of learning via teacher-modeled instruction. In Experiment II, a demonstration study, 5 preschool students with a disability were selected following BiN probe trials and were grouped according to their BiN repertoires. A combined ABAB/BABA reversal design across learning objectives and BiN level was used to compare the rate of learning new speaker (i.e., tact) and listener (i.e., point-to) tasks across SLU and IDLU conditions. Results replicated previous findings wherein students with BiN in repertoire learned at an accelerated rate when provided IDLU instruction as compared to SLU instruction; further, participants with only the listener component of Naming (Unidirectional Naming; UniN) displayed accelerated learning under IDLU conditions for listener tasks, but not for speaker tasks. Results across both Experiments I and II indicate that students’ acquisition of the BiN capability (joint stimulus control across speaking and listening) is an essential verbal developmental capability for learning through the observation of a model in a standard classroom instructional setting. In Experiment III, a group correlational design was used to analyze the relation between students’ BiN scores and performance during the Boehm Test of Basic Concepts 3rd Edition – Preschool Version (BTBC3-P) (Boehm, 2001). Results demonstrated that a significant positive correlation exists between BiN and BTBC3-P assessment scores (p (42) = .341, p = .027). These data indicate that a student’s degree of BiN is a potential predictor of success on measures of basic concept knowledge, adding to findings from Experiments I and II that BiN is functionally related to learning at an accelerated rate and via observation
- …