17 research outputs found
Learning to Reason in Round-based Games: Multi-task Sequence Generation for Purchasing Decision Making in First-person Shooters
Sequential reasoning is a complex human ability, with extensive previous
research focusing on gaming AI in a single continuous game, round-based
decision makings extending to a sequence of games remain less explored.
Counter-Strike: Global Offensive (CS:GO), as a round-based game with abundant
expert demonstrations, provides an excellent environment for multi-player
round-based sequential reasoning. In this work, we propose a Sequence Reasoner
with Round Attribute Encoder and Multi-Task Decoder to interpret the strategies
behind the round-based purchasing decisions. We adopt few-shot learning to
sample multiple rounds in a match, and modified model agnostic meta-learning
algorithm Reptile for the meta-learning loop. We formulate each round as a
multi-task sequence generation problem. Our state representations combine
action encoder, team encoder, player features, round attribute encoder, and
economy encoders to help our agent learn to reason under this specific
multi-player round-based scenario. A complete ablation study and comparison
with the greedy approach certify the effectiveness of our model. Our research
will open doors for interpretable AI for understanding episodic and long-term
purchasing strategies beyond the gaming community.Comment: 16th AAAI Conference on Artificial Intelligence and Interactive
Digital Entertainment (AIIDE-20
Parallelized Interactive Machine Learning on Autonomous Vehicles
Deep reinforcement learning (deep RL) has achieved superior performance in
complex sequential tasks by learning directly from image input. A deep neural
network is used as a function approximator and requires no specific state
information. However, one drawback of using only images as input is that this
approach requires a prohibitively large amount of training time and data for
the model to learn the state feature representation and approach reasonable
performance. This is not feasible in real-world applications, especially when
the data are expansive and training phase could introduce disasters that affect
human safety. In this work, we use a human demonstration approach to speed up
training for learning features and use the resulting pre-trained model to
replace the neural network in the deep RL Deep Q-Network (DQN), followed by
human interaction to further refine the model. We empirically evaluate our
approach by using only a human demonstration model and modified DQN with human
demonstration model included in the Microsoft AirSim car simulator. Our results
show that (1) pre-training with human demonstration in a supervised learning
approach is better and much faster at discovering features than DQN alone, (2)
initializing the DQN with a pre-trained model provides a significant
improvement in training time and performance even with limited human
demonstration, and (3) providing the ability for humans to supply suggestions
during DQN training can speed up the network's convergence on an optimal
policy, as well as allow it to learn more complex policies that are harder to
discover by random exploration.Comment: 6 pages, NAECON 2018 - IEEE National Aerospace and Electronics
Conferenc