Search CORE

2 research outputs found

Learning from Noisy and Delayed Rewards The Value of Reinforcement Learning to Defense Modeling and Simulation

Author: Alt Jonathan K.
Publication venue: Monterey, California. Naval Postgraduate School
Publication date: 01/09/2012
Field of study

Modeling and simulation of military operations requires human behavior models capable of learning from experi-ence in complex environments where feedback on action quality is noisy and delayed. This research examines the potential of reinforcement learning, a class of AI learning algorithms, to address this need. A novel reinforcement learning algorithm that uses the exponentially weighted average reward as an action-value estimator is described. Empirical results indicate that this relatively straight-forward approach improves learning speed in both benchmark environments and in challenging applied settings. Applications of reinforcement learning in the verification of the re-ward structure of a training simulation, the improvement in the performance of a discrete event simulation scheduling tool, and in enabling adaptive decision-making in combat simulation are presented. To place reinforcement learning within the context of broader models of human information processing, a practical cognitive architecture is devel-oped and applied to the representation of a population within a conflict area. These varied applications and domains demonstrate that the potential for the use of reinforcement learning within modeling and simulation is great.http://archive.org/details/learningfromnois1094517313Lieutenant Colonel, United States ArmyApproved for public release; distribution is unlimited

Calhoun, Institutional Archive of the Naval Postgraduate School

Balancing exploration and exploitation in agent learning

Author: Ozcan Ozkan
Publication venue: Monterey, California. Naval Postgraduate School
Publication date: 01/09/2011
Field of study

Controlling the ratio of exploration and exploitation in agent learning in dynamic environments is a continuing challenge in applying agent-learning techniques. Methods to control this ratio in a manner that mimics human behavior are required for use in the representation of human behavior in simulations, where the goal is to constrain agent-learning mechanisms in a manner similar to that observed in human cognition. The Cultural Geography (CG) model, under development in TRAC Monterey, is an agent-based social simulation. It simulates a wide variety of situations and scenarios so that a dynamic ratio between exploration and exploitation makes the decisions more sensible. As part of an attempt to improve the model, this thesis investigates enhancements to the exploration-exploitation balance by using different techniques. The work includes design of experiments with a range of factors in multiple environments and statistical analysis related to these experiments. As a main finding from this research, for small environments and for short runs techniques based on subjective utility give better results, while for long runs techniques based on time obtain higher utilities than other techniques. In more complex and bigger environments, a combined technique performed better in long runs.http://archive.org/details/balancingexplora109455468Approved for public release; distribution is unlimited

Calhoun, Institutional Archive of the Naval Postgraduate School