7 research outputs found

    Training Intelligent Red Team Agents Via Reinforcement Deep Learning

    Get PDF
    NPS NRP Technical ReportWargames are an essential tool for education, training and formulation of strategy. They are especially important in the evaluation of threats from, and strategies against, trained adversaries who present significant risk to friendly forces. We propose to develop a wargame adversary trained to defeat the current strategy of friendly forces, thereby allowing the evaluation of alternate strategies against an intelligent, simulated opponent. We will investigate the use of deep neural network (DNN) algorithms to solve a constrained stochastic reward-collecting path problem. Agents from a friendly (blue) team and an adversarial (red) team will be placed within a discrete environment. The blue team will be challenged to obtain a reward by achieving a fixed goal using a pre-determined strategy. Then, reinforcement learning will be used to train the red team to overcome the blue team's current strategy. Having thus trained a competent red team, the blue team's strategy can be altered to evaluate the efficacy of new strategies. This research will seek to evaluate the ability of different DNN algorithms to train the red team against various blue team strategies, in terms of both efficacy and efficiency, and the resiliency of the trained red team to subsequent changes in blue team strategy. We anticipate the results of this research to be summarized in a research poster and executive summary, in addition to a presentation and full technical report deliverable to the Topic Sponsor.Marine Corps Systems Command (MARCORSYSCOM)Marine Corps Systems Command (MARCORSYSCOM)This research is supported by funding from the Naval Postgraduate School, Naval Research Program (PE 0605853N/2098). https://nps.edu/nrpChief of Naval Operations (CNO)Approved for public release. Distribution is unlimited.

    Accelerating Deep Reinforcement Learning via Action Advising

    Get PDF
    Deep Reinforcement Learning (RL) algorithms can solve complex sequential decision-making tasks successfully. However, they suffer from the major drawbacks of having poor sample efficiency and long training times, which can often be tackled by knowledge reuse. Action advising is a promising knowledge exchange mechanism that adopts the teacher-student paradigm to leverage some legacy knowledge through a budget-limited number of interactions in the form of action advice between peers. In this thesis, we studied action advising techniques, particularly in Deep RL domain, both in single-agent and multi-agent scenarios. We proposed a heuristic-based jointly-initiated action advising method that is suitable for multi-agent Deep RL setting, for the first time in literature. By adopting Random Network Distillation (RND), we devised a measurement for agents to assess their confidence in any given state to initiate the teacher-student dynamics with no prior role assumptions. We also used RND as an advice novelty metric to construct more robust student-initiated advice query strategies in single-agent Deep RL. Moreover, we addressed the absence of advice utilisation mechanisms beyond collection by employing a behavioural cloning module to imitate the teacher's advice. We also proposed a method to automatically tune the relevant hyperparameters of these components on the fly to make our action advising algorithms capable of adapting to any domain with minimal human intervention. Finally, we extended our advice reuse via imitation technique to construct a unified student-initiated approach that addresses both advice collection and advice utilisation problems. The experiments we conducted in a range of Deep RL domains showed that our proposal provides significant contributions. Our Deep RL-compatible action advising techniques managed to achieve a state-of-the-art level of performance. Furthermore, we demonstrated that their practical attributes render domain adaptation and implementation processes straightforward, which is an important progression towards being able to apply action advising in real-world problems

    Analysis of Statistical Forward Planning Methods in Pommerman

    No full text
    Pommerman is a complex multi-player and partially observable game where agents try to be the last standing to win. This game poses very interesting challenges to AI, such as collaboration, learning and planning. In this paper, we compare two Statistical Forward Planning algorithms, Monte Carlo Tree Search (MCTS) and Rolling Horizon Evolutionary Algorithm (RHEA) in Pommerman. We provide insights on how the agents actually play the game, inspecting their behaviours to explain their performance. Results show that MCTS outperforms RHEA in several game settings, but leaving room for multiple avenues of future work: tuning these methods, improving opponent modelling, identifying trap moves and introducing of assumptions for partial observability settings
    corecore