Search CORE

1 research outputs found

EFFECTIVENESS OF PROXIMAL POLICY OPTIMIZATION METHODS FOR NEURAL PROGRAM INDUCTION

Author: Lin Runxing
Publication venue
Publication date: 01/01/2020
Field of study

The Neural Virtual Machine (NVM) is a novel neurocomputational architecturedesigned to emulate the functionality of a traditional computer. A version of the NVM called NVM-RL supports reinforcement learning based on standard policy gradient methods as a mechanism for performing neural program induction. In this thesis, I modified NVM-RL using one of the most popular reinforcement learning algorithms, proximal policy optimization (PPO). Surprisingly, using PPO with the existing all-or-nothing reward function did not improve its effectiveness. However, I found that PPO did improve the performance of the existing NVM-RL if one instead used a reward function that grants partial credit for incorrect outputs based on how much those incorrect outputs differ from the correct targets. I conclude that, in some situations, PPO can improve the performance of reinforcement learning during program induction, but that this improvement is dependent on the quality of the reward function that is used

Digital Repository at the University of Maryland