1 research outputs found
EFFECTIVENESS OF PROXIMAL POLICY OPTIMIZATION METHODS FOR NEURAL PROGRAM INDUCTION
The Neural Virtual Machine (NVM) is a novel neurocomputational architecturedesigned to emulate the functionality of a traditional computer. A version of the
NVM called NVM-RL supports reinforcement learning based on standard policy
gradient methods as a mechanism for performing neural program induction. In
this thesis, I modified NVM-RL using one of the most popular reinforcement
learning algorithms, proximal policy optimization (PPO). Surprisingly, using PPO
with the existing all-or-nothing reward function did not improve its effectiveness.
However, I found that PPO did improve the performance of the existing NVM-RL
if one instead used a reward function that grants partial credit for incorrect outputs
based on how much those incorrect outputs differ from the correct targets. I
conclude that, in some situations, PPO can improve the performance of
reinforcement learning during program induction, but that this improvement is
dependent on the quality of the reward function that is used