Open Vocabulary Learning on Source Code with a Graph-Structured Cache

Anandkumar, Anima; Cvitkovic, Milan; Singh, Badal

research

Open Vocabulary Learning on Source Code with a Graph-Structured Cache

Authors: Anima Anandkumar
Milan Cvitkovic
Badal Singh
Publication date: 19 May 2019
Publisher

Abstract

Machine learning models that take computer program source code as input typically use Natural Language Processing (NLP) techniques. However, a major challenge is that code is written using an open, rapidly changing vocabulary due to, e.g., the coinage of new variable and method names. Reasoning over such a vocabulary is not something for which most NLP methods are designed. We introduce a Graph-Structured Cache to address this problem; this cache contains a node for each new word the model encounters with edges connecting each word to its occurrences in the code. We find that combining this graph-structured cache strategy with recent Graph-Neural-Network-based models for supervised learning on code improves the models' performance on a code completion task and a variable naming task --- with over

100\%

relative improvement on the latter --- at the cost of a moderate increase in computation time.Comment: Published in the International Conference on Machine Learning (ICML 2019), 13 page

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Caltech Authors - Main

oai:authors.library.caltech.ed...

Last time updated on 05/02/2021

Caltech Authors - Main

oai:authors.library.caltech.ed...

Last time updated on 09/07/2019