Search CORE

3,401 research outputs found

Open Vocabulary Learning on Source Code with a Graph-Structured Cache

Author: Anandkumar Anima
Cvitkovic Milan
Singh Badal
Publication venue
Publication date: 19/05/2019
Field of study

Machine learning models that take computer program source code as input typically use Natural Language Processing (NLP) techniques. However, a major challenge is that code is written using an open, rapidly changing vocabulary due to, e.g., the coinage of new variable and method names. Reasoning over such a vocabulary is not something for which most NLP methods are designed. We introduce a Graph-Structured Cache to address this problem; this cache contains a node for each new word the model encounters with edges connecting each word to its occurrences in the code. We find that combining this graph-structured cache strategy with recent Graph-Neural-Network-based models for supervised learning on code improves the models' performance on a code completion task and a variable naming task --- with over

100\%

relative improvement on the latter --- at the cost of a moderate increase in computation time.Comment: Published in the International Conference on Machine Learning (ICML 2019), 13 page

arXiv.org e-Print Archive

Caltech Authors

Open Vocabulary Learning on Source Code with a Graph-Structured Cache

Author: Anandkumar Anima
Cvitkovic Milan
Singh Badal
Publication venue: PMLR
Publication date: 01/06/2019
Field of study

Machine learning models that take computer program source code as input typically use Natural Language Processing (NLP) techniques. However, a major challenge is that code is written using an open, rapidly changing vocabulary due to, e.g., the coinage of new variable and method names. Reasoning over such a vocabulary is not something for which most NLP methods are designed. We introduce a Graph-Structured Cache to address this problem; this cache contains a node for each new word the model encounters with edges connecting each word to its occurrences in the code. We find that combining this graph-structured cache strategy with recent Graph-Neural-Network-based models for supervised learning on code improves the models’ performance on a code completion task and a variable naming task — with over 100% relative improvement on the latter — at the cost of a moderate increase in computation time

Deep Learning in Unconventional Domains

Author: Cvitkovic Michael William (Milan0
Publication venue
Publication date: 01/01/2020
Field of study

Machine learning methods have dramatically improved in recent years thanks to advances in deep learning (LeCun et al., 2015), a set of methods for training high-dimensional, highly-parameterized, nonlinear functions. Yet deep learning progress has been concentrated in the domains of computer vision, vision-based reinforcement learning, and natural language processing. This dissertation is an attempt to extend deep learning into domains where it has thus far had little impact or has never been applied. It presents new deep learning algorithms and state-of-the-art results on tasks in the domains of source-code analysis, relational databases, and tabular data.</p

Caltech Theses and Dissertations