1,382 research outputs found
Open Vocabulary Learning on Source Code with a Graph-Structured Cache
Machine learning models that take computer program source code as input
typically use Natural Language Processing (NLP) techniques. However, a major
challenge is that code is written using an open, rapidly changing vocabulary
due to, e.g., the coinage of new variable and method names. Reasoning over such
a vocabulary is not something for which most NLP methods are designed. We
introduce a Graph-Structured Cache to address this problem; this cache contains
a node for each new word the model encounters with edges connecting each word
to its occurrences in the code. We find that combining this graph-structured
cache strategy with recent Graph-Neural-Network-based models for supervised
learning on code improves the models' performance on a code completion task and
a variable naming task --- with over relative improvement on the latter
--- at the cost of a moderate increase in computation time.Comment: Published in the International Conference on Machine Learning (ICML
2019), 13 page
Code Completion with Neural Attention and Pointer Networks
Intelligent code completion has become an essential research task to
accelerate modern software development. To facilitate effective code completion
for dynamically-typed programming languages, we apply neural language models by
learning from large codebases, and develop a tailored attention mechanism for
code completion. However, standard neural language models even with attention
mechanism cannot correctly predict the out-of-vocabulary (OoV) words that
restrict the code completion performance. In this paper, inspired by the
prevalence of locally repeated terms in program source code, and the recently
proposed pointer copy mechanism, we propose a pointer mixture network for
better predicting OoV words in code completion. Based on the context, the
pointer mixture network learns to either generate a within-vocabulary word
through an RNN component, or regenerate an OoV word from local context through
a pointer component. Experiments on two benchmarked datasets demonstrate the
effectiveness of our attention mechanism and pointer mixture network on the
code completion task.Comment: Accepted in IJCAI 201
Learning when to skim and when to read
Many recent advances in deep learning for natural language processing have
come at increasing computational cost, but the power of these state-of-the-art
models is not needed for every example in a dataset. We demonstrate two
approaches to reducing unnecessary computation in cases where a fast but weak
baseline classier and a stronger, slower model are both available. Applying an
AUC-based metric to the task of sentiment classification, we find significant
efficiency gains with both a probability-threshold method for reducing
computational cost and one that uses a secondary decision network.Comment: 8 pages (4 article, 1 references, 3 appendix), 11 figures, 3 tables,
published at ACL2017 workshop Repl4NL
- …