2,339 research outputs found
Predicting Type Annotations for Python using Embeddings from Graph Neural Networks
An intelligent tool for type annotations in Python would increase the productivity of developers. Python is
a dynamic programming language, and predicting types using static analysis is difficult. Existing techniques
for type prediction use deep learning models originated in the area of Natural Language Processing. These
models depend on the quality of embeddings for source code tokens. We compared approaches for pre-
training embeddings for source code. Specifically, we compared FastText embeddings to embeddings trained
with Graph Neural Networks (GNN). Our experiments showed that GNN embeddings outperformed FastText
embeddings on the task of type prediction. Moreover, they seem to encode complementary information since
the prediction quality increases when both types of embeddings are use
Representing Programs with Dependency and Function Call Graphs for Learning Hierarchical Embeddings
Any source code can be represented as a graph. This kind of representation allows capturing the interaction
between the elements of a program, such as functions, variables, etc. Modeling these interactions can enable
us to infer the purpose of a code snippet, a function, or even an entire program. Lately, more and more
work appear, where source code is represented in the form of a graph. One of the difficulties in evaluating the
usefulness of such representation is the lack of a proper dataset and an evaluation metric. Our contribution is in
preparing a dataset that represents programs written in Python and Java source codes in the form of dependency
and function call graphs. In this dataset, multiple projects are analyzed and united into a single graph. The
nodes of the graph represent the functions, variables, classes, methods, interfaces, etc. Nodes for functions
carry information about how these functions are constructed internally, and where they are called from. Such
graphs enable training hierarchical vector representations for source code. Moreover, some functions come
with textual descriptions (docstrings), which allows learning useful tasks such as API search and generation
of documentation
- …