Search CORE

2,339 research outputs found

Predicting Type Annotations for Python using Embeddings from Graph Neural Networks

Author: Ivanov V
Romanov V
Succi G
Publication venue: SciTePress
Publication date: 01/01/2021
Field of study

An intelligent tool for type annotations in Python would increase the productivity of developers. Python is a dynamic programming language, and predicting types using static analysis is difficult. Existing techniques for type prediction use deep learning models originated in the area of Natural Language Processing. These models depend on the quality of embeddings for source code tokens. We compared approaches for pre- training embeddings for source code. Specifically, we compared FastText embeddings to embeddings trained with Graph Neural Networks (GNN). Our experiments showed that GNN embeddings outperformed FastText embeddings on the task of type prediction. Moreover, they seem to encode complementary information since the prediction quality increases when both types of embeddings are use

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Representing Programs with Dependency and Function Call Graphs for Learning Hierarchical Embeddings

Author: Ivanov V
Romanov V
Succi G
Publication venue: 'Scitepress'
Publication date: 01/01/2020
Field of study

Any source code can be represented as a graph. This kind of representation allows capturing the interaction between the elements of a program, such as functions, variables, etc. Modeling these interactions can enable us to infer the purpose of a code snippet, a function, or even an entire program. Lately, more and more work appear, where source code is represented in the form of a graph. One of the difficulties in evaluating the usefulness of such representation is the lack of a proper dataset and an evaluation metric. Our contribution is in preparing a dataset that represents programs written in Python and Java source codes in the form of dependency and function call graphs. In this dataset, multiple projects are analyzed and united into a single graph. The nodes of the graph represent the functions, variables, classes, methods, interfaces, etc. Nodes for functions carry information about how these functions are constructed internally, and where they are called from. Such graphs enable training hierarchical vector representations for source code. Moreover, some functions come with textual descriptions (docstrings), which allows learning useful tasks such as API search and generation of documentation

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna