Search CORE

19 research outputs found

Increasing Code Completion Accuracy in Pythia Models for Non-Standard Python Libraries

Author: Buksbaum David
Publication venue: NSUWorks
Publication date: 01/01/2023
Field of study

Contemporary software development with modern programming languages leverages Integrated Development Environments, smart text editors, and similar tooling with code completion capabilities to increase the efficiency of software developers. Recent code completion research has shown that the combination of natural language processing with recurrent neural networks configured with long short-term memory can improve the accuracy of code completion predictions over prior models. It is well known that the accuracy of predictive systems based on training data is correlated to the quality and the quantity of the training data. This dissertation demonstrates that by expanding the training data set to include more references to specific Python third-party modules, the quality of the predictions increase for those specific Python third-party modules without degrading the quality of predictions of the originally represented modules

NSU Works

Pythia: AI-assisted Code Completion System

Author: Andrea Renika D'Souza
Mikolov Tomas
Mobasher B.
Shani R. I.
Socher Richard
Sutskever Ilya
Zimdars A.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 28/11/2019
Field of study

In this paper, we propose a novel end-to-end approach for AI-assisted code completion called Pythia. It generates ranked lists of method and API recommendations which can be used by software developers at edit time. The system is currently deployed as part of Intellicode extension in Visual Studio Code IDE. Pythia exploits state-of-the-art large-scale deep learning models trained on code contexts extracted from abstract syntax trees. It is designed to work at a high throughput predicting the best matching code completions on the order of 100

ms

. We describe the architecture of the system, perform comparisons to frequency-based approach and invocation-based Markov Chain language model, and discuss challenges serving Pythia models on lightweight client devices. The offline evaluation results obtained on 2700 Python open source software GitHub repositories show a top-5 accuracy of 92\%, surpassing the baseline models by 20\% averaged over classes, for both intra and cross-project settings.Comment: Published in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '19

arXiv.org e-Print Archive

Crossref

Code Prediction by Feeding Trees to Transformers

Author: Chandra Satish
Kim Seohyun
Tian Yuchi
Zhao Jinman
Publication venue
Publication date: 02/07/2020
Field of study

We advance the state-of-the-art in the accuracy of code prediction (next token prediction) used in autocomplete systems. First, we report that using the recently proposed Transformer architecture even out-of-the-box outperforms previous neural and non-neural systems for code prediction. We then show that by making the Transformer architecture aware of the syntactic structure of code, we further increase the margin by which a Transformer-based system outperforms previous systems. With this, it outperforms the accuracy of an RNN-based system (similar to Hellendoorn et al. 2018) by 18.3\%, the Deep3 system (Raychev et al 2016) by 14.1\%, and an adaptation of Code2Seq (Alon et al., 2018) for code prediction by 14.4\%. We present in the paper several ways of communicating the code structure to the Transformer, which is fundamentally built for processing sequence data. We provide a comprehensive experimental evaluation of our proposal, along with alternative design choices, on a standard Python dataset, as well as on a Facebook internal Python corpus. Our code and data preparation pipeline will be available in open source

arXiv.org e-Print Archive

Introduction of an Assistance System to Support Domain Experts in Programming Low-code to Leverage Industry 5.0

Author: Haben Fabian
Krueger Marius
Neumann Eva-Maria
Vogel-Heuser Birgit
Wieringa Timotheus
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/12/2022
Field of study

The rapid technological leaps of Industry 4.0 increase the pressure and demands on humans working in automation, which is one of the main motivators of Industry 5.0. In particular, automation software development for mechatronic systems becomes increasingly challenging, as both domain knowledge and programming skills are required for high-quality, maintainable software. Especially for small companies from automation and robotics without dedicated software engineering departments, domain-specific low-code platforms become indispensable that enable domain experts to develop code intuitively using visual programming languages, e.g., for tasks such as retrofitting mobile machines. However, for extensive functionalities, visual programs may become overwhelming due to the scaling-up problem. In addition, the ever-shortening time-to-market increases the time pressure on programmers. Thus, an assistance system concept is introduced that can be implemented by low-code platform suppliers based on combining data mining and static code analysis. Domain experts are supported in developing low-code by targeted recommendations, metric-based complexity measurement, and reducing complexity by encapsulating functionalities. The concept is implemented for the industrial low-code platform HAWE eDesign to program hydraulic components in mobile machines, and its benefits are confirmed in a user study and an industrial expert workshop.Comment: 8 pages, https://ieeexplore.ieee.org/abstract/document/983945

arXiv.org e-Print Archive

Code Completion by Modeling Flattened Abstract Syntax Trees as Graphs

Author: Li Hui
Wang Yanlin
Publication venue
Publication date: 17/03/2021
Field of study

Code completion has become an essential component of integrated development environments. Contemporary code completion methods rely on the abstract syntax tree (AST) to generate syntactically correct code. However, they cannot fully capture the sequential and repetitive patterns of writing code and the structural information of the AST. To alleviate these problems, we propose a new code completion approach named CCAG, which models the flattened sequence of a partial AST as an AST graph. CCAG uses our proposed AST Graph Attention Block to capture different dependencies in the AST graph for representation learning in code completion. The sub-tasks of code completion are optimized via multi-task learning in CCAG, and the task balance is automatically achieved using uncertainty without the need to tune task weights. The experimental results show that CCAG has superior performance than state-of-the-art approaches and it is able to provide intelligent code completion.Comment: Accepted in AAAI 2021. This version contains the appendix for the derivation of Eq. 1

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications