Search CORE

4 research outputs found

Program Merge Conflict Resolution via Neural Transformers

Author: Bird Christian
Dinella Elizabeth
Fakhoury Sarah
Ghorbani Negar
Jang Jinu
Lahiri Shuvendu
Mytkowicz Todd
Sundaresan Neel
Svyatkovskiy Alexey
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 21/09/2022
Field of study

Collaborative software development is an integral part of the modern software development life cycle, essential to the success of large-scale software projects. When multiple developers make concurrent changes around the same lines of code, a merge conflict may occur. Such conflicts stall pull requests and continuous integration pipelines for hours to several days, seriously hurting developer productivity. To address this problem, we introduce MergeBERT, a novel neural program merge framework based on token-level three-way differencing and a transformer encoder model. By exploiting the restricted nature of merge conflict resolutions, we reformulate the task of generating the resolution sequence as a classification task over a set of primitive merge patterns extracted from real-world merge commit data. Our model achieves 63-68% accuracy for merge resolution synthesis, yielding nearly a 3x performance improvement over existing semi-structured, and 2x improvement over neural program merge tools. Finally, we demonstrate that MergeBERT is sufficiently flexible to work with source code files in Java, JavaScript, TypeScript, and C# programming languages. To measure the practical use of MergeBERT, we conduct a user study to evaluate MergeBERT suggestions with 25 developers from large OSS projects on 122 real-world conflicts they encountered. Results suggest that in practice, MergeBERT resolutions would be accepted at a higher rate than estimated by automatic metrics for precision and accuracy. Additionally, we use participant feedback to identify future avenues for improvement of MergeBERT.Comment: ESEC/FSE '22 camera ready version. 12 pages, 4 figures, online appendi

arXiv.org e-Print Archive

Towards Demystifying Dimensions of Source Code Embeddings

Author: Allamanis Miltiadis
Allamanis Miltiadis
Alon Uri
Bard JF
Dinella Elizabeth
Fernandes Patrick
Islam Rabin Md Rafiqul
Jiang L.
Joachims T.
Pedregosa F.
van der Maaten Laurens
Wang Ke
Publication venue
Publication date: 28/09/2020
Field of study

Source code representations are key in applying machine learning techniques for processing and analyzing programs. A popular approach in representing source code is neural source code embeddings that represents programs with high-dimensional vectors computed by training deep neural networks on a large volume of programs. Although successful, there is little known about the contents of these vectors and their characteristics. In this paper, we present our preliminary results towards better understanding the contents of code2vec neural source code embeddings. In particular, in a small case study, we use the code2vec embeddings to create binary SVM classifiers and compare their performance with the handcrafted features. Our results suggest that the handcrafted features can perform very close to the highly-dimensional code2vec embeddings, and the information gains are more evenly distributed in the code2vec embeddings compared to the handcrafted features. We also find that the code2vec embeddings are more resilient to the removal of dimensions with low information gains than the handcrafted features. We hope our results serve a stepping stone toward principled analysis and evaluation of these code representations.Comment: 1st ACM SIGSOFT International Workshop on Representation Learning for Software Engineering and Program Languages, Co-located with ESEC/FSE (RL+SE&PL'20

arXiv.org e-Print Archive

Crossref

Learning semantic program embeddings with graph interval neural network

Author: Allamanis Miltiadis
Alon Uri
Bahdanau Dzmitry
Berdine Josh
Cousot P.
Dinella Elizabeth
Fernandes Patrick
Gilmer Justin
Gupta Rahul
Hellendoorn Vincent J.
Jiang L.
Li Yujia
Maddison Chris
Nguyen Tung Thanh
Pawlak Renaud
Saha Ripon
Vasic Marko
Wang Ke
Wang Ke
Wei Jiayi
Weiser Mark
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref