Search CORE

7 research outputs found

Leveraging Code Generation to Improve Code Retrieval and Summarization via Dual Learning

Author: Allamanis Miltiadis
Allamanis Miltiadis
Alon Uri
Banerjee Satanjeev
Caruana A.
Di He Wei Chen Yuanzhi Li
He Di
Lu Meili
Moreno Laura
Movshovitz-Attias Dana
Papineni Kishore
Su Shang-Yu
Voorhees M.
Wang Yijun
Xia Yingce
Yao Ziyu
Ye Hai
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 25/02/2020
Field of study

Code summarization generates brief natural language description given a source code snippet, while code retrieval fetches relevant source code given a natural language query. Since both tasks aim to model the association between natural language and programming language, recent studies have combined these two tasks to improve their performance. However, researchers have yet been able to effectively leverage the intrinsic connection between the two tasks as they train these tasks in a separate or pipeline manner, which means their performance can not be well balanced. In this paper, we propose a novel end-to-end model for the two tasks by introducing an additional code generation task. More specifically, we explicitly exploit the probabilistic correlation between code summarization and code generation with dual learning, and utilize the two encoders for code summarization and code generation to train the code retrieval task via multi-task learning. We have carried out extensive experiments on an existing dataset of SQL and Python, and results show that our model can significantly improve the results of the code retrieval task over the-state-of-art models, as well as achieve competitive performance in terms of BLEU score for the code summarization task.Comment: Published at The Web Conference (WWW) 2020, full pape

arXiv.org e-Print Archive

Crossref

Neural-machine-translation-based commit message generation: how far are we?

Author: HASSAN Ahmed E.
LIU Zhongxin
LO David
WANG Xinyu
XIA Xin
XING Zhenchang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/09/2018
Field of study

Crossref

Institutional Knowledge at Singapore Management University

Code Structure Guided Transformer for Source Code Summarization

Author: Gao Cuiyun
Gao Shuzheng
He Yulan
Lyu Michael R.
Nie Lun Yiu
Xia Xin
Zeng Jichuan
Publication venue
Publication date: 22/07/2022
Field of study

Code summaries help developers comprehend programs and reduce their time to infer the program functionalities during software maintenance. Recent efforts resort to deep learning techniques such as sequence-to-sequence models for generating accurate code summaries, among which Transformer-based approaches have achieved promising performance. However, effectively integrating the code structure information into the Transformer is under-explored in this task domain. In this paper, we propose a novel approach named SG-Trans to incorporate code structural properties into Transformer. Specifically, we inject the local symbolic information (e.g., code tokens and statements) and global syntactic structure (e.g., data flow graph) into the self-attention module of Transformer as inductive bias. To further capture the hierarchical characteristics of code, the local information and global structure are designed to distribute in the attention heads of lower layers and high layers of Transformer. Extensive evaluation shows the superior performance of SG-Trans over the state-of-the-art approaches. Compared with the best-performing baseline, SG-Trans still improves 1.4% and 2.0% in terms of METEOR score, a metric widely used for measuring generation quality, respectively on two benchmark datasets

arXiv.org e-Print Archive

Branch coverage prediction in automated testing

Author: Gall Harald C.
Grano Giovanni
Panichella Sebastiano
Titov Timofey V.
Publication venue: Wiley
Publication date: 01/01/2019
Field of study

This is the peer reviewed version which has been published in final form at [DOI]. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived Versions.Software testing is crucial in continuous integration (CI). Ideally, at every commit, all the test cases should be executed, and moreover, new test cases should be generated for the new source code. This is especially true in a Continuous Test Generation (CTG) environment, where the automatic generation of test cases is integrated into the continuous integration pipeline. In this context, developers want to achieve a certain minimum level of coverage for every software build. However, executing all the test cases and, moreover, generating new ones for all the classes at every commit is not feasible. As a consequence, developers have to select which subset of classes has to be tested and/or targeted by test‐case generation. We argue that knowing a priori the branch coverage that can be achieved with test‐data generation tools can help developers into taking informed decision about those issues. In this paper, we investigate the possibility to use source‐code metrics to predict the coverage achieved by test‐data generation tools. We use four different categories of source‐code features and assess the prediction on a large data set involving more than 3'000 Java classes. We compare different machine learning algorithms and conduct a fine‐grained feature analysis aimed at investigating the factors that most impact the prediction accuracy. Moreover, we extend our investigation to four different search budgets. Our evaluation shows that the best model achieves an average 0.15 and 0.21 MAE on nested cross‐validation over the different budgets, respectively, on EVOSUITE and RANDOOP. Finally, the discussion of the results demonstrate the relevance of coupling‐related features for the prediction accuracy

ZHAW digitalcollection

ZORA

A search for the Ten Commentments: An exploratory study on automated quality assessment of comments in Java source code

Author: Lung C.
Publication venue
Publication date: 16/05/2022
Field of study

Open University of the Netherlands Research Portal

Automatic sentence annotation for more useful bug report summarization

Author: Galappaththi Akalanka
University of Lethbridge. Faculty of Arts and Science
Publication venue: 'University of Central Missouri, Department of Mathematics and Computer Science'
Publication date: 01/01/2020
Field of study

Bug reports are a useful software artifact with software developers referring to them for various information needs. As bug reports can become long, users of bug reports may need to spend a lot of time reading them. Previous studies developed summarizers and the quality of summaries was determined based on human-created gold-standard summaries. We believe creating such summaries for evaluating summarizers is not a good practice. First, we have observed a high level of disagreement between the annotated summaries. Second, the number of annotators involved is lower than the established minimum for the creation of a stable annotated summary. Finally, the traditional fixed threshold of 25% of the bug report word count does not adequately serve the different information needs. Consequently, we developed an automatic sentence annotation method to identify content in bug report comments which allows bug report users to customize a view for their task-dependent information needs

OPUS: Open Uleth Scholarship - University of Lethbridge Research Repository