Search CORE

20 research outputs found

Towards Automatic Generation of Short Summaries of Commits

Author: Jiang Siyuan
McMillan Collin
Publication venue
Publication date: 28/03/2017
Field of study

Committing to a version control system means submitting a software change to the system. Each commit can have a message to describe the submission. Several approaches have been proposed to automatically generate the content of such messages. However, the quality of the automatically generated messages falls far short of what humans write. In studying the differences between auto-generated and human-written messages, we found that 82% of the human-written messages have only one sentence, while the automatically generated messages often have multiple lines. Furthermore, we found that the commit messages often begin with a verb followed by an direct object. This finding inspired us to use a "verb+object" format in this paper to generate short commit summaries. We split the approach into two parts: verb generation and object generation. As our first try, we trained a classifier to classify a diff to a verb. We are seeking feedback from the community before we continue to work on generating direct objects for the commits.Comment: 4 pages, accepted in ICPC 2017 ERA Trac

arXiv.org e-Print Archive

Crossref

ARENA: An Approach for the Automated Generation of Release Notes

Author: Andrian Marcus
Gabriele Bavota
Gerardo Canfora
Laura Moreno
Massimiliano Di Penta
Rocco Oliveto
Publication venue
Publication date: 01/02/2017
Field of study

Release notes document corrections, enhancements, and, in general, changes that were implemented in a new release of a software project. They are usually created manually and may include hundreds of different items, such as descriptions of new features, bug fixes, structural changes, new or deprecated APIs, and changes to software licenses. Thus, producing them can be a time-consuming and daunting task. This paper describes ARENA ( A utomatic RE lease N otes gener A tor), an approach for the automatic generation of release notes. ARENA extracts changes from the source code, summarizes them, and integrates them with information from versioning systems and issue trackers. ARENA was designed based on the manual analysis of 990 existing release notes. In order to evaluate the quality of the release notes automatically generated by ARENA, we performed four empirical studies involving a total of 56 participants (48 professional developers and eight students). The obtained results indicate that the generated release notes are very good approximations of the ones manually produced by developers and often include important information that is missing in the manually created release notes

Crossref

Open Access Repository

Eye of the Mind: Image Processing for Social Coding

Author: Nayebi Maleknaz
Publication venue
Publication date: 17/01/2020
Field of study

Developers are increasingly sharing images in social coding environments alongside the growth in visual interactions within social networks. The analysis of the ratio between the textual and visual content of Mozilla's change requests and in Q/As of StackOverflow programming revealed a steady increase in sharing images over the past five years. Developers' shared images are meaningful and are providing complementary information compared to their associated text. Often, the shared images are essential in understanding the change requests, questions, or the responses submitted. Relying on these observations, we delve into the potential of automatic completion of textual software artifacts with visual content.Comment: This is the author's version of ICSE 2020 pape

arXiv.org e-Print Archive

Crossref

PolyPublie

Exploring and Evaluating Personalized Models for Code Generation

Author: Clement Colin
Drain Dawn
Sundaresan Neel
Svyatkovskiy Alexey
Tufano Michele
Zlotchevski Andrei
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 19/09/2022
Field of study

Large Transformer models achieved the state-of-the-art status for Natural Language Understanding tasks and are increasingly becoming the baseline model architecture for modeling source code. Transformers are usually pre-trained on large unsupervised corpora, learning token representations and transformations relevant to modeling generally available text, and are then fine-tuned on a particular downstream task of interest. While fine-tuning is a tried-and-true method for adapting a model to a new domain -- for example, question-answering on a given topic -- generalization remains an on-going challenge. In this paper, we explore and evaluate transformer model fine-tuning for personalization. In the context of generating unit tests for Java methods, we evaluate learning to personalize to a specific software project using several personalization techniques. We consider three key approaches: (i) custom fine-tuning, which allows all the model parameters to be tuned; (ii) lightweight fine-tuning, which freezes most of the model's parameters, allowing tuning of the token embeddings and softmax layer only or the final layer alone; (iii) prefix tuning, which keeps model parameters frozen, but optimizes a small project-specific prefix vector. Each of these techniques offers a trade-off in total compute cost and predictive performance, which we evaluate by code and task-specific metrics, training time, and total computational operations. We compare these fine-tuning strategies for code generation and discuss the potential generalization and cost benefits of each in various deployment scenarios.Comment: Accepted to the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2022), Industry Track - Singapore, November 14-18, 2022, to appear 9 page

arXiv.org e-Print Archive

Label Smoothing Improves Neural Source Code Summarization

Author: Bansal Aakash
Haque Sakib
McMillan Collin
Publication venue
Publication date: 28/03/2023
Field of study

Label smoothing is a regularization technique for neural networks. Normally neural models are trained to an output distribution that is a vector with a single 1 for the correct prediction, and 0 for all other elements. Label smoothing converts the correct prediction location to something slightly less than 1, then distributes the remainder to the other elements such that they are slightly greater than 0. A conceptual explanation behind label smoothing is that it helps prevent a neural model from becoming "overconfident" by forcing it to consider alternatives, even if only slightly. Label smoothing has been shown to help several areas of language generation, yet typically requires considerable tuning and testing to achieve the optimal results. This tuning and testing has not been reported for neural source code summarization - a growing research area in software engineering that seeks to generate natural language descriptions of source code behavior. In this paper, we demonstrate the effect of label smoothing on several baselines in neural code summarization, and conduct an experiment to find good parameters for label smoothing and make recommendations for its use

arXiv.org e-Print Archive

Data-Driven Decisions and Actions in Today’s Software Development

Author: Alexandru Carol V
Ciurumelea Adelina
Gall Harald
Grano Giovanni
Laaber Christoph
Panichella Sebastiano
Proksch Sebastian
Schermann Gerald
Vassallo Carmine
Zhao Jitong
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Today’s software development is all about data: data about the software product itself, about the process and its different stages, about the customers and markets, about the development, the testing, the integration, the deployment, or the runtime aspects in the cloud. We use static and dynamic data of various kinds and quantities to analyze market feedback, feature impact, code quality, architectural design alternatives, or effects of performance optimizations. Development environments are no longer limited to IDEs in a desktop application or the like but span the Internet using live programming environments such as Cloud9 or large-volume repositories such as BitBucket, GitHub, GitLab, or StackOverflow. Software development has become “live” in the cloud, be it the coding, the testing, or the experimentation with different product options on the Internet. The inherent complexity puts a further burden on developers, since they need to stay alert when constantly switching between tasks in different phases. Research has been analyzing the development process, its data and stakeholders, for decades and is working on various tools that can help developers in their daily tasks to improve the quality of their work and their productivity. In this chapter, we critically reflect on the challenges faced by developers in a typical release cycle, identify inherent problems of the individual phases, and present the current state of the research that can help overcome these issues

Crossref

ZORA

Image-based Communication on Social Coding Platforms

Author: Adams Bram
Nayebi Maleknaz
Publication venue
Publication date: 27/06/2023
Field of study

Visual content in the form of images and videos has taken over general-purpose social networks in a variety of ways, streamlining and enriching online communications. We are interested to understand if and to what extent the use of images is popular and helpful in social coding platforms. We mined nine years of data from two popular software developers' platforms: the Mozilla issue tracking system, i.e., Bugzilla, and the most well-known platform for developers' Q/A, i.e., Stack Overflow. We further triangulated and extended our mining results by performing a survey with 168 software developers. We observed that, between 2013 and 2022, the number of posts containing image data on Bugzilla and Stack Overflow doubled. Furthermore, we found that sharing images makes other developers engage more and faster with the content. In the majority of cases in which an image is included in a developer's post, the information in that image is complementary to the text provided. Finally, our results showed that when an image is shared, understanding the content without the information in the image is unlikely for 86.9\% of the cases. Based on these observations, we discuss the importance of considering visual content when analyzing developers and designing automation tools

arXiv.org e-Print Archive