Search CORE

11 research outputs found

Towards Automatic Generation of Short Summaries of Commits

Author: Jiang Siyuan
McMillan Collin
Publication venue
Publication date: 28/03/2017
Field of study

Committing to a version control system means submitting a software change to the system. Each commit can have a message to describe the submission. Several approaches have been proposed to automatically generate the content of such messages. However, the quality of the automatically generated messages falls far short of what humans write. In studying the differences between auto-generated and human-written messages, we found that 82% of the human-written messages have only one sentence, while the automatically generated messages often have multiple lines. Furthermore, we found that the commit messages often begin with a verb followed by an direct object. This finding inspired us to use a "verb+object" format in this paper to generate short commit summaries. We split the approach into two parts: verb generation and object generation. As our first try, we trained a classifier to classify a diff to a verb. We are seeking feedback from the community before we continue to work on generating direct objects for the commits.Comment: 4 pages, accepted in ICPC 2017 ERA Trac

arXiv.org e-Print Archive

Crossref

End-to-End Rationale Reconstruction

Author: Dhaouadi Mouna
Famelis Michalis
Oakes Bentley James
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 31/08/2022
Field of study

The logic behind design decisions, called design rationale, is very valuable. In the past, researchers have tried to automatically extract and exploit this information, but prior techniques are only applicable to specific contexts and there is insufficient progress on an end-to-end rationale information extraction pipeline. Here we outline a path towards such a pipeline that leverages several Machine Learning (ML) and Natural Language Processing (NLP) techniques. Our proposed context-independent approach, called Kantara, produces a knowledge graph representation of decisions and of their rationales, which considers their historical evolution and traceability. We also propose validation mechanisms to ensure the correctness of the extracted information and the coherence of the development process. We conducted a preliminary evaluation of our proposed approach on a small example sourced from the Linux Kernel, which shows promising results

arXiv.org e-Print Archive

PolyPublie

COMET: Generating Commit Messages using Delta Graph Context Representation

Author: Mandli Abhinav Reddy
Rajput Saurabhsingh
Sharma Tushar
Publication venue
Publication date: 02/02/2024
Field of study

Commit messages explain code changes in a commit and facilitate collaboration among developers. Several commit message generation approaches have been proposed; however, they exhibit limited success in capturing the context of code changes. We propose Comet (Context-Aware Commit Message Generation), a novel approach that captures context of code changes using a graph-based representation and leverages a transformer-based model to generate high-quality commit messages. Our proposed method utilizes delta graph that we developed to effectively represent code differences. We also introduce a customizable quality assurance module to identify optimal messages, mitigating subjectivity in commit messages. Experiments show that Comet outperforms state-of-the-art techniques in terms of bleu-norm and meteor metrics while being comparable in terms of rogue-l. Additionally, we compare the proposed approach with the popular gpt-3.5-turbo model, along with gpt-4-turbo; the most capable GPT model, over zero-shot, one-shot, and multi-shot settings. We found Comet outperforming the GPT models, on five and four metrics respectively and provide competitive results with the two other metrics. The study has implications for researchers, tool developers, and software developers. Software developers may utilize Comet to generate context-aware commit messages. Researchers and tool developers can apply the proposed delta graph technique in similar contexts, like code review summarization.Comment: 22 Pages, 7 Figure

arXiv.org e-Print Archive

Neural-machine-translation-based commit message generation: how far are we?

Author: HASSAN Ahmed E.
LIU Zhongxin
LO David
WANG Xinyu
XIA Xin
XING Zhenchang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/09/2018
Field of study

Crossref

Institutional Knowledge at Singapore Management University

Continuous Rationale Management

Author: Kleebaum Anja
Publication venue
Publication date: 01/01/2023
Field of study

Continuous Software Engineering (CSE) is a software life cycle model open to frequent changes in requirements or technology. During CSE, software developers continuously make decisions on the requirements and design of the software or the development process. They establish essential decision knowledge, which they need to document and share so that it supports the evolution and changes of the software. The management of decision knowledge is called rationale management. Rationale management provides an opportunity to support the change process during CSE. However, rationale management is not well integrated into CSE. The overall goal of this dissertation is to provide workflows and tool support for continuous rationale management. The dissertation contributes an interview study with practitioners from the industry, which investigates rationale management problems, current practices, and features to support continuous rationale management beneficial for practitioners. Problems of rationale management in practice are threefold: First, documenting decision knowledge is intrusive in the development process and an additional effort. Second, the high amount of distributed decision knowledge documentation is difficult to access and use. Third, the documented knowledge can be of low quality, e.g., outdated, which impedes its use. The dissertation contributes a systematic mapping study on recommendation and classification approaches to treat the rationale management problems. The major contribution of this dissertation is a validated approach for continuous rationale management consisting of the ConRat life cycle model extension and the comprehensive ConDec tool support. To reduce intrusiveness and additional effort, ConRat integrates rationale management activities into existing workflows, such as requirements elicitation, development, and meetings. ConDec integrates into standard development tools instead of providing a separate tool. ConDec enables lightweight capturing and use of decision knowledge from various artifacts and reduces the developers' effort through automatic text classification, recommendation, and nudging mechanisms for rationale management. To enable access and use of distributed decision knowledge documentation, ConRat defines a knowledge model of decision knowledge and other artifacts. ConDec instantiates the model as a knowledge graph and offers interactive knowledge views with useful tailoring, e.g., transitive linking. To operationalize high quality, ConRat introduces the rationale backlog, the definition of done for knowledge documentation, and metrics for intra-rationale completeness and decision coverage of requirements and code. ConDec implements these agile concepts for rationale management and a knowledge dashboard. ConDec also supports consistent changes through change impact analysis. The dissertation shows the feasibility, effectiveness, and user acceptance of ConRat and ConDec in six case study projects in an industrial setting. Besides, it comprehensively analyses the rationale documentation created in the projects. The validation indicates that ConRat and ConDec benefit CSE projects. Based on the dissertation, continuous rationale management should become a standard part of CSE, like automated testing or continuous integration

Heidelberger Dokumentenserver

Prototype of a tool for automatic generation of commit messages for Java applications

Author: Cortés Coy Luis Fernando
Publication venue
Publication date: 01/01/2014
Field of study

Although version control systems allow developers to describe and explain the rationale behind code changes in commit messages, the state of practice indicates that most of the time such commit messages are either very short or even empty. In fact, in a recent study of 23K+ Java projects it has been found that only 10% of the messages are descriptive and over 66% of those messages contained fewer words as compared to a typical English sentence. However, accurate and complete commit messages summarizing software changes are important to support a number of development and maintenance tasks. This thesis presents an approach, coined as ChangeScribe, which is designed to generate commit messages automatically from change sets. ChangeScribe generates natural language commit messages by taking into account commit stereotype, the type of changes (e.g., files rename, changes done only to property files), as well as the impact set of the underlying changes. This work presents the evaluation of ChangeScribe in an evaluative survey involving 23 developers in which the participants analyzed automatically generated commit messages from real changes and compared them with commit messages written by the original developers of six open source systems. The results demonstrate that automatically generated messages by ChangeScribe are preferred in about 62% of the cases for large commits, and about 54% for small commitsResumen. Aunque los sistemas de control de versiones le permiten a los desarrolladores de software describir y explicar las razones por la cuales modificaron el código fuente utilizando un mensaje en el commit, en la práctica estos mensajes son muy cortos o incluso vacíos. De hecho, en recientes estudios de 23K+ de proyectos Java se ha encontrado que el 10% de los mensajes son descriptivos y alrededor del 66% de estos contienen pocas palabras comparado con el tamaño promedio de una oración escrita en el idioma inglés. Sin embargo, resumir los cambios en el software de una manera precisa y completa es muy importante para apoyar las tareas que se realizan en el desarrollo y mantenimiento de un software. Este trabajo presenta ChangeScribe un prototipo para generar mensajes de commit usando lenguaje natural y teniendo en cuenta el estereotipo del commit, el tipo de cambio (rename de un archivo, cambios a archivos de propiedades, etc ), y también el conjunto de impacto de los cambios realizados. De otro lado, presenta la evaluación de ChangeScribe en un estudio de usuarios que involucró 23 desarrolladores de software que analizaron los mensajes de commit generados automáticamente por ChangeScribe y los mensajes de commit escritos por los desarrolladores originales de seis sistemas open source. Los resultados demuestran que los mensajes generados de forma automática por ChangeScribe son preferidos en cerca del 62% de los casos en commits largos, y en cerca de 54% de los casos en commits cortos (pocas modificaciones).Maestrí

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Nacional De Colombia - Repositorio Institucional UN

ARENA: An Approach for the Automated Generation of Release Notes

Author: Andrian Marcus
Gabriele Bavota
Gerardo Canfora
Laura Moreno
Massimiliano Di Penta
Rocco Oliveto
Publication venue
Publication date: 01/02/2017
Field of study

Release notes document corrections, enhancements, and, in general, changes that were implemented in a new release of a software project. They are usually created manually and may include hundreds of different items, such as descriptions of new features, bug fixes, structural changes, new or deprecated APIs, and changes to software licenses. Thus, producing them can be a time-consuming and daunting task. This paper describes ARENA ( A utomatic RE lease N otes gener A tor), an approach for the automatic generation of release notes. ARENA extracts changes from the source code, summarizes them, and integrates them with information from versioning systems and issue trackers. ARENA was designed based on the manual analysis of 990 existing release notes. In order to evaluate the quality of the release notes automatically generated by ARENA, we performed four empirical studies involving a total of 56 participants (48 professional developers and eight students). The obtained results indicate that the generated release notes are very good approximations of the ones manually produced by developers and often include important information that is missing in the manually created release notes

Crossref

Open Access Repository