4,405 research outputs found
Towards Automatic Generation of Short Summaries of Commits
Committing to a version control system means submitting a software change to
the system. Each commit can have a message to describe the submission. Several
approaches have been proposed to automatically generate the content of such
messages. However, the quality of the automatically generated messages falls
far short of what humans write. In studying the differences between
auto-generated and human-written messages, we found that 82% of the
human-written messages have only one sentence, while the automatically
generated messages often have multiple lines. Furthermore, we found that the
commit messages often begin with a verb followed by an direct object. This
finding inspired us to use a "verb+object" format in this paper to generate
short commit summaries. We split the approach into two parts: verb generation
and object generation. As our first try, we trained a classifier to classify a
diff to a verb. We are seeking feedback from the community before we continue
to work on generating direct objects for the commits.Comment: 4 pages, accepted in ICPC 2017 ERA Trac
Towards automatic context-aware summarization of code entities
Software developers are working with different methods and classes and in order to understand those that perplex them and–or that are part of their tasks, they need to tackle with a huge amount of information. Therefore, providing developers with high-quality summaries of code entities can help them during their maintenance and evolution tasks.
To provide useful information about the purpose of code entities, informal documentation (Stack Overflow) has been shown to be an important source of information that can be leveraged.
In this study, we investigate bug reports as a type of informal documentation and we apply machine learning to produce summaries of code entities (methods and classes) in bug reports. In the proposed approach, code entities are extracted using a technique in a form of an island parser that we implemented to identify code in bug reports. Additionally, we applied machine learning to select a set of useful sentences that will be part of the code entities’ summaries. We have used logistic regression as our machine learning technique to rank sentences based on their importance. To this aim, a corpus of sentences is built based on the occurrence of code entities in the sentences belonging to bug reports containing the code entities in question. In the last step, summaries have been evaluated using surveys to estimate the quality of produced summaries.
The results show that the automatically produced summaries can reduce time and effort to understand the usage of code entities. Specifically, the majority of participants found summaries extremely helpful to decrease the understanding time (43.5%) and the effort to understand the code entities (39.1%).
In the future, summaries can be produced by using other informal documentation such as mailing lists or stack overflow, etc. Additionally, the approach can be applied in practical settings. Consequently, it can be used within an IDE such as Eclipse to assist developers during their software maintenance and evolution tasks
Explainable Software Bot Contributions: Case Study of Automated Bug Fixes
In a software project, esp. in open-source, a contribution is a valuable
piece of work made to the project: writing code, reporting bugs, translating,
improving documentation, creating graphics, etc. We are now at the beginning of
an exciting era where software bots will make contributions that are of similar
nature than those by humans. Dry contributions, with no explanation, are often
ignored or rejected, because the contribution is not understandable per se,
because they are not put into a larger context, because they are not grounded
on idioms shared by the core community of developers. We have been operating a
program repair bot called Repairnator for 2 years and noticed the problem of
"dry patches": a patch that does not say which bug it fixes, or that does not
explain the effects of the patch on the system. We envision program repair
systems that produce an "explainable bug fix": an integrated package of at
least 1) a patch, 2) its explanation in natural or controlled language, and 3)
a highlight of the behavioral difference with examples. In this paper, we
generalize and suggest that software bot contributions must explainable, that
they must be put into the context of the global software development
conversation
Summarizing and measuring development activity
Software developers pursue a wide range of activities as part of their work, and making sense of what they did in a given time frame is far from trivial as evidenced by the large number of awareness and coordination tools that have been developed in recent years. To inform tool design for making sense of the information available about a developer's activity, we conducted an empirical study with 156 GitHub users to investigate what information they would expect in a summary of development activity, how they would measure development activity, and what factors in uence how such activity can be condensed into textual summaries or numbers. We found that unexpected events are as important as expected events in summaries of what a developer did, and that many developers do not believe in measuring development activity. Among the factors that in uence summarization and measurement of development activity, we identified development experience and programming languages.Christoph Treude, Fernando Figueira Filho, Uirá Kulesz
Data-Driven Decisions and Actions in Today’s Software Development
Today’s software development is all about data: data about the software product itself, about the process and its different stages, about the customers and markets, about the development, the testing, the integration, the deployment, or the runtime aspects in the cloud. We use static and dynamic data of various kinds and quantities to analyze market feedback, feature impact, code quality, architectural design alternatives, or effects of performance optimizations. Development environments are no longer limited to IDEs in a desktop application or the like but span the Internet using live programming environments such as Cloud9 or large-volume repositories such as BitBucket, GitHub, GitLab, or StackOverflow. Software development has become “live” in the cloud, be it the coding, the testing, or the experimentation with different product options on the Internet. The inherent complexity puts a further burden on developers, since they need to stay alert when constantly switching between tasks in different phases. Research has been analyzing the development process, its data and stakeholders, for decades and is working on various tools that can help developers in their daily tasks to improve the quality of their work and their productivity. In this chapter, we critically reflect on the challenges faced by developers in a typical release cycle, identify inherent problems of the individual phases, and present the current state of the research that can help overcome these issues
Delving into Commit-Issue Correlation to Enhance Commit Message Generation Models
Commit message generation (CMG) is a challenging task in automated software
engineering that aims to generate natural language descriptions of code changes
for commits. Previous methods all start from the modified code snippets,
outputting commit messages through template-based, retrieval-based, or
learning-based models. While these methods can summarize what is modified from
the perspective of code, they struggle to provide reasons for the commit. The
correlation between commits and issues that could be a critical factor for
generating rational commit messages is still unexplored.
In this work, we delve into the correlation between commits and issues from
the perspective of dataset and methodology. We construct the first dataset
anchored on combining correlated commits and issues. The dataset consists of an
unlabeled commit-issue parallel part and a labeled part in which each example
is provided with human-annotated rational information in the issue.
Furthermore, we propose \tool (\underline{Ex}traction, \underline{Gro}unding,
\underline{Fi}ne-tuning), a novel paradigm that can introduce the correlation
between commits and issues into the training phase of models. To evaluate
whether it is effective, we perform comprehensive experiments with various
state-of-the-art CMG models. The results show that compared with the original
models, the performance of \tool-enhanced models is significantly improved.Comment: ASE2023 accepted pape
- …