4,309 research outputs found

    Towards Automatic Generation of Short Summaries of Commits

    Full text link
    Committing to a version control system means submitting a software change to the system. Each commit can have a message to describe the submission. Several approaches have been proposed to automatically generate the content of such messages. However, the quality of the automatically generated messages falls far short of what humans write. In studying the differences between auto-generated and human-written messages, we found that 82% of the human-written messages have only one sentence, while the automatically generated messages often have multiple lines. Furthermore, we found that the commit messages often begin with a verb followed by an direct object. This finding inspired us to use a "verb+object" format in this paper to generate short commit summaries. We split the approach into two parts: verb generation and object generation. As our first try, we trained a classifier to classify a diff to a verb. We are seeking feedback from the community before we continue to work on generating direct objects for the commits.Comment: 4 pages, accepted in ICPC 2017 ERA Trac

    Towards automatic context-aware summarization of code entities

    Get PDF
    Software developers are working with different methods and classes and in order to understand those that perplex them and–or that are part of their tasks, they need to tackle with a huge amount of information. Therefore, providing developers with high-quality summaries of code entities can help them during their maintenance and evolution tasks. To provide useful information about the purpose of code entities, informal documentation (Stack Overflow) has been shown to be an important source of information that can be leveraged. In this study, we investigate bug reports as a type of informal documentation and we apply machine learning to produce summaries of code entities (methods and classes) in bug reports. In the proposed approach, code entities are extracted using a technique in a form of an island parser that we implemented to identify code in bug reports. Additionally, we applied machine learning to select a set of useful sentences that will be part of the code entities’ summaries. We have used logistic regression as our machine learning technique to rank sentences based on their importance. To this aim, a corpus of sentences is built based on the occurrence of code entities in the sentences belonging to bug reports containing the code entities in question. In the last step, summaries have been evaluated using surveys to estimate the quality of produced summaries. The results show that the automatically produced summaries can reduce time and effort to understand the usage of code entities. Specifically, the majority of participants found summaries extremely helpful to decrease the understanding time (43.5%) and the effort to understand the code entities (39.1%). In the future, summaries can be produced by using other informal documentation such as mailing lists or stack overflow, etc. Additionally, the approach can be applied in practical settings. Consequently, it can be used within an IDE such as Eclipse to assist developers during their software maintenance and evolution tasks

    Explainable Software Bot Contributions: Case Study of Automated Bug Fixes

    Full text link
    In a software project, esp. in open-source, a contribution is a valuable piece of work made to the project: writing code, reporting bugs, translating, improving documentation, creating graphics, etc. We are now at the beginning of an exciting era where software bots will make contributions that are of similar nature than those by humans. Dry contributions, with no explanation, are often ignored or rejected, because the contribution is not understandable per se, because they are not put into a larger context, because they are not grounded on idioms shared by the core community of developers. We have been operating a program repair bot called Repairnator for 2 years and noticed the problem of "dry patches": a patch that does not say which bug it fixes, or that does not explain the effects of the patch on the system. We envision program repair systems that produce an "explainable bug fix": an integrated package of at least 1) a patch, 2) its explanation in natural or controlled language, and 3) a highlight of the behavioral difference with examples. In this paper, we generalize and suggest that software bot contributions must explainable, that they must be put into the context of the global software development conversation

    Data-Driven Decisions and Actions in Today’s Software Development

    Full text link
    Today’s software development is all about data: data about the software product itself, about the process and its different stages, about the customers and markets, about the development, the testing, the integration, the deployment, or the runtime aspects in the cloud. We use static and dynamic data of various kinds and quantities to analyze market feedback, feature impact, code quality, architectural design alternatives, or effects of performance optimizations. Development environments are no longer limited to IDEs in a desktop application or the like but span the Internet using live programming environments such as Cloud9 or large-volume repositories such as BitBucket, GitHub, GitLab, or StackOverflow. Software development has become “live” in the cloud, be it the coding, the testing, or the experimentation with different product options on the Internet. The inherent complexity puts a further burden on developers, since they need to stay alert when constantly switching between tasks in different phases. Research has been analyzing the development process, its data and stakeholders, for decades and is working on various tools that can help developers in their daily tasks to improve the quality of their work and their productivity. In this chapter, we critically reflect on the challenges faced by developers in a typical release cycle, identify inherent problems of the individual phases, and present the current state of the research that can help overcome these issues

    Summarizing and measuring development activity

    Get PDF
    Software developers pursue a wide range of activities as part of their work, and making sense of what they did in a given time frame is far from trivial as evidenced by the large number of awareness and coordination tools that have been developed in recent years. To inform tool design for making sense of the information available about a developer's activity, we conducted an empirical study with 156 GitHub users to investigate what information they would expect in a summary of development activity, how they would measure development activity, and what factors in uence how such activity can be condensed into textual summaries or numbers. We found that unexpected events are as important as expected events in summaries of what a developer did, and that many developers do not believe in measuring development activity. Among the factors that in uence summarization and measurement of development activity, we identified development experience and programming languages.Christoph Treude, Fernando Figueira Filho, Uirá Kulesz

    Delving into Commit-Issue Correlation to Enhance Commit Message Generation Models

    Full text link
    Commit message generation (CMG) is a challenging task in automated software engineering that aims to generate natural language descriptions of code changes for commits. Previous methods all start from the modified code snippets, outputting commit messages through template-based, retrieval-based, or learning-based models. While these methods can summarize what is modified from the perspective of code, they struggle to provide reasons for the commit. The correlation between commits and issues that could be a critical factor for generating rational commit messages is still unexplored. In this work, we delve into the correlation between commits and issues from the perspective of dataset and methodology. We construct the first dataset anchored on combining correlated commits and issues. The dataset consists of an unlabeled commit-issue parallel part and a labeled part in which each example is provided with human-annotated rational information in the issue. Furthermore, we propose \tool (\underline{Ex}traction, \underline{Gro}unding, \underline{Fi}ne-tuning), a novel paradigm that can introduce the correlation between commits and issues into the training phase of models. To evaluate whether it is effective, we perform comprehensive experiments with various state-of-the-art CMG models. The results show that compared with the original models, the performance of \tool-enhanced models is significantly improved.Comment: ASE2023 accepted pape
    • …
    corecore