Search CORE

89 research outputs found

How do you propose your code changes? Empirical analysis of affect metrics of pull requests on GitHub

Author: Destefanis Giuseppe
Graziotin Daniel
Marchesi Michele
Ortu Marco
Tonelli Roberto
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Software engineering methodologies rely on version control systems such as git to store source code artifacts and manage changes to the codebase. Pull requests include chunks of source code, history of changes, log messages around a proposed change of the mainstream codebase, and much discussion on whether to integrate such changes or not. A better understanding of what contributes to a pull request fate and latency will allow us to build predictive models of what is going to happen and when. Several factors can influence the acceptance of pull requests, many of which are related to the individual aspects of software developers. In this study, we aim to understand how the affect (e.g., sentiment, discrete emotions, and valence-arousal-dominance dimensions) expressed in the discussion of pull request issues influence the acceptance of pull requests. We conducted a mining study of large git software repositories and analyzed more than 150,000 issues with more than 1,000,000 comments in them. We built a model to understand whether the affect and the politeness have an impact on the chance of issues and pull requests to be merged - i.e., the code which fixes the issue is integrated in the codebase. We built two logistic classifiers, one without affect metrics and one with them. By comparing the two classifiers, we show that the affect metrics improve the prediction performance. Our results show that valence (expressed in comments received and posted by a reporter) and joy expressed in the comments written by a reporter are linked to a higher likelihood of issues to be merged. On the contrary, sadness, anger, and arousal expressed in the comments written by a reporter, and anger, arousal, and dominance expressed in the comments received by a reporter, are linked to a lower likelihood of a pull request to be merged

Archivio istituzionale della ricerca - Università di Cagliari

Analysis of Textual and Non-Textual Sources of Sentiment in Github

Author: De Zoysa Nalin
Publication venue: 'University of Waterloo'
Publication date: 26/05/2020
Field of study

Github is a collaborative platform that is used primarily for the development of software. In order to gain more insight into how teams work on Github, we wish to analyze the sentiment content available via communication on the platform. In order to do so, we first use existing sentiment analysis classifiers and compare the Github data to other social networks, Twitter and Reddit. By identifying that users are able to provide reactions to other users posts on Github, we use this as an indicator or label of sentiment information. Using this we first investigate whether repeated user interaction has an impact on sentiment and find that it is positively correlated to the amount of prior interaction as well as the directness of interaction. We also investigate if metrics corresponding to a user's status or power in a project correlate with positive sentiment received and find that it does. We then build sentiment classifiers using both textual and non textual information, both which outperform the generic sentiment scorer systems. In addition we show that a sentiment classifier built using only non-textual information can perform at a comparable level to that of a text-based classifier, indicating that there is significant sentiment information contained in non-textual information in the Github network

University of Waterloo's Institutional Repository

Effects of Online Self-Disclosure on Social Feedback During the COVID-19 Pandemic

Author: Lee Jooyoung
Rajtmajer Sarah
Srivatsavaya Eesha
Wilson Shomir
Publication venue
Publication date: 21/09/2022
Field of study

We investigate relationships between online self-disclosure and received social feedback during the COVID-19 crisis. We crawl a total of 2,399 posts and 29,851 associated comments from the r/COVID19_support subreddit and manually extract fine-grained personal information categories and types of social support sought from each post. We develop a BERT-based ensemble classifier to automatically identify types of support offered in users' comments. We then analyze the effect of personal information sharing and posts' topical, lexical, and sentiment markers on the acquisition of support and five interaction measures (submission scores, the number of comments, the number of unique commenters, the length and sentiments of comments). Our findings show that: 1) users were more likely to share their age, education, and location information when seeking both informational and emotional support, as opposed to pursuing either one; 2) while personal information sharing was positively correlated with receiving informational support when requested, it did not correlate with emotional support; 3) as the degree of self-disclosure increased, information support seekers obtained higher submission scores and longer comments, whereas emotional support seekers' self-disclosure resulted in lower submission scores, fewer comments, and fewer unique commenters; 4) post characteristics affecting social feedback differed significantly based on types of support sought by post authors. These results provide empirical evidence for the varying effects of self-disclosure on acquiring desired support and user involvement online during the COVID-19 pandemic. Furthermore, this work can assist support seekers hoping to enhance and prioritize specific types of social feedback

arXiv.org e-Print Archive

How Are Communication Channels on GitHub Presented to Their Intended Audience? -- A Thematic Analysis

Author: Ebert Verena
Graziotin Daniel
Wagner Stefan
Publication venue
Publication date: 03/05/2022
Field of study

Communication is essential in software development, and even more in distributed settings. Communication activities need to be organized and coordinated to defend against the threat of productivity losses, increases in cognitive load, and stress among team members. With a plethora of communication channels that were identified by previous research in open-source projects, there is a need to explore organizational issues in how these communication channels are introduced, explained, and motivated for use among all project members. In this study, we wanted to understand which communication channels are used in GitHub projects and how they are presented to the GitHub project audience. We employed thematic analysis to analyze 151 artifacts in 90 GitHub projects. Our results revealed 32 unique communications channels that can be divided into nine different types. Projects mostly provide channels of different types, but for some types (e.g., chat) it is common to provide several channels. Maintainers are aware that channels have different properties and help the developers to decide which channel should be used in which case. However, this is not true for all projects, and often we have not found any explicit reasons why maintainers chose to provide one channel over another. Different channels can be used for different purposes and have different affordances, so maintainers have to decide wisely which channels they want to provide and make clear which channel should be used in which case. Otherwise, developers might feel overwhelmed of too many channels and information can get fragmented over multiple channels.Comment: 10 pages, 5 figures. Accepted for presentation at the International Conference on Evaluation and Assessment in Software Engineering (EASE) 202

arXiv.org e-Print Archive

Delving into Commit-Issue Correlation to Enhance Commit Message Generation Models

Author: He Yichen
Li Zhoujun
Ren Changyu
Shi Shuhua
Tang Xunzhu
Wang Liran
Yan Chaoran
Publication venue
Publication date: 28/09/2023
Field of study

Commit message generation (CMG) is a challenging task in automated software engineering that aims to generate natural language descriptions of code changes for commits. Previous methods all start from the modified code snippets, outputting commit messages through template-based, retrieval-based, or learning-based models. While these methods can summarize what is modified from the perspective of code, they struggle to provide reasons for the commit. The correlation between commits and issues that could be a critical factor for generating rational commit messages is still unexplored. In this work, we delve into the correlation between commits and issues from the perspective of dataset and methodology. We construct the first dataset anchored on combining correlated commits and issues. The dataset consists of an unlabeled commit-issue parallel part and a labeled part in which each example is provided with human-annotated rational information in the issue. Furthermore, we propose \tool (\underline{Ex}traction, \underline{Gro}unding, \underline{Fi}ne-tuning), a novel paradigm that can introduce the correlation between commits and issues into the training phase of models. To evaluate whether it is effective, we perform comprehensive experiments with various state-of-the-art CMG models. The results show that compared with the original models, the performance of \tool-enhanced models is significantly improved.Comment: ASE2023 accepted pape

arXiv.org e-Print Archive

Effects of Personality Traits and Emotional Factors in Pull Request Acceptance.

Author: Iyer Rahul
Publication venue: 'University of Waterloo'
Publication date: 15/08/2019
Field of study

Social interactions in the form of discussion are an indispensable part of collaborative software development. The discussions are essential for developers to share their views and to form a strong relationship with other teammates. These discussions invoke both positive and negative emotions such as joy, love, aggression, and disgust. Additionally, developers also exhibit hidden behaviours that dictate their personality. Some developers can be supportive and open to new ideas, whereas others can be conservative. Past research has shown that the personality of the developers has a significant role in determining the success of the task they collaboratively perform. Additionally, previous research has also shown that in online collaborative environments, the developers use signals from comments such as rudeness to determine if they are compatible to work together. Most of these studies use traditional small-scale surveys for their experiments. The transparent nature of online collaborative environments makes it easier to conduct empirical experiments by mining pull request comments. In this thesis, first, we investigate the effect of different personality traits on pull request acceptance. The results of this experiment will provide us with a valuable understanding of the personality traits of developers and help us develop tools to assist developers. We follow it with a second experiment to understand the influence of different emotional factors on pull request decisions. The emotion expressed by a developer on their teammates can be influenced by social statuses, such as the number of followers. Moreover, the teammate's team status, such as team member or outside contributor too, can influence the emotional effect. To understand moderation, we investigate different interaction effects. We start the experiment by replicating Tsay et al.'s work that examined the influence of social factors (e.g., `social distance') and technical factors (e.g., test file inclusion) for evaluating contributions. We extend their work by augmenting it with personality traits of developers and examining the influence of on the pull request evaluation process in GitHub. In particular, we extract the `Big Five' personality traits (Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism) of developers from their online digital footprints, such as pull request comments. We analyze the personality traits of 16,935 active developers from 1,860 projects and compare their relative importance to other non-personality factors from past research, in the pull request evaluation process. We find that pull requests from authors (requesters) who are more open and conscientious, but less extroverted, have a higher chance of approval. Furthermore, pull requests that are closed by developers (closers) who are more conscientious, extroverted, and neurotic, have a higher likelihood of acceptance. The larger the difference in personality traits between the requester and the closer, the more positive effect it has on pull request acceptance. Although the effect of personality traits is significant and comparable to technical factors, we find that social factors are still more influential when it comes to the effect in the likelihood of pull request acceptance. We perform a second experiment to analyze the effect of emotions on pull request decisions. To predict emotions in the comments, we develop a generalised, software engineering specific language model that outperforms previous machine learning algorithms on four different standard datasets. We find that the percentage of positive comments from both requester and closer has a positive association with pull request acceptance, whereas the percentage of negative comments has a negative association. Also, the polarity of the emotion associated with the first comment of both requester and closer had a positive association with pull request acceptance, i.e., more positive the emotion, the higher the likelihood of acceptance. Finally, we find that social factors moderate the effects of emotions

University of Waterloo's Institutional Repository

Coding vs presenting: a multicultural study on emotions

Author: Casado Lumbreras Cristina
Colomo Palacios Ricardo
Yilmaz Murat
Álvarez Rodríguez José María
Publication venue: 'Emerald'
Publication date: 13/08/2020
Field of study

Purpose: The purpose of this paper is to explore and compare emotions perceived while coding and presenting for software students, comparing three different countries and performing also a gender analysis. Design/methodology/approach: Empirical data are gathered by means of the discrete emotions questionnaire, which was distributed to a group of students (n = 174) in three different countries: Norway, Spain and Turkey. All emotions are self-assessed by means of a Likert scale. Findings: The results show that both tasks are emotionally different for the subjects of all countries: presentation is described as a task that produces mainly fear and anxiety; whereas coding tasks produce anger and rage, but also happiness and satisfaction. With regards to gender differences, men feel less scared in presentation tasks, whereas women report more desire in coding activities. It is concluded that it is important to be aware and take into account the different emotions perceived by students in their activities. Moreover, it is also important to note the different intensities in these emotions present in different cultures and genders. Originality/value: This study is among the few to study emotions perceived in software work by means of a multicultural approach using quantitative research methods. The research results enrich computing literacy theory in human factors

Universidad Carlos III de Madrid e-Archivo

Mining software repositories: measuring effectiveness and affectiveness in software systems.

Author
Publication venue: Università degli Studi di Cagliari
Publication date: 27/04/2015
Field of study

Software Engineering field has many goals, among them we can certainly deal with monitoring and controlling the development process in order to meet the business requirements of the released software artifact. Software engineers need to have empirical evidence that the development process and the overall quality of software artifacts is converging to the required features. Improving the development process's Effectiveness leads to higher productivity, meaning shorter time to market, but understanding or even measuring the software de- velopment process is an hard challenge. Modern software is the result of a complex process involving many stakeholders such as product owners, quality assurance teams, project manager and, above all, developers. All these stake- holders use complex software systems for managing development process, issue tracking, code versioning, release scheduling and many other aspect concerning software development. Tools for project management and issues/bugs tracking are becoming useful for governing the development process of Open Source soft- ware. Such tools simplify the communications process among developers and ensure the scalability of a project. The more information developers are able to exchange, the clearer are the goals, and the higher is the number of developers keen on joining and actively collaborating on a project. By analyzing data stored in such systems, researchers are able to study and address questions such as: Which are the factors able to impact the software productivity? Is it possible to improve software productivity shortening the time to market?. The present work addresses two major aspect of software development pro- cess: Effectiveness and Affectiveness. By analyzing data stored in project man- agement and in issue tracking system of Open Source Communities, we mea- sured the Effectiveness as the time required to resolve an issue and analyzed factors able to impact it

Archivio istituzionale della ricerca - Università di Cagliari

Mining software repositories: measuring effectiveness and affectiveness in software systems.

Author: Ortu Marco
Publication venue
Publication date: 27/04/2015
Field of study

Archivio istituzionale della ricerca - Università di Cagliari

UniCA Eprints