4 research outputs found
SEntiMoji: An Emoji-Powered Learning Approach for Sentiment Analysis in Software Engineering
Sentiment analysis has various application scenarios in software engineering
(SE), such as detecting developers' emotions in commit messages and identifying
their opinions on Q&A forums. However, commonly used out-of-the-box sentiment
analysis tools cannot obtain reliable results on SE tasks and the
misunderstanding of technical jargon is demonstrated to be the main reason.
Then, researchers have to utilize labeled SE-related texts to customize
sentiment analysis for SE tasks via a variety of algorithms. However, the
scarce labeled data can cover only very limited expressions and thus cannot
guarantee the analysis quality. To address such a problem, we turn to the
easily available emoji usage data for help. More specifically, we employ
emotional emojis as noisy labels of sentiments and propose a representation
learning approach that uses both Tweets and GitHub posts containing emojis to
learn sentiment-aware representations for SE-related texts. These emoji-labeled
posts can not only supply the technical jargon, but also incorporate more
general sentiment patterns shared across domains. They as well as labeled data
are used to learn the final sentiment classifier. Compared to the existing
sentiment analysis methods used in SE, the proposed approach can achieve
significant improvement on representative benchmark datasets. By further
contrast experiments, we find that the Tweets make a key contribution to the
power of our approach. This finding informs future research not to unilaterally
pursue the domain-specific resource, but try to transform knowledge from the
open domain through ubiquitous signals such as emojis.Comment: Accepted by the 2019 ACM Joint European Software Engineering
Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE
2019). Please include ESEC/FSE in any citation
Individual differences limit predicting well-being and productivity using software repositories : a longitudinal industrial study
Reports of poor work well-being and fluctuating productivity in software engineering have been reported in both academic and popular sources. Understanding and predicting these issues through repository analysis might help manage software developers' well-being. Our objective is to link data from software repositories, that is commit activity, communication, expressed sentiments, and job events, with measures of well-being obtained with a daily experience sampling questionnaire. To achieve our objective, we studied a single software project team for eight months in the software industry. Additionally, we performed semi-structured interviews to explain our results. The acquired quantitative data are analyzed with generalized linear mixed-effects models with autocorrelation structure. We find that individual variance accounts for most of the R-2 values in models predicting developers' experienced well-being and productivity. In other words, using software repository variables to predict developers' well-being or productivity is challenging due to individual differences. Prediction models developed for each developer individually work better, with fixed effects R-2 value of up to 0.24. The semi-structured interviews give insights into the well-being of software developers and the benefits of chat interaction. Our study suggests that individualized prediction models are needed for well-being and productivity prediction in software development.Peer reviewe
Opinion Mining for Software Development: A Systematic Literature Review
Opinion mining, sometimes referred to as sentiment analysis, has gained increasing attention in software engineering (SE) studies.
SE researchers have applied opinion mining techniques in various contexts, such as identifying developers’ emotions expressed in
code comments and extracting users’ critics toward mobile apps. Given the large amount of relevant studies available, it can take
considerable time for researchers and developers to figure out which approaches they can adopt in their own studies and what perils
these approaches entail.
We conducted a systematic literature review involving 185 papers. More specifically, we present 1) well-defined categories of opinion
mining-related software development activities, 2) available opinion mining approaches, whether they are evaluated when adopted in
other studies, and how their performance is compared, 3) available datasets for performance evaluation and tool customization, and 4)
concerns or limitations SE researchers might need to take into account when applying/customizing these opinion mining techniques.
The results of our study serve as references to choose suitable opinion mining tools for software development activities, and provide
critical insights for the further development of opinion mining techniques in the SE domain
On the use of emoticons in open source software development
Abstract
Background: Using sentiment analysis to study software developers’ behavior comes with challenges such as the presence of a large amount of technical discussion unlikely to express any positive or negative sentiment. However, emoticons provide information about developer sentiments that can easily be extracted from software repositories.
Aim: We investigate how software developers use emoticons differently in issue trackers in order to better understand the differences between developers and determine to which extent emoticons can be used as in place of sentiment analysis.
Method: We extract emoticons from 1.3M comments from Apache’s issue tracker and 4.5M from Mozilla’s issue tracker using regular expressions built from a list of emoticons used by SentiStrength and Wikipedia. We check for statistical differences using Mann-Whitney U tests and determine the effect size with Cliff’s δ.
Results: Overall Mozilla developers rely more on emoticons than Apache developers. While the overall rate of comments with emoticons is of 1% and 3% for Apache and Mozilla, some individual developers can have a rate up to 21%. Looking specifically at Mozilla developers, we find that western developers use significantly more emoticons (with medium size effect) than eastern developers. While the majority of emoticons are used to express joy, we find that Mozilla developers use emoticons more frequently to express sadness and surprise than Apache developers. Finally, we find that Apache developers use overall more emoticons during weekends than during weekdays, with the share of sad and surprised emoticons increasing during weekends.
Conclusions: While emoticons are primarily used to express joy, the more occasional use of sad and surprised emoticons can potentially be utilized to detect frustration in place of sentiment analysis among developers using emoticons frequently enough