11 research outputs found
A Benchmark Study on Sentiment Analysis for Software Engineering Research
A recent research trend has emerged to identify developers' emotions, by
applying sentiment analysis to the content of communication traces left in
collaborative development environments. Trying to overcome the limitations
posed by using off-the-shelf sentiment analysis tools, researchers recently
started to develop their own tools for the software engineering domain. In this
paper, we report a benchmark study to assess the performance and reliability of
three sentiment analysis tools specifically customized for software
engineering. Furthermore, we offer a reflection on the open challenges, as they
emerge from a qualitative analysis of misclassified texts.Comment: Proceedings of 15th International Conference on Mining Software
Repositories (MSR 2018
Can We Use SE-specific Sentiment Analysis Tools in a Cross-Platform Setting?
In this paper, we address the problem of using sentiment analysis tools
'off-the-shelf,' that is when a gold standard is not available for retraining.
We evaluate the performance of four SE-specific tools in a cross-platform
setting, i.e., on a test set collected from data sources different from the one
used for training. We find that (i) the lexicon-based tools outperform the
supervised approaches retrained in a cross-platform setting and (ii) retraining
can be beneficial in within-platform settings in the presence of robust gold
standard datasets, even using a minimal training set. Based on our empirical
findings, we derive guidelines for reliable use of sentiment analysis tools in
software engineering.Comment: 12 page
SEntiMoji: An Emoji-Powered Learning Approach for Sentiment Analysis in Software Engineering
Sentiment analysis has various application scenarios in software engineering
(SE), such as detecting developers' emotions in commit messages and identifying
their opinions on Q&A forums. However, commonly used out-of-the-box sentiment
analysis tools cannot obtain reliable results on SE tasks and the
misunderstanding of technical jargon is demonstrated to be the main reason.
Then, researchers have to utilize labeled SE-related texts to customize
sentiment analysis for SE tasks via a variety of algorithms. However, the
scarce labeled data can cover only very limited expressions and thus cannot
guarantee the analysis quality. To address such a problem, we turn to the
easily available emoji usage data for help. More specifically, we employ
emotional emojis as noisy labels of sentiments and propose a representation
learning approach that uses both Tweets and GitHub posts containing emojis to
learn sentiment-aware representations for SE-related texts. These emoji-labeled
posts can not only supply the technical jargon, but also incorporate more
general sentiment patterns shared across domains. They as well as labeled data
are used to learn the final sentiment classifier. Compared to the existing
sentiment analysis methods used in SE, the proposed approach can achieve
significant improvement on representative benchmark datasets. By further
contrast experiments, we find that the Tweets make a key contribution to the
power of our approach. This finding informs future research not to unilaterally
pursue the domain-specific resource, but try to transform knowledge from the
open domain through ubiquitous signals such as emojis.Comment: Accepted by the 2019 ACM Joint European Software Engineering
Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE
2019). Please include ESEC/FSE in any citation
Use and misuse of the term "Experiment" in mining software repositories research
The significant momentum and importance of Mining Software Repositories (MSR) in Software Engineering (SE) has fostered new opportunities and challenges for extensive empirical research. However, MSR researchers seem to struggle to characterize the empirical methods they use into the existing empirical SE body of knowledge. This is especially the case of MSR experiments. To provide evidence on the special characteristics of MSR experiments and their differences with experiments traditionally acknowledged in SE so far, we elicited the hallmarks that differentiate an experiment from other types of empirical studies and characterized the hallmarks and types of experiments in MSR. We analyzed MSR literature obtained from a small-scale systematic mapping study to assess the use of the term experiment in MSR. We found that 19% of the papers claiming to be an experiment are indeed not an experiment at all but also observational studies, so they use the term in a misleading way. From the remaining 81% of the papers, only one of them refers to a genuine controlled experiment while the others stand for experiments with limited control. MSR researchers tend to overlook such limitations, compromising the interpretation of the results of their studies. We provide recommendations and insights to support the improvement of MSR experiments.This work has been partially supported by the Spanish project: MCI PID2020-117191RB-I00.Peer ReviewedPostprint (author's final draft
Opinion Mining for Software Development: A Systematic Literature Review
Opinion mining, sometimes referred to as sentiment analysis, has gained increasing attention in software engineering (SE) studies.
SE researchers have applied opinion mining techniques in various contexts, such as identifying developers’ emotions expressed in
code comments and extracting users’ critics toward mobile apps. Given the large amount of relevant studies available, it can take
considerable time for researchers and developers to figure out which approaches they can adopt in their own studies and what perils
these approaches entail.
We conducted a systematic literature review involving 185 papers. More specifically, we present 1) well-defined categories of opinion
mining-related software development activities, 2) available opinion mining approaches, whether they are evaluated when adopted in
other studies, and how their performance is compared, 3) available datasets for performance evaluation and tool customization, and 4)
concerns or limitations SE researchers might need to take into account when applying/customizing these opinion mining techniques.
The results of our study serve as references to choose suitable opinion mining tools for software development activities, and provide
critical insights for the further development of opinion mining techniques in the SE domain