276 research outputs found

    A Gold Standard for Emotion Annotation in Stack Overflow

    Full text link
    Software developers experience and share a wide range of emotions throughout a rich ecosystem of communication channels. A recent trend that has emerged in empirical software engineering studies is leveraging sentiment analysis of developers' communication traces. We release a dataset of 4,800 questions, answers, and comments from Stack Overflow, manually annotated for emotions. Our dataset contributes to the building of a shared corpus of annotated resources to support research on emotion awareness in software development.Comment: To appear in Proceedings of the 15th International Conference on Mining Software Repositories (MSR '18) Data Showcase Track, 28-29 May, Gothenburg, Swede

    A Benchmark Study on Sentiment Analysis for Software Engineering Research

    Full text link
    A recent research trend has emerged to identify developers' emotions, by applying sentiment analysis to the content of communication traces left in collaborative development environments. Trying to overcome the limitations posed by using off-the-shelf sentiment analysis tools, researchers recently started to develop their own tools for the software engineering domain. In this paper, we report a benchmark study to assess the performance and reliability of three sentiment analysis tools specifically customized for software engineering. Furthermore, we offer a reflection on the open challenges, as they emerge from a qualitative analysis of misclassified texts.Comment: Proceedings of 15th International Conference on Mining Software Repositories (MSR 2018

    How to Ask for Technical Help? Evidence-based Guidelines for Writing Questions on Stack Overflow

    Full text link
    Context: The success of Stack Overflow and other community-based question-and-answer (Q&A) sites depends mainly on the will of their members to answer others' questions. In fact, when formulating requests on Q&A sites, we are not simply seeking for information. Instead, we are also asking for other people's help and feedback. Understanding the dynamics of the participation in Q&A communities is essential to improve the value of crowdsourced knowledge. Objective: In this paper, we investigate how information seekers can increase the chance of eliciting a successful answer to their questions on Stack Overflow by focusing on the following actionable factors: affect, presentation quality, and time. Method: We develop a conceptual framework of factors potentially influencing the success of questions in Stack Overflow. We quantitatively analyze a set of over 87K questions from the official Stack Overflow dump to assess the impact of actionable factors on the success of technical requests. The information seeker reputation is included as a control factor. Furthermore, to understand the role played by affective states in the success of questions, we qualitatively analyze questions containing positive and negative emotions. Finally, a survey is conducted to understand how Stack Overflow users perceive the guideline suggestions for writing questions. Results: We found that regardless of user reputation, successful questions are short, contain code snippets, and do not abuse with uppercase characters. As regards affect, successful questions adopt a neutral emotional style. Conclusion: We provide evidence-based guidelines for writing effective questions on Stack Overflow that software engineers can follow to increase the chance of getting technical help. As for the role of affect, we empirically confirmed community guidelines that suggest avoiding rudeness in question writing.Comment: Preprint, to appear in Information and Software Technolog

    Intensional Learning to Efficiently Build up Automatically Annotated Emotion Corpora

    Get PDF
    Textual emotion detection has a high impact on business, society, politics or education with applications such as, detecting depression or personality traits, suicide prevention or identifying cases of cyber-bulling. Given this context, the objective of our research is to contribute to the improvement of emotion recognition task through an automatic technique focused on reducing both the time and cost needed to develop emotion corpora. Our proposal is to exploit a bootstrapping approach based on intensional learning for automatic annotations with two main steps: 1) an initial similarity-based categorization where a set of seed sentences is created and extended by distributional semantic similarity (word vectors or word embeddings); 2) train a supervised classifier on the initially categorized set. The technique proposed allows us an efficient annotation of a large amount of emotion data with standards of reliability according to the evaluation results.This research has been supported by the FPI grant (BES-2013-065950) and the research stay grants (EEBB-I-15-10108 and EEBB-I-16-11174) from the Spanish Ministry of Science and Innovation. It has also funded by the Spanish Government (DIGITY ref. TIN2015-65136-C02-2-R and RESCATA ref. TIN2015-65100-R), the Valencian Government (grant no. PROMETEOII/ 2014/001), the University of Alicante (ref. GRE16-01) and BBVA Foundation (Análisis de Sentimientos Aplicado a la Prevención del Suicidio en las Redes Sociales (ASAP) project)

    20-MAD -- 20 Years of Issues and Commits of Mozilla and Apache Development

    Full text link
    Data of long-lived and high profile projects is valuable for research on successful software engineering in the wild. Having a dataset with different linked software repositories of such projects, enables deeper diving investigations. This paper presents 20-MAD, a dataset linking the commit and issue data of Mozilla and Apache projects. It includes over 20 years of information about 765 projects, 3.4M commits, 2.3M issues, and 17.3M issue comments, and its compressed size is over 6 GB. The data contains all the typical information about source code commits (e.g., lines added and removed, message and commit time) and issues (status, severity, votes, and summary). The issue comments have been pre-processed for natural language processing and sentiment analysis. This includes emoticons and valence and arousal scores. Linking code repository and issue tracker information, allows studying individuals in two types of repositories and provide more accurate time zone information for issue trackers as well. To our knowledge, this the largest linked dataset in size and in project lifetime that is not based on GitHub.Comment: 17th International Conference on Mining Software Repositories, 202

    Exploring embedding vectors for emotion detection

    Get PDF
    Textual data nowadays is being generated in vast volumes. With the proliferation of social media and the prevalence of smartphones, short texts have become a prevalent form of information such as news headlines, tweets and text advertisements. Given the huge volume of short texts available, effective and efficient models to detect the emotions from short texts become highly desirable and in some cases fundamental to a range of applications that require emotion understanding of textual content, such as human computer interaction, marketing, e-learning and health. Emotion detection from text has been an important task in Natural Language Processing (NLP) for many years. Many approaches have been based on the emotional words or lexicons in order to detect emotions. While the word embedding vectors like Word2Vec have been successfully employed in many NLP approaches, the word mover’s distance (WMD) is a method introduced recently to calculate the distance between two documents based on the embedded words. This thesis is investigating the ability to detect or classify emotions in sentences using word vectorization and distance measures. Our results confirm the novelty of using Word2Vec and WMD in predicting the emotions in short text. We propose a new methodology based on identifying “idealised” vectors that cap- ture the essence of an emotion; we define these vectors as having the minimal distance (using some metric function) between a vector and the embeddings of the text that contains the relevant emotion (e.g. a tweet, a sentence). We look for these vectors through searching the space of word embeddings using the covariance matrix adap- tation evolution strategy (CMA-ES). Our method produces state of the art results, surpassing classic supervised learning methods

    Emotions ontology for collaborative modelling and learning of emotional responses

    Get PDF
    Emotions-aware applications are getting a lot of attention as a way to improve the user experience, and also thanks to increasingly affordable Brain Computer Interfaces (BCI). Thus, projects collecting emotion-related data are proliferating, like social networks sentiment analysis or tracking students" engagement to reduce Massive Online Open Courses (MOOCs) drop out rates. All them require a common way to represent emotions so it can be more easily integrated, shared and reused by applications improving user experience. Due to the complexity of this data, our proposal is to use rich semantic models based on ontology. EmotionsOnto is a generic ontology for describing emotions and their detection and expression systems taking contextual and multimodal elements into account. The ontology has been applied in the context of EmoCS, a project that collaboratively collects emotion common sense and models it using the EmotionsOnto and other ontologies. Currently, emotion input is provided manually by users. However, experiments are being conduced to automatically measure users"s emotional states using Brain Computer Interfaces
    • …
    corecore