31 research outputs found

    Rationale in Development Chat Messages: An Exploratory Study

    Full text link
    Chat messages of development teams play an increasingly significant role in software development, having replaced emails in some cases. Chat messages contain information about discussed issues, considered alternatives and argumentation leading to the decisions made during software development. These elements, defined as rationale, are invaluable during software evolution for documenting and reusing development knowledge. Rationale is also essential for coping with changes and for effective maintenance of the software system. However, exploiting the rationale hidden in the chat messages is challenging due to the high volume of unstructured messages covering a wide range of topics. This work presents the results of an exploratory study examining the frequency of rationale in chat messages, the completeness of the available rationale and the potential of automatic techniques for rationale extraction. For this purpose, we apply content analysis and machine learning techniques on more than 8,700 chat messages from three software development projects. Our results show that chat messages are a rich source of rationale and that machine learning is a promising technique for detecting rationale and identifying different rationale elements.Comment: 11 pages, 6 figures. The 14th International Conference on Mining Software Repositories (MSR'17

    On Using Active Learning and Self-Training when Mining Performance Discussions on Stack Overflow

    Full text link
    Abundant data is the key to successful machine learning. However, supervised learning requires annotated data that are often hard to obtain. In a classification task with limited resources, Active Learning (AL) promises to guide annotators to examples that bring the most value for a classifier. AL can be successfully combined with self-training, i.e., extending a training set with the unlabelled examples for which a classifier is the most certain. We report our experiences on using AL in a systematic manner to train an SVM classifier for Stack Overflow posts discussing performance of software components. We show that the training examples deemed as the most valuable to the classifier are also the most difficult for humans to annotate. Despite carefully evolved annotation criteria, we report low inter-rater agreement, but we also propose mitigation strategies. Finally, based on one annotator's work, we show that self-training can improve the classification accuracy. We conclude the paper by discussing implication for future text miners aspiring to use AL and self-training.Comment: Preprint of paper accepted for the Proc. of the 21st International Conference on Evaluation and Assessment in Software Engineering, 201

    On the Helpfulness of Answering Developer Questions on Discord with Similar Conversations and Posts from the Past

    Full text link
    A big part of software developers’ time is spent finding answers to their coding-task-related questions. To answer their questions, developers usually perform web searches, ask questions on Q&A websites, or, more recently, in chat communities. Yet, many of these questions have frequently already been answered in previous chat conversations or other online communities. Automatically identifying and then suggesting these previous answers to the askers could, thus, save time and effort. In an empirical analysis, we first explored the frequency of repeating questions on the Discord chat platform and assessed our approach to identify them automatically. The approach was then evaluated with real-world developers in a field experiment, through which we received 142 ratings on the helpfulness of the suggestions we provided to help answer 277 questions that developers posted in four Discord communities. We further collected qualitative feedback through 53 surveys and 10 follow-up interviews. We found that the suggestions were considered helpful in 40% of the cases, that suggesting Stack Overflow posts is more often considered helpful than past Discord conversations, and that developers have difficulties describing their problems as search queries and, thus, prefer describing them as natural language questions in online communities

    Using multiclass classification algorithms to improve text categorization tool:NLoN

    Get PDF
    Abstract. Natural language processing (NLP) and machine learning techniques have been widely utilized in the mining software repositories (MSR) field in recent years. Separating natural language from source code is a pre-processing step that is needed in both NLP and the MSR domain for better data quality. This paper presents the design and implementation of a multi-class classification approach that is based on the existing open-source R package Natural Language or Not (NLoN). This article also reviews the existing literature on MSR and NLP. The review classified the information sources and approaches of MSR in detail, and also focused on the text representation and classification tasks of NLP. In addition, the design and implementation methods of the original paper are briefly introduced. Regarding the research methodology, since the research goal is technology-oriented, i.e., to improve the design and implementation of existing technologies, this article adopts the design science research methodology and also describes how the methodology was adopted. This research implements an open-source Python library, namely NLoN-PY. This is an open-source library hosted on GitHub, and users can also directly use the tools published to the PyPI library. Since NLoN has achieved comparable performance on two-class classification tasks with the Lasso regression model, this study evaluated other multi-class classification algorithms, i.e., Naive Bayes, k-Nearest Neighbours, and Support Vector Machine. Using 10-fold cross-validation, the expanded classifier achieved AUC performance of 0.901 for the 5-class classification task and the AUC performance of 0.92 for the 2-class task. Although the design of this study did not show a significant performance improvement compared to the original design, the impact of unbalanced data distribution on performance was detected and the category of the classification problem was also refined in the process. These findings on the multi-class classification design can provide a research foundation or direction for future research

    Analyzing the Predictability of Source Code and its Application in Creating Parallel Corpora for English-to-Code Statistical Machine Translation

    Get PDF
    Analyzing source code using computational linguistics and exploiting the linguistic properties of source code have recently become popular topics in the domain of software engineering. In the first part of the thesis, we study the predictability of source code and determine how well source code can be represented using language models developed for natural language processing. In the second part, we study how well English discussions of source code can be aligned with code elements to create parallel corpora for English-to-code statistical machine translation. This work is organized as a “manuscript” thesis whereby each core chapter constitutes a submitted paper. The first part replicates recent works that have concluded that software is more repetitive and predictable, i.e. more natural, than English texts. We find that much of the apparent “naturalness” is artificial and is the result of language specific tokens. For example, the syntax of a language, especially the separators e.g., semi-colons and brackets, make up for 59% of all uses of Java tokens in our corpus. Furthermore, 40% of all 2-grams end in a separator, implying that a model for autocompleting the next token, would have a trivial separator as top suggestion 40% of the time. By using the standard NLP practice of eliminating punctuation (e.g., separators) and stopwords (e.g., keywords) we find that code is less repetitive and predictable than was suggested by previous work. We replicate this result across 7 programming languages. Continuing this work, we find that unlike the code written for a particular project, API code usage is similar across projects. For example a file is opened and closed in the same manner irrespective of domain. When we restrict our n-grams to those contained in the Java API we find that the entropy for 2-grams is significantly lower than the English corpus. This repetition perhaps explains the successful literature on API usage suggestion and autocompletion. We then study the impact of the representation of code on repetition. The n-gram model assumes that the current token can be predicted by the sequence of n previous tokens. When we extract program graphs of size 2, 3, and 4 nodes we see that the abstract graph representation is much more concise and repetitive than the n-gram representations of the same code. This suggests that future work should focus on graphs that include control and data flow dependencies and not linear sequences of tokens. The second part of this thesis focuses cleaning English and code corpora to aid in machine translation. Generating source code API sequences from an English query using Machine Translation (MT) has gained much interest in recent years. For any kind of MT, the model needs to be trained on a parallel corpus. We clean StackOverflow, one of the most popular online discussion forums for programmers, to generate a parallel English-Code corpora. We contrast three data cleaning approaches: standard NLP, title only, and software task. We evaluate the quality of each corpus for MT. We measure the corpus size, percentage of unique tokens, and per-word maximum likelihood alignment entropy. While many works have shown that code is repetitive and predictable, we find that English discussions of code are also repetitive. Creating a maximum likelihood MT model, we find that English words map to a small number of specific code elements which partially explains the success of using StackOverflow for search and other tasks in the software engineering literature and paves the way for MT. Our scripts and corpora are publicly available

    Using Crowd-Based Software Repositories to Better Understand Developer-User Interactions

    Get PDF
    Software development is a complex process. To serve the final software product to the end user, developers need to rely on a variety of software artifacts throughout the development process. The term software repository used to denote only containers of source code such as version control systems; more recent usage has generalized the concept to include a plethora of software development artifact kinds and their related meta-data. Broadly speaking, software repositories include version control systems, technical documentation, issue trackers, question and answer sites, distribution information, etc. The software repositories can be based on a specific project (e.g., bug tracker for Firefox), or be crowd-sourced (e.g., questions and answers on technical Q&A websites). Crowd-based software artifacts are created as by-products of developer-user interactions which are sometimes referred to as communication channels. In this thesis, we investigate three distinct crowd-based software repositories that follow different models of developer-user interactions. We believe through a better understanding of the crowd-based software repositories, we can identify challenges in software development and provide insights to improve the software development process. In our first study, we investigate Stack Overflow. It is the largest collection of programming related questions and answers. On Stack Overflow, developers interact with other developers to create crowd-sourced knowledge in the form of questions and answers. The results of the interactions (i.e., the question threads) become valuable information to the entire developer community. Prior research on Stack Overflow tacitly assume that questions receives answers directly on the platform and no need of interaction is required during the process. Meanwhile, the platform allows attaching comments to questions which forms discussions of the question. Our study found that question discussions occur for 59.2% of questions on Stack Overflow. For discussed and solved questions on Stack Overflow, 80.6% of the questions have the discussion begin before the accepted answer is submitted. The results of our study show the importance and nuances of interactions in technical Q&A. We then study dotfiles, a set of publicly shared user-specific configuration files for software tools. There is a culture of sharing dotfiles within the developer community, where the idea is to learn from other developers’ dotfiles and share your variants. The interaction of dotfiles sharing can be viewed as developers sources information from other developers, adapt the information to their own needs, and share their adaptations back to the community. Our study on dotfiles suggests that is a common practice among developers to share dotfiles where 25.8% of the most stared users on GitHub have a dotfiles repository. We provide a taxonomy of the commonly tracked dotfiles and a qualitative study on the commits in dotfiles repositories. We also leveraged the state-of-the-art time-series clustering technique (K-shape) to identify code churn pattern for dotfile edits. This study is the first step towards understanding the practices of maintaining and sharing dotfiles. Finally, we study app stores, the platforms that distribute software products and contain many non-technical attributes (e.g., ratings and reviews) of software products. Three major stakeholders interacts with each other in app stores: the app store owner who governs the operation of the app store; developers who publish applications on the app store; and users who browse and download applications in the app store. App stores often provide means of interaction between all three actors (e.g., app reviews, store policy) and sometimes interactions with in the same actor (e.g., developer forum). We surveyed existing app stores to extract key features from app store operation. We then labeled a representative set of app store collected by web queries. K-means is applied to the labeled app stores to detect natural groupings of app stores. We observed a diverse set of app stores through the process. Instead of a single model that describes all app stores, fundamentally, our observations show that app stores operates differently. This study provide insights in understanding how app stores can affect software development. In summary, we investigated software repositories containing software artifacts created from different developer-user interactions. These software repositories are essential for software development in providing referencing information (i.e., Stack Overflow), improving development productivity (i.e., dotfiles), and help distributing the software products to end users (i.e., app stores)

    Social aspects of collaboration in online software communities

    Get PDF

    Automatic sentence annotation for more useful bug report summarization

    Get PDF
    Bug reports are a useful software artifact with software developers referring to them for various information needs. As bug reports can become long, users of bug reports may need to spend a lot of time reading them. Previous studies developed summarizers and the quality of summaries was determined based on human-created gold-standard summaries. We believe creating such summaries for evaluating summarizers is not a good practice. First, we have observed a high level of disagreement between the annotated summaries. Second, the number of annotators involved is lower than the established minimum for the creation of a stable annotated summary. Finally, the traditional fixed threshold of 25% of the bug report word count does not adequately serve the different information needs. Consequently, we developed an automatic sentence annotation method to identify content in bug report comments which allows bug report users to customize a view for their task-dependent information needs
    corecore