33,734 research outputs found

    Classifying software issue reports through association mining

    Get PDF
    Software issue reports classification is a significant task in software maintenance and evolution. Despite the research effort being made over the years, the existing issue reports classification techniques are still inadequate. In this paper, we propose a new approach that is inspired by the Classification Associations Rule Mining (CARM) methodology in data mining, and report the testing of our method on 500 software issue reports extracted from an open source issue tracking system. Our experiments show that our method can achieve a high degree of accuracy in classifying software issue reports

    What Causes My Test Alarm? Automatic Cause Analysis for Test Alarms in System and Integration Testing

    Full text link
    Driven by new software development processes and testing in clouds, system and integration testing nowadays tends to produce enormous number of alarms. Such test alarms lay an almost unbearable burden on software testing engineers who have to manually analyze the causes of these alarms. The causes are critical because they decide which stakeholders are responsible to fix the bugs detected during the testing. In this paper, we present a novel approach that aims to relieve the burden by automating the procedure. Our approach, called Cause Analysis Model, exploits information retrieval techniques to efficiently infer test alarm causes based on test logs. We have developed a prototype and evaluated our tool on two industrial datasets with more than 14,000 test alarms. Experiments on the two datasets show that our tool achieves an accuracy of 58.3% and 65.8%, respectively, which outperforms the baseline algorithms by up to 13.3%. Our algorithm is also extremely efficient, spending about 0.1s per cause analysis. Due to the attractive experimental results, our industrial partner, a leading information and communication technology company in the world, has deployed the tool and it achieves an average accuracy of 72% after two months of running, nearly three times more accurate than a previous strategy based on regular expressions.Comment: 12 page

    Analysis and Detection of Information Types of Open Source Software Issue Discussions

    Full text link
    Most modern Issue Tracking Systems (ITSs) for open source software (OSS) projects allow users to add comments to issues. Over time, these comments accumulate into discussion threads embedded with rich information about the software project, which can potentially satisfy the diverse needs of OSS stakeholders. However, discovering and retrieving relevant information from the discussion threads is a challenging task, especially when the discussions are lengthy and the number of issues in ITSs are vast. In this paper, we address this challenge by identifying the information types presented in OSS issue discussions. Through qualitative content analysis of 15 complex issue threads across three projects hosted on GitHub, we uncovered 16 information types and created a labeled corpus containing 4656 sentences. Our investigation of supervised, automated classification techniques indicated that, when prior knowledge about the issue is available, Random Forest can effectively detect most sentence types using conversational features such as the sentence length and its position. When classifying sentences from new issues, Logistic Regression can yield satisfactory performance using textual features for certain information types, while falling short on others. Our work represents a nontrivial first step towards tools and techniques for identifying and obtaining the rich information recorded in the ITSs to support various software engineering activities and to satisfy the diverse needs of OSS stakeholders.Comment: 41st ACM/IEEE International Conference on Software Engineering (ICSE2019

    Text categorization and similarity analysis: similarity measure, literature review

    Get PDF
    Document classification and provenance has become an important area of computer science as the amount of digital information is growing significantly. Organisations are storing documents on computers rather than in paper form. Software is now required that will show the similarities between documents (i.e. document classification) and to point out duplicates and possibly the history of each document (i.e. provenance). Poor organisation is common and leads to situations like above. There exists a number of software solutions in this area designed to make document organisation as simple as possible. I'm doing my project with Pingar who are a company based in Auckland who aim to help organise the growing amount of unstructured digital data. This reports analyses the existing literature in this area with the aim to determine what already exists and how my project will be different from existing solutions

    Classification of information systems research revisited: A keyword analysis approach

    Get PDF
    A number of studies have previously been conducted on keyword analysis in order to provide a comprehensive scheme to classify information systems (IS) research. However, these studies appeared prior to 1994, and IS research has clearly developed substantially since then with the emergence of areas such as electronic commerce, electronic government, electronic health and numerous others. Furthermore, the majority of European IS outlets - such as the European Journal of Information Systems and Information Systems Journal - were founded in the early 1990s, and keywords from these journals were not included in any previous work. Given that a number of studies have raised the issue of differences in European and North American IS research topics and approaches, it is arguable that any such analysis must consider sources from both locations to provide a representative and balanced view of IS classification. Moreover, it has also been argued that there is a need for further work in order to create a comprehensive keyword classification scheme reflecting the current state of the art. Consequently, the aim of this paper is to present the results of a keyword analysis utilizing keywords appearing in major peer-reviewed IS publications after the year 1990 through to 2007. This aim is realized by means of the two following objectives: (1) collect all keywords appearing in 24 peer reviewed IS journals after 1990; and (2) identify keywords not included in the previous IS keyword classification scheme. This paper also describes further research required in order to place new keywords in appropriate IS research categories. The paper makes an incremental contribution toward a contemporary means of classifying IS research. This work is important and useful for researchers in understanding the area and evolution of the IS field and also has implications for improving information search and retrieval activities

    How to Ask for Technical Help? Evidence-based Guidelines for Writing Questions on Stack Overflow

    Full text link
    Context: The success of Stack Overflow and other community-based question-and-answer (Q&A) sites depends mainly on the will of their members to answer others' questions. In fact, when formulating requests on Q&A sites, we are not simply seeking for information. Instead, we are also asking for other people's help and feedback. Understanding the dynamics of the participation in Q&A communities is essential to improve the value of crowdsourced knowledge. Objective: In this paper, we investigate how information seekers can increase the chance of eliciting a successful answer to their questions on Stack Overflow by focusing on the following actionable factors: affect, presentation quality, and time. Method: We develop a conceptual framework of factors potentially influencing the success of questions in Stack Overflow. We quantitatively analyze a set of over 87K questions from the official Stack Overflow dump to assess the impact of actionable factors on the success of technical requests. The information seeker reputation is included as a control factor. Furthermore, to understand the role played by affective states in the success of questions, we qualitatively analyze questions containing positive and negative emotions. Finally, a survey is conducted to understand how Stack Overflow users perceive the guideline suggestions for writing questions. Results: We found that regardless of user reputation, successful questions are short, contain code snippets, and do not abuse with uppercase characters. As regards affect, successful questions adopt a neutral emotional style. Conclusion: We provide evidence-based guidelines for writing effective questions on Stack Overflow that software engineers can follow to increase the chance of getting technical help. As for the role of affect, we empirically confirmed community guidelines that suggest avoiding rudeness in question writing.Comment: Preprint, to appear in Information and Software Technolog
    • …
    corecore