15 research outputs found

    Bug or Not? Bug Report Classification Using N-Gram IDF

    Get PDF
    Previous studies have found that a significant number of bug reports are misclassified between bugs and non-bugs, and that manually classifying bug reports is a time-consuming task. To address this problem, we propose a bug reports classification model with N-gram IDF, a theoretical extension of Inverse Document Frequency (IDF) for handling words and phrases of any length. N-gram IDF enables us to extract key terms of any length from texts, these key terms can be used as the features to classify bug reports. We build classification models with logistic regression and random forest using features from N-gram IDF and topic modeling, which is widely used in various software engineering tasks. With a publicly available dataset, our results show that our N-gram IDF-based models have a superior performance than the topic-based models on all of the evaluated cases. Our models show promising results and have a potential to be extended to other software engineering tasks.Comment: 5 pages, ICSME 201

    Analyzing social media data and performance comparison with traditional database, data warehouse, and MapReduce approaches

    Get PDF
    Data warehouse, OLAP technology and distributed analysis show great potential in improving business analysis, tendency prediction and decision making. With the assistance of data mining techniques, databases can also be a useful tool for analyzing societal trends by gathering data from social media networks. As these networks can contain huge amounts of text data, it can serve as a perfect platform for testing text mining technologies, and discovering what kind of trend or what kind of topic concern people the most during a certain time period. This project utilizes a data set of tweets generated from May to June 2019, which contains more than 2 million tweets with content and location data. After applying some data cleaning techniques, we were able to establish a data cube and provide various analyses based on location. Our results show Twitter users\u27 preference and use frequency varies significantly based on their locations. Ultimately, this project provides a case study about utilizing database, data warehouse and distributed analysis technology to analyze social media, and provides some insight regarding trending topics of interest. This work could be applied by those interested in gaining a better understanding of social media users

    On the feasibility of automated prediction of bug and non-bug issues

    Get PDF
    Context Issue tracking systems are used to track and describe tasks in the development process, e.g., requested feature improvements or reported bugs. However, past research has shown that the reported issue types often do not match the description of the issue. Objective We want to understand the overall maturity of the state of the art of issue type prediction with the goal to predict if issues are bugs and evaluate if we can improve existing models by incorporating manually specified knowledge about issues. Method We train different models for the title and description of the issue to account for the difference in structure between these fields, e.g., the length. Moreover, we manually detect issues whose description contains a null pointer exception, as these are strong indicators that issues are bugs. Results Our approach performs best overall, but not significantly different from an approach from the literature based on the fastText classifier from Facebook AI Research. The small improvements in prediction performance are due to structural information about the issues we used. We found that using information about the content of issues in form of null pointer exceptions is not useful. We demonstrate the usefulness of issue type prediction through the example of labelling bugfixing commits. Conclusions Issue type prediction can be a useful tool if the use case allows either for a certain amount of missed bug reports or the prediction of too many issues as bug is acceptable
    corecore