4 research outputs found
Machine Learning And Deep Learning Based Approaches For Detecting Duplicate Bug Reports With Stack Traces
Many large software systems rely on bug tracking systems to record the submitted bug reports and to track and manage bugs. Handling bug reports is known to be a challenging task, especially in software organizations with a large client base, which tend to receive a considerable large number of bug reports a day. Fortunately, not all reported bugs are new; many are similar or identical to previously reported bugs, also called duplicate bug reports.
Automatic detection of duplicate bug reports is an important research topic to help reduce the time and effort spent by triaging and development teams on sorting and fixing bugs. This explains the recent increase in attention to this topic as evidenced by the number of tools and algorithms that have been proposed in academia and industry. The objective is to automatically detect duplicate bug reports as soon as they arrive into the system. To do so, existing techniques rely heavily on the nature of bug report data they operate on. This includes both structural information such as OS, product version, time and date of the crash, and stack traces, as well as unstructured information such as bug report summaries and descriptions written in natural language by end users and developers
Recommended from our members
Leveraging the Power of Crowds: Automated Test Report Processing for The Maintenance of Mobile Applications
Crowdsourcing is an emerging distributed problem-solving model combining human and machine computation. It collects intelligence and knowledge from a large and diverse workforce to complete complex tasks. In the software engineering domain, crowdsourced techniques have been adopted to facilitate various tasks, such as design, testing, debugging, development, and so on. Specifically, in crowdsourced testing, crowdsourced workers are given testing tasks to perform and submit their feedback in the form of test reports. One of the key advantages of crowdsourced testing is that it is capable of providing engineers software engineers with domain knowledge and feedback from a large number of real users. Based on diverse software and hardware settings of these users, engineers can bugs that are not caught by traditional quality assurance techniques. Such benefits are particularly ideal for mobile application testing, which needs rapid development-and-deployment iterations and support diverse execution environments. However, crowdsourced testing naturally generates an overwhelming number of crowdsourced test reports, and inspecting such a large number of reports becomes a time-consuming yet inevitable task. This dissertation presents a series of techniques, tools and experiments to assist in crowdsourced report processing. These techniques are designed for improving this task in multiple aspects: 1. prioritizing crowdsourced report to assist engineers in finding as many unique bugs as possible, and as quickly as possible; 2. grouping crowdsourced report to assist engineers in identifying the representative ones in a short time; 3. summarizing the duplicate reports to provide engineers with a concise and accurate understanding of a group of reports; In the first step, I present a text-analysis-based technique to prioritize test reports for manual inspection. This technique leverages two key strategies: (1) a diversity strategy to help developers inspect a wide variety of test reports and to avoid duplicates and wasted effort on falsely classified faulty behavior, and (2) a risk-assessment strategy to help developers identify test reports that may be more likely to be fault-revealing based on past observations.Together, these two strategies form our technique to prioritize test reports in crowdsourced testing. Moreover, in the mobile testing domain, test reports often consist of more screenshots and shorter descriptive text, and thus text-analysis-based techniques may be ineffective or inapplicable. The shortage and ambiguity of natural-language text information and the well-defined screenshots of activity views within mobile applications motivate me to propose a novel technique based on using image understanding for multi-objective test-report prioritization. This technique employs the Spatial Pyramid Matching (SPM) technique to measure the similarity of the screenshots, and apply the natural-language processing technique to measure the distance between the text of test reports. Next, I design and implement CTRAS: a novel approach to leveraging duplicates to enrich the content of bug descriptions and improve the efficiency of inspecting these reports. CTRAS is capable of automatically aggregating duplicates based on both textual information and screenshots, and further summarizes the duplicate test reports into a comprehensive and comprehensible report.I validate all of these techniques on industrial data by collaborating with several companies. The results show my techniques can improve both the efficiency and effectiveness of crowdsourced test report processing. Also, I suggest settings for different usage scenarios and discuss future research directions
Software Engineering in the Age of App Stores: Feature-Based Analyses to Guide Mobile Software Engineers
Mobile app stores are becoming the dominating distribution platform of mobile applications. Due to their rapid growth, their impact on software engineering practices is not yet well understood. There has been no comprehensive study that explores the mobile app store ecosystem's effect on software engineering practices. Therefore, this thesis, as its first contribution, empirically studies the app store as a phenomenon from the developers' perspective to investigate the extent to which app stores affect software engineering tasks. The study highlights the importance of a mobile application's features as a deliverable unit from developers to users. The study uncovers the involvement of app stores in eliciting requirements, perfective maintenance and domain analysis in the form of discoverable features written in text form in descriptions and user reviews. Developers discover possible features to include by searching the app store. Developers, through interviews, revealed the cost of such tasks given a highly prolific user base, which major app stores exhibit. Therefore, the thesis, in its second contribution, uses techniques to extract features from unstructured natural language artefacts. This is motivated by the indication that developers monitor similar applications, in terms of provided features, to understand user expectations in a certain application domain. This thesis then devises a semantic-aware technique of mobile application representation using textual functionality descriptions. This representation is then shown to successfully cluster mobile applications to uncover a finer-grained and functionality-based grouping of mobile apps. The thesis, furthermore, provides a comparison of baseline techniques of feature extraction from textual artefacts based on three main criteria: silhouette width measure, human judgement and execution time. Finally, this thesis, in its final contribution shows that features do indeed migrate in the app store beyond category boundaries and discovers a set of migratory characteristics and their relationship to price, rating and popularity in the app stores studied