165 research outputs found

    A Comparison of Reinforcement Learning Frameworks for Software Testing Tasks

    Full text link
    Software testing activities scrutinize the artifacts and the behavior of a software product to find possible defects and ensure that the product meets its expected requirements. Recently, Deep Reinforcement Learning (DRL) has been successfully employed in complex testing tasks such as game testing, regression testing, and test case prioritization to automate the process and provide continuous adaptation. Practitioners can employ DRL by implementing from scratch a DRL algorithm or using a DRL framework. DRL frameworks offer well-maintained implemented state-of-the-art DRL algorithms to facilitate and speed up the development of DRL applications. Developers have widely used these frameworks to solve problems in various domains including software testing. However, to the best of our knowledge, there is no study that empirically evaluates the effectiveness and performance of implemented algorithms in DRL frameworks. Moreover, some guidelines are lacking from the literature that would help practitioners choose one DRL framework over another. In this paper, we empirically investigate the applications of carefully selected DRL algorithms on two important software testing tasks: test case prioritization in the context of Continuous Integration (CI) and game testing. For the game testing task, we conduct experiments on a simple game and use DRL algorithms to explore the game to detect bugs. Results show that some of the selected DRL frameworks such as Tensorforce outperform recent approaches in the literature. To prioritize test cases, we run experiments on a CI environment where DRL algorithms from different frameworks are used to rank the test cases. Our results show that the performance difference between implemented algorithms in some cases is considerable, motivating further investigation.Comment: Accepted for publication at EMSE (Empirical Software Engineering journal) 202

    Improving Developer Profiling and Ranking to Enhance Bug Report Assignment

    Get PDF
    Bug assignment plays a critical role in the bug fixing process. However, bug assignment can be a burden for projects receiving a large number of bug reports. If a bug is assigned to a developer who lacks sufficient expertise to appropriately address it, the software project can be adversely impacted in terms of quality, developer hours, and aggregate cost. An automated strategy that provides a list of developers ranked by suitability based on their development history and the development history of the project can help teams more quickly and more accurately identify the appropriate developer for a bug report, potentially resulting in an increase in productivity. To automate the process of assigning bug reports to the appropriate developer, several studies have employed an approach that combines natural language processing and information retrieval techniques to extract two categories of features: one targeting developers who have fixed similar bugs before and one targeting developers who have worked on source files similar to the description of the bug. As developers document their changes through their commit messages it represents another rich resource for profiling their expertise, as the language used in commit messages typically more closely matches the language used in bug reports. In this study, we have replicated the approach presented in [32] that applies a learning-to-rank technique to rank appropriate developers for each bug report. Additionally, we have extended the study by proposing an additional set of features to better profile a developer through their commit logs and through the API project descriptions referenced in their code changes. Furthermore, we explore the appropriateness of a joint recommendation approach employing a learning-to-rank technique and an ordinal regression technique. To evaluate our model, we have considered more than 10,000 bug reports with their appropriate assignees. The experimental results demonstrate the efficiency of our model in comparison with the state-of-the-art methods in recommending developers for open bug reports

    A Tale of Two Cities: Data and Configuration Variances in Robust Deep Learning

    Full text link
    Deep neural networks (DNNs), are widely used in many industries such as image recognition, supply chain, medical diagnosis, and autonomous driving. However, prior work has shown the high accuracy of a DNN model does not imply high robustness (i.e., consistent performances on new and future datasets) because the input data and external environment (e.g., software and model configurations) for a deployed model are constantly changing. Hence, ensuring the robustness of deep learning is not an option but a priority to enhance business and consumer confidence. Previous studies mostly focus on the data aspect of model variance. In this article, we systematically summarize DNN robustness issues and formulate them in a holistic view through two important aspects, i.e., data and software configuration variances in DNNs. We also provide a predictive framework to generate representative variances (counterexamples) by considering both data and configurations for robust learning through the lens of search-based optimization

    Improving visualization on code repository issues for tasks understanding

    Get PDF
    Abstract. Understanding the tasks and bug locating are extremely challenging and time-consuming. Achieving a new state of the art of understanding the tasks or issues and provide a high-level visualization to the users would be an incredible asset to both developers and research communities. Open Github archive are gathered, and the data is programmatically labelled. The Fasttext embedding model was trained to map the words to together based on semantics. Then, both CNN and RNN types of deep learning architectures are trained to classify whether each tokenized instance is a source file attribute or not. The word embedding and LSTM models worked well and did generalize in the real-world usage up to an extent. The models could achieve around 0.80 F1 scores on the test set. Along with the model, the generated usage graphs are presented that are the final output of the thesis work. Some types of issues were suitable for this workflow and did produce reasonable graphs which might be useful for the users to see the big picture of an issue

    Automated Testing and Bug Reproduction of Android Apps

    Get PDF
    The large demand of mobile devices creates significant concerns about the quality of mobile applications (apps). The corresponding increase in app complexity has made app testing and maintenance activities more challenging. During app development phase, developers need to test the app in order to guarantee its quality before releasing it to the market. During the deployment phase, developers heavily rely on bug reports to reproduce failures reported by users. Because of the rapid releasing cycle of apps and limited human resources, it is difficult for developers to manually construct test cases for testing the apps or diagnose failures from a large number of bug reports. However, existing automated test case generation techniques are ineffective in exploring most effective events that can quickly improve code coverage and fault detection capability. In addition, none of existing techniques can reproduce failures directly from bug reports. This dissertation provides a framework that employs artifact intelligence (AI) techniques to improve testing and debugging of mobile apps. Specifically, the testing approach employs a Q-network that learns a behavior model from a set of existing apps and the learned model can be used to explore and generate tests for new apps. The framework is able to capture the fine-grained details of GUI events (e.g., visiting times of events, text on the widgets) and use them as features that are fed into a deep neural network, which acts as the agent to guide the app exploration. The debugging approach focuses on automatically reproducing crashes from bug reports for mobile apps. The approach uses a combination of natural language processing (NLP), deep learning, and dynamic GUI exploration to synthesize event sequences with the goal of reproducing the reported crash

    Feature Set Selection for Improved Classification of Static Analysis Alerts

    Get PDF
    With the extreme growth in third party cloud applications, increased exposure of applications to the internet, and the impact of successful breaches, improving the security of software being produced is imperative. Static analysis tools can alert to quality and security vulnerabilities of an application; however, they present developers and analysts with a high rate of false positives and unactionable alerts. This problem may lead to the loss of confidence in the scanning tools, possibly resulting in the tools not being used. The discontinued use of these tools may increase the likelihood of insecure software being released into production. Insecure software can be successfully attacked resulting in the compromise of one or several information security principles such as confidentiality, availability, and integrity. Feature selection methods have the potential to improve the classification of static analysis alerts and thereby reduce the false positive rates. Thus, the goal of this research effort was to improve the classification of static analysis alerts by proposing and testing a novel method leveraging feature selection. The proposed model was developed and subsequently tested on three open source PHP applications spanning several years. The results were compared to a classification model utilizing all features to gauge the classification improvement of the feature selection model. The model presented did result in the improved classification accuracy and reduction of the false positive rate on a reduced feature set. This work contributes a real-world static analysis dataset based upon three open source PHP applications. It also enhanced an existing data set generation framework to include additional predictive software features. However, the main contribution is a feature selection methodology that may be used to discover optimal feature sets that increase the classification accuracy of static analysis alerts
    • …
    corecore