1,554 research outputs found

    Deep Reinforcement Learning Driven Applications Testing

    Get PDF
    Applications have become indispensable in our lives, and ensuring their correctness is now a critical issue. Automatic system test case generation can significantly improve the testing process for these applications, which has recently motivated researchers to work on this problem, defining various approaches. However, most state-of-the-art approaches automatically generate test cases leveraging symbolic execution or random exploration techniques. This led to techniques that lose efficiency when dealing with an increasing number of program constraints and become inapplicable when conditions are too challenging to solve or even to formulate. This Ph.D. thesis proposes addressing current techniques' limitations by exploiting Deep Reinforcement Learning. Deep Reinforcement Learning (Deep RL) is a machine learning technique that does not require a labeled training set as input since the learning process is guided by the positive or negative reward experienced during the tentative execution of a task. Hence, it can be used to dynamically learn how to build a test suite based on the feedback obtained during past successful or unsuccessful attempts. This dissertation presents three novel techniques that exploit this intuition: ARES, RONIN, and IFRIT. Since functional testing and security testing are complementary, this Ph.D. thesis explores both testing techniques using the same approach for test cases generation. ARES is a Deep RL approach for functional testing of Android apps. RONIN addresses the issue of generating exploits for a subset of Android ICC vulnerabilities. Subsequently, to better expose the bugs discovered by previous techniques, this thesis presents IFRIT, a focused testing approach capable of increasing the number of test cases that can reach a specific target (i.e., a precise section or statement of an application) and their diversity. IFRIT has the ultimate goal of exposing faults affecting the given program point

    Deep Reinforcement Learning for Black-Box Testing of Android Apps

    Full text link
    The state space of Android apps is huge and its thorough exploration during testing remains a major challenge. In fact, the best exploration strategy is highly dependent on the features of the app under test. Reinforcement Learning (RL) is a machine learning technique that learns the optimal strategy to solve a task by trial and error, guided by positive or negative reward, rather than by explicit supervision. Deep RL is a recent extension of RL that takes advantage of the learning capabilities of neural networks. Such capabilities make Deep RL suitable for complex exploration spaces such as the one of Android apps. However, state of the art, publicly available tools only support basic, tabular RL. We have developed ARES, a Deep RL approach for black-box testing of Android apps. Experimental results show that it achieves higher coverage and fault revelation than the baselines, which include state of the art RL based tools, such as TimeMachine and Q-Testing. We also investigated qualitatively the reasons behind such performance and we have identified the key features of Android apps that make Deep RL particularly effective on them to be the presence of chained and blocking activities

    Deep Reinforcement Learning for Black-box Testing of Android Apps

    Get PDF
    The state space of Android apps is huge, and its thorough exploration during testing remains a significant challenge. The best exploration strategy is highly dependent on the features of the app under test. Reinforcement Learning (RL) is a machine learning technique that learns the optimal strategy to solve a task by trial and error, guided by positive or negative reward, rather than explicit supervision. Deep RL is a recent extension of RL that takes advantage of the learning capabilities of neural networks. Such capabilities make Deep RL suitable for complex exploration spaces such as one of Android apps. However, state-of-the-art, publicly available tools only support basic, Tabular RL. We have developed ARES, a Deep RL approach for black-box testing of Android apps. Experimental results show that it achieves higher coverage and fault revelation than the baselines, including state-of-the-art tools, such as TimeMachine and Q-Testing. We also investigated the reasons behind such performance qualitatively, and we have identified the key features of Android apps that make Deep RL particularly effective on them to be the presence of chained and blocking activities. Moreover, we have developed FATE to fine-tune the hyperparameters of Deep RL algorithms on simulated apps, since it is computationally expensive to carry it out on real apps

    Guiding Random Graphical and Natural User Interface Testing Through Domain Knowledge

    Get PDF
    Users have access to a diverse set of interfaces that can be used to interact with software. Tools exist for automatically generating test data for an application, but the data required by each user interface is complex. Generating realistic data similar to that of a user is difficult. The environment which an application is running inside may also limit the data available, or updates to an operating system can break support for tools that generate test data. Consequently, applications exist for which there are no automated methods of generating test data similar to that which a user would provide through real usage of a user interface. With no automated method of generating data, the cost of testing increases and there is an increased chance of bugs being released into production code. In this thesis, we investigate techniques which aim to mimic users, observing how stored user interactions can be split to generate data targeted at specific states of an application, or to generate different subareas of the data structure provided by a user interface. To reduce the cost of gathering and labelling graphical user interface data, we look at generating randomised screen shots of applications, which can be automatically labelled and used in the training stage of a machine learning model. These trained models could guide a randomised approach at generating tests, achieving a significantly higher branch coverage than an unguided random approach. However, for natural user interfaces, which allow interaction through body tracking, we could not learn such a model through generated data. We find that models derived from real user data can generate tests with a significantly higher branch coverage than a purely random tester for both natural and graphical user interfaces. Our approaches use no feedback from an application during test generation. Consequently, the models are “generating data in the dark”. Despite this, these models can still generate tests with a higher coverage than random testing, but there may be a benefit to inferring the current state of an application and using this to guide data generation

    Event trace reduction for effective bug replay of Android apps via differential GUI state analysis

    Full text link
    © 2019 ACM. Existing Android testing tools, such as Monkey, generate a large quantity and a wide variety of user events to expose latent GUI bugs in Android apps. However, even if a bug is found, a majority of the events thus generated are often redundant and bug-irrelevant. In addition, it is also time-consuming for developers to localize and replay the bug given a long and tedious event sequence (trace). This paper presents ECHO, an event trace reduction tool for effective bug replay by using a new differential GUI state analysis. Given a sequence of events (trace), ECHO aims at removing bug-irrelevant events by exploiting the differential behavior between the GUI states collected when their corresponding events are triggered. During dynamic testing, ECHO injects at most one lightweight inspection event after every event to collect its corresponding GUI state. A new adaptive model is proposed to selectively inject inspection events based on sliding windows to differentiate the GUI states on-the-fly in a single testing process. The experimental results show that ECHO improves the effectiveness of bug replay by removing 85.11% redundant events on average while also revealing the same bugs as those detected when full event sequences are used

    The Integration of Machine Learning into Automated Test Generation: A Systematic Mapping Study

    Get PDF
    Context: Machine learning (ML) may enable effective automated test generation. Objective: We characterize emerging research, examining testing practices, researcher goals, ML techniques applied, evaluation, and challenges. Methods: We perform a systematic mapping on a sample of 102 publications. Results: ML generates input for system, GUI, unit, performance, and combinatorial testing or improves the performance of existing generation methods. ML is also used to generate test verdicts, property-based, and expected output oracles. Supervised learning - often based on neural networks - and reinforcement learning - often based on Q-learning - are common, and some publications also employ unsupervised or semi-supervised learning. (Semi-/Un-)Supervised approaches are evaluated using both traditional testing metrics and ML-related metrics (e.g., accuracy), while reinforcement learning is often evaluated using testing metrics tied to the reward function. Conclusion: Work-to-date shows great promise, but there are open challenges regarding training data, retraining, scalability, evaluation complexity, ML algorithms employed - and how they are applied - benchmarks, and replicability. Our findings can serve as a roadmap and inspiration for researchers in this field.Comment: Under submission to Software Testing, Verification, and Reliability journal. (arXiv admin note: text overlap with arXiv:2107.00906 - This is an earlier study that this study extends

    Automating Software Development for Mobile Computing Platforms

    Get PDF
    Mobile devices such as smartphones and tablets have become ubiquitous in today\u27s computing landscape. These devices have ushered in entirely new populations of users, and mobile operating systems are now outpacing more traditional desktop systems in terms of market share. The applications that run on these mobile devices (often referred to as apps ) have become a primary means of computing for millions of users and, as such, have garnered immense developer interest. These apps allow for unique, personal software experiences through touch-based UIs and a complex assortment of sensors. However, designing and implementing high quality mobile apps can be a difficult process. This is primarily due to challenges unique to mobile development including change-prone APIs and platform fragmentation, just to name a few. in this dissertation we develop techniques that aid developers in overcoming these challenges by automating and improving current software design and testing practices for mobile apps. More specifically, we first introduce a technique, called Gvt, that improves the quality of graphical user interfaces (GUIs) for mobile apps by automatically detecting instances where a GUI was not implemented to its intended specifications. Gvt does this by constructing hierarchal models of mobile GUIs from metadata associated with both graphical mock-ups (i.e., created by designers using photo-editing software) and running instances of the GUI from the corresponding implementation. Second, we develop an approach that completely automates prototyping of GUIs for mobile apps. This approach, called ReDraw, is able to transform an image of a mobile app GUI into runnable code by detecting discrete GUI-components using computer vision techniques, classifying these components into proper functional categories (e.g., button, dropdown menu) using a Convolutional Neural Network (CNN), and assembling these components into realistic code. Finally, we design a novel approach for automated testing of mobile apps, called CrashScope, that explores a given android app using systematic input generation with the intrinsic goal of triggering crashes. The GUI-based input generation engine is driven by a combination of static and dynamic analyses that create a model of an app\u27s GUI and targets common, empirically derived root causes of crashes in android apps. We illustrate that the techniques presented in this dissertation represent significant advancements in mobile development processes through a series of empirical investigations, user studies, and industrial case studies that demonstrate the effectiveness of these approaches and the benefit they provide developers
    corecore