562 research outputs found
StoryDroid: Automated Generation of Storyboard for Android Apps
Mobile apps are now ubiquitous. Before developing a new app, the development
team usually endeavors painstaking efforts to review many existing apps with
similar purposes. The review process is crucial in the sense that it reduces
market risks and provides inspiration for app development. However, manual
exploration of hundreds of existing apps by different roles (e.g., product
manager, UI/UX designer, developer) in a development team can be ineffective.
For example, it is difficult to completely explore all the functionalities of
the app in a short period of time. Inspired by the conception of storyboard in
movie production, we propose a system, StoryDroid, to automatically generate
the storyboard for Android apps, and assist different roles to review apps
efficiently. Specifically, StoryDroid extracts the activity transition graph
and leverages static analysis techniques to render UI pages to visualize the
storyboard with the rendered pages. The mapping relations between UI pages and
the corresponding implementation code (e.g., layout code, activity code, and
method hierarchy) are also provided to users. Our comprehensive experiments
unveil that StoryDroid is effective and indeed useful to assist app
development. The outputs of StoryDroid enable several potential applications,
such as the recommendation of UI design and layout code
Event trace reduction for effective bug replay of Android apps via differential GUI state analysis
© 2019 ACM. Existing Android testing tools, such as Monkey, generate a large quantity and a wide variety of user events to expose latent GUI bugs in Android apps. However, even if a bug is found, a majority of the events thus generated are often redundant and bug-irrelevant. In addition, it is also time-consuming for developers to localize and replay the bug given a long and tedious event sequence (trace). This paper presents ECHO, an event trace reduction tool for effective bug replay by using a new differential GUI state analysis. Given a sequence of events (trace), ECHO aims at removing bug-irrelevant events by exploiting the differential behavior between the GUI states collected when their corresponding events are triggered. During dynamic testing, ECHO injects at most one lightweight inspection event after every event to collect its corresponding GUI state. A new adaptive model is proposed to selectively inject inspection events based on sliding windows to differentiate the GUI states on-the-fly in a single testing process. The experimental results show that ECHO improves the effectiveness of bug replay by removing 85.11% redundant events on average while also revealing the same bugs as those detected when full event sequences are used
Automated Testing and Bug Reproduction of Android Apps
The large demand of mobile devices creates significant concerns about the quality of mobile applications (apps). The corresponding increase in app complexity has made app testing and maintenance activities more challenging. During app development phase, developers need to test the app in order to guarantee its quality before releasing it to the market. During the deployment phase, developers heavily rely on bug reports to reproduce failures reported by users. Because of the rapid releasing cycle of apps and limited human resources, it is difficult for developers to manually construct test cases for testing the apps or diagnose failures from a large number of bug reports. However, existing automated test case generation techniques are ineffective in exploring most effective events that can quickly improve code coverage and fault detection capability. In addition, none of existing techniques can reproduce failures directly from bug reports. This dissertation provides a framework that employs artifact intelligence (AI) techniques to improve testing and debugging of mobile apps. Specifically, the testing approach employs a Q-network that learns a behavior model from a set of existing apps and the learned model can be used to explore and generate tests for new apps. The framework is able to capture the fine-grained details of GUI events (e.g., visiting times of events, text on the widgets) and use them as features that are fed into a deep neural network, which acts as the agent to guide the app exploration. The debugging approach focuses on automatically reproducing crashes from bug reports for mobile apps. The approach uses a combination of natural language processing (NLP), deep learning, and dynamic GUI exploration to synthesize event sequences with the goal of reproducing the reported crash
Deep Reinforcement Learning for Black-Box Testing of Android Apps
The state space of Android apps is huge and its thorough exploration during
testing remains a major challenge. In fact, the best exploration strategy is
highly dependent on the features of the app under test. Reinforcement Learning
(RL) is a machine learning technique that learns the optimal strategy to solve
a task by trial and error, guided by positive or negative reward, rather than
by explicit supervision. Deep RL is a recent extension of RL that takes
advantage of the learning capabilities of neural networks. Such capabilities
make Deep RL suitable for complex exploration spaces such as the one of Android
apps. However, state of the art, publicly available tools only support basic,
tabular RL. We have developed ARES, a Deep RL approach for black-box testing of
Android apps. Experimental results show that it achieves higher coverage and
fault revelation than the baselines, which include state of the art RL based
tools, such as TimeMachine and Q-Testing. We also investigated qualitatively
the reasons behind such performance and we have identified the key features of
Android apps that make Deep RL particularly effective on them to be the
presence of chained and blocking activities
Deep Reinforcement Learning for Black-box Testing of Android Apps
The state space of Android apps is huge, and its thorough exploration during testing remains a significant challenge. The best exploration strategy is highly dependent on the features of the app under test. Reinforcement Learning (RL) is a machine learning technique that learns the optimal strategy to solve a task by trial and error, guided by positive or negative reward, rather than explicit supervision. Deep RL is a recent extension of RL that takes advantage of the learning capabilities of neural networks. Such capabilities make Deep RL suitable for complex exploration spaces such as one of Android apps. However, state-of-the-art, publicly available tools only support basic, Tabular RL. We have developed ARES, a Deep RL approach for black-box testing of Android apps. Experimental results show that it achieves higher coverage and fault revelation than the baselines, including state-of-the-art tools, such as TimeMachine and Q-Testing. We also investigated the reasons behind such performance qualitatively, and we have identified the key features of Android apps that make Deep RL particularly effective on them to be the presence of chained and blocking activities. Moreover, we have developed FATE to fine-tune the hyperparameters of Deep RL algorithms on simulated apps, since it is computationally expensive to carry it out on real apps
Efficiency Matters: Speeding Up Automated Testing with GUI Rendering Inference
Due to the importance of Android app quality assurance, many automated GUI
testing tools have been developed. Although the test algorithms have been
improved, the impact of GUI rendering has been overlooked. On the one hand,
setting a long waiting time to execute events on fully rendered GUIs slows down
the testing process. On the other hand, setting a short waiting time will cause
the events to execute on partially rendered GUIs, which negatively affects the
testing effectiveness. An optimal waiting time should strike a balance between
effectiveness and efficiency. We propose AdaT, a lightweight image-based
approach to dynamically adjust the inter-event time based on GUI rendering
state. Given the real-time streaming on the GUI, AdaT presents a deep learning
model to infer the rendering state, and synchronizes with the testing tool to
schedule the next event when the GUI is fully rendered. The evaluations
demonstrate the accuracy, efficiency, and effectiveness of our approach. We
also integrate our approach with the existing automated testing tool to
demonstrate the usefulness of AdaT in covering more activities and executing
more events on fully rendered GUIs.Comment: Proceedings of the 45th International Conference on Software
Engineerin
Hawkeye: Change-targeted Testing for Android Apps based on Deep Reinforcement Learning
Android Apps are frequently updated to keep up with changing user, hardware,
and business demands. Ensuring the correctness of App updates through extensive
testing is crucial to avoid potential bugs reaching the end user. Existing
Android testing tools generate GUI events focussing on improving the test
coverage of the entire App rather than prioritising updates and its impacted
elements. Recent research has proposed change-focused testing but relies on
random exploration to exercise the updates and impacted GUI elements that is
ineffective and slow for large complex Apps with a huge input exploration
space. We propose directed testing of App updates with Hawkeye that is able to
prioritise executing GUI actions associated with code changes based on deep
reinforcement learning from historical exploration data. Our empirical
evaluation compares Hawkeye with state-of-the-art model-based and reinforcement
learning-based testing tools FastBot2 and ARES using 10 popular open-source and
1 commercial App. We find that Hawkeye is able to generate GUI event sequences
targeting changed functions more reliably than FastBot2 and ARES for the open
source Apps and the large commercial App. Hawkeye achieves comparable
performance on smaller open source Apps with a more tractable exploration
space. The industrial deployment of Hawkeye in the development pipeline also
shows that Hawkeye is ideal to perform smoke testing for merge requests of a
complicated commercial App
Learning the language of apps
To explore the functionality of an app, automated test generators systematically identify and interact with its user interface (UI) elements. A key challenge is to synthesize inputs which effectively and efficiently cover app behavior. To do so, a test generator has to choose which elements to interact with but, which interactions to do on each element and which input values to type. In summary, to better test apps, a test generator should know the app's language, that is, the language of its graphical interactions and the language of its textual inputs. In this work, we show how a test generator can learn the language of apps and how this knowledge is modeled to create tests. We demonstrate how to learn the language of the graphical input prior to testing by combining machine learning and static analysis, and how to refine this knowledge during testing using reinforcement learning. In our experiments, statically learned models resulted in 50\% less ineffective actions an average increase in test (code) coverage of 19%, while refining these through reinforcement learning resulted in an additional test (code) coverage of up to 20%. We learn the language of textual inputs, by identifying the semantics of input fields in the UI and querying the web for real-world values. In our experiments, real-world values increase test (code) coverage ~10%; Finally, we show how to use context-free grammars to integrate both languages into a single representation (UI grammar), giving back control to the user. This representation can then be: mined from existing tests, associated to the app source code, and used to produce new tests. 82% test cases produced by fuzzing our UI grammar can reach a UI element within the app and 70% of them can reach a specific code location.Automatisierte Testgeneratoren identifizieren systematisch Elemente der Benutzeroberfläche und interagieren mit ihnen, um die Funktionalität einer App zu erkunden. Eine wichtige Herausforderung besteht darin, Eingaben zu synthetisieren, die das App-Verhalten effektiv und effizient abdecken. Dazu muss ein Testgenerator auswählen, mit welchen Elementen interagiert werden soll, welche Interaktionen jedoch für jedes Element ausgeführt werden sollen und welche Eingabewerte eingegeben werden sollen. Um Apps besser testen zu können, sollte ein Testgenerator die Sprache der App kennen, dh die Sprache ihrer grafischen Interaktionen und die Sprache ihrer Texteingaben. In dieser Arbeit zeigen wir, wie ein Testgenerator die Sprache von Apps lernen kann und wie dieses Wissen modelliert wird, um Tests zu erstellen. Wir zeigen, wie die Sprache der grafischen Eingabe lernen vor dem Testen durch maschinelles Lernen und statische Analyse kombiniert und wie dieses Wissen weiter verfeinern beim Testen Verstärkung Lernen verwenden. In unseren Experimenten führten statisch erlernte Modelle zu 50% weniger ineffektiven Aktionen, was einer durchschnittlichen Erhöhung der Testabdeckung (Code) von 19% entspricht, während die Verfeinerung dieser durch verstärkendes Lernen zu einer zusätzlichen Testabdeckung (Code) von bis zu 20% führte. Wir lernen die Sprache der Texteingaben, indem wir die Semantik der Eingabefelder in der Benutzeroberfläche identifizieren und das Web nach realen Werten abfragen. In unseren Experimenten erhöhen reale Werte die Testabdeckung (Code) um ca. 10%; Schließlich zeigen wir, wie kontextfreien Grammatiken verwenden beide Sprachen in einer einzigen Darstellung (UI Grammatik) zu integrieren, wieder die Kontrolle an den Benutzer zu geben. Diese Darstellung kann dann: aus vorhandenen Tests gewonnen, dem App-Quellcode zugeordnet und zur Erstellung neuer Tests verwendet werden. 82% Testfälle, die durch Fuzzing unserer UI-Grammatik erstellt wurden, können ein UI-Element in der App erreichen, und 70% von ihnen können einen bestimmten Code-Speicherort erreichen
- …