6 research outputs found
DroidBot-GPT: GPT-powered UI Automation for Android
This paper introduces DroidBot-GPT, a tool that utilizes GPT-like large
language models (LLMs) to automate the interactions with Android mobile
applications. Given a natural language description of a desired task,
DroidBot-GPT can automatically generate and execute actions that navigate the
app to complete the task. It works by translating the app GUI state information
and the available actions on the smartphone screen to natural language prompts
and asking the LLM to make a choice of actions. Since the LLM is typically
trained on a large amount of data including the how-to manuals of diverse
software applications, it has the ability to make reasonable choices of actions
based on the provided information. We evaluate DroidBot-GPT with a self-created
dataset that contains 33 tasks collected from 17 Android applications spanning
10 categories. It can successfully complete 39.39% of the tasks, and the
average partial completion progress is about 66.76%. Given the fact that our
method is fully unsupervised (no modification required from both the app and
the LLM), we believe there is great potential to enhance automation performance
with better app development paradigms and/or custom model training.Comment: 8 pages, 5 figure
Deep Reinforcement Learning for Black-box Testing of Android Apps
The state space of Android apps is huge, and its thorough exploration during testing remains a significant challenge. The best exploration strategy is highly dependent on the features of the app under test. Reinforcement Learning (RL) is a machine learning technique that learns the optimal strategy to solve a task by trial and error, guided by positive or negative reward, rather than explicit supervision. Deep RL is a recent extension of RL that takes advantage of the learning capabilities of neural networks. Such capabilities make Deep RL suitable for complex exploration spaces such as one of Android apps. However, state-of-the-art, publicly available tools only support basic, Tabular RL. We have developed ARES, a Deep RL approach for black-box testing of Android apps. Experimental results show that it achieves higher coverage and fault revelation than the baselines, including state-of-the-art tools, such as TimeMachine and Q-Testing. We also investigated the reasons behind such performance qualitatively, and we have identified the key features of Android apps that make Deep RL particularly effective on them to be the presence of chained and blocking activities. Moreover, we have developed FATE to fine-tune the hyperparameters of Deep RL algorithms on simulated apps, since it is computationally expensive to carry it out on real apps
Deep Reinforcement Learning for Black-Box Testing of Android Apps
The state space of Android apps is huge and its thorough exploration during
testing remains a major challenge. In fact, the best exploration strategy is
highly dependent on the features of the app under test. Reinforcement Learning
(RL) is a machine learning technique that learns the optimal strategy to solve
a task by trial and error, guided by positive or negative reward, rather than
by explicit supervision. Deep RL is a recent extension of RL that takes
advantage of the learning capabilities of neural networks. Such capabilities
make Deep RL suitable for complex exploration spaces such as the one of Android
apps. However, state of the art, publicly available tools only support basic,
tabular RL. We have developed ARES, a Deep RL approach for black-box testing of
Android apps. Experimental results show that it achieves higher coverage and
fault revelation than the baselines, which include state of the art RL based
tools, such as TimeMachine and Q-Testing. We also investigated qualitatively
the reasons behind such performance and we have identified the key features of
Android apps that make Deep RL particularly effective on them to be the
presence of chained and blocking activities
Guiding app testing with mined interaction models
Test generators for graphical user interfaces must constantly choose which UI element to interact with, and how. We guide this choice by mining associations between UI elements and their interactions from the most common applications. Once mined, the resulting UI interaction model can be easily applied to new apps and new test generators. In our experiments, the mined interaction models lead to code coverage improvements of 19.41% and 43.03% on average on two state-of-the-art tools (DROIDMATE and DROIDBOT), when executing the same number of actions