53 research outputs found
Make LLM a Testing Expert: Bringing Human-like Interaction to Mobile GUI Testing via Functionality-aware Decisions
Automated Graphical User Interface (GUI) testing plays a crucial role in
ensuring app quality, especially as mobile applications have become an integral
part of our daily lives. Despite the growing popularity of learning-based
techniques in automated GUI testing due to their ability to generate human-like
interactions, they still suffer from several limitations, such as low testing
coverage, inadequate generalization capabilities, and heavy reliance on
training data. Inspired by the success of Large Language Models (LLMs) like
ChatGPT in natural language understanding and question answering, we formulate
the mobile GUI testing problem as a Q&A task. We propose GPTDroid, asking LLM
to chat with the mobile apps by passing the GUI page information to LLM to
elicit testing scripts, and executing them to keep passing the app feedback to
LLM, iterating the whole process. Within this framework, we have also
introduced a functionality-aware memory prompting mechanism that equips the LLM
with the ability to retain testing knowledge of the whole process and conduct
long-term, functionality-based reasoning to guide exploration. We evaluate it
on 93 apps from Google Play and demonstrate that it outperforms the best
baseline by 32% in activity coverage, and detects 31% more bugs at a faster
rate. Moreover, GPTDroid identify 53 new bugs on Google Play, of which 35 have
been confirmed and fixed.Comment: Accepted by IEEE/ACM International Conference on Software Engineering
2024 (ICSE 2024). arXiv admin note: substantial text overlap with
arXiv:2305.0943
Automated Testing and Bug Reproduction of Android Apps
The large demand of mobile devices creates significant concerns about the quality of mobile applications (apps). The corresponding increase in app complexity has made app testing and maintenance activities more challenging. During app development phase, developers need to test the app in order to guarantee its quality before releasing it to the market. During the deployment phase, developers heavily rely on bug reports to reproduce failures reported by users. Because of the rapid releasing cycle of apps and limited human resources, it is difficult for developers to manually construct test cases for testing the apps or diagnose failures from a large number of bug reports. However, existing automated test case generation techniques are ineffective in exploring most effective events that can quickly improve code coverage and fault detection capability. In addition, none of existing techniques can reproduce failures directly from bug reports. This dissertation provides a framework that employs artifact intelligence (AI) techniques to improve testing and debugging of mobile apps. Specifically, the testing approach employs a Q-network that learns a behavior model from a set of existing apps and the learned model can be used to explore and generate tests for new apps. The framework is able to capture the fine-grained details of GUI events (e.g., visiting times of events, text on the widgets) and use them as features that are fed into a deep neural network, which acts as the agent to guide the app exploration. The debugging approach focuses on automatically reproducing crashes from bug reports for mobile apps. The approach uses a combination of natural language processing (NLP), deep learning, and dynamic GUI exploration to synthesize event sequences with the goal of reproducing the reported crash
Testing the Limits: Unusual Text Inputs Generation for Mobile App Crash Detection with Large Language Model
Mobile applications have become a ubiquitous part of our daily life,
providing users with access to various services and utilities. Text input, as
an important interaction channel between users and applications, plays an
important role in core functionality such as search queries, authentication,
messaging, etc. However, certain special text (e.g., -18 for Font Size) can
cause the app to crash, and generating diversified unusual inputs for fully
testing the app is highly demanded. Nevertheless, this is also challenging due
to the combination of explosion dilemma, high context sensitivity, and complex
constraint relations. This paper proposes InputBlaster which leverages the LLM
to automatically generate unusual text inputs for mobile app crash detection.
It formulates the unusual inputs generation problem as a task of producing a
set of test generators, each of which can yield a batch of unusual text inputs
under the same mutation rule. In detail, InputBlaster leverages LLM to produce
the test generators together with the mutation rules serving as the reasoning
chain, and utilizes the in-context learning schema to demonstrate the LLM with
examples for boosting the performance. InputBlaster is evaluated on 36 text
input widgets with cash bugs involving 31 popular Android apps, and results
show that it achieves 78% bug detection rate, with 136% higher than the best
baseline. Besides, we integrate it with the automated GUI testing tool and
detect 37 unseen crashes in real-world apps from Google Play.Comment: Accepted by IEEE/ACM International Conference on Software Engineering
2024 (ICSE 2024
Automated, Cost-effective, and Update-driven App Testing
Apps' pervasive role in our society led to the definition of test automation
approaches to ensure their dependability. However, state-of-the-art approaches
tend to generate large numbers of test inputs and are unlikely to achieve more
than 50% method coverage. In this paper, we propose a strategy to achieve
significantly higher coverage of the code affected by updates with a much
smaller number of test inputs, thus alleviating the test oracle problem. More
specifically, we present ATUA, a model-based approach that synthesizes App
models with static analysis, integrates a dynamically-refined state abstraction
function and combines complementary testing strategies, including (1) coverage
of the model structure, (2) coverage of the App code, (3) random exploration,
and (4) coverage of dependencies identified through information retrieval. Its
model-based strategy enables ATUA to generate a small set of inputs that
exercise only the code affected by the updates. In turn, this makes common test
oracle solutions more cost-effective as they tend to involve human effort. A
large empirical evaluation, conducted with 72 App versions belonging to nine
popular Android Apps, has shown that ATUA is more effective and less effort
intensive than state-of-the-art approaches when testing App updates
Fill in the Blank: Context-aware Automated Text Input Generation for Mobile GUI Testing
Automated GUI testing is widely used to help ensure the quality of mobile
apps. However, many GUIs require appropriate text inputs to proceed to the next
page which remains a prominent obstacle for testing coverage. Considering the
diversity and semantic requirement of valid inputs (e.g., flight departure,
movie name), it is challenging to automate the text input generation. Inspired
by the fact that the pre-trained Large Language Model (LLM) has made
outstanding progress in text generation, we propose an approach named QTypist
based on LLM for intelligently generating semantic input text according to the
GUI context. To boost the performance of LLM in the mobile testing scenario, we
develop a prompt-based data construction and tuning method which automatically
extracts the prompts and answers for model tuning. We evaluate QTypist on 106
apps from Google Play and the result shows that the passing rate of QTypist is
87%, which is 93% higher than the best baseline. We also integrate QTypist with
the automated GUI testing tools and it can cover 42% more app activities, 52%
more pages, and subsequently help reveal 122% more bugs compared with the raw
tool.Comment: Accepted by IEEE/ACM International Conference on Software Engineering
2023 (ICSE 2023
Hawkeye: Change-targeted Testing for Android Apps based on Deep Reinforcement Learning
Android Apps are frequently updated to keep up with changing user, hardware,
and business demands. Ensuring the correctness of App updates through extensive
testing is crucial to avoid potential bugs reaching the end user. Existing
Android testing tools generate GUI events focussing on improving the test
coverage of the entire App rather than prioritising updates and its impacted
elements. Recent research has proposed change-focused testing but relies on
random exploration to exercise the updates and impacted GUI elements that is
ineffective and slow for large complex Apps with a huge input exploration
space. We propose directed testing of App updates with Hawkeye that is able to
prioritise executing GUI actions associated with code changes based on deep
reinforcement learning from historical exploration data. Our empirical
evaluation compares Hawkeye with state-of-the-art model-based and reinforcement
learning-based testing tools FastBot2 and ARES using 10 popular open-source and
1 commercial App. We find that Hawkeye is able to generate GUI event sequences
targeting changed functions more reliably than FastBot2 and ARES for the open
source Apps and the large commercial App. Hawkeye achieves comparable
performance on smaller open source Apps with a more tractable exploration
space. The industrial deployment of Hawkeye in the development pipeline also
shows that Hawkeye is ideal to perform smoke testing for merge requests of a
complicated commercial App
FraudDroid: Automated Ad Fraud Detection for Android Apps
Although mobile ad frauds have been widespread, state-of-the-art approaches
in the literature have mainly focused on detecting the so-called static
placement frauds, where only a single UI state is involved and can be
identified based on static information such as the size or location of ad
views. Other types of fraud exist that involve multiple UI states and are
performed dynamically while users interact with the app. Such dynamic
interaction frauds, although now widely spread in apps, have not yet been
explored nor addressed in the literature. In this work, we investigate a wide
range of mobile ad frauds to provide a comprehensive taxonomy to the research
community. We then propose, FraudDroid, a novel hybrid approach to detect ad
frauds in mobile Android apps. FraudDroid analyses apps dynamically to build UI
state transition graphs and collects their associated runtime network traffics,
which are then leveraged to check against a set of heuristic-based rules for
identifying ad fraudulent behaviours. We show empirically that FraudDroid
detects ad frauds with a high precision (93%) and recall (92%). Experimental
results further show that FraudDroid is capable of detecting ad frauds across
the spectrum of fraud types. By analysing 12,000 ad-supported Android apps,
FraudDroid identified 335 cases of fraud associated with 20 ad networks that
are further confirmed to be true positive results and are shared with our
fellow researchers to promote advanced ad fraud detectionComment: 12 pages, 10 figure
Learning the language of apps
To explore the functionality of an app, automated test generators systematically identify and interact with its user interface (UI) elements. A key challenge is to synthesize inputs which effectively and efficiently cover app behavior. To do so, a test generator has to choose which elements to interact with but, which interactions to do on each element and which input values to type. In summary, to better test apps, a test generator should know the app's language, that is, the language of its graphical interactions and the language of its textual inputs. In this work, we show how a test generator can learn the language of apps and how this knowledge is modeled to create tests. We demonstrate how to learn the language of the graphical input prior to testing by combining machine learning and static analysis, and how to refine this knowledge during testing using reinforcement learning. In our experiments, statically learned models resulted in 50\% less ineffective actions an average increase in test (code) coverage of 19%, while refining these through reinforcement learning resulted in an additional test (code) coverage of up to 20%. We learn the language of textual inputs, by identifying the semantics of input fields in the UI and querying the web for real-world values. In our experiments, real-world values increase test (code) coverage ~10%; Finally, we show how to use context-free grammars to integrate both languages into a single representation (UI grammar), giving back control to the user. This representation can then be: mined from existing tests, associated to the app source code, and used to produce new tests. 82% test cases produced by fuzzing our UI grammar can reach a UI element within the app and 70% of them can reach a specific code location.Automatisierte Testgeneratoren identifizieren systematisch Elemente der Benutzeroberfläche und interagieren mit ihnen, um die Funktionalität einer App zu erkunden. Eine wichtige Herausforderung besteht darin, Eingaben zu synthetisieren, die das App-Verhalten effektiv und effizient abdecken. Dazu muss ein Testgenerator auswählen, mit welchen Elementen interagiert werden soll, welche Interaktionen jedoch für jedes Element ausgeführt werden sollen und welche Eingabewerte eingegeben werden sollen. Um Apps besser testen zu können, sollte ein Testgenerator die Sprache der App kennen, dh die Sprache ihrer grafischen Interaktionen und die Sprache ihrer Texteingaben. In dieser Arbeit zeigen wir, wie ein Testgenerator die Sprache von Apps lernen kann und wie dieses Wissen modelliert wird, um Tests zu erstellen. Wir zeigen, wie die Sprache der grafischen Eingabe lernen vor dem Testen durch maschinelles Lernen und statische Analyse kombiniert und wie dieses Wissen weiter verfeinern beim Testen Verstärkung Lernen verwenden. In unseren Experimenten führten statisch erlernte Modelle zu 50% weniger ineffektiven Aktionen, was einer durchschnittlichen Erhöhung der Testabdeckung (Code) von 19% entspricht, während die Verfeinerung dieser durch verstärkendes Lernen zu einer zusätzlichen Testabdeckung (Code) von bis zu 20% führte. Wir lernen die Sprache der Texteingaben, indem wir die Semantik der Eingabefelder in der Benutzeroberfläche identifizieren und das Web nach realen Werten abfragen. In unseren Experimenten erhöhen reale Werte die Testabdeckung (Code) um ca. 10%; Schließlich zeigen wir, wie kontextfreien Grammatiken verwenden beide Sprachen in einer einzigen Darstellung (UI Grammatik) zu integrieren, wieder die Kontrolle an den Benutzer zu geben. Diese Darstellung kann dann: aus vorhandenen Tests gewonnen, dem App-Quellcode zugeordnet und zur Erstellung neuer Tests verwendet werden. 82% Testfälle, die durch Fuzzing unserer UI-Grammatik erstellt wurden, können ein UI-Element in der App erreichen, und 70% von ihnen können einen bestimmten Code-Speicherort erreichen
Reverse Engineering and Testing of Rich Internet Applications
The World Wide Web experiences a continuous and constant evolution, where new initiatives, standards, approaches and technologies are continuously proposed for developing more effective and higher quality Web applications.
To satisfy the growing request of the market for Web applications, new technologies, frameworks, tools and environments that allow to develop Web and mobile applications with the least effort and in very short time have been introduced in the last years.
These new technologies have made possible the dawn of a new generation of Web applications, named Rich Internet Applications (RIAs), that offer greater usability and interactivity than traditional ones. This evolution has been accompanied by some drawbacks that are mostly due to the lack of applying well-known software engineering practices and approaches. As a consequence, new research questions and challenges have emerged in the field of web and mobile applications maintenance and testing.
The research activity described in this thesis has addressed some of these topics with the specific aim of proposing new and effective solutions to the problems of modelling, reverse engineering, comprehending, re-documenting and testing existing RIAs.
Due to the growing relevance of mobile applications in the renewed Web scenarios, the problem of testing mobile applications developed for the Android operating system has been addressed too, in an attempt of exploring and proposing new techniques of testing automation for these type of applications
- …