2,022 research outputs found
Make LLM a Testing Expert: Bringing Human-like Interaction to Mobile GUI Testing via Functionality-aware Decisions
Automated Graphical User Interface (GUI) testing plays a crucial role in
ensuring app quality, especially as mobile applications have become an integral
part of our daily lives. Despite the growing popularity of learning-based
techniques in automated GUI testing due to their ability to generate human-like
interactions, they still suffer from several limitations, such as low testing
coverage, inadequate generalization capabilities, and heavy reliance on
training data. Inspired by the success of Large Language Models (LLMs) like
ChatGPT in natural language understanding and question answering, we formulate
the mobile GUI testing problem as a Q&A task. We propose GPTDroid, asking LLM
to chat with the mobile apps by passing the GUI page information to LLM to
elicit testing scripts, and executing them to keep passing the app feedback to
LLM, iterating the whole process. Within this framework, we have also
introduced a functionality-aware memory prompting mechanism that equips the LLM
with the ability to retain testing knowledge of the whole process and conduct
long-term, functionality-based reasoning to guide exploration. We evaluate it
on 93 apps from Google Play and demonstrate that it outperforms the best
baseline by 32% in activity coverage, and detects 31% more bugs at a faster
rate. Moreover, GPTDroid identify 53 new bugs on Google Play, of which 35 have
been confirmed and fixed.Comment: Accepted by IEEE/ACM International Conference on Software Engineering
2024 (ICSE 2024). arXiv admin note: substantial text overlap with
arXiv:2305.0943
LLM for Test Script Generation and Migration: Challenges, Capabilities, and Opportunities
This paper investigates the application of large language models (LLM) in the
domain of mobile application test script generation. Test script generation is
a vital component of software testing, enabling efficient and reliable
automation of repetitive test tasks. However, existing generation approaches
often encounter limitations, such as difficulties in accurately capturing and
reproducing test scripts across diverse devices, platforms, and applications.
These challenges arise due to differences in screen sizes, input modalities,
platform behaviors, API inconsistencies, and application architectures.
Overcoming these limitations is crucial for achieving robust and comprehensive
test automation.
By leveraging the capabilities of LLMs, we aim to address these challenges
and explore its potential as a versatile tool for test automation. We
investigate how well LLMs can adapt to diverse devices and systems while
accurately capturing and generating test scripts. Additionally, we evaluate its
cross-platform generation capabilities by assessing its ability to handle
operating system variations and platform-specific behaviors. Furthermore, we
explore the application of LLMs in cross-app migration, where it generates test
scripts across different applications and software environments based on
existing scripts.
Throughout the investigation, we analyze its adaptability to various user
interfaces, app architectures, and interaction patterns, ensuring accurate
script generation and compatibility. The findings of this research contribute
to the understanding of LLMs' capabilities in test automation. Ultimately, this
research aims to enhance software testing practices, empowering app developers
to achieve higher levels of software quality and development efficiency.Comment: Accepted by the 23rd IEEE International Conference on Software
Quality, Reliability, and Security (QRS 2023
Translation from Visual to Layout-based Android Test Cases: a Proof of Concept
Layout-based and Visual GUI testing are two approaches for testing mobile GUIs, both with individual benefits and drawbacks. Previous research has presented approaches to translate Layout-based scripts to hirdgen scripts but not the vice versa.
The objective of this work is to provide Proof of Concept of the effectiveness of automatic translation between existing Visual test scripts to Layout-based test scripts.
A tool architecture is presented and implemented in a tool capable of translating most hirdgen interactions with the GUI of an Android app into Layout-based instructions and oracles for the Espresso testing tool.
We validate our approach on two test suites of our own creation, consisting of 30 test cases each. The measured success rate of the translation is 96.7% (58 working test cases out of 60 applications of the translator).
The study provides support for the feasibility of a translation-based approach from Visual to Layout-based test cases. However, additional work is needed to make the approach applicable in real-world scenarios or larger open-source test suites
Software Testing with Large Language Model: Survey, Landscape, and Vision
Pre-trained large language models (LLMs) have recently emerged as a
breakthrough technology in natural language processing and artificial
intelligence, with the ability to handle large-scale datasets and exhibit
remarkable performance across a wide range of tasks. Meanwhile, software
testing is a crucial undertaking that serves as a cornerstone for ensuring the
quality and reliability of software products. As the scope and complexity of
software systems continue to grow, the need for more effective software testing
techniques becomes increasingly urgent, and making it an area ripe for
innovative approaches such as the use of LLMs. This paper provides a
comprehensive review of the utilization of LLMs in software testing. It
analyzes 52 relevant studies that have used LLMs for software testing, from
both the software testing and LLMs perspectives. The paper presents a detailed
discussion of the software testing tasks for which LLMs are commonly used,
among which test case preparation and program repair are the most
representative ones. It also analyzes the commonly used LLMs, the types of
prompt engineering that are employed, as well as the accompanied techniques
with these LLMs. It also summarizes the key challenges and potential
opportunities in this direction. This work can serve as a roadmap for future
research in this area, highlighting potential avenues for exploration, and
identifying gaps in our current understanding of the use of LLMs in software
testing.Comment: 20 pages, 11 figure
Automated, Cost-effective, and Update-driven App Testing
Apps' pervasive role in our society led to the definition of test automation
approaches to ensure their dependability. However, state-of-the-art approaches
tend to generate large numbers of test inputs and are unlikely to achieve more
than 50% method coverage. In this paper, we propose a strategy to achieve
significantly higher coverage of the code affected by updates with a much
smaller number of test inputs, thus alleviating the test oracle problem. More
specifically, we present ATUA, a model-based approach that synthesizes App
models with static analysis, integrates a dynamically-refined state abstraction
function and combines complementary testing strategies, including (1) coverage
of the model structure, (2) coverage of the App code, (3) random exploration,
and (4) coverage of dependencies identified through information retrieval. Its
model-based strategy enables ATUA to generate a small set of inputs that
exercise only the code affected by the updates. In turn, this makes common test
oracle solutions more cost-effective as they tend to involve human effort. A
large empirical evaluation, conducted with 72 App versions belonging to nine
popular Android Apps, has shown that ATUA is more effective and less effort
intensive than state-of-the-art approaches when testing App updates
Recommended from our members
Techniques for Efficient and Effective Mobile Testing
The booming mobile app market attracts a large number of developers. As a result, the competition is extremely tough. This fierce competition leads to high standards required for mobile apps, which mandates efficient and effective testing. Efficient testing requires little effort to use, while effective testing checks that the app under test behaves as expected. Manual testing is highly effective, but it is costly. Automatic testing should come to the rescue, but current automatic methods are either ineffective or inefficient. Methods using implicit specifications – for instance, “an app should not crash” for catching fail-stop errors – are ineffective because they cannot find semantic problems. Methods using explicit specifications such as test scripts are inefficient because they require huge developer effort to create and maintain specifications. In this thesis, we present our two approaches for solving these challenges. We first built the AppDoctor system which efficiently tests mobile apps. It quickly explores an app then slowly but accurately verifies the potential problems to identify bugs without introducing false positives. It uses dependencies discovered between actions to simplify its reports. Our second approach, implemented in the AppFlow system, leverages the ample opportunity of reusing test cases between apps to gain efficiency without losing effectiveness. It allows common UI elements to be used in test scripts then recognizes these UI elements in real apps using a machine learning approach. The system also allows tests to be specified in reusable pieces, and provides a system to synthesize complete test cases from these reusable pieces. It enables robust tests to be created and reused across apps in the same category. The combination of these two approaches enables a developer to quickly test an app on a great number of combinations of actions for fail-stop problems, and effortlessly and efficiently test the app on most common scenarios for semantic problems. This combination covers most of her test requirements and greatly reduces her burden in testing the app
Test Cases Evolution of Mobile Applications: Model Driven Approach
AELOS_HCERES2020 , NAOMOD_HCERES2020Mobile Applications Developers, with large freedom given to them, focus on satisfying market requirements and on pleasing consumer’s desires. They are forced to be creative and productive in a short period of time. As a result, billions of powerful mobile applications are displayed every day. Therefore, every mobile application needs to continually change and make an incremental evolution in order to survive and preserve its ranking among the top applications in the market. Mobile apps Testers hold a heavy responsibility on their shoulders, the intrinsic nature of agile swift change of mobile apps pushes them to be meticulous, to be aware that things can be different at any time, and to be prepared for unpredicted crashes. Therefore, starting the generation or the creation of test cases from scratch and selecting each time the overridden or the overloaded test cases is a tedious operation. In software testing the time allocated for testing and correcting defects is important for every software development (regularly half the time). This time can be reduced by the introduction of tools and the adoption of new testing methods. In the field of mobile development, new concerns should be taken into account; among the most important ones are the heterogeneity of execution environments and the fragmentation of terminals which have different impacts on the functionality, performance, and connectivity. This project studies the evolution of mobile applications and its impact on the evolution of test cases from their creation until their expiration stage. A detailed case study of a native open source Android application is provided; describing many aspects of design, development, testing in addition to the analysis of the process of mobile apps evolution. This project based on model driven engineering approach where the models are serialized using the standard XMI. It presents a protocol for the adaptation of test cases under certain restrictions
Mining Android Crash Fixes in the Absence of Issue- and Change-Tracking Systems
Android apps are prone to crash. This often arises from the misuse of Android framework APIs, making it harder to debug since official Android documentation does not discuss thoroughly potential exceptions.Recently, the program repair community has also started to investigate the possibility to fix crashes automatically. Current results, however, apply to limited example cases. In both scenarios of repair, the main issue is the need for more example data to drive the fix processes due to the high cost in time and effort needed to collect and identify fix examples. We propose in this work a scalable approach, CraftDroid, to mine crash fixes by leveraging a set of 28 thousand carefully reconstructed app lineages from app markets, without the need for the app source code or issue reports. We developed a replicative testing approach that locates fixes among app versions which output different runtime logs with the exact same test inputs. Overall, we have mined 104 relevant crash fixes, further abstracted 17 fine-grained fix templates that are demonstrated to be effective for patching crashed apks. Finally, we release ReCBench, a benchmark consisting of 200 crashed apks and the crash replication scripts, which the community can explore for evaluating generated crash-inducing bug patches
- …