46 research outputs found

    Autonomous Large Language Model Agents Enabling Intent-Driven Mobile GUI Testing

    Full text link
    GUI testing checks if a software system behaves as expected when users interact with its graphical interface, e.g., testing specific functionality or validating relevant use case scenarios. Currently, deciding what to test at this high level is a manual task since automated GUI testing tools target lower level adequacy metrics such as structural code coverage or activity coverage. We propose DroidAgent, an autonomous GUI testing agent for Android, for semantic, intent-driven automation of GUI testing. It is based on Large Language Models and support mechanisms such as long- and short-term memory. Given an Android app, DroidAgent sets relevant task goals and subsequently tries to achieve them by interacting with the app. Our empirical evaluation of DroidAgent using 15 apps from the Themis benchmark shows that it can set up and perform realistic tasks, with a higher level of autonomy. For example, when testing a messaging app, DroidAgent created a second account and added a first account as a friend, testing a realistic use case, without human intervention. On average, DroidAgent achieved 61% activity coverage, compared to 51% for current state-of-the-art GUI testing techniques. Further, manual analysis shows that 317 out of the 374 autonomously created tasks are realistic and relevant to app functionalities, and also that DroidAgent interacts deeply with the apps and covers more features.Comment: 10 page

    Mobile GUI Testing Fragility: A Study on Open-Source Android Applications

    Get PDF
    Android applications do not seem to be tested as thoroughly as desktop ones. In particular, GUI testing appears generally limited. Like webbased applications, mobile apps suffer from GUI test fragility, i.e. GUI test classes failing or needing updates due to even minor modifications in the GUI or in the Application Under Test. The objective of our study is to estimate the adoption of GUI testing frameworks among Android opensource applications, the quantity of modifications needed to keep test classes up to date, and the amount of them due to GUI test fragility. We introduce a set of 21 metrics to measure the adoption of testing tools, the evolution of test classes and test methods, and to estimate the fragility of test suites. We computed our metrics for six GUI testing frameworks, none of which achieved a significant adoption among Android projects hosted on GitHub. When present, GUI test methods associated with the considered tools are modified often and a relevant portion (70% on average) of those modifications is induced by GUI-related fragilities. On average for the projects considered, more than 7% of the total modified lines of code between consecutive releases belong to test classes developed with the analysed testing frameworks. The measured percentage was higher on average than the one required by other generic test code, based on the JUnit testing framework. Fragility of GUI tests constitute a relevant concern, probably an obstacle for developers to adopt test automation. This first evaluation of the fragility of Android scripted GUI testing can constitute a benchmark for developers and testers leveraging the analysed test tools, and the basis for the definition of a taxonomy of fragility causes and guidelines to mitigate the issue

    A Metric Framework for the Gamification of Web and Mobile GUI Testing

    Get PDF
    System testing through the Graphical User Interface (GUI) is a valuable form of Verification & Validation for modern applications, especially in graphically-intensive domains like web and mobile applications. However, the practice is often overlooked by developers mostly because of its costly nature and the absence of immediate feedback about the quality of test sequence. This paper describes a proposal for the Gamification of exploratory GUI testing. We define - in a tool and domain- agnostic way - the basic concepts, a set of metrics, a scoring scheme and visual feedbacks to enable a gamified approach to the practice; we finally discuss the potential implications and envision a roadmap for the evaluation of the approach

    Fill in the Blank: Context-aware Automated Text Input Generation for Mobile GUI Testing

    Full text link
    Automated GUI testing is widely used to help ensure the quality of mobile apps. However, many GUIs require appropriate text inputs to proceed to the next page which remains a prominent obstacle for testing coverage. Considering the diversity and semantic requirement of valid inputs (e.g., flight departure, movie name), it is challenging to automate the text input generation. Inspired by the fact that the pre-trained Large Language Model (LLM) has made outstanding progress in text generation, we propose an approach named QTypist based on LLM for intelligently generating semantic input text according to the GUI context. To boost the performance of LLM in the mobile testing scenario, we develop a prompt-based data construction and tuning method which automatically extracts the prompts and answers for model tuning. We evaluate QTypist on 106 apps from Google Play and the result shows that the passing rate of QTypist is 87%, which is 93% higher than the best baseline. We also integrate QTypist with the automated GUI testing tools and it can cover 42% more app activities, 52% more pages, and subsequently help reveal 122% more bugs compared with the raw tool.Comment: Accepted by IEEE/ACM International Conference on Software Engineering 2023 (ICSE 2023

    Make LLM a Testing Expert: Bringing Human-like Interaction to Mobile GUI Testing via Functionality-aware Decisions

    Full text link
    Automated Graphical User Interface (GUI) testing plays a crucial role in ensuring app quality, especially as mobile applications have become an integral part of our daily lives. Despite the growing popularity of learning-based techniques in automated GUI testing due to their ability to generate human-like interactions, they still suffer from several limitations, such as low testing coverage, inadequate generalization capabilities, and heavy reliance on training data. Inspired by the success of Large Language Models (LLMs) like ChatGPT in natural language understanding and question answering, we formulate the mobile GUI testing problem as a Q&A task. We propose GPTDroid, asking LLM to chat with the mobile apps by passing the GUI page information to LLM to elicit testing scripts, and executing them to keep passing the app feedback to LLM, iterating the whole process. Within this framework, we have also introduced a functionality-aware memory prompting mechanism that equips the LLM with the ability to retain testing knowledge of the whole process and conduct long-term, functionality-based reasoning to guide exploration. We evaluate it on 93 apps from Google Play and demonstrate that it outperforms the best baseline by 32% in activity coverage, and detects 31% more bugs at a faster rate. Moreover, GPTDroid identify 53 new bugs on Google Play, of which 35 have been confirmed and fixed.Comment: Accepted by IEEE/ACM International Conference on Software Engineering 2024 (ICSE 2024). arXiv admin note: substantial text overlap with arXiv:2305.0943

    Software Testing with Large Language Model: Survey, Landscape, and Vision

    Full text link
    Pre-trained large language models (LLMs) have recently emerged as a breakthrough technology in natural language processing and artificial intelligence, with the ability to handle large-scale datasets and exhibit remarkable performance across a wide range of tasks. Meanwhile, software testing is a crucial undertaking that serves as a cornerstone for ensuring the quality and reliability of software products. As the scope and complexity of software systems continue to grow, the need for more effective software testing techniques becomes increasingly urgent, and making it an area ripe for innovative approaches such as the use of LLMs. This paper provides a comprehensive review of the utilization of LLMs in software testing. It analyzes 52 relevant studies that have used LLMs for software testing, from both the software testing and LLMs perspectives. The paper presents a detailed discussion of the software testing tasks for which LLMs are commonly used, among which test case preparation and program repair are the most representative ones. It also analyzes the commonly used LLMs, the types of prompt engineering that are employed, as well as the accompanied techniques with these LLMs. It also summarizes the key challenges and potential opportunities in this direction. This work can serve as a roadmap for future research in this area, highlighting potential avenues for exploration, and identifying gaps in our current understanding of the use of LLMs in software testing.Comment: 20 pages, 11 figure

    Gamified Exploratory GUI Testing of Web Applications: a Preliminary Evaluation

    Get PDF
    In the context of Software Engineering, testing is a well-known phase that plays a critical role, as is needed to ensure that the designed and produced code provides the expected results, avoiding faults and crashes. Exploratory GUI testing allows the tester to manually define test cases by directly interacting with the user interface of the finite system. However, testers often loosely perform exploratory GUI testing, as they perceive it as a time-consuming, repetitive and unappealing activity. We defined a gamified framework for GUI testing to address this issue, which we developed and integrated into the Augmented testing tool, Scout. Gamification is perceived as a means to enhance the performance of human testers by stimulating competition and encouraging them to achieve better results in terms of both efficiency and effectiveness. We performed a preliminary evaluation of the gamification layer with a small sample of testers to assess the benefits of the technique compared with the standard version of the same tool. Test sequences defined with the gamified tool achieved higher coverage (i.e., higher efficiency) and a slightly higher percentage of bugs found. The user's opinion was almost unanimously in favor of the gamified version of the tool

    LLM4TDD: Best Practices for Test Driven Development Using Large Language Models

    Full text link
    In today's society, we are becoming increasingly dependent on software systems. However, we also constantly witness the negative impacts of buggy software. Program synthesis aims to improve software correctness by automatically generating the program given an outline of the expected behavior. For decades, program synthesis has been an active research field, with recent approaches looking to incorporate Large Language Models to help generate code. This paper explores the concept of LLM4TDD, where we guide Large Language Models to generate code iteratively using a test-driven development methodology. We conduct an empirical evaluation using ChatGPT and coding problems from LeetCode to investigate the impact of different test, prompt and problem attributes on the efficacy of LLM4TDD