Search CORE

811 research outputs found

Improving random GUI testing with image-based widget detection

Author: Bajammal M.
Ding J.
Lo R.
Memon A.
Russakovsky O.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 15/07/2019
Field of study

Graphical User Interfaces (GUIs) are amongst the most common user interfaces, enabling interactions with applications through mouse movements and key presses. Tools for automated testing of programs through their GUI exist, however they usually rely on operating system or framework specific knowledge to interact with an application. Due to frequent operating system updates, which can remove required information, and a large variety of different GUI frameworks using unique underlying data structures, such tools rapidly become obsolete, Consequently, for an automated GUI test generation tool, supporting many frameworks and operating systems is impractical. We propose a technique for improving GUI testing by automatically identifying GUI widgets in screen shots using machine learning techniques. As training data, we generate randomized GUIs to automatically extract widget information. The resulting model provides guidance to GUI testing tools in environments not currently supported by deriving GUI widget information from screen shots only. In our experiments, we found that identifying GUI widgets in screen shots and using this information to guide random testing achieved a significantly higher branch coverage in 18 of 20 applications, with an average increase of 42.5% when compared to conventional random testing

Crossref

White Rose Research Online

Keeping Context In Mind: Automating Mobile App Access Control with User Interface Inspection

Author: Fu Hao
Mohapatra Pransant
Zheng Zizhan
Zhu Sencun
Publication venue
Publication date: 22/01/2019
Field of study

Recent studies observe that app foreground is the most striking component that influences the access control decisions in mobile platform, as users tend to deny permission requests lacking visible evidence. However, none of the existing permission models provides a systematic approach that can automatically answer the question: Is the resource access indicated by app foreground? In this work, we present the design, implementation, and evaluation of COSMOS, a context-aware mediation system that bridges the semantic gap between foreground interaction and background access, in order to protect system integrity and user privacy. Specifically, COSMOS learns from a large set of apps with similar functionalities and user interfaces to construct generic models that detect the outliers at runtime. It can be further customized to satisfy specific user privacy preference by continuously evolving with user decisions. Experiments show that COSMOS achieves both high precision and high recall in detecting malicious requests. We also demonstrate the effectiveness of COSMOS in capturing specific user preferences using the decisions collected from 24 users and illustrate that COSMOS can be easily deployed on smartphones as a real-time guard with a very low performance overhead.Comment: Accepted for publication in IEEE INFOCOM'201

arXiv.org e-Print Archive

Crossref

Guiding Random Graphical and Natural User Interface Testing Through Domain Knowledge

Author: White Thomas D.
Publication venue: 'University of Sheffield Conference Proceedings'
Publication date: 01/09/2019
Field of study

Users have access to a diverse set of interfaces that can be used to interact with software. Tools exist for automatically generating test data for an application, but the data required by each user interface is complex. Generating realistic data similar to that of a user is difficult. The environment which an application is running inside may also limit the data available, or updates to an operating system can break support for tools that generate test data. Consequently, applications exist for which there are no automated methods of generating test data similar to that which a user would provide through real usage of a user interface. With no automated method of generating data, the cost of testing increases and there is an increased chance of bugs being released into production code. In this thesis, we investigate techniques which aim to mimic users, observing how stored user interactions can be split to generate data targeted at specific states of an application, or to generate different subareas of the data structure provided by a user interface. To reduce the cost of gathering and labelling graphical user interface data, we look at generating randomised screen shots of applications, which can be automatically labelled and used in the training stage of a machine learning model. These trained models could guide a randomised approach at generating tests, achieving a significantly higher branch coverage than an unguided random approach. However, for natural user interfaces, which allow interaction through body tracking, we could not learn such a model through generated data. We find that models derived from real user data can generate tests with a significantly higher branch coverage than a purely random tester for both natural and graphical user interfaces. Our approaches use no feedback from an application during test generation. Consequently, the models are “generating data in the dark”. Despite this, these models can still generate tests with a higher coverage than random testing, but there may be a benefit to inferring the current state of an application and using this to guide data generation

White Rose E-theses Online

mTESTAR for scriptless GUI testing on Android and iOS applications

Author: Jansen Thorn
Publication venue
Publication date: 29/09/2021
Field of study

Pure OAI Repository

Fill in the Blank: Context-aware Automated Text Input Generation for Mobile GUI Testing

Author: Che Xing
Chen Chunyang
Hu Jun
Huang Yuekai
Liu Zhe
Wang Junjie
Wang Qing
Publication venue
Publication date: 09/12/2022
Field of study

Automated GUI testing is widely used to help ensure the quality of mobile apps. However, many GUIs require appropriate text inputs to proceed to the next page which remains a prominent obstacle for testing coverage. Considering the diversity and semantic requirement of valid inputs (e.g., flight departure, movie name), it is challenging to automate the text input generation. Inspired by the fact that the pre-trained Large Language Model (LLM) has made outstanding progress in text generation, we propose an approach named QTypist based on LLM for intelligently generating semantic input text according to the GUI context. To boost the performance of LLM in the mobile testing scenario, we develop a prompt-based data construction and tuning method which automatically extracts the prompts and answers for model tuning. We evaluate QTypist on 106 apps from Google Play and the result shows that the passing rate of QTypist is 87%, which is 93% higher than the best baseline. We also integrate QTypist with the automated GUI testing tools and it can cover 42% more app activities, 52% more pages, and subsequently help reveal 122% more bugs compared with the raw tool.Comment: Accepted by IEEE/ACM International Conference on Software Engineering 2023 (ICSE 2023

arXiv.org e-Print Archive

Towards Efficient Record and Replay: A Case Study in WeChat

Author: Chen Chunyang
Deng Yuetang
Feng Sidong
Lu Haochuan
Xiong Ting
Publication venue
Publication date: 12/08/2023
Field of study

WeChat, a widely-used messenger app boasting over 1 billion monthly active users, requires effective app quality assurance for its complex features. Record-and-replay tools are crucial in achieving this goal. Despite the extensive development of these tools, the impact of waiting time between replay events has been largely overlooked. On one hand, a long waiting time for executing replay events on fully-rendered GUIs slows down the process. On the other hand, a short waiting time can lead to events executing on partially-rendered GUIs, negatively affecting replay effectiveness. An optimal waiting time should strike a balance between effectiveness and efficiency. We introduce WeReplay, a lightweight image-based approach that dynamically adjusts inter-event time based on the GUI rendering state. Given the real-time streaming on the GUI, WeReplay employs a deep learning model to infer the rendering state and synchronize with the replaying tool, scheduling the next event when the GUI is fully rendered. Our evaluation shows that our model achieves 92.1% precision and 93.3% recall in discerning GUI rendering states in the WeChat app. Through assessing the performance in replaying 23 common WeChat usage scenarios, WeReplay successfully replays all scenarios on the same and different devices more efficiently than the state-of-the-practice baselines

arXiv.org e-Print Archive

Make LLM a Testing Expert: Bringing Human-like Interaction to Mobile GUI Testing via Functionality-aware Decisions

Author: Che Xing
Chen Chunyang
Chen Mengzhuo
Liu Zhe
Wang Dandan
Wang Junjie
Wang Qing
Wu Boyu
Publication venue
Publication date: 24/10/2023
Field of study

Automated Graphical User Interface (GUI) testing plays a crucial role in ensuring app quality, especially as mobile applications have become an integral part of our daily lives. Despite the growing popularity of learning-based techniques in automated GUI testing due to their ability to generate human-like interactions, they still suffer from several limitations, such as low testing coverage, inadequate generalization capabilities, and heavy reliance on training data. Inspired by the success of Large Language Models (LLMs) like ChatGPT in natural language understanding and question answering, we formulate the mobile GUI testing problem as a Q&A task. We propose GPTDroid, asking LLM to chat with the mobile apps by passing the GUI page information to LLM to elicit testing scripts, and executing them to keep passing the app feedback to LLM, iterating the whole process. Within this framework, we have also introduced a functionality-aware memory prompting mechanism that equips the LLM with the ability to retain testing knowledge of the whole process and conduct long-term, functionality-based reasoning to guide exploration. We evaluate it on 93 apps from Google Play and demonstrate that it outperforms the best baseline by 32% in activity coverage, and detects 31% more bugs at a faster rate. Moreover, GPTDroid identify 53 new bugs on Google Play, of which 35 have been confirmed and fixed.Comment: Accepted by IEEE/ACM International Conference on Software Engineering 2024 (ICSE 2024). arXiv admin note: substantial text overlap with arXiv:2305.0943

arXiv.org e-Print Archive

Learning the language of apps

Author: Pereira Borges Junior Nataniel
Publication venue: Saarländische Universitäts- und Landesbibliothek
Publication date: 01/01/2020
Field of study

To explore the functionality of an app, automated test generators systematically identify and interact with its user interface (UI) elements. A key challenge is to synthesize inputs which effectively and efficiently cover app behavior. To do so, a test generator has to choose which elements to interact with but, which interactions to do on each element and which input values to type. In summary, to better test apps, a test generator should know the app's language, that is, the language of its graphical interactions and the language of its textual inputs. In this work, we show how a test generator can learn the language of apps and how this knowledge is modeled to create tests. We demonstrate how to learn the language of the graphical input prior to testing by combining machine learning and static analysis, and how to refine this knowledge during testing using reinforcement learning. In our experiments, statically learned models resulted in 50\% less ineffective actions an average increase in test (code) coverage of 19%, while refining these through reinforcement learning resulted in an additional test (code) coverage of up to 20%. We learn the language of textual inputs, by identifying the semantics of input fields in the UI and querying the web for real-world values. In our experiments, real-world values increase test (code) coverage ~10%; Finally, we show how to use context-free grammars to integrate both languages into a single representation (UI grammar), giving back control to the user. This representation can then be: mined from existing tests, associated to the app source code, and used to produce new tests. 82% test cases produced by fuzzing our UI grammar can reach a UI element within the app and 70% of them can reach a specific code location.Automatisierte Testgeneratoren identifizieren systematisch Elemente der Benutzeroberfläche und interagieren mit ihnen, um die Funktionalität einer App zu erkunden. Eine wichtige Herausforderung besteht darin, Eingaben zu synthetisieren, die das App-Verhalten effektiv und effizient abdecken. Dazu muss ein Testgenerator auswählen, mit welchen Elementen interagiert werden soll, welche Interaktionen jedoch für jedes Element ausgeführt werden sollen und welche Eingabewerte eingegeben werden sollen. Um Apps besser testen zu können, sollte ein Testgenerator die Sprache der App kennen, dh die Sprache ihrer grafischen Interaktionen und die Sprache ihrer Texteingaben. In dieser Arbeit zeigen wir, wie ein Testgenerator die Sprache von Apps lernen kann und wie dieses Wissen modelliert wird, um Tests zu erstellen. Wir zeigen, wie die Sprache der grafischen Eingabe lernen vor dem Testen durch maschinelles Lernen und statische Analyse kombiniert und wie dieses Wissen weiter verfeinern beim Testen Verstärkung Lernen verwenden. In unseren Experimenten führten statisch erlernte Modelle zu 50% weniger ineffektiven Aktionen, was einer durchschnittlichen Erhöhung der Testabdeckung (Code) von 19% entspricht, während die Verfeinerung dieser durch verstärkendes Lernen zu einer zusätzlichen Testabdeckung (Code) von bis zu 20% führte. Wir lernen die Sprache der Texteingaben, indem wir die Semantik der Eingabefelder in der Benutzeroberfläche identifizieren und das Web nach realen Werten abfragen. In unseren Experimenten erhöhen reale Werte die Testabdeckung (Code) um ca. 10%; Schließlich zeigen wir, wie kontextfreien Grammatiken verwenden beide Sprachen in einer einzigen Darstellung (UI Grammatik) zu integrieren, wieder die Kontrolle an den Benutzer zu geben. Diese Darstellung kann dann: aus vorhandenen Tests gewonnen, dem App-Quellcode zugeordnet und zur Erstellung neuer Tests verwendet werden. 82% Testfälle, die durch Fuzzing unserer UI-Grammatik erstellt wurden, können ein UI-Element in der App erreichen, und 70% von ihnen können einen bestimmten Code-Speicherort erreichen

Universaar

Acronym

Supporting Evolution and Maintenance of android Apps

Author: Linares-Vasquez Mario
Publication venue: W&M ScholarWorks
Publication date: 01/10/2016
Field of study

Mobile developers and testers face a number of emerging challenges. These include rapid platform evolution and API instability; issues in bug reporting and reproduction involving complex multitouch gestures; platform fragmentation; the impact of reviews and ratings on the success of their apps; management of crowd-sourced requirements; continuous pressure from the market for frequent releases; lack of effective and usable testing tools; and limited computational resources for handheld devices. Traditional and contemporary methods in software evolution and maintenance were not designed for these types of challenges; therefore, a set of studies and a new toolbox of techniques for mobile development are required to analyze current challenges and propose new solutions. This dissertation presents a set of empirical studies, as well as solutions for some of the key challenges when evolving and maintaining android apps. In particular, we analyzed key challenges experienced by practitioners and open issues in the mobile development community such as (i) android API instability, (ii) performance optimizations, (iii) automatic GUI testing, and (iv) energy consumption. When carrying out the studies, we relied on qualitative and quantitative analyses to understand the phenomena on a large scale by considering evidence extracted from software repositories and the opinions of open-source mobile developers. From the empirical studies, we identified that dynamic analysis is a relevant method for several evolution and maintenance tasks, in particular, because of the need of practitioners to execute/validate the apps on a diverse set of platforms (i.e., device and OS) and under pressure for continuous delivery. Therefore, we designed and implemented an extensible infrastructure that enables large-scale automatic execution of android apps to support different evolution and maintenance tasks (e.g., testing and energy optimization). In addition to the infrastructure we present a taxonomy of issues, single solutions to the issues, and guidelines to enable large execution of android apps. Finally, we devised novel approaches aimed at supporting testing and energy optimization of mobile apps (two key challenges in evolution and maintenance of android apps). First, we propose a novel hybrid approach for automatic GUI-based testing of apps that is able to generate (un)natural test sequences by mining real applications usages and learning statistical models that represent the GUI interactions. In addition, we propose a multi-objective approach for optimizing the energy consumption of GUIs in android apps that is able to generate visually appealing color compositions, while reducing the energy consumption and keeping a design concept close to the original

College of William & Mary: W&M Publish

Automatically Detecting Visual Bugs in HTML5 <canvas> Games

Author: Antoszko Stefan
Bezemer Cor-Paul
Macklon Finlay
Paas Dale
Romanova Natalia
Taesiri Mohammad Reza
Viggiato Markos
Publication venue
Publication date: 03/08/2022
Field of study

The HTML5 is used to display high quality graphics in web applications such as web games (i.e., games). However, automatically testing games is not possible with existing web testing techniques and tools, and manual testing is laborious. Many widely used web testing tools rely on the Document Object Model (DOM) to drive web test automation, but the contents of the are not represented in the DOM. The main alternative approach, snapshot testing, involves comparing oracle snapshot images with test-time snapshot images using an image similarity metric to catch visual bugs, i.e., bugs in the graphics of the web application. However, creating and maintaining oracle snapshot images for games is onerous, defeating the purpose of test automation. In this paper, we present a novel approach to automatically detect visual bugs in games. By leveraging an internal representation of objects on the , we decompose snapshot images into a set of object images, each of which is compared with a respective oracle asset (e.g., a sprite) using four similarity metrics: percentage overlap, mean squared error, structural similarity, and embedding similarity. We evaluate our approach by injecting 24 visual bugs into a custom game, and find that our approach achieves an accuracy of 100%, compared to an accuracy of 44.6% with traditional snapshot testing.Comment: Accepted at ASE 2022 conferenc

arXiv.org e-Print Archive