18 research outputs found
Context2Name: A Deep Learning-Based Approach to Infer Natural Variable Names from Usage Contexts
Most of the JavaScript code deployed in the wild has been minified, a process
in which identifier names are replaced with short, arbitrary and meaningless
names. Minified code occupies less space, but also makes the code extremely
difficult to manually inspect and understand. This paper presents Context2Name,
a deep learningbased technique that partially reverses the effect of
minification by predicting natural identifier names for minified names. The
core idea is to predict from the usage context of a variable a name that
captures the meaning of the variable. The approach combines a lightweight,
token-based static analysis with an auto-encoder neural network that summarizes
usage contexts and a recurrent neural network that predict natural names for a
given usage context. We evaluate Context2Name with a large corpus of real-world
JavaScript code and show that it successfully predicts 47.5% of all minified
identifiers while taking only 2.9 milliseconds on average to predict a name. A
comparison with the state-of-the-art tools JSNice and JSNaughty shows that our
approach performs comparably in terms of accuracy while improving in terms of
efficiency. Moreover, Context2Name complements the state-of-the-art by
predicting 5.3% additional identifiers that are missed by both existing tools
Autonomous Large Language Model Agents Enabling Intent-Driven Mobile GUI Testing
GUI testing checks if a software system behaves as expected when users
interact with its graphical interface, e.g., testing specific functionality or
validating relevant use case scenarios. Currently, deciding what to test at
this high level is a manual task since automated GUI testing tools target lower
level adequacy metrics such as structural code coverage or activity coverage.
We propose DroidAgent, an autonomous GUI testing agent for Android, for
semantic, intent-driven automation of GUI testing. It is based on Large
Language Models and support mechanisms such as long- and short-term memory.
Given an Android app, DroidAgent sets relevant task goals and subsequently
tries to achieve them by interacting with the app. Our empirical evaluation of
DroidAgent using 15 apps from the Themis benchmark shows that it can set up and
perform realistic tasks, with a higher level of autonomy. For example, when
testing a messaging app, DroidAgent created a second account and added a first
account as a friend, testing a realistic use case, without human intervention.
On average, DroidAgent achieved 61% activity coverage, compared to 51% for
current state-of-the-art GUI testing techniques. Further, manual analysis shows
that 317 out of the 374 autonomously created tasks are realistic and relevant
to app functionalities, and also that DroidAgent interacts deeply with the apps
and covers more features.Comment: 10 page
Make LLM a Testing Expert: Bringing Human-like Interaction to Mobile GUI Testing via Functionality-aware Decisions
Automated Graphical User Interface (GUI) testing plays a crucial role in
ensuring app quality, especially as mobile applications have become an integral
part of our daily lives. Despite the growing popularity of learning-based
techniques in automated GUI testing due to their ability to generate human-like
interactions, they still suffer from several limitations, such as low testing
coverage, inadequate generalization capabilities, and heavy reliance on
training data. Inspired by the success of Large Language Models (LLMs) like
ChatGPT in natural language understanding and question answering, we formulate
the mobile GUI testing problem as a Q&A task. We propose GPTDroid, asking LLM
to chat with the mobile apps by passing the GUI page information to LLM to
elicit testing scripts, and executing them to keep passing the app feedback to
LLM, iterating the whole process. Within this framework, we have also
introduced a functionality-aware memory prompting mechanism that equips the LLM
with the ability to retain testing knowledge of the whole process and conduct
long-term, functionality-based reasoning to guide exploration. We evaluate it
on 93 apps from Google Play and demonstrate that it outperforms the best
baseline by 32% in activity coverage, and detects 31% more bugs at a faster
rate. Moreover, GPTDroid identify 53 new bugs on Google Play, of which 35 have
been confirmed and fixed.Comment: Accepted by IEEE/ACM International Conference on Software Engineering
2024 (ICSE 2024). arXiv admin note: substantial text overlap with
arXiv:2305.0943
Testing the Limits: Unusual Text Inputs Generation for Mobile App Crash Detection with Large Language Model
Mobile applications have become a ubiquitous part of our daily life,
providing users with access to various services and utilities. Text input, as
an important interaction channel between users and applications, plays an
important role in core functionality such as search queries, authentication,
messaging, etc. However, certain special text (e.g., -18 for Font Size) can
cause the app to crash, and generating diversified unusual inputs for fully
testing the app is highly demanded. Nevertheless, this is also challenging due
to the combination of explosion dilemma, high context sensitivity, and complex
constraint relations. This paper proposes InputBlaster which leverages the LLM
to automatically generate unusual text inputs for mobile app crash detection.
It formulates the unusual inputs generation problem as a task of producing a
set of test generators, each of which can yield a batch of unusual text inputs
under the same mutation rule. In detail, InputBlaster leverages LLM to produce
the test generators together with the mutation rules serving as the reasoning
chain, and utilizes the in-context learning schema to demonstrate the LLM with
examples for boosting the performance. InputBlaster is evaluated on 36 text
input widgets with cash bugs involving 31 popular Android apps, and results
show that it achieves 78% bug detection rate, with 136% higher than the best
baseline. Besides, we integrate it with the automated GUI testing tool and
detect 37 unseen crashes in real-world apps from Google Play.Comment: Accepted by IEEE/ACM International Conference on Software Engineering
2024 (ICSE 2024
Fill in the Blank: Context-aware Automated Text Input Generation for Mobile GUI Testing
Automated GUI testing is widely used to help ensure the quality of mobile
apps. However, many GUIs require appropriate text inputs to proceed to the next
page which remains a prominent obstacle for testing coverage. Considering the
diversity and semantic requirement of valid inputs (e.g., flight departure,
movie name), it is challenging to automate the text input generation. Inspired
by the fact that the pre-trained Large Language Model (LLM) has made
outstanding progress in text generation, we propose an approach named QTypist
based on LLM for intelligently generating semantic input text according to the
GUI context. To boost the performance of LLM in the mobile testing scenario, we
develop a prompt-based data construction and tuning method which automatically
extracts the prompts and answers for model tuning. We evaluate QTypist on 106
apps from Google Play and the result shows that the passing rate of QTypist is
87%, which is 93% higher than the best baseline. We also integrate QTypist with
the automated GUI testing tools and it can cover 42% more app activities, 52%
more pages, and subsequently help reveal 122% more bugs compared with the raw
tool.Comment: Accepted by IEEE/ACM International Conference on Software Engineering
2023 (ICSE 2023