3,365 research outputs found
Using Hover to Compromise the Confidentiality of User Input on Android
We show that the new hover (floating touch) technology, available in a number
of today's smartphone models, can be abused by any Android application running
with a common SYSTEM_ALERT_WINDOW permission to record all touchscreen input
into other applications. Leveraging this attack, a malicious application
running on the system is therefore able to profile user's behavior, capture
sensitive input such as passwords and PINs as well as record all user's social
interactions. To evaluate our attack we implemented Hoover, a proof-of-concept
malicious application that runs in the system background and records all input
to foreground applications. We evaluated Hoover with 40 users, across two
different Android devices and two input methods, stylus and finger. In the case
of touchscreen input by finger, Hoover estimated the positions of users' clicks
within an error of 100 pixels and keyboard input with an accuracy of 79%.
Hoover captured users' input by stylus even more accurately, estimating users'
clicks within 2 pixels and keyboard input with an accuracy of 98%. We discuss
ways of mitigating this attack and show that this cannot be done by simply
restricting access to permissions or imposing additional cognitive load on the
users since this would significantly constrain the intended use of the hover
technology.Comment: 11 page
DroidBot-GPT: GPT-powered UI Automation for Android
This paper introduces DroidBot-GPT, a tool that utilizes GPT-like large
language models (LLMs) to automate the interactions with Android mobile
applications. Given a natural language description of a desired task,
DroidBot-GPT can automatically generate and execute actions that navigate the
app to complete the task. It works by translating the app GUI state information
and the available actions on the smartphone screen to natural language prompts
and asking the LLM to make a choice of actions. Since the LLM is typically
trained on a large amount of data including the how-to manuals of diverse
software applications, it has the ability to make reasonable choices of actions
based on the provided information. We evaluate DroidBot-GPT with a self-created
dataset that contains 33 tasks collected from 17 Android applications spanning
10 categories. It can successfully complete 39.39% of the tasks, and the
average partial completion progress is about 66.76%. Given the fact that our
method is fully unsupervised (no modification required from both the app and
the LLM), we believe there is great potential to enhance automation performance
with better app development paradigms and/or custom model training.Comment: 8 pages, 5 figure
ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation
Graphical User Interface (GUI) automation holds significant promise for
assisting users with complex tasks, thereby boosting human productivity.
Existing works leveraging Large Language Model (LLM) or LLM-based AI agents
have shown capabilities in automating tasks on Android and Web platforms.
However, these tasks are primarily aimed at simple device usage and
entertainment operations. This paper presents a novel benchmark, AssistGUI, to
evaluate whether models are capable of manipulating the mouse and keyboard on
the Windows platform in response to user-requested tasks. We carefully
collected a set of 100 tasks from nine widely-used software applications, such
as, After Effects and MS Word, each accompanied by the necessary project files
for better evaluation. Moreover, we propose an advanced Actor-Critic Embodied
Agent framework, which incorporates a sophisticated GUI parser driven by an
LLM-agent and an enhanced reasoning mechanism adept at handling lengthy
procedural tasks. Our experimental results reveal that our GUI Parser and
Reasoning mechanism outshine existing methods in performance. Nevertheless, the
potential remains substantial, with the best model attaining only a 46% success
rate on our benchmark. We conclude with a thorough analysis of the current
methods' limitations, setting the stage for future breakthroughs in this
domain.Comment: Project Page: https://showlab.github.io/assistgui
Make LLM a Testing Expert: Bringing Human-like Interaction to Mobile GUI Testing via Functionality-aware Decisions
Automated Graphical User Interface (GUI) testing plays a crucial role in
ensuring app quality, especially as mobile applications have become an integral
part of our daily lives. Despite the growing popularity of learning-based
techniques in automated GUI testing due to their ability to generate human-like
interactions, they still suffer from several limitations, such as low testing
coverage, inadequate generalization capabilities, and heavy reliance on
training data. Inspired by the success of Large Language Models (LLMs) like
ChatGPT in natural language understanding and question answering, we formulate
the mobile GUI testing problem as a Q&A task. We propose GPTDroid, asking LLM
to chat with the mobile apps by passing the GUI page information to LLM to
elicit testing scripts, and executing them to keep passing the app feedback to
LLM, iterating the whole process. Within this framework, we have also
introduced a functionality-aware memory prompting mechanism that equips the LLM
with the ability to retain testing knowledge of the whole process and conduct
long-term, functionality-based reasoning to guide exploration. We evaluate it
on 93 apps from Google Play and demonstrate that it outperforms the best
baseline by 32% in activity coverage, and detects 31% more bugs at a faster
rate. Moreover, GPTDroid identify 53 new bugs on Google Play, of which 35 have
been confirmed and fixed.Comment: Accepted by IEEE/ACM International Conference on Software Engineering
2024 (ICSE 2024). arXiv admin note: substantial text overlap with
arXiv:2305.0943
- …