3,365 research outputs found

    Using Hover to Compromise the Confidentiality of User Input on Android

    Full text link
    We show that the new hover (floating touch) technology, available in a number of today's smartphone models, can be abused by any Android application running with a common SYSTEM_ALERT_WINDOW permission to record all touchscreen input into other applications. Leveraging this attack, a malicious application running on the system is therefore able to profile user's behavior, capture sensitive input such as passwords and PINs as well as record all user's social interactions. To evaluate our attack we implemented Hoover, a proof-of-concept malicious application that runs in the system background and records all input to foreground applications. We evaluated Hoover with 40 users, across two different Android devices and two input methods, stylus and finger. In the case of touchscreen input by finger, Hoover estimated the positions of users' clicks within an error of 100 pixels and keyboard input with an accuracy of 79%. Hoover captured users' input by stylus even more accurately, estimating users' clicks within 2 pixels and keyboard input with an accuracy of 98%. We discuss ways of mitigating this attack and show that this cannot be done by simply restricting access to permissions or imposing additional cognitive load on the users since this would significantly constrain the intended use of the hover technology.Comment: 11 page

    DroidBot-GPT: GPT-powered UI Automation for Android

    Full text link
    This paper introduces DroidBot-GPT, a tool that utilizes GPT-like large language models (LLMs) to automate the interactions with Android mobile applications. Given a natural language description of a desired task, DroidBot-GPT can automatically generate and execute actions that navigate the app to complete the task. It works by translating the app GUI state information and the available actions on the smartphone screen to natural language prompts and asking the LLM to make a choice of actions. Since the LLM is typically trained on a large amount of data including the how-to manuals of diverse software applications, it has the ability to make reasonable choices of actions based on the provided information. We evaluate DroidBot-GPT with a self-created dataset that contains 33 tasks collected from 17 Android applications spanning 10 categories. It can successfully complete 39.39% of the tasks, and the average partial completion progress is about 66.76%. Given the fact that our method is fully unsupervised (no modification required from both the app and the LLM), we believe there is great potential to enhance automation performance with better app development paradigms and/or custom model training.Comment: 8 pages, 5 figure

    ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation

    Full text link
    Graphical User Interface (GUI) automation holds significant promise for assisting users with complex tasks, thereby boosting human productivity. Existing works leveraging Large Language Model (LLM) or LLM-based AI agents have shown capabilities in automating tasks on Android and Web platforms. However, these tasks are primarily aimed at simple device usage and entertainment operations. This paper presents a novel benchmark, AssistGUI, to evaluate whether models are capable of manipulating the mouse and keyboard on the Windows platform in response to user-requested tasks. We carefully collected a set of 100 tasks from nine widely-used software applications, such as, After Effects and MS Word, each accompanied by the necessary project files for better evaluation. Moreover, we propose an advanced Actor-Critic Embodied Agent framework, which incorporates a sophisticated GUI parser driven by an LLM-agent and an enhanced reasoning mechanism adept at handling lengthy procedural tasks. Our experimental results reveal that our GUI Parser and Reasoning mechanism outshine existing methods in performance. Nevertheless, the potential remains substantial, with the best model attaining only a 46% success rate on our benchmark. We conclude with a thorough analysis of the current methods' limitations, setting the stage for future breakthroughs in this domain.Comment: Project Page: https://showlab.github.io/assistgui

    Make LLM a Testing Expert: Bringing Human-like Interaction to Mobile GUI Testing via Functionality-aware Decisions

    Full text link
    Automated Graphical User Interface (GUI) testing plays a crucial role in ensuring app quality, especially as mobile applications have become an integral part of our daily lives. Despite the growing popularity of learning-based techniques in automated GUI testing due to their ability to generate human-like interactions, they still suffer from several limitations, such as low testing coverage, inadequate generalization capabilities, and heavy reliance on training data. Inspired by the success of Large Language Models (LLMs) like ChatGPT in natural language understanding and question answering, we formulate the mobile GUI testing problem as a Q&A task. We propose GPTDroid, asking LLM to chat with the mobile apps by passing the GUI page information to LLM to elicit testing scripts, and executing them to keep passing the app feedback to LLM, iterating the whole process. Within this framework, we have also introduced a functionality-aware memory prompting mechanism that equips the LLM with the ability to retain testing knowledge of the whole process and conduct long-term, functionality-based reasoning to guide exploration. We evaluate it on 93 apps from Google Play and demonstrate that it outperforms the best baseline by 32% in activity coverage, and detects 31% more bugs at a faster rate. Moreover, GPTDroid identify 53 new bugs on Google Play, of which 35 have been confirmed and fixed.Comment: Accepted by IEEE/ACM International Conference on Software Engineering 2024 (ICSE 2024). arXiv admin note: substantial text overlap with arXiv:2305.0943
    • …
    corecore