11 research outputs found
Never-ending Learning of User Interfaces
Machine learning models have been trained to predict semantic information
about user interfaces (UIs) to make apps more accessible, easier to test, and
to automate. Currently, most models rely on datasets that are collected and
labeled by human crowd-workers, a process that is costly and surprisingly
error-prone for certain tasks. For example, it is possible to guess if a UI
element is "tappable" from a screenshot (i.e., based on visual signifiers) or
from potentially unreliable metadata (e.g., a view hierarchy), but one way to
know for certain is to programmatically tap the UI element and observe the
effects. We built the Never-ending UI Learner, an app crawler that
automatically installs real apps from a mobile app store and crawls them to
discover new and challenging training examples to learn from. The Never-ending
UI Learner has crawled for more than 5,000 device-hours, performing over half a
million actions on 6,000 apps to train three computer vision models for i)
tappability prediction, ii) draggability prediction, and iii) screen
similarity
Intelligent Exploration for User Interface Modules of Mobile App with Collective Learning
A mobile app interface usually consists of a set of user interface modules.
How to properly design these user interface modules is vital to achieving user
satisfaction for a mobile app. However, there are few methods to determine
design variables for user interface modules except for relying on the judgment
of designers. Usually, a laborious post-processing step is necessary to verify
the key change of each design variable. Therefore, there is a only very limited
amount of design solutions that can be tested. It is timeconsuming and almost
impossible to figure out the best design solutions as there are many modules.
To this end, we introduce FEELER, a framework to fast and intelligently explore
design solutions of user interface modules with a collective machine learning
approach. FEELER can help designers quantitatively measure the preference score
of different design solutions, aiming to facilitate the designers to
conveniently and quickly adjust user interface module. We conducted extensive
experimental evaluations on two real-life datasets to demonstrate its
applicability in real-life cases of user interface module design in the Baidu
App, which is one of the most popular mobile apps in China.Comment: 10 pages, accepted as a full paper in KDD 202
Spotlight: Mobile UI Understanding using Vision-Language Models with a Focus
Mobile UI understanding is important for enabling various interaction tasks
such as UI automation and accessibility. Previous mobile UI modeling often
depends on the view hierarchy information of a screen, which directly provides
the structural data of the UI, with the hope to bypass challenging tasks of
visual modeling from screen pixels. However, view hierarchies are not always
available, and are often corrupted with missing object descriptions or
misaligned structure information. As a result, despite the use of view
hierarchies could offer short-term gains, it may ultimately hinder the
applicability and performance of the model. In this paper, we propose
\textit{Spotlight}, a vision-only approach for mobile UI understanding.
Specifically, we enhance a vision-language model that only takes the screenshot
of the UI and a region of interest on the screen -- the focus -- as the input.
This general architecture is easily scalable and capable of performing a range
of UI modeling tasks. Our experiments show that our model establishes SoTA
results on several representative UI tasks and outperforms previous methods
that use both screenshots and view hierarchies as inputs. Furthermore, we
explore multi-task learning and few-shot prompting capacities of the proposed
models, demonstrating promising results in the multi-task learning direction
EvIcon: Designing High-Usability Icon with Human-in-the-loop Exploration and IconCLIP
Interface icons are prevalent in various digital applications. Due to limited
time and budgets, many designers rely on informal evaluation, which often
results in poor usability icons. In this paper, we propose a unique
human-in-the-loop framework that allows our target users, i.e., novice and
professional UI designers, to improve the usability of interface icons
efficiently. We formulate several usability criteria into a perceptual
usability function and enable users to iteratively revise an icon set with an
interactive design tool, EvIcon. We take a large-scale pre-trained joint
image-text embedding (CLIP) and fine-tune it to embed icon visuals with icon
tags in the same embedding space (IconCLIP). During the revision process, our
design tool provides two types of instant perceptual usability feedback. First,
we provide perceptual usability feedback modeled by deep learning models
trained on IconCLIP embeddings and crowdsourced perceptual ratings. Second, we
use the embedding space of IconCLIP to assist users in improving icons' visual
distinguishability among icons within the user-prepared icon set. To provide
the perceptual prediction, we compiled IconCEPT10K, the first large-scale
dataset of perceptual usability ratings over interface icons, by
conducting a crowdsourcing study. We demonstrated that our framework could
benefit UI designers' interface icon revision process with a wide range of
professional experience. Moreover, the interface icons designed using our
framework achieved better semantic distance and familiarity, verified by an
additional online user study
Artificial Intelligence (AI) and User Experience (UX) design: A systematic literature review and future research agenda
PurposeThe aim of this article is to map the use of AI in the user experience (UX) design process. Disrupting the UX process by introducing novel digital tools such as Artificial Intelligence (AI) has the potential to improve efficiency and accuracy, while creating more innovative and creative solutions. Thus, understanding how AI can be leveraged for UX has important research and practical implications.Design/Methodology/ApproachThis article builds on a systematic literature review approach and aims to understand how AI is used in UX design today, as well as uncover some prominent themes for future research. Through a process of selection and filtering, 46 research articles are analysed, with findings synthesized based on a user-centred design and development process.FindingsOur analysis shows how AI is leveraged in the UX design process at different key areas. Namely, these include understanding the context of use, uncovering user requirements, aiding solution design, and evaluating design, and for assisting development of solutions. We also highlight the ways in which AI is changing the UX design process through illustrative examples.Originality/valueWhile there is increased interest in the use of AI in organizations, there is still limited work on how AI can be introduced into processes that depend heavily on human creativity and input. Thus, we show the ways in which AI can enhance such activities and assume tasks that have been typically performed by humans
StateLens: A Reverse Engineering Solution for Making Existing Dynamic Touchscreens Accessible
Blind people frequently encounter inaccessible dynamic touchscreens in their
everyday lives that are difficult, frustrating, and often impossible to use
independently. Touchscreens are often the only way to control everything from
coffee machines and payment terminals, to subway ticket machines and in-flight
entertainment systems. Interacting with dynamic touchscreens is difficult
non-visually because the visual user interfaces change, interactions often
occur over multiple different screens, and it is easy to accidentally trigger
interface actions while exploring the screen. To solve these problems, we
introduce StateLens - a three-part reverse engineering solution that makes
existing dynamic touchscreens accessible. First, StateLens reverse engineers
the underlying state diagrams of existing interfaces using point-of-view videos
found online or taken by users using a hybrid crowd-computer vision pipeline.
Second, using the state diagrams, StateLens automatically generates
conversational agents to guide blind users through specifying the tasks that
the interface can perform, allowing the StateLens iOS application to provide
interactive guidance and feedback so that blind users can access the interface.
Finally, a set of 3D-printed accessories enable blind people to explore
capacitive touchscreens without the risk of triggering accidental touches on
the interface. Our technical evaluation shows that StateLens can accurately
reconstruct interfaces from stationary, hand-held, and web videos; and, a user
study of the complete system demonstrates that StateLens successfully enables
blind users to access otherwise inaccessible dynamic touchscreens.Comment: ACM UIST 201