300 research outputs found
Reflow: Automatically Improving Touch Interactions in Mobile Applications through Pixel-based Refinements
Touch is the primary way that users interact with smartphones. However,
building mobile user interfaces where touch interactions work well for all
users is a difficult problem, because users have different abilities and
preferences. We propose a system, Reflow, which automatically applies small,
personalized UI adaptations, called refinements -- to mobile app screens to
improve touch efficiency. Reflow uses a pixel-based strategy to work with
existing applications, and improves touch efficiency while minimally disrupting
the design intent of the original application. Our system optimizes a UI by (i)
extracting its layout from its screenshot, (ii) refining its layout, and (iii)
re-rendering the UI to reflect these modifications. We conducted a user study
with 10 participants and a heuristic evaluation with 6 experts and found that
applications optimized by Reflow led to, on average, 9% faster selection time
with minimal layout disruption. The results demonstrate that Reflow's
refinements useful UI adaptations to improve touch interactions
Screen Correspondence: Mapping Interchangeable Elements between UIs
Understanding user interface (UI) functionality is a useful yet challenging
task for both machines and people. In this paper, we investigate a machine
learning approach for screen correspondence, which allows reasoning about UIs
by mapping their elements onto previously encountered examples with known
functionality and properties. We describe and implement a model that
incorporates element semantics, appearance, and text to support correspondence
computation without requiring any labeled examples. Through a comprehensive
performance evaluation, we show that our approach improves upon baselines by
incorporating multi-modal properties of UIs. Finally, we show three example
applications where screen correspondence facilitates better UI understanding
for humans and machines: (i) instructional overlay generation, (ii) semantic UI
element search, and (iii) automated interface testing
Spotlight: Mobile UI Understanding using Vision-Language Models with a Focus
Mobile UI understanding is important for enabling various interaction tasks
such as UI automation and accessibility. Previous mobile UI modeling often
depends on the view hierarchy information of a screen, which directly provides
the structural data of the UI, with the hope to bypass challenging tasks of
visual modeling from screen pixels. However, view hierarchies are not always
available, and are often corrupted with missing object descriptions or
misaligned structure information. As a result, despite the use of view
hierarchies could offer short-term gains, it may ultimately hinder the
applicability and performance of the model. In this paper, we propose
\textit{Spotlight}, a vision-only approach for mobile UI understanding.
Specifically, we enhance a vision-language model that only takes the screenshot
of the UI and a region of interest on the screen -- the focus -- as the input.
This general architecture is easily scalable and capable of performing a range
of UI modeling tasks. Our experiments show that our model establishes SoTA
results on several representative UI tasks and outperforms previous methods
that use both screenshots and view hierarchies as inputs. Furthermore, we
explore multi-task learning and few-shot prompting capacities of the proposed
models, demonstrating promising results in the multi-task learning direction
Deep reinforcement learning for multi-modal embodied navigation
Ce travail se concentre sur une tâche de micro-navigation en plein air où le but est de naviguer
vers une adresse de rue spécifiée en utilisant plusieurs modalités (par exemple, images, texte
de scène et GPS). La tâche de micro-navigation extérieure s’avère etre un défi important pour
de nombreuses personnes malvoyantes, ce que nous démontrons à travers des entretiens et
des études de marché, et nous limitons notre définition des problèmes à leurs besoins. Nous
expérimentons d’abord avec un monde en grille partiellement observable (Grid-Street et Grid
City) contenant des maisons, des numéros de rue et des régions navigables. Ensuite, nous
introduisons le Environnement de Trottoir pour la Navigation Visuelle (ETNV), qui contient
des images panoramiques avec des boîtes englobantes pour les numéros de maison, les portes
et les panneaux de nom de rue, et des formulations pour plusieurs tâches de navigation. Dans
SEVN, nous formons un modèle de politique pour fusionner des observations multimodales
sous la forme d’images à résolution variable, de texte visible et de données GPS simulées afin
de naviguer vers une porte d’objectif. Nous entraînons ce modèle en utilisant l’algorithme
d’apprentissage par renforcement, Proximal Policy Optimization (PPO). Nous espérons que
cette thèse fournira une base pour d’autres recherches sur la création d’agents pouvant aider
les membres de la communauté des gens malvoyantes à naviguer le monde.This work focuses on an Outdoor Micro-Navigation (OMN) task in which the goal is to
navigate to a specified street address using multiple modalities including images, scene-text,
and GPS. This task is a significant challenge to many Blind and Visually Impaired (BVI)
people, which we demonstrate through interviews and market research. To investigate the
feasibility of solving this task with Deep Reinforcement Learning (DRL), we first introduce
two partially observable grid-worlds, Grid-Street and Grid City, containing houses, street
numbers, and navigable regions. In these environments, we train an agent to find specific
houses using local observations under a variety of training procedures. We parameterize
our agent with a neural network and train using reinforcement learning methods. Next, we
introduce the Sidewalk Environment for Visual Navigation (SEVN), which contains panoramic
images with labels for house numbers, doors, and street name signs, and formulations for
several navigation tasks. In SEVN, we train another neural network model using Proximal
Policy Optimization (PPO) to fuse multi-modal observations in the form of variable resolution
images, visible text, and simulated GPS data, and to use this representation to navigate to
goal doors. Our best model used all available modalities and was able to navigate to over 100
goals with an 85% success rate. We found that models with access to only a subset of these
modalities performed significantly worse, supporting the need for a multi-modal approach to
the OMN task. We hope that this thesis provides a foundation for further research into the
creation of agents to assist members of the BVI community to safely navigate
ScreenQA: Large-Scale Question-Answer Pairs over Mobile App Screenshots
We present a new task and dataset, ScreenQA, for screen content understanding
via question answering. The existing screen datasets are focused either on
structure and component-level understanding, or on a much higher-level
composite task such as navigation and task completion. We attempt to bridge the
gap between these two by annotating 80,000+ question-answer pairs over the RICO
dataset in hope to benchmark the screen reading comprehension capacity
GreaseVision: Rewriting the Rules of the Interface
Digital harms can manifest across any interface. Key problems in addressing
these harms include the high individuality of harms and the fast-changing
nature of digital systems. As a result, we still lack a systematic approach to
study harms and produce interventions for end-users. We put forward
GreaseVision, a new framework that enables end-users to collaboratively develop
interventions against harms in software using a no-code approach and recent
advances in few-shot machine learning. The contribution of the framework and
tool allow individual end-users to study their usage history and create
personalized interventions. Our contribution also enables researchers to study
the distribution of harms and interventions at scale
A competencies framework of visual impairments for enabling shared understanding in design
Existing work in Human Computer Interaction and accessibility research has long sought to investigate the experiences of people with visual impairments in order to address their needs through technology design and integrate their participation into different stages of the design process. Yet challenges remain regarding how disabilities are framed in technology design and the extent of involvement of disabled people within it. Furthermore, accessibility is often considered a specialised job and misunderstandings or assumptions about visually impaired people’s experiences and needs occur outside dedicated fields. This thesis presents an ethnomethodology-informed design critique for supporting awareness and shared understanding of visual impairments and accessibility that centres on their experiences, abilities, and participation in early-stage design. This work is rooted in an in-depth empirical investigation of the interactional competencies that people with visual impairments exhibit through their use of technology, which informs and shapes the concept of a Competencies Framework of Visual Impairments. Although past research has established stances for considering the individual abilities of disabled people and other social and relational factors in technology design, by drawing on ethnomethodology and its interest in situated competence this thesis employs an interactional perspective to investigate the practical accomplishments of visually impaired people. Thus, this thesis frames visual impairments in terms of competencies to be considered in the design process, rather than a deficiency or problem to be fixed through technology. Accordingly, this work favours supporting awareness and reflection rather than the design of particular solutions, which are also strongly needed for advancing accessible design at large.
This PhD thesis comprises two main empirical studies branched into three different investigations. The first and second investigations are based on a four-month ethnographic study with visually impaired participants examining their everyday technology practices. The third investigation comprises the design and implementation of a workshop study developed to include people with and without visual impairments in collaborative reflections about technology and accessibility. As such, each investigation informed the ones that followed, revisiting and refining concepts and design materials throughout the thesis. Although ethnomethodology is the overarching approach running through this PhD project, each investigation has a different focus of enquiry:
• The first is focused on analysing participants’ technology practices and unearthing the interactional competencies enabling them.
• The second is focused on analysing technology demonstrations, which were a pervasive phenomenon recorded during fieldwork, and the work of demonstrating as exhibited by visually impaired participants.
• Lastly, the third investigation defines a workshop approach employing video demonstrations and a deck of reflective design cards as building blocks for enabling shared understanding among people with and without visual impairments from different technology backgrounds; that is, users, technologists, designers, and researchers.
Overall, this thesis makes several contributions to audiences within and outside academia, such as the detailed accounts of some of the main technology practices of people with visual impairments and the methodological analysis of demonstrations in empirical Human Computer Interaction and accessibility research. Moreover, the main contribution lies in the conceptualisation of a Competencies Framework of Visual Impairments from the empirical analysis of interactional competencies and their practical exhibition through demonstrations, as well as the creation and use of a deck of cards that encapsulates the competencies and external elements involved in the everyday interactional accomplishments of people with visual impairments. All these contributions are lastly brought together in the implementation of the workshop approach that enabled participants to interact with and learn from each other. Thus, this thesis builds upon and advances contemporary strands of work in Human Computer Interaction that call for re-orienting how visual impairments and, overall, disabilities are framed in technology design, and ultimately for re-shaping the design practice itself
Demonstrating Interaction: The Case of Assistive Technology
Technology 'demos' have become a staple in technology design practice, especially for showcasing prototypes or systems. However, demonstrations are also commonplace and multifaceted phenomena in everyday life, and thus have found their way into empirical research of technology use. In spite of their presence in HCI, their methodical character as a research tool has so far received little attention in our community. We analysed 102 video-recorded demonstrations performed by visually impaired people, captured in the context of a larger ethnographic study investigating their technology use. In doing so, we exhibit core features of demonstrational work and discuss the relevance of the meta-activities occurring around and within demonstrations. We reflect on their value as an approach to doing HCI research on assistive technologies, for enabling shared understanding and letting us identify opportunities for design. Lastly, we discuss their implications as a research instrument for accessibility and HCI research more broadly
Automating Software Development for Mobile Computing Platforms
Mobile devices such as smartphones and tablets have become ubiquitous in today\u27s computing landscape. These devices have ushered in entirely new populations of users, and mobile operating systems are now outpacing more traditional desktop systems in terms of market share. The applications that run on these mobile devices (often referred to as apps ) have become a primary means of computing for millions of users and, as such, have garnered immense developer interest. These apps allow for unique, personal software experiences through touch-based UIs and a complex assortment of sensors. However, designing and implementing high quality mobile apps can be a difficult process. This is primarily due to challenges unique to mobile development including change-prone APIs and platform fragmentation, just to name a few. in this dissertation we develop techniques that aid developers in overcoming these challenges by automating and improving current software design and testing practices for mobile apps. More specifically, we first introduce a technique, called Gvt, that improves the quality of graphical user interfaces (GUIs) for mobile apps by automatically detecting instances where a GUI was not implemented to its intended specifications. Gvt does this by constructing hierarchal models of mobile GUIs from metadata associated with both graphical mock-ups (i.e., created by designers using photo-editing software) and running instances of the GUI from the corresponding implementation. Second, we develop an approach that completely automates prototyping of GUIs for mobile apps. This approach, called ReDraw, is able to transform an image of a mobile app GUI into runnable code by detecting discrete GUI-components using computer vision techniques, classifying these components into proper functional categories (e.g., button, dropdown menu) using a Convolutional Neural Network (CNN), and assembling these components into realistic code. Finally, we design a novel approach for automated testing of mobile apps, called CrashScope, that explores a given android app using systematic input generation with the intrinsic goal of triggering crashes. The GUI-based input generation engine is driven by a combination of static and dynamic analyses that create a model of an app\u27s GUI and targets common, empirically derived root causes of crashes in android apps. We illustrate that the techniques presented in this dissertation represent significant advancements in mobile development processes through a series of empirical investigations, user studies, and industrial case studies that demonstrate the effectiveness of these approaches and the benefit they provide developers
- …