5,942 research outputs found
Using Variable Dwell Time to Accelerate Gaze-Based Web Browsing with Two-Step Selection
In order to avoid the "Midas Touch" problem, gaze-based interfaces for
selection often introduce a dwell time: a fixed amount of time the user must
fixate upon an object before it is selected. Past interfaces have used a
uniform dwell time across all objects. Here, we propose a gaze-based browser
using a two-step selection policy with variable dwell time. In the first step,
a command, e.g. "back" or "select", is chosen from a menu using a dwell time
that is constant across the different commands. In the second step, if the
"select" command is chosen, the user selects a hyperlink using a dwell time
that varies between different hyperlinks. We assign shorter dwell times to more
likely hyperlinks and longer dwell times to less likely hyperlinks. In order to
infer the likelihood each hyperlink will be selected, we have developed a
probabilistic model of natural gaze behavior while surfing the web. We have
evaluated a number of heuristic and probabilistic methods for varying the dwell
times using both simulation and experiment. Our results demonstrate that
varying dwell time improves the user experience in comparison with fixed dwell
time, resulting in fewer errors and increased speed. While all of the methods
for varying dwell time resulted in improved performance, the probabilistic
models yielded much greater gains than the simple heuristics. The best
performing model reduces error rate by 50% compared to 100ms uniform dwell time
while maintaining a similar response time. It reduces response time by 60%
compared to 300ms uniform dwell time while maintaining a similar error rate.Comment: This is an Accepted Manuscript of an article published by Taylor &
Francis in the International Journal of Human-Computer Interaction on 30
March, 2018, available online:
http://www.tandfonline.com/10.1080/10447318.2018.1452351 . For an eprint of
the final published article, please access:
https://www.tandfonline.com/eprint/T9d4cNwwRUqXPPiZYm8Z/ful
Co-adaptive control strategies in assistive Brain-Machine Interfaces
A large number of people with severe motor disabilities cannot access any of the
available control inputs of current assistive products, which typically rely on residual
motor functions. These patients are therefore unable to fully benefit from existent
assistive technologies, including communication interfaces and assistive robotics. In
this context, electroencephalography-based Brain-Machine Interfaces (BMIs) offer a
potential non-invasive solution to exploit a non-muscular channel for communication
and control of assistive robotic devices, such as a wheelchair, a telepresence
robot, or a neuroprosthesis. Still, non-invasive BMIs currently suffer from limitations,
such as lack of precision, robustness and comfort, which prevent their practical
implementation in assistive technologies.
The goal of this PhD research is to produce scientific and technical developments
to advance the state of the art of assistive interfaces and service robotics based on
BMI paradigms. Two main research paths to the design of effective control strategies
were considered in this project. The first one is the design of hybrid systems, based on
the combination of the BMI together with gaze control, which is a long-lasting motor
function in many paralyzed patients. Such approach allows to increase the degrees
of freedom available for the control. The second approach consists in the inclusion
of adaptive techniques into the BMI design. This allows to transform robotic tools and
devices into active assistants able to co-evolve with the user, and learn new rules of
behavior to solve tasks, rather than passively executing external commands.
Following these strategies, the contributions of this work can be categorized
based on the typology of mental signal exploited for the control. These include:
1) the use of active signals for the development and implementation of hybrid eyetracking
and BMI control policies, for both communication and control of robotic
systems; 2) the exploitation of passive mental processes to increase the adaptability
of an autonomous controller to the user\u2019s intention and psychophysiological state,
in a reinforcement learning framework; 3) the integration of brain active and passive
control signals, to achieve adaptation within the BMI architecture at the level of
feature extraction and classification
Dwell-free input methods for people with motor impairments
Millions of individuals affected by disorders or injuries that cause severe motor impairments have difficulty performing compound manipulations using traditional input devices. This thesis first explores how effective various assistive technologies are for people with motor impairments. The following questions are studied: (1) What activities are performed? (2) What tools are used to support these activities? (3) What are the advantages and limitations of these tools? (4) How do users learn about and choose assistive technologies? (5) Why do users adopt or abandon certain tools? A qualitative study of fifteen people with motor impairments indicates that users have strong needs for efficient text entry and communication tools that are not met by existing technologies.
To address these needs, this thesis proposes three dwell-free input methods, designed to improve the efficacy of target selection and text entry based on eye-tracking and head-tracking systems. They yield: (1) the Target Reverse Crossing selection mechanism, (2) the EyeSwipe eye-typing interface, and (3) the HGaze Typing interface. With Target Reverse Crossing, a user moves the cursor into a target and reverses over a goal to select it. This mechanism is significantly more efficient than dwell-time selection. Target Reverse Crossing is then adapted in EyeSwipe to delineate the start and end of a word that is eye-typed with a gaze path connecting the intermediate characters (as with traditional gesture typing). When compared with a dwell-based virtual keyboard, EyeSwipe affords higher text entry rates and a more comfortable interaction. Finally, HGaze Typing adds head gestures to gaze-path-based text entry to enable simple and explicit command activations. Results from a user study demonstrate that HGaze Typing has better performance and user satisfaction than a dwell-time method
GaVe: A webcam-based gaze vending interface using one-point calibration
Gaze input, i.e., information input via eye of users, represents a promising method for contact-free interaction in human-machine systems. In this paper, we present the GazeVending interface (GaVe), which lets users control actions on a display with their eyes. The interface works on a regular webcam, available on most of today's laptops, and only requires a short one-point calibration before use. GaVe is designed in a hierarchical structure, presenting broad item cluster to users first and subsequently guiding them through another selection round, which allows the presentation of a large number of items. Cluster/item selection in GaVe is based on the dwell time, i.e., the time duration that users look at a given Cluster/item. A user study (N=22) was conducted to test optimal dwell time thresholds and comfortable human-to-display distances. Users' perception of the system, as well as error rates and task completion time were registered. We found that all participants were able to quickly understand and know how to interact with the interface, and showed good performance, selecting a target item within a group of 12 items in 6.76 seconds on average. We provide design guidelines for GaVe and discuss the potentials of the system
An end-to-end review of gaze estimation and its interactive applications on handheld mobile devices
In recent years we have witnessed an increasing number of interactive systems on handheld mobile devices which utilise gaze as a single or complementary interaction modality. This trend is driven by the enhanced computational power of these devices, higher resolution and capacity of their cameras, and improved gaze estimation accuracy obtained from advanced machine learning techniques, especially in deep learning. As the literature is fast progressing, there is a pressing need to review the state of the art, delineate the boundary, and identify the key research challenges and opportunities in gaze estimation and interaction. This paper aims to serve this purpose by presenting an end-to-end holistic view in this area, from gaze capturing sensors, to gaze estimation workflows, to deep learning techniques, and to gaze interactive applications.PostprintPeer reviewe
Understanding Adoption Barriers to Dwell-Free Eye-Typing: Design Implications from a Qualitative Deployment Study and Computational Simulations
Eye-typing is a slow and cumbersome text entry method typically used by individuals with no other practical means of communication. As an alternative, prior HCI research has proposed dwell-free eye-typing as a potential improvement that eliminates time-consuming and distracting dwell-timeouts. However, it is rare that such research ideas are translated into working products. This paper reports on a qualitative deployment study of a product that was developed to allow users access to a dwell-free eye-typing research solution. This allowed us to understand how such a research solution would work in practice, as part of users\u27 current communication solutions in their own homes. Based on interviews and observations, we discuss a number of design issues that currently act as barriers preventing widespread adoption of dwell-free eye-typing. The study findings are complemented with computational simulations in a range of conditions that were inspired by the findings in the deployment study. These simulations serve to both contextualize the qualitative findings and to explore quantitative implications of possible interface redesigns. The combined analysis gives rise to a set of design implications for enabling wider adoption of dwell-free eye-typing in practice
Intelligent Techniques to Accelerate Everyday Text Communication
People with some form of speech- or motor-impairments usually use a high-tech augmentative and alternative communication (AAC) device to communicate with other people in writing or in face-to-face conversations. Their text entry rate on these devices is slow due to their motor abilities. Making good letter or word predictions can help accelerate the communication of such users. In this dissertation, we investigated several approaches to accelerate input for AAC users. First, considering that an AAC user is participating in a face-to-face conversation, we investigated whether performing speech recognition on the speaking-side can improve next word predictions. We compared the accuracy of three plausible microphone deployment options and the accuracy of two commercial speech recognition engines. We found that despite recognition word error rates of 7-16%, our ensemble of n-gram and recurrent neural network language models made predictions nearly as good as when they used the reference transcripts. In a user study with 160 participants, we also found that increasing number of prediction slots in a keyboard interface does not necessarily correlate to improved performance. Second, typing every character in a text message may require an AAC user more time or effort than strictly necessary. Skipping spaces or other characters may be able to speed input and reduce an AAC user\u27s physical input effort. We designed a recognizer optimized for expanding noisy abbreviated input where users often omitted spaces and mid-word vowels. We showed using neural language models for selecting conversational-style training text and for rescoring the recognizer\u27s n-best sentences improved accuracy. We found accurate abbreviated input was possible even if a third of characters was omitted. In a study where users had to dwell for a second on each key, we found sentence abbreviated input was competitive with a conventional keyboard with word predictions. Finally, AAC keyboards rely on language modeling to auto-correct noisy typing and to offer word predictions. While today language models can be trained on huge amounts of text, pre-trained models may fail to capture the unique writing style and vocabulary of individual users. We demonstrated improved performance compared to a unigram cache by adapting to a user\u27s text via language models based on prediction by partial match (PPM) and recurrent neural networks. Our best model ensemble increased keystroke savings by 9.6%
CodeGazer: Making Code Navigation Easy and Natural with Gaze Input
Navigating source code, an activity common in software development,is time consuming and in need of improvement. We present CodeGazer, a prototype for source code navigation using eye gaze for common navigation functions. These functions include actions such as “Go to Definition” and “Find All Usages” of an identifier, navigate to files and methods, move back and forth between visited points in code and scrolling. We present user study results showing that many users liked and even preferred the gaze-based navigation, in particular the “Go to Definition” function. Gaze-based navigation is also holding up well in completion time when compared to traditional methods. We discuss how eye gaze can be integrated into traditional mouse & keyboard applications in order to make “look up” tasks more natural
Thinking eyes: visual thinking strategies and the social brain
The foundation of art processes in the social brain can guide the scientific study of how human beings perceive and interact with their environment. Here, we applied the theoretical frameworks of the social and artistic brain connectomes to an eye-tracking paradigm with the aim to elucidate how different viewing conditions and social cues influence gaze patterns and personal resonance with artworks and complex imagery in healthy adults. We compared two viewing conditions that encourage personal or social perspective taking-modeled on the well-known Visual Thinking Strategies (VTS) method-to a viewing condition during which only contextual information about the image was provided. Our findings showed that the viewing conditions that used VTS techniques directed the gaze more toward highly salient social cues (Animate elements) in artworks and complex imagery, compared to when only contextual information was provided. We furthermore found that audio cues also directed visual attention, whereby listening to a personal reflection by another person (VTS) had a stronger effect than contextual information. However, we found no effect of viewing condition on the personal resonance with the artworks and complex images when taking the random effects of the image selection into account. Our study provides a neurobiological grounding of the VTS method in the social brain, revealing that this pedagogical method of engaging viewers with artworks measurably shapes people's visual exploration patterns. This is not only of relevance to (art) education but also has implications for art-based diagnostic and therapeutic applications
- …