98 research outputs found

    MIDAS: Deep learning human action intention prediction from natural eye movement patterns

    Get PDF
    Eye movements have long been studied as a window into the attentional mechanisms of the human brain and made accessible as novelty style human-machine interfaces. However, not everything that we gaze upon, is something we want to interact with; this is known as the Midas Touch problem for gaze interfaces. To overcome the Midas Touch problem, present interfaces tend not to rely on natural gaze cues, but rather use dwell time or gaze gestures. Here we present an entirely data-driven approach to decode human intention for object manipulation tasks based solely on natural gaze cues. We run data collection experiments where 16 participants are given manipulation and inspection tasks to be performed on various objects on a table in front of them. The subjects' eye movements are recorded using wearable eye-trackers allowing the participants to freely move their head and gaze upon the scene. We use our Semantic Fovea, a convolutional neural network model to obtain the objects in the scene and their relation to gaze traces at every frame. We then evaluate the data and examine several ways to model the classification task for intention prediction. Our evaluation shows that intention prediction is not a naive result of the data, but rather relies on non-linear temporal processing of gaze cues. We model the task as a time series classification problem and design a bidirectional Long-Short-Term-Memory (LSTM) network architecture to decode intentions. Our results show that we can decode human intention of motion purely from natural gaze cues and object relative position, with 91.9%91.9\% accuracy. Our work demonstrates the feasibility of natural gaze as a Zero-UI interface for human-machine interaction, i.e., users will only need to act naturally, and do not need to interact with the interface itself or deviate from their natural eye movement patterns

    A Review and Analysis of Eye-Gaze Estimation Systems, Algorithms and Performance Evaluation Methods in Consumer Platforms

    Full text link
    In this paper a review is presented of the research on eye gaze estimation techniques and applications, that has progressed in diverse ways over the past two decades. Several generic eye gaze use-cases are identified: desktop, TV, head-mounted, automotive and handheld devices. Analysis of the literature leads to the identification of several platform specific factors that influence gaze tracking accuracy. A key outcome from this review is the realization of a need to develop standardized methodologies for performance evaluation of gaze tracking systems and achieve consistency in their specification and comparative evaluation. To address this need, the concept of a methodological framework for practical evaluation of different gaze tracking systems is proposed.Comment: 25 pages, 13 figures, Accepted for publication in IEEE Access in July 201

    Scenes, saliency maps and scanpaths

    Get PDF
    The aim of this chapter is to review some of the key research investigating how people look at pictures. In particular, my goal is to provide theoretical background for those that are new to the field, while also explaining some of the relevant methods and analyses. I begin by introducing eye movements in the context of natural scene perception. As in other complex tasks, eye movements provide a measure of attention and information processing over time, and they tell us about how the foveated visual system determines what to prioritise. I then describe some of the many measures which have been derived to summarize where people look in complex images. These include global measures, analyses based on regions of interest and comparisons based on heat maps. A particularly popular approach for trying to explain fixation locations is the saliency map approach, and the first half of the chapter is mostly devoted to this topic. A large number of papers and models are built on this approach, but it is also worth spending time on this topic because the methods involved have been used across a wide range of applications. The saliency map approach is based on the fact that the visual system has topographic maps of visual features, that contrast within these features seems to be represented and prioritized, and that a central representation can be used to control attention and eye movements. This approach, and the underlying principles, has led to an increase in the number of researchers using complex natural scenes as stimuli. It is therefore important that those new to the field are familiar with saliency maps, their usage, and their pitfalls. I describe the original implementation of this approach (Itti & Koch, 2000), which uses spatial filtering at different levels of coarseness and combines them in an attempt to identify the regions which stand out from their background. Evaluating this model requires comparing fixation locations to model predictions. Several different experimental and comparison methods have been used, but most recent research shows that bottom-up guidance is rather limited in terms of predicting real eye movements. The second part of the chapter is largely concerned with measuring eye movement scanpaths. Scanpaths are the sequential patterns of fixations and saccades made when looking at something for a period of time. They show regularities which may reflect top-down attention, and some have attempted to link these to memory and an individual’s mental model of what they are looking at. While not all researchers will be testing hypotheses about scanpaths, an understanding of the underlying methods and theory will be of benefit to all. I describe the theories behind analyzing eye movements in this way, and various methods which have been used to represent and compare them. These methods allow one to quantify the similarity between two viewing patterns, and this similarity is linked to both the image and the observer. The last part of the chapter describes some applications of eye movements in image viewing. The methods discussed can be applied to complex images, and therefore these experiments can tell us about perception in art and marketing, as well as about machine vision

    Perception-driven approaches to real-time remote immersive visualization

    Get PDF
    In remote immersive visualization systems, real-time 3D perception through RGB-D cameras, combined with modern Virtual Reality (VR) interfaces, enhances the user’s sense of presence in a remote scene through 3D reconstruction rendered in a remote immersive visualization system. Particularly, in situations when there is a need to visualize, explore and perform tasks in inaccessible environments, too hazardous or distant. However, a remote visualization system requires the entire pipeline from 3D data acquisition to VR rendering satisfies the speed, throughput, and high visual realism. Mainly when using point-cloud, there is a fundamental quality difference between the acquired data of the physical world and the displayed data because of network latency and throughput limitations that negatively impact the sense of presence and provoke cybersickness. This thesis presents state-of-the-art research to address these problems by taking the human visual system as inspiration, from sensor data acquisition to VR rendering. The human visual system does not have a uniform vision across the field of view; It has the sharpest visual acuity at the center of the field of view. The acuity falls off towards the periphery. The peripheral vision provides lower resolution to guide the eye movements so that the central vision visits all the interesting crucial parts. As a first contribution, the thesis developed remote visualization strategies that utilize the acuity fall-off to facilitate the processing, transmission, buffering, and rendering in VR of 3D reconstructed scenes while simultaneously reducing throughput requirements and latency. As a second contribution, the thesis looked into attentional mechanisms to select and draw user engagement to specific information from the dynamic spatio-temporal environment. It proposed a strategy to analyze the remote scene concerning the 3D structure of the scene, its layout, and the spatial, functional, and semantic relationships between objects in the scene. The strategy primarily focuses on analyzing the scene with models the human visual perception uses. It sets a more significant proportion of computational resources on objects of interest and creates a more realistic visualization. As a supplementary contribution, A new volumetric point-cloud density-based Peak Signal-to-Noise Ratio (PSNR) metric is proposed to evaluate the introduced techniques. An in-depth evaluation of the presented systems, comparative examination of the proposed point cloud metric, user studies, and experiments demonstrated that the methods introduced in this thesis are visually superior while significantly reducing latency and throughput

    Human Machine Interfaces for Teleoperators and Virtual Environments

    Get PDF
    In Mar. 1990, a meeting organized around the general theme of teleoperation research into virtual environment display technology was conducted. This is a collection of conference-related fragments that will give a glimpse of the potential of the following fields and how they interplay: sensorimotor performance; human-machine interfaces; teleoperation; virtual environments; performance measurement and evaluation methods; and design principles and predictive models

    Social cognition and robotics

    Get PDF
    In our social world we continuously display nonverbal behavior during interaction. Particularly, when meeting for the first time we use these implicit signals to form judgments about each other, which is a cornerstone of cooperation and societal cohesion. The aim of the studies presented here was to examine which gaze patterns as well as other types of nonverbal signals, such as facial expressions, gestures and kinesics are presented during interaction, which signals are preferred, and which signals we base our social judgment on. Furthermore, it was investigated whether cultural context, of German or Japanese culture, influences these interaction and decision making patterns. One part of the following dissertation concerned itself mainly with gaze behavior as it is one of the most important tools humans use to function in the natural world. It allows monitoring the environment as well as signalling towards others. Thus, measuring whether attentional resources are captured by examining potential gaze following in reaction to pointing gestures and gaze shifts of an interaction partner was one of the goals of this dissertation. However, also intercultural differences in gaze reaction towards direct gaze during various types of interaction were examined. For that purpose, a real-world dyadic interaction scenario in combination with a mobile eyetracker was used. Evidence of gaze patterns suggested that independent of culture interactants seem to mostly ignore irrelevant directional cues and instead remain focused on the face of a conversation partner, at least while listening to said partner. This was a pattern also repeated when no displays of directional signals were performed. While speaking, on the other hand, interactants from Japan seem to change their behaviour, in contrast to interactants from Germany, as they avert their gaze away from the face, which may be attributed to cultural norms. As correct assessment of another person is a critical skill for humans to possess the second part of the presented dissertation investigated on which basis humans make these social decisions. Specifically, nonverbal signals of trustworthiness and potential cooperativeness in Germany and in Japan were of interest. Thus, in one study a mobile eyetracker was used to investigate intercultural differences in gaze patterns during the social judgment process of a small number of sequentially presented potential cooperation partner. In another study participants viewed video stimuli of faces, bodies and faces + bodies of potential cooperation partner to examine the basis of social decision making in more detail and also to explore a wider variety of nonverbal behaviours in a more controlled manner. Results indicated that while judging presenters on trustworthiness based on displayed nonverbal cues German participants seem to partly look away from the face and examine the body. This is behavior in contrast to Japanese participants who seem to remain fixated mostly on the face. Furthermore, it was shown that body motion may be of particular importance for social judgment and that body motion of one’s own culture as opposed to a different culture seems to be preferred. Lastly, nonverbal signals as a basis of decision making were explored in more detail by examining the preferred interaction partner’s behaviour presented as video stimuli. In recent years and presumably also in the future, the human social environment has been growing to include new types of interactants, such as robots. To therefore ensure a smooth interaction, robots need to be adjusted according to human social expectation, including their nonverbal behavior. That is one of the reasons why all results presented here were not only put in the context of human interaction and judgment, but also viewed in the context of human-robot interaction
    • …
    corecore