609 research outputs found

    Unobtrusive and pervasive video-based eye-gaze tracking

    Get PDF
    Eye-gaze tracking has long been considered a desktop technology that finds its use inside the traditional office setting, where the operating conditions may be controlled. Nonetheless, recent advancements in mobile technology and a growing interest in capturing natural human behaviour have motivated an emerging interest in tracking eye movements within unconstrained real-life conditions, referred to as pervasive eye-gaze tracking. This critical review focuses on emerging passive and unobtrusive video-based eye-gaze tracking methods in recent literature, with the aim to identify different research avenues that are being followed in response to the challenges of pervasive eye-gaze tracking. Different eye-gaze tracking approaches are discussed in order to bring out their strengths and weaknesses, and to identify any limitations, within the context of pervasive eye-gaze tracking, that have yet to be considered by the computer vision community.peer-reviewe

    Webcam Eye Tracking for Desktop and Mobile Devices: A Systematic Review

    Get PDF
    Building the Internet of Behaviors (IOB) obviously requires capturing human behavior. Sensor input from eye tracking has been widely used for profiling in market research, adaptive user interfaces, and other smart systems, but requires dedicated hardware. The wide spread of webcams in consumer devices like phones, tablets, notebooks, and smart TVs has fostered eye tracking with commodity cameras. In this paper, we present a systematic review across the IEEE and ACM databases -- complemented by snowballing and input from eye tracking experts at CHI 2021 -- to list and characterize publicly available webcam eye trackers that estimate the point-of-regard on devices with no additional hardware. Information from articles was supplemented by searching author websites and code repositories, and contacting authors. 16 eye trackers were found that can be used. The restrictions regarding license terms and technical performance are presented, enabling developers to choose an appropriate software for their IoB application

    A Review and Analysis of Eye-Gaze Estimation Systems, Algorithms and Performance Evaluation Methods in Consumer Platforms

    Full text link
    In this paper a review is presented of the research on eye gaze estimation techniques and applications, that has progressed in diverse ways over the past two decades. Several generic eye gaze use-cases are identified: desktop, TV, head-mounted, automotive and handheld devices. Analysis of the literature leads to the identification of several platform specific factors that influence gaze tracking accuracy. A key outcome from this review is the realization of a need to develop standardized methodologies for performance evaluation of gaze tracking systems and achieve consistency in their specification and comparative evaluation. To address this need, the concept of a methodological framework for practical evaluation of different gaze tracking systems is proposed.Comment: 25 pages, 13 figures, Accepted for publication in IEEE Access in July 201

    Videos in Context for Telecommunication and Spatial Browsing

    Get PDF
    The research presented in this thesis explores the use of videos embedded in panoramic imagery to transmit spatial and temporal information describing remote environments and their dynamics. Virtual environments (VEs) through which users can explore remote locations are rapidly emerging as a popular medium of presence and remote collaboration. However, capturing visual representation of locations to be used in VEs is usually a tedious process that requires either manual modelling of environments or the employment of specific hardware. Capturing environment dynamics is not straightforward either, and it is usually performed through specific tracking hardware. Similarly, browsing large unstructured video-collections with available tools is difficult, as the abundance of spatial and temporal information makes them hard to comprehend. At the same time, on a spectrum between 3D VEs and 2D images, panoramas lie in between, as they offer the same 2D images accessibility while preserving 3D virtual environments surrounding representation. For this reason, panoramas are an attractive basis for videoconferencing and browsing tools as they can relate several videos temporally and spatially. This research explores methods to acquire, fuse, render and stream data coming from heterogeneous cameras, with the help of panoramic imagery. Three distinct but interrelated questions are addressed. First, the thesis considers how spatially localised video can be used to increase the spatial information transmitted during video mediated communication, and if this improves quality of communication. Second, the research asks whether videos in panoramic context can be used to convey spatial and temporal information of a remote place and the dynamics within, and if this improves users' performance in tasks that require spatio-temporal thinking. Finally, the thesis considers whether there is an impact of display type on reasoning about events within videos in panoramic context. These research questions were investigated over three experiments, covering scenarios common to computer-supported cooperative work and video browsing. To support the investigation, two distinct video+context systems were developed. The first telecommunication experiment compared our videos in context interface with fully-panoramic video and conventional webcam video conferencing in an object placement scenario. The second experiment investigated the impact of videos in panoramic context on quality of spatio-temporal thinking during localization tasks. To support the experiment, a novel interface to video-collection in panoramic context was developed and compared with common video-browsing tools. The final experimental study investigated the impact of display type on reasoning about events. The study explored three adaptations of our video-collection interface to three display types. The overall conclusion is that videos in panoramic context offer a valid solution to spatio-temporal exploration of remote locations. Our approach presents a richer visual representation in terms of space and time than standard tools, showing that providing panoramic contexts to video collections makes spatio-temporal tasks easier. To this end, videos in context are suitable alternative to more difficult, and often expensive solutions. These findings are beneficial to many applications, including teleconferencing, virtual tourism and remote assistance

    Remote Data Collection During a Pandemic: A New Approach for Assessing and Coding Multisensory Attention Skills in Infants and Young Children

    Get PDF
    In early 2020, in-person data collection dramatically slowed or was completely halted across the world as many labs were forced to close due to the COVID-19 pandemic. Developmental researchers who assess looking time (especially those who rely heavily on in-lab eye-tracking or live coding techniques) were forced to re-think their methods of data collection. While a variety of remote or online platforms are available for gathering behavioral data outside of the typical lab setting, few are specifically designed for collecting and processing looking time data in infants and young children. To address these challenges, our lab developed several novel approaches for continuing data collection and coding for a remotely administered audiovisual looking time protocol. First, we detail a comprehensive approach for successfully administering the Multisensory Attention Assessment Protocol (MAAP), developed by our lab to assess multisensory attention skills (MASks; duration of looking, speed of shifting/disengaging, accuracy of audiovisual matching). The MAAP is administered from a distance (remotely) by using Zoom, Gorilla Experiment Builder, an internet connection, and a home computer. This new data collection approach has the advantage that participants can be tested in their homes. We discuss challenges and successes in implementing our approach for remote testing and data collection during an ongoing longitudinal project. Second, we detail an approach for estimating gaze direction and duration collected remotely from webcam recordings using a post processing toolkit (OpenFace) and demonstrate its effectiveness and precision. However, because OpenFace derives gaze estimates without translating them to an external frame of reference (i.e., the participant\u27s screen), we developed a machine learning (ML) approach to overcome this limitation. Thus, third, we trained a ML algorithm [(artificial neural network (ANN)] to classify gaze estimates from OpenFace with respect to areas of interest (AOI) on the participant\u27s screen (i.e., left, right, and center). We then demonstrate reliability between this approach and traditional coding approaches (e.g., coding gaze live). The combination of OpenFace and ML will provide a method to automate the coding of looking time for data collected remotely. Finally, we outline a series of best practices for developmental researchers conducting remote data collection for looking time studies

    Cursor control by point-of-regard estimation for a computer with integrated webcam

    Get PDF
    This work forms part of the project Eye-Communicate funded by the Malta Council for Science and Technology through the National Research & Innovation Programme (2012) under Research Grant No. R&I-2012-057.The problem of eye-gaze tracking by videooculography has been receiving extensive interest throughout the years owing to the wide range of applications associated with this technology. Nonetheless, the emergence of a new paradigm referred to as pervasive eye-gaze tracking, introduces new challenges that go beyond the typical conditions for which classical video-based eye- gaze tracking methods have been developed. In this paper, we propose to deal with the problem of point-of-regard estimation from low-quality images acquired by an integrated camera inside a notebook computer. The proposed method detects the iris region from low-resolution eye region images by its intensity values rather than the shape, ensuring that this region can also be detected at different angles of rotation and under partial occlusion by the eyelids. Following the calculation of the point- of-regard from the estimated iris center coordinates, a number of Kalman filters improve upon the noisy point-of-regard estimates to smoothen the trajectory of the mouse cursor on the monitor screen. Quantitative results obtained from a validation procedure reveal a low mean error that is within the footprint of the average on-screen icon.peer-reviewe

    Direct interaction with large displays through monocular computer vision

    Get PDF
    Large displays are everywhere, and have been shown to provide higher productivity gain and user satisfaction compared to traditional desktop monitors. The computer mouse remains the most common input tool for users to interact with these larger displays. Much effort has been made on making this interaction more natural and more intuitive for the user. The use of computer vision for this purpose has been well researched as it provides freedom and mobility to the user and allows them to interact at a distance. Interaction that relies on monocular computer vision, however, has not been well researched, particularly when used for depth information recovery. This thesis aims to investigate the feasibility of using monocular computer vision to allow bare-hand interaction with large display systems from a distance. By taking into account the location of the user and the interaction area available, a dynamic virtual touchscreen can be estimated between the display and the user. In the process, theories and techniques that make interaction with computer display as easy as pointing to real world objects is explored. Studies were conducted to investigate the way human point at objects naturally with their hand and to examine the inadequacy in existing pointing systems. Models that underpin the pointing strategy used in many of the previous interactive systems were formalized. A proof-of-concept prototype is built and evaluated from various user studies. Results from this thesis suggested that it is possible to allow natural user interaction with large displays using low-cost monocular computer vision. Furthermore, models developed and lessons learnt in this research can assist designers to develop more accurate and natural interactive systems that make use of human’s natural pointing behaviours

    I2DNet - Design and real-time evaluation of an appearance-based gaze estimation system

    Get PDF
    Gaze estimation problem can be addressed using either model-based or appearance-based approaches. Model-based approaches rely on features extracted from eye images to fit a 3D eye-ball model to obtain gaze point estimate while appearance-based methods attempt to directly map captured eye images to gaze point without any handcrafted features. Recently, availability of large datasets and novel deep learning techniques made appearance-based methods achieve superior accuracy than model-based approaches. However, many appearance-based gaze estimation systems perform well in within-dataset validation but fail to provide the same degree of accuracy in cross-dataset evaluation. Hence, it is still unclear how well the current state-of-the-art approaches perform in real-time in an interactive setting on unseen users. This paper proposes I2DNet, a novel architecture aimed to improve subject-independent gaze estimation accuracy that achieved a state-of-the-art 4.3 and 8.4 degree mean angle error on the MPIIGaze and RT-Gene datasets respectively. We have evaluated the proposed system as a gaze-controlled interface in real-time for a 9-block pointing and selection task and compared it with Webgazer.js and OpenFace 2.0. We have conducted a user study with 16 participants, and our proposed system reduces selection time and the number of missed selections statistically significantly compared to other two systems
    corecore