47 research outputs found

    Analysis and extension of hierarchical temporal memory for multivariable time series

    Full text link
    Tesis doctoral inédita. Universidad Autónoma de Madrid, Escuela Politécnica Superior, junio de 201

    Eye center localization and gaze gesture recognition for human-computer interaction

    Get PDF
    © 2016 Optical Society of America. This paper introduces an unsupervised modular approach for accurate and real-time eye center localization in images and videos, thus allowing a coarse-to-fine, global-to-regional scheme. The trajectories of eye centers in consecutive frames, i.e., gaze gestures, are further analyzed, recognized, and employed to boost the human-computer interaction (HCI) experience. This modular approach makes use of isophote and gradient features to estimate the eye center locations. A selective oriented gradient filter has been specifically designed to remove strong gradients from eyebrows, eye corners, and shadows, which sabotage most eye center localization methods. A real-world implementation utilizing these algorithms has been designed in the form of an interactive advertising billboard to demonstrate the effectiveness of our method for HCI. The eye center localization algorithm has been compared with 10 other algorithms on the BioID database and six other algorithms on the GI4E database. It outperforms all the other algorithms in comparison in terms of localization accuracy. Further tests on the extended Yale Face Database b and self-collected data have proved this algorithm to be robust against moderate head poses and poor illumination conditions. The interactive advertising billboard has manifested outstanding usability and effectiveness in our tests and shows great potential for benefiting a wide range of real-world HCI applications

    An end-to-end review of gaze estimation and its interactive applications on handheld mobile devices

    Get PDF
    In recent years we have witnessed an increasing number of interactive systems on handheld mobile devices which utilise gaze as a single or complementary interaction modality. This trend is driven by the enhanced computational power of these devices, higher resolution and capacity of their cameras, and improved gaze estimation accuracy obtained from advanced machine learning techniques, especially in deep learning. As the literature is fast progressing, there is a pressing need to review the state of the art, delineate the boundary, and identify the key research challenges and opportunities in gaze estimation and interaction. This paper aims to serve this purpose by presenting an end-to-end holistic view in this area, from gaze capturing sensors, to gaze estimation workflows, to deep learning techniques, and to gaze interactive applications.PostprintPeer reviewe

    Covert Attention Tracking: Towards Two-Dimensional Real-Time Recordings

    Get PDF
    Achieving attention tracking as easily as recording eye movements is still beyond reach. However, by exploiting Steady-State Visual Evoked Potentials (SSVEPs) we could recently record in a satisfactory way the horizontal trajectory of covert visuospatial attention in single trials, both when attending target motion and during mental motion extrapolation. Here we show that, despite the different cortical functional architecture for horizontal and vertical motion processing, the same result is obtained for vertical attention tracking. Thus, it seems that trustworthy real-time two-dimensional attention tracking, with both physical and imagined target motion, is not a too far goal

    2D and 3D computer vision analysis of gaze, gender and age

    Get PDF
    Human-Computer Interaction (HCI) has been an active research area for over four decades. Research studies and commercial designs in this area have been largely facilitated by the visual modality which brings diversified functionality and improved usability to HCI interfaces by employing various computer vision techniques. This thesis explores a number of facial cues, such as gender, age and gaze, by performing 2D and 3D based computer vision analysis. The ultimate aim is to create a natural HCI strategy that can fulfil user expectations, augment user satisfaction and enrich user experience by understanding user characteristics and behaviours. To this end, salient features have been extracted and analysed from 2D and 3D face representations; 3D reconstruction algorithms and their compatible real-world imaging systems have been investigated; case study HCI systems have been designed to demonstrate the reliability, robustness, and applicability of the proposed method.More specifically, an unsupervised approach has been proposed to localise eye centres in images and videos accurately and efficiently. This is achieved by utilisation of two types of geometric features and eye models, complemented by an iris radius constraint and a selective oriented gradient filter specifically tailored to this modular scheme. This approach resolves challenges such as interfering facial edges, undesirable illumination conditions, head poses, and the presence of facial accessories and makeup. Tested on 3 publicly available databases (the BioID database, the GI4E database and the extended Yale Face Database b), and a self-collected database, this method outperforms all the methods in comparison and thus proves to be highly accurate and robust. Based on this approach, a gaze gesture recognition algorithm has been designed to increase the interactivity of HCI systems by encoding eye saccades into a communication channel similar to the role of hand gestures. As well as analysing eye/gaze data that represent user behaviours and reveal user intentions, this thesis also investigates the automatic recognition of user demographics such as gender and age. The Fisher Vector encoding algorithm is employed to construct visual vocabularies as salient features for gender and age classification. Algorithm evaluations on three publicly available databases (the FERET database, the LFW database and the FRCVv2 database) demonstrate the superior performance of the proposed method in both laboratory and unconstrained environments. In order to achieve enhanced robustness, a two-source photometric stereo method has been introduced to recover surface normals such that more invariant 3D facia features become available that can further boost classification accuracy and robustness. A 2D+3D imaging system has been designed for construction of a self-collected dataset including 2D and 3D facial data. Experiments show that utilisation of 3D facial features can increase gender classification rate by up to 6% (based on the self-collected dataset), and can increase age classification rate by up to 12% (based on the Photoface database). Finally, two case study HCI systems, a gaze gesture based map browser and a directed advertising billboard, have been designed by adopting all the proposed algorithms as well as the fully compatible imaging system. Benefits from the proposed algorithms naturally ensure that the case study systems can possess high robustness to head pose variation and illumination variation; and can achieve excellent real-time performance. Overall, the proposed HCI strategy enabled by reliably recognised facial cues can serve to spawn a wide array of innovative systems and to bring HCI to a more natural and intelligent state

    Using Eye-Tracking to Assess the Application of Divisibility Rules when Dividing a Multi-Digit Dividend by a Single Digit Divisor

    Get PDF
    Conference ProceedingsThe Department of Basic Education in South Africa has identified certain problem areas in Mathematics of which the factorisation of numbers was specifically identified as a problem area for Grade 9 learners. The building blocks for factorisation should already have been established in Grades 4, 5 and 6. Knowing the divisibility rules, will assist learners to simplify mathematical calculations such as factorisation of numbers, manipulating fractions and determining if a given number is a prime number. When a learner has to indicate, by only giving the answer, if a dividend is divisible by a certain single digit divisor, the teacher has no insight in the learner’s reasoning. If the answer is correct, the teacher does not know if the learner guessed the answer or applied the divisibility rule correctly or incorrectly. A pre-post experiment design was used to investigate the effect of revision on the difference in gaze behaviour of learners before and after revision of divisibility rules. The gaze behaviour was analysed before they respond to a question on divisibility. It is suggested that if teachers have access to learners’ answers, motivations and gaze behaviour, they can identify if learners (i) guessed the answers, (ii) applied the divisibility rules correctly, (iii) applied the divisibility rules correctly but made mental calculation errors, or (iv) applied the divisibility rules wrongly

    Perception-driven approaches to real-time remote immersive visualization

    Get PDF
    In remote immersive visualization systems, real-time 3D perception through RGB-D cameras, combined with modern Virtual Reality (VR) interfaces, enhances the user’s sense of presence in a remote scene through 3D reconstruction rendered in a remote immersive visualization system. Particularly, in situations when there is a need to visualize, explore and perform tasks in inaccessible environments, too hazardous or distant. However, a remote visualization system requires the entire pipeline from 3D data acquisition to VR rendering satisfies the speed, throughput, and high visual realism. Mainly when using point-cloud, there is a fundamental quality difference between the acquired data of the physical world and the displayed data because of network latency and throughput limitations that negatively impact the sense of presence and provoke cybersickness. This thesis presents state-of-the-art research to address these problems by taking the human visual system as inspiration, from sensor data acquisition to VR rendering. The human visual system does not have a uniform vision across the field of view; It has the sharpest visual acuity at the center of the field of view. The acuity falls off towards the periphery. The peripheral vision provides lower resolution to guide the eye movements so that the central vision visits all the interesting crucial parts. As a first contribution, the thesis developed remote visualization strategies that utilize the acuity fall-off to facilitate the processing, transmission, buffering, and rendering in VR of 3D reconstructed scenes while simultaneously reducing throughput requirements and latency. As a second contribution, the thesis looked into attentional mechanisms to select and draw user engagement to specific information from the dynamic spatio-temporal environment. It proposed a strategy to analyze the remote scene concerning the 3D structure of the scene, its layout, and the spatial, functional, and semantic relationships between objects in the scene. The strategy primarily focuses on analyzing the scene with models the human visual perception uses. It sets a more significant proportion of computational resources on objects of interest and creates a more realistic visualization. As a supplementary contribution, A new volumetric point-cloud density-based Peak Signal-to-Noise Ratio (PSNR) metric is proposed to evaluate the introduced techniques. An in-depth evaluation of the presented systems, comparative examination of the proposed point cloud metric, user studies, and experiments demonstrated that the methods introduced in this thesis are visually superior while significantly reducing latency and throughput

    Augmentative and alternative communication (AAC) advances: A review of configurations for individuals with a speech disability

    Get PDF
    High-tech augmentative and alternative communication (AAC) methods are on a constant rise; however, the interaction between the user and the assistive technology is still challenged for an optimal user experience centered around the desired activity. This review presents a range of signal sensing and acquisition methods utilized in conjunction with the existing high-tech AAC platforms for individuals with a speech disability, including imaging methods, touch-enabled systems, mechanical and electro-mechanical access, breath-activated methods, and brain–computer interfaces (BCI). The listed AAC sensing modalities are compared in terms of ease of access, affordability, complexity, portability, and typical conversational speeds. A revelation of the associated AAC signal processing, encoding, and retrieval highlights the roles of machine learning (ML) and deep learning (DL) in the development of intelligent AAC solutions. The demands and the affordability of most systems hinder the scale of usage of high-tech AAC. Further research is indeed needed for the development of intelligent AAC applications reducing the associated costs and enhancing the portability of the solutions for a real user’s environment. The consolidation of natural language processing with current solutions also needs to be further explored for the amelioration of the conversational speeds. The recommendations for prospective advances in coming high-tech AAC are addressed in terms of developments to support mobile health communicative applications

    Interfaces hombre-máquina mediante técnicas de seguimiento de ojos y reconocimiento de voz

    Full text link
    En este proyecto se ha abordado el diseño y validación de un sistema (en este caso una red neuronal) para caracterizar la actividad que realiza una persona delante de un ordenador mediante el seguimiento ocular. El objetivo de este prototipo es diferenciar entre cuatro tipos de actividades: la lectura de un texto, el visionado de un vídeo tranquilo, donde aparece una playa durante 35 segundos, el visionado de un vídeo de acción también de 35 segundos y la navegación en la página web de la Universidad Autónoma de Madrid. Para conseguir esta implementación, se utilizará la tecnología de Eye Tracking (seguimiento de los ojos) usando el dispositivo Eye Tracker de Tobii X2-30. Mediante un software de Eye Tracking se obtendrán las posiciones de la pantalla donde miran los ojos (coordenadas x e y) a lo largo del tiempo. Una vez obtenidas dichas coordenadas, se determinará la actividad realizada por el usuario a través del algoritmo de reconocimiento. Lo primero que se llevará a cabo en este trabajo será acceder a los datos de seguimiento de la pupila del dispositivo de Tobii, para poder hacer uso de la tecnología Eye Tracking. Una vez entendido el funcionamiento y el manejo de dicha cámara, se pasará a utilizarla con distintos usuarios para generar ficheros con coordenadas de distintas acciones y así tener una pequeña base de datos que servirá como entrenamiento para la red neuronal. Finalmente, se procederá a la validación del reconocimiento de las series temporales obtenidas en las distintas tareas de usuario registradas durante los experimentos.This project deals with the design and validation of a system (in this case, a neural network) to characterize the activity performed by a person in front of a computer by means of eye tracking. The goal of this prototype is to discriminate between four types of activities: reading a text, viewing a 35-second peaceful video showing a beach, viewing a 35-second action video and surfing in the Universidad Autónoma de Madrid website. To achieve this implementation, Eye Tracking technology will be used, as provided by the Tobii X2-30 Eye Tracker. By means of the Eye Tracking software, gaze screen positions (coordinates x and y) of the eye will be obtained over time. Once these coordinates are obtained, the user’s activity will be determined through the recognition algorithm. The first step that will take place in this work will be to access pupil tracking data from the Tobii device in order to apply the eye tracking technology. Once the operation of the camera is understood, it will be applied to several users to create files containing coordinates corresponding to the time series from different user tasks and thus have a small database available that will work as the training set for the neural network. Finally, we will proceed to validate the recognition of the time series obtained during the experiments for the different user tasks
    corecore