17 research outputs found

    A Review and Analysis of Eye-Gaze Estimation Systems, Algorithms and Performance Evaluation Methods in Consumer Platforms

    Full text link
    In this paper a review is presented of the research on eye gaze estimation techniques and applications, that has progressed in diverse ways over the past two decades. Several generic eye gaze use-cases are identified: desktop, TV, head-mounted, automotive and handheld devices. Analysis of the literature leads to the identification of several platform specific factors that influence gaze tracking accuracy. A key outcome from this review is the realization of a need to develop standardized methodologies for performance evaluation of gaze tracking systems and achieve consistency in their specification and comparative evaluation. To address this need, the concept of a methodological framework for practical evaluation of different gaze tracking systems is proposed.Comment: 25 pages, 13 figures, Accepted for publication in IEEE Access in July 201

    Rethinking Eye-blink: Assessing Task Difficulty through Physiological Representation of Spontaneous Blinking

    Get PDF
    Continuous assessment of task difficulty and mental workload is essential in improving the usability and accessibility of interactive systems. Eye tracking data has often been investigated to achieve this ability, with reports on the limited role of standard blink metrics. Here, we propose a new approach to the analysis of eye-blink responses for automated estimation of task difficulty. The core module is a time-frequency representation of eye-blink, which aims to capture the richness of information reflected on blinking. In our first study, we show that this method significantly improves the sensitivity to task difficulty. We then demonstrate how to form a framework where the represented patterns are analyzed with multi-dimensional Long Short-Term Memory recurrent neural networks for their non-linear mapping onto difficulty-related parameters. This framework outperformed other methods that used hand-engineered features. This approach works with any built-in camera, without requiring specialized devices. We conclude by discussing how Rethinking Eye-blink can benefit real-world applications

    Rethinking Eye-blink: Assessing Task Difficulty through Physiological Representation of Spontaneous Blinking

    Get PDF
    Continuous assessment of task difficulty and mental workload is essential in improving the usability and accessibility of interactive systems. Eye tracking data has often been investigated to achieve this ability, with reports on the limited role of standard blink metrics. Here, we propose a new approach to the analysis of eye-blink responses for automated estimation of task difficulty. The core module is a time-frequency representation of eye-blink, which aims to capture the richness of information reflected on blinking. In our first study, we show that this method significantly improves the sensitivity to task difficulty. We then demonstrate how to form a framework where the represented patterns are analyzed with multi-dimensional Long Short-Term Memory recurrent neural networks for their non-linear mapping onto difficulty-related parameters. This framework outperformed other methods that used hand-engineered features. This approach works with any built-in camera, without requiring specialized devices. We conclude by discussing how Rethinking Eye-blink can benefit real-world applications.Comment: [Accepted version] In Proceedings of CHI Conference on Human Factors in Computing Systems (CHI '21), May 8-13, 2021, Yokohama, Japan. ACM, New York, NY, USA. 19 Pages. https://doi.org/10.1145/3411764.344557

    Eye and mouth openness estimation in sign language and news broadcast videos

    Get PDF
    Currently there exists an increasing need of automatic video analysis tools to support sign language studies and the evaluation of the activity of the face in sign language and other videos. Henceforth, research focusing on automatic estimation and annotation of videos and facial gestures is continuously developing. In this work, techniques for the estimation of eye and mouth openness and eyebrow position are studied. Such estimation could prove beneficial for automatic annotation and quantitative evaluation of sign language videos as well as towards more prolific production of sign language material. The method proposed for the estimation of the eyebrow position, eye openness, and mouth state is based on the construction of a set of facial landmarks that employ different detection techniques designed for each facial element. Furthermore, we compare the presented landmark detection algorithm with a recently published third-party face alignment algorithm. The landmarks are used to compute features which describe the geometric information of the elements of the face. The features constitute the input for the classifiers that can produce quantized openness estimates for the studied facial elements. Finally, the estimation performance of the estimations is evaluated in quantitative and qualitative experiments with sign language and news broadcast videos

    Smart HMI for an autonomous vehicle

    Get PDF
    El presente trabajo expone la arquitectura diseñada para la implementación de un HMI (Human Machine Interface) en un vehículo autónomo desarrollado en la Universidad de Alcalá. Este sistema hace uso del ecosistema ROS (Robot Operating System) para la comunicación entre los diferentes modulos desarrollados en el vehículo. Además se expone la creación de una herramienta de captación de datos de conductores haciendo uso de la mirada de este, basada en OpenFace, una herramienta de código libre para análisis de caras. Para ello se han desarrollado dos métodos, uno basado en un método lineal y otro usando técnicas del algoritmo NARMAX. Se han desarrollado diferentes test para demostrar la precisión de ambos métodos y han sido evaluados en el dataset de accidentes DADA2000.This works presents the framework that composed the HMI (Human Machine Interface) built in an autonomous vehicle from University of Alcalá. This system has been developed using the framework ROS (Robot Operating System) for the communication between the different sub-modules developed on the vehicle. Also, a system to obtain gaze focalization data from drivers using a camera is presented, based on OpenFace, which is an open source tool for face analysis. Two different methods are proposed, one linear and other based on NARMAX algorithm. Different test has been done in order to prove their accuracy and they have been evaluated on the challenging dataset DADA2000, which is composed by traffic accidents.Máster Universitario en Ingeniería Industrial (M141

    Face tracking with active models for a driver monitoring application

    Get PDF
    La falta de atención durante la conducción es una de las principales causas de accidentes de tráfico. La \ud \ud monitorización del conductor para detectar inatención es un problema complejo, que incluye elementos fisiológicos y de \ud \ud comportamiento. Un sistema de Visión Computacional para detección de inatención se compone de varios etapas de procesado, y \ud \ud esta tesis se centra en el seguimiento de la cara del conductor. La tesis doctoral propone un nuevo conjunto de vídeos de \ud \ud conductores, grabados en un vehículo real y en dos simuladores realistas, que contienen la mayoría de los comportamientos \ud \ud presentes en la conducción, incluyendo gestos, giros de cabeza, interacción con el sistema de sonido y otras distracciones, \ud \ud y somnolencia. Esta base de datos, RS-DMV, se emplea para evaluar el rendimiento de los métodos que propone la tesis y \ud \ud otros del estado del arte. La tesis analiza el rendimiento de los Modelos Activos de Forma (ASM), y de los Modelos Locales \ud \ud Restringidos (CLM), por considerarlos a priori de interés. En concreto, se ha evaluado el método Stacked Trimmed ASM \ud \ud (STASM), que integra una serie de mejoras sobre el ASM original, mostrando una alta precisión en todas las pruebas cuando \ud \ud la cara es frontal a la cámara, si bien no funciona con la cara girada y su velocidad de ejecución es muy baja. CLM es \ud \ud capaz de ejecutarse con mayor rapidez, pero tiene una precisión mucho menor en todos los casos. El tercer método a evaluar \ud \ud es el Modelado y Seguimiento Simultáneo (SMAT), que caracteriza la forma y la textura de manera incremental, a partir de \ud \ud muestras encontradas previamente. La textura alrededor de cada punto de la forma que define la cara se modela mediante un \ud \ud conjunto de grupos (clusters) de muestras pasadas. El trabajo de tesis propone 3 métodos de clustering alternativos al \ud \ud original para la textura, y un modelo de forma entrenado off-line con una función de ajuste robusta. Los métodos \ud \ud alternativos propuestos obtienen una amplia mejora tanto en la precisión del seguimiento como en la robustez de éste frente \ud \ud a giros de cabeza, oclusiones, gestos y cambios de iluminación. Los métodos propuestos tienen, además, una baja carga \ud \ud computacional, y son capaces de ejecutarse a velocidades en torno a 100 imágenes por segundo en un computador de sobremesa

    Methods and techniques for analyzing human factors facets on drivers

    Get PDF
    Mención Internacional en el título de doctorWith millions of cars moving daily, driving is the most performed activity worldwide. Unfortunately, according to the World Health Organization (WHO), every year, around 1.35 million people worldwide die from road traffic accidents and, in addition, between 20 and 50 million people are injured, placing road traffic accidents as the second leading cause of death among people between the ages of 5 and 29. According to WHO, human errors, such as speeding, driving under the influence of drugs, fatigue, or distractions at the wheel, are the underlying cause of most road accidents. Global reports on road safety such as "Road safety in the European Union. Trends, statistics, and main challenges" prepared by the European Commission in 2018 presented a statistical analysis that related road accident mortality rates and periods segmented by hours and days of the week. This report revealed that the highest incidence of mortality occurs regularly in the afternoons during working days, coinciding with the period when the volume of traffic increases and when any human error is much more likely to cause a traffic accident. Accordingly, mitigating human errors in driving is a challenge, and there is currently a growing trend in the proposal for technological solutions intended to integrate driver information into advanced driving systems to improve driver performance and ergonomics. The study of human factors in the field of driving is a multidisciplinary field in which several areas of knowledge converge, among which stand out psychology, physiology, instrumentation, signal treatment, machine learning, the integration of information and communication technologies (ICTs), and the design of human-machine communication interfaces. The main objective of this thesis is to exploit knowledge related to the different facets of human factors in the field of driving. Specific objectives include identifying tasks related to driving, the detection of unfavorable cognitive states in the driver, such as stress, and, transversely, the proposal for an architecture for the integration and coordination of driver monitoring systems with other active safety systems. It should be noted that the specific objectives address the critical aspects in each of the issues to be addressed. Identifying driving-related tasks is one of the primary aspects of the conceptual framework of driver modeling. Identifying maneuvers that a driver performs requires training beforehand a model with examples of each maneuver to be identified. To this end, a methodology was established to form a data set in which a relationship is established between the handling of the driving controls (steering wheel, pedals, gear lever, and turn indicators) and a series of adequately identified maneuvers. This methodology consisted of designing different driving scenarios in a realistic driving simulator for each type of maneuver, including stop, overtaking, turns, and specific maneuvers such as U-turn and three-point turn. From the perspective of detecting unfavorable cognitive states in the driver, stress can damage cognitive faculties, causing failures in the decision-making process. Physiological signals such as measurements derived from the heart rhythm or the change of electrical properties of the skin are reliable indicators when assessing whether a person is going through an episode of acute stress. However, the detection of stress patterns is still an open problem. Despite advances in sensor design for the non-invasive collection of physiological signals, certain factors prevent reaching models capable of detecting stress patterns in any subject. This thesis addresses two aspects of stress detection: the collection of physiological values during stress elicitation through laboratory techniques such as the Stroop effect and driving tests; and the detection of stress by designing a process flow based on unsupervised learning techniques, delving into the problems associated with the variability of intra- and inter-individual physiological measures that prevent the achievement of generalist models. Finally, in addition to developing models that address the different aspects of monitoring, the orchestration of monitoring systems and active safety systems is a transversal and essential aspect in improving safety, ergonomics, and driving experience. Both from the perspective of integration into test platforms and integration into final systems, the problem of deploying multiple active safety systems lies in the adoption of monolithic models where the system-specific functionality is run in isolation, without considering aspects such as cooperation and interoperability with other safety systems. This thesis addresses the problem of the development of more complex systems where monitoring systems condition the operability of multiple active safety systems. To this end, a mediation architecture is proposed to coordinate the reception and delivery of data flows generated by the various systems involved, including external sensors (lasers, external cameras), cabin sensors (cameras, smartwatches), detection models, deliberative models, delivery systems and machine-human communication interfaces. Ontology-based data modeling plays a crucial role in structuring all this information and consolidating the semantic representation of the driving scene, thus allowing the development of models based on data fusion.I would like to thank the Ministry of Economy and Competitiveness for granting me the predoctoral fellowship BES-2016-078143 corresponding to the project TRA2015-63708-R, which provided me the opportunity of conducting all my Ph. D activities, including completing an international internship.Programa de Doctorado en Ciencia y Tecnología Informática por la Universidad Carlos III de MadridPresidente: José María Armingol Moreno.- Secretario: Felipe Jiménez Alonso.- Vocal: Luis Mart

    3D Gaze Estimation from Remote RGB-D Sensors

    Get PDF
    The development of systems able to retrieve and characterise the state of humans is important for many applications and fields of study. In particular, as a display of attention and interest, gaze is a fundamental cue in understanding people activities, behaviors, intentions, state of mind and personality. Moreover, gaze plays a major role in the communication process, like for showing attention to the speaker, indicating who is addressed or averting gaze to keep the floor. Therefore, many applications within the fields of human-human, human-robot and human-computer interaction could benefit from gaze sensing. However, despite significant advances during more than three decades of research, current gaze estimation technologies can not address the conditions often required within these fields, such as remote sensing, unconstrained user movements and minimum user calibration. Furthermore, to reduce cost, it is preferable to rely on consumer sensors, but this usually leads to low resolution and low contrast images that current techniques can hardly cope with. In this thesis we investigate the problem of automatic gaze estimation under head pose variations, low resolution sensing and different levels of user calibration, including the uncalibrated case. We propose to build a non-intrusive gaze estimation system based on remote consumer RGB-D sensors. In this context, we propose algorithmic solutions which overcome many of the limitations of previous systems. We thus address the main aspects of this problem: 3D head pose tracking, 3D gaze estimation, and gaze based application modeling. First, we develop an accurate model-based 3D head pose tracking system which adapts to the participant without requiring explicit actions. Second, to achieve a head pose invariant gaze estimation, we propose a method to correct the eye image appearance variations due to head pose. We then investigate on two different methodologies to infer the 3D gaze direction. The first one builds upon machine learning regression techniques. In this context, we propose strategies to improve their generalization, in particular, to handle different people. The second methodology is a new paradigm we propose and call geometric generative gaze estimation. This novel approach combines the benefits of geometric eye modeling (normally restricted to high resolution images due to the difficulty of feature extraction) with a stochastic segmentation process (adapted to low-resolution) within a Bayesian model allowing the decoupling of user specific geometry and session specific appearance parameters, along with the introduction of priors, which are appropriate for adaptation relying on small amounts of data. The aforementioned gaze estimation methods are validated through extensive experiments in a comprehensive database which we collected and made publicly available. Finally, we study the problem of automatic gaze coding in natural dyadic and group human interactions. The system builds upon the thesis contributions to handle unconstrained head movements and the lack of user calibration. It further exploits the 3D tracking of participants and their gaze to conduct a 3D geometric analysis within a multi-camera setup. Experiments on real and natural interactions demonstrate the system is highly accuracy. Overall, the methods developed in this dissertation are suitable for many applications, involving large diversity in terms of setup configuration, user calibration and mobility

    Multimodal analysis of verbal and nonverbal behaviour on the example of clinical depression

    No full text
    Clinical depression is a common mood disorder that may last for long periods, vary in severity, and could impair an individual’s ability to cope with daily life. Depression affects 350 million people worldwide and is therefore considered a burden not only on a personal and social level, but also on an economic one. Depression is the fourth most significant cause of suffering and disability worldwide and it is predicted to be the leading cause in 2020. Although treatment of depression disorders has proven to be effective in most cases, misdiagnosing depressed patients is a common barrier. Not only because depression manifests itself in different ways, but also because clinical interviews and self-reported history are currently the only ways of diagnosis, which risks a range of subjective biases either from the patient report or the clinical judgment. While automatic affective state recognition has become an active research area in the past decade, methods for mood disorder detection, such as depression, are still in their infancy. Using the advancements of affective sensing techniques, the long-term goal is to develop an objective multimodal system that supports clinicians during the diagnosis and monitoring of clinical depression. This dissertation aims to investigate the most promising characteristics of depression that can be “heard” and “seen” by a computer system for the task of detecting depression objectively. Using audio-video recordings of a clinically validated Australian depression dataset, several experiments are conducted to characterise depression-related patterns from verbal and nonverbal cues. Of particular interest in this dissertation is the exploration of speech style, speech prosody, eye activity, and head pose modalities. Statistical analysis and automatic classification of extracted cues are investigated. In addition, multimodal fusion methods of these modalities are examined to increase the accuracy and confidence level of detecting depression. These investigations result in a proposed system that detects depression in a binary manner (e.g. depressed vs. non-depressed) using temporal depression behavioural cues. The proposed system: (1) uses audio-video recordings to investigate verbal and nonverbal modalities, (2) extracts functional features from verbal and nonverbal modalities over the entire subjects’ segments, (3) pre- and post-normalises the extracted features, (4) selects features using the T-test, (5) classifies depression in a binary manner (i.e. severely depressed vs. healthy controls), and finally (6) fuses the individual modalities. The proposed system was validated for scalability and usability using generalisation experiments. Close studies were made of American and German depression datasets individually, and then also in combination with the Australian one. Applying the proposed system to the three datasets showed remarkably high classification results - up to a 95% average recall for the individual sets and 86% for the three combined. Strong implications are that the proposed system has the ability to generalise to different datasets recorded under quite different conditions such as collection procedure and task, depression diagnosis testing and scale, as well as cultural and language background. High performance was found consistently in speech prosody and eye activity in both individual and combined datasets, with head pose features a little less remarkable. Strong indications are that the extracted features are robust to large variations in recording conditions. Furthermore, once the modalities were combined, the classification results improved substantially. Therefore, the modalities are shown both to correlate and complement each other, working in tandem as an innovative system for diagnoses of depression across large variations of population and procedure
    corecore