45 research outputs found

    Eye Blink Detection

    Get PDF
    Abstract. Nowadays, people spend more time in front of electronic screens like computers, laptops, TV screens, mobile phones or tablets which cause eye blink frequency to decrease. Each blink spreads the tears on the eye cornea to moisture and disinfect the eye. Reduced blink rate causes eye redness and dryness also known as Dry Eye, which belongs to the major symptoms of the Computer Vision Syndrome. The goal of this work is to design eye blink detector which can be used in dry eye prevention system. We have analyzed available techniques for blink detection and designed our own solutions based on histogram backprojection and optical flow methods. We have tested our algorithms on different datasets under various lighting conditions. Inner movement detection method based on optical flow performs better than the histogram based ones. We achieve higher recognition rate and much lower false positive rate than the-state-of-the-art technique presented by Divjak and Bischof

    Affect-driven Engagement Measurement from Videos

    Full text link
    In education and intervention programs, person's engagement has been identified as a major factor in successful program completion. Automatic measurement of person's engagement provides useful information for instructors to meet program objectives and individualize program delivery. In this paper, we present a novel approach for video-based engagement measurement in virtual learning programs. We propose to use affect states, continuous values of valence and arousal extracted from consecutive video frames, along with a new latent affective feature vector and behavioral features for engagement measurement. Deep learning-based temporal, and traditional machine-learning-based non-temporal models are trained and validated on frame-level, and video-level features, respectively. In addition to the conventional centralized learning, we also implement the proposed method in a decentralized federated learning setting and study the effect of model personalization in engagement measurement. We evaluated the performance of the proposed method on the only two publicly available video engagement measurement datasets, DAiSEE and EmotiW, containing videos of students in online learning programs. Our experiments show a state-of-the-art engagement level classification accuracy of 63.3% and correctly classifying disengagement videos in the DAiSEE dataset and a regression mean squared error of 0.0673 on the EmotiW dataset. Our ablation study shows the effectiveness of incorporating affect states in engagement measurement. We interpret the findings from the experimental results based on psychology concepts in the field of engagement.Comment: 13 pages, 8 figures, 7 table

    A framework for context-aware driver status assessment systems

    Get PDF
    The automotive industry is actively supporting research and innovation to meet manufacturers' requirements related to safety issues, performance and environment. The Green ITS project is among the efforts in that regard. Safety is a major customer and manufacturer concern. Therefore, much effort have been directed to developing cutting-edge technologies able to assess driver status in term of alertness and suitability. In that regard, we aim to create with this thesis a framework for a context-aware driver status assessment system. Context-aware means that the machine uses background information about the driver and environmental conditions to better ascertain and understand driver status. The system also relies on multiple sensors, mainly video and audio. Using context and multi-sensor data, we need to perform multi-modal analysis and data fusion in order to infer as much knowledge as possible about the driver. Last, the project is to be continued by other students, so the system should be modular and well-documented. With this in mind, a driving simulator integrating multiple sensors was built. This simulator is a starting point for experimentation related to driver status assessment, and a prototype of software for real-time driver status assessment is integrated to the platform. To make the system context-aware, we designed a driver identification module based on audio-visual data fusion. Thus, at the beginning of driving sessions, the users are identified and background knowledge about them is loaded to better understand and analyze their behavior. A driver status assessment system was then constructed based on two different modules. The first one is for driver fatigue detection, based on an infrared camera. Fatigue is inferred via percentage of eye closure, which is the best indicator of fatigue for vision systems. The second one is a driver distraction recognition system, based on a Kinect sensor. Using body, head, and facial expressions, a fusion strategy is employed to deduce the type of distraction a driver is subject to. Of course, fatigue and distraction are only a fraction of all possible drivers' states, but these two aspects have been studied here primarily because of their dramatic impact on traffic safety. Through experimental results, we show that our system is efficient for driver identification and driver inattention detection tasks. Nevertheless, it is also very modular and could be further complemented by driver status analysis, context or additional sensor acquisition

    Individual and Inter-related Action Unit Detection in Videos for Affect Recognition

    Get PDF
    The human face has evolved to become the most important source of non-verbal information that conveys our affective, cognitive and mental state to others. Apart from human to human communication facial expressions have also become an indispensable component of human-machine interaction (HMI). Systems capable of understanding how users feel allow for a wide variety of applications in medical, learning, entertainment and marketing technologies in addition to advancements in neuroscience and psychology research and many others. The Facial Action Coding System (FACS) has been built to objectively define and quantify every possible facial movement through what is called Action Units (AU), each representing an individual facial action. In this thesis we focus on the automatic detection and exploitation of these AUs using novel appearance representation techniques as well as incorporation of the prior co-occurrence information between them. Our contributions can be grouped in three parts. In the first part, we propose to improve the detection accuracy of appearance features based on local binary patterns (LBP) for AU detection in videos. For this purpose, we propose two novel methodologies. The first one uses three fundamental image processing tools as a pre-processing step prior to the application of the LBP transform on the facial texture. These tools each enhance the descriptive ability of LBP by emphasizing different transient appearance characteristics, and are proven to increase the AU detection accuracy significantly in our experiments. The second one uses multiple local curvature Gabor binary patterns (LCGBP) for the same problem and achieves state-of-the-art performance on a dataset of mostly posed facial expressions. The curvature information of the face, as well as the proposed multiple filter size scheme is very effective in recognizing these individual facial actions. In the second part, we propose to take advantage of the co-occurrence relation between the AUs, that we can learn through training examples. We use this information in a multi-label discriminant Laplacian embedding (DLE) scheme to train our system with SIFT features extracted around the salient and transient landmarks on the face. The system is first validated on a challenging (containing lots of occlusions and head pose variations) dataset without the DLE, then we show the performance of the full system on the FERA 2015 challenge on AU occurence detection. The challenge consists of two difficult datasets that contain spontaneous facial actions at different intensities. We demonstrate that our proposed system achieves the best results on these datasets for detecting AUs. The third and last part of the thesis contains an application on how this automatic AU detection system can be used in real-life situations, particularly for detecting cognitive distraction. Our contribution in this part is two-fold: First, we present a novel visual database of people driving a simulator while being induced visual and cognitive distraction via secondary tasks. The subjects have been recorded using three near-infrared camera-lighting systems, which makes it a very suitable configuration to use in real driving conditions, i.e. with large head pose and ambient light variations. Secondly, we propose an original framework to automatically discriminate cognitive distraction sequences from baseline sequences by extracting features from continuous AU signals and by exploiting the cross-correlations between them. We achieve a very high classification accuracy in our subject-based experiments and a lower yet acceptable performance for the subject-independent tests. Based on these results we discuss how facial expressions related to this complex mental state are individual, rather than universal, and also how the proposed system can be used in a vehicle to help decrease human error in traffic accidents

    Towards Automation and Human Assessment of Objective Skin Quantification

    Get PDF
    The goal of this study is to provide an objective criterion for computerised skin quality assessment. Humans have been impacted by a variety of face features. Utilising eye-tracking technology assists to get a better understanding of human visual behaviour, this research examined the influence of face characteristics on the quantification of skin evaluation and age estimation. The results revealed that when facial features are apparent, individuals do well in age estimation. Also, this research attempts to examine the performance and perception of machine learning algorithms for various skin attributes. Comparison of the traditional machine learning technique to deep learning approaches. Support Vector Machine (SVM) and Convolutional Neural Networks (CNNs) were used to evaluate classification algorithms, with CNNs outperforming SVM. The primary difficulty in training deep learning algorithms is the need of large-scale dataset. This thesis proposed two high-resolution face datasets to address the requirement of face images for research community to study face and skin quality. Additionally, the study of machine-generated skin patches using Generative Adversarial Networks (GANs) is conducted. Dermatologists confirmed the machine-generated images by evaluating the fake and real images. Only 38% accurately predicted the real from fake correctly. Lastly, the performance of human perception and machine algorithm is compared using the heat-map from the eye-tracking experiment and the machine learning prediction on age estimation. The finding indicates that both humans and machines predict in a similar manner

    Methods and techniques for analyzing human factors facets on drivers

    Get PDF
    Mención Internacional en el título de doctorWith millions of cars moving daily, driving is the most performed activity worldwide. Unfortunately, according to the World Health Organization (WHO), every year, around 1.35 million people worldwide die from road traffic accidents and, in addition, between 20 and 50 million people are injured, placing road traffic accidents as the second leading cause of death among people between the ages of 5 and 29. According to WHO, human errors, such as speeding, driving under the influence of drugs, fatigue, or distractions at the wheel, are the underlying cause of most road accidents. Global reports on road safety such as "Road safety in the European Union. Trends, statistics, and main challenges" prepared by the European Commission in 2018 presented a statistical analysis that related road accident mortality rates and periods segmented by hours and days of the week. This report revealed that the highest incidence of mortality occurs regularly in the afternoons during working days, coinciding with the period when the volume of traffic increases and when any human error is much more likely to cause a traffic accident. Accordingly, mitigating human errors in driving is a challenge, and there is currently a growing trend in the proposal for technological solutions intended to integrate driver information into advanced driving systems to improve driver performance and ergonomics. The study of human factors in the field of driving is a multidisciplinary field in which several areas of knowledge converge, among which stand out psychology, physiology, instrumentation, signal treatment, machine learning, the integration of information and communication technologies (ICTs), and the design of human-machine communication interfaces. The main objective of this thesis is to exploit knowledge related to the different facets of human factors in the field of driving. Specific objectives include identifying tasks related to driving, the detection of unfavorable cognitive states in the driver, such as stress, and, transversely, the proposal for an architecture for the integration and coordination of driver monitoring systems with other active safety systems. It should be noted that the specific objectives address the critical aspects in each of the issues to be addressed. Identifying driving-related tasks is one of the primary aspects of the conceptual framework of driver modeling. Identifying maneuvers that a driver performs requires training beforehand a model with examples of each maneuver to be identified. To this end, a methodology was established to form a data set in which a relationship is established between the handling of the driving controls (steering wheel, pedals, gear lever, and turn indicators) and a series of adequately identified maneuvers. This methodology consisted of designing different driving scenarios in a realistic driving simulator for each type of maneuver, including stop, overtaking, turns, and specific maneuvers such as U-turn and three-point turn. From the perspective of detecting unfavorable cognitive states in the driver, stress can damage cognitive faculties, causing failures in the decision-making process. Physiological signals such as measurements derived from the heart rhythm or the change of electrical properties of the skin are reliable indicators when assessing whether a person is going through an episode of acute stress. However, the detection of stress patterns is still an open problem. Despite advances in sensor design for the non-invasive collection of physiological signals, certain factors prevent reaching models capable of detecting stress patterns in any subject. This thesis addresses two aspects of stress detection: the collection of physiological values during stress elicitation through laboratory techniques such as the Stroop effect and driving tests; and the detection of stress by designing a process flow based on unsupervised learning techniques, delving into the problems associated with the variability of intra- and inter-individual physiological measures that prevent the achievement of generalist models. Finally, in addition to developing models that address the different aspects of monitoring, the orchestration of monitoring systems and active safety systems is a transversal and essential aspect in improving safety, ergonomics, and driving experience. Both from the perspective of integration into test platforms and integration into final systems, the problem of deploying multiple active safety systems lies in the adoption of monolithic models where the system-specific functionality is run in isolation, without considering aspects such as cooperation and interoperability with other safety systems. This thesis addresses the problem of the development of more complex systems where monitoring systems condition the operability of multiple active safety systems. To this end, a mediation architecture is proposed to coordinate the reception and delivery of data flows generated by the various systems involved, including external sensors (lasers, external cameras), cabin sensors (cameras, smartwatches), detection models, deliberative models, delivery systems and machine-human communication interfaces. Ontology-based data modeling plays a crucial role in structuring all this information and consolidating the semantic representation of the driving scene, thus allowing the development of models based on data fusion.I would like to thank the Ministry of Economy and Competitiveness for granting me the predoctoral fellowship BES-2016-078143 corresponding to the project TRA2015-63708-R, which provided me the opportunity of conducting all my Ph. D activities, including completing an international internship.Programa de Doctorado en Ciencia y Tecnología Informática por la Universidad Carlos III de MadridPresidente: José María Armingol Moreno.- Secretario: Felipe Jiménez Alonso.- Vocal: Luis Mart

    Facial Expression Recognition Using Multiresolution Analysis

    Get PDF
    Facial expression recognition from images or videos attracts interest of research community owing to its applications in human-computer interaction and intelligent transportation systems. The expressions cause non-rigid motions of the face-muscles thereby changing the orientations of facial curves. Wavelets and Gabor wavelets have been used effectively for recognition of these oriented features. Although wavelets are the most popular multiresolution method, they have limited orientation-selectivity/directionality. Gabor wavelets are highly directional but they are not multiresolution methods in the true sense of the term. Proposed work is an effort to apply directional multiresolution representations like curvelets and contourlets to explore the multiresolution space in multiple ways for extracting effective facial features. Extensive comparisons between different multiresolution transforms and state of the art methods are provided to demonstrate the promise of the work. The problem of drowsiness detection, a special case of expression recognition, is also addressed using a proposed feature extraction method

    Affective and Implicit Tagging using Facial Expressions and Electroencephalography.

    Get PDF
    PhDRecent years have seen an explosion of user-generated, untagged multimedia data, generating a need for efficient search and retrieval of this data. The predominant method for content-based tagging is through manual annotation. Consequently, automatic tagging is currently the subject of intensive research. However, it is clear that the process will not be fully automated in the foreseeable future. We propose to involve the user and investigate methods for implicit tagging, wherein users' responses to the multimedia content are analysed in order to generate descriptive tags. We approach this problem through the modalities of facial expressions and EEG signals. We investigate tag validation and affective tagging using EEG signals. The former relies on the detection of event-related potentials triggered in response to the presentation of invalid tags alongside multimedia material. We demonstrate significant differences in users' EEG responses for valid versus invalid tags, and present results towards single-trial classification. For affective tagging, we propose methodologies to map EEG signals onto the valence-arousal space and perform both binary classification as well as regression into this space. We apply these methods in a real-time affective recommendation system. We also investigate the analysis of facial expressions for implicit tagging. This relies on a dynamic texture representation using non-rigid registration that we first evaluate on the problem of facial action unit recognition. We present results on well-known datasets (with both posed and spontaneous expressions) comparable to the state of the art in the field. Finally, we present a multi-modal approach that fuses both modalities for affective tagging. We perform classification in the valence-arousal space based on these modalities and present results for both feature-level and decision-level fusion. We demonstrate improvement in the results when using both modalities, suggesting the modalities contain complementary information
    corecore