327 research outputs found

    Deep Neural Network and Data Augmentation Methodology for off-axis iris segmentation in wearable headsets

    Full text link
    A data augmentation methodology is presented and applied to generate a large dataset of off-axis iris regions and train a low-complexity deep neural network. Although of low complexity the resulting network achieves a high level of accuracy in iris region segmentation for challenging off-axis eye-patches. Interestingly, this network is also shown to achieve high levels of performance for regular, frontal, segmentation of iris regions, comparing favorably with state-of-the-art techniques of significantly higher complexity. Due to its lower complexity, this network is well suited for deployment in embedded applications such as augmented and mixed reality headsets

    Computationally efficient deformable 3D object tracking with a monocular RGB camera

    Get PDF
    182 p.Monocular RGB cameras are present in most scopes and devices, including embedded environments like robots, cars and home automation. Most of these environments have in common a significant presence of human operators with whom the system has to interact. This context provides the motivation to use the captured monocular images to improve the understanding of the operator and the surrounding scene for more accurate results and applications.However, monocular images do not have depth information, which is a crucial element in understanding the 3D scene correctly. Estimating the three-dimensional information of an object in the scene using a single two-dimensional image is already a challenge. The challenge grows if the object is deformable (e.g., a human body or a human face) and there is a need to track its movements and interactions in the scene.Several methods attempt to solve this task, including modern regression methods based on Deep NeuralNetworks. However, despite the great results, most are computationally demanding and therefore unsuitable for several environments. Computational efficiency is a critical feature for computationally constrained setups like embedded or onboard systems present in robotics and automotive applications, among others.This study proposes computationally efficient methodologies to reconstruct and track three-dimensional deformable objects, such as human faces and human bodies, using a single monocular RGB camera. To model the deformability of faces and bodies, it considers two types of deformations: non-rigid deformations for face tracking, and rigid multi-body deformations for body pose tracking. Furthermore, it studies their performance on computationally restricted devices like smartphones and onboard systems used in the automotive industry. The information extracted from such devices gives valuable insight into human behaviour a crucial element in improving human-machine interaction.We tested the proposed approaches in different challenging application fields like onboard driver monitoring systems, human behaviour analysis from monocular videos, and human face tracking on embedded devices

    Computationally efficient deformable 3D object tracking with a monocular RGB camera

    Get PDF
    182 p.Monocular RGB cameras are present in most scopes and devices, including embedded environments like robots, cars and home automation. Most of these environments have in common a significant presence of human operators with whom the system has to interact. This context provides the motivation to use the captured monocular images to improve the understanding of the operator and the surrounding scene for more accurate results and applications.However, monocular images do not have depth information, which is a crucial element in understanding the 3D scene correctly. Estimating the three-dimensional information of an object in the scene using a single two-dimensional image is already a challenge. The challenge grows if the object is deformable (e.g., a human body or a human face) and there is a need to track its movements and interactions in the scene.Several methods attempt to solve this task, including modern regression methods based on Deep NeuralNetworks. However, despite the great results, most are computationally demanding and therefore unsuitable for several environments. Computational efficiency is a critical feature for computationally constrained setups like embedded or onboard systems present in robotics and automotive applications, among others.This study proposes computationally efficient methodologies to reconstruct and track three-dimensional deformable objects, such as human faces and human bodies, using a single monocular RGB camera. To model the deformability of faces and bodies, it considers two types of deformations: non-rigid deformations for face tracking, and rigid multi-body deformations for body pose tracking. Furthermore, it studies their performance on computationally restricted devices like smartphones and onboard systems used in the automotive industry. The information extracted from such devices gives valuable insight into human behaviour a crucial element in improving human-machine interaction.We tested the proposed approaches in different challenging application fields like onboard driver monitoring systems, human behaviour analysis from monocular videos, and human face tracking on embedded devices

    Estimation of the QoE for video streaming services based on facial expressions and gaze direction

    Get PDF
    As the multimedia technologies evolve, the need to control their quality becomes even more important making the Quality of Experience (QoE) measurements a key priority. Machine Learning (ML) can support this task providing models to analyse the information extracted by the multimedia. It is possible to divide the ML models applications in the following categories: 1) QoE modelling: ML is used to define QoE models which provide an output (e.g., perceived QoE score) for any given input (e.g., QoE influence factor). 2) QoE monitoring in case of encrypted traffic: ML is used to analyze passive traffic monitored data to obtain insight into degradations perceived by end-users. 3) Big data analytics: ML is used for the extraction of meaningful and useful information from the collected data, which can further be converted to actionable knowledge and utilized in managing QoE. The QoE estimation quality task can be carried out by using two approaches: the objective approach and subjective one. As the two names highlight, they are referred to the pieces of information that the model analyses. The objective approach analyses the objective features extracted by the network connection and by the used media. As objective parameters, the state-of-the-art shows different approaches that use also the features extracted by human behaviour. The subjective approach instead, comes as a result of the rating approach, where the participants were asked to rate the perceived quality using different scales. This approach had the problem of being a time-consuming approach and for this reason not all the users agree to compile the questionnaire. Thus the direct evolution of this approach is the ML model adoption. A model can substitute the questionnaire and evaluate the QoE, depending on the data that analyses. By modelling the human response to the perceived quality on multimedia, QoE researchers found that the parameters extracted from the users could be different, like Electroencephalogram (EEG), Electrocardiogram (ECG), waves of the brain. The main problem with these techniques is the hardware. In fact, the user must wear electrodes in case of ECG and EEG, and also if the obtained results from these methods are relevant, their usage in a real context could be not feasible. For this reason, my studies have been focused on the developing of a Machine Learning framework completely unobtrusively based on the Facial reactions

    Biometric Presentation Attack Detection for Mobile Devices Using Gaze Information

    Get PDF
    Facial recognition systems are among the most widely deployed in biometric applications. However, such systems are vulnerable to presentation attacks (spoofing), where a person tries to disguise as someone else by mimicking their biometric data and thereby gaining access to the system. Significant research attention has been directed toward developing robust strategies for detecting such attacks and thus assuring the security of these systems in real-world applications. This thesis is focused on presentation attack detection for face recognition systems using a gaze tracking approach. The proposed challenge-response presentation attack detection system assesses the gaze of the user in response to a randomly moving stimulus on the screen. The user is required to track the moving stimulus with their gaze with natural head/eye movements. If the response is adequately similar to the challenge, the access attempt is seen as genuine. The attack scenarios considered in this work included the use of hand held displayed photos, 2D masks, and 3D masks. Due to the nature of the proposed challenge-response approaches for presentation attack detection, none of the existing public databases were appropriate and a new database has been collected. The Kent Gaze Dynamics Database (KGDD) consists of 2,400 sets of genuine and attack-based presentation attempts collected from 80 participants. The use of a mobile device were simulated on a desktop PC for two possible geometries corresponding to mobile phone and tablet devices. Three different types of challenge trajectories were used in this data collection exercise. A number of novel gaze-based features were explored to develop the presentation attack detection algorithm. Initial experiments using the KGDD provided an encouraging indication of the potential of the proposed system for attack detection. In order to explore the feasibility of the scheme on a real hand held device, another database, the Mobile KGDD (MKGDD), was collected from 30 participants using a single mobile device (Google Nexus 6), to test the proposed features. Comprehensive experimental analysis has been performed on the two collected databases for each of the proposed features. Performance evaluation results indicate that the proposed gaze-based features are effective in discriminating between genuine and presentation attack attempts

    A Review and Analysis of Eye-Gaze Estimation Systems, Algorithms and Performance Evaluation Methods in Consumer Platforms

    Full text link
    In this paper a review is presented of the research on eye gaze estimation techniques and applications, that has progressed in diverse ways over the past two decades. Several generic eye gaze use-cases are identified: desktop, TV, head-mounted, automotive and handheld devices. Analysis of the literature leads to the identification of several platform specific factors that influence gaze tracking accuracy. A key outcome from this review is the realization of a need to develop standardized methodologies for performance evaluation of gaze tracking systems and achieve consistency in their specification and comparative evaluation. To address this need, the concept of a methodological framework for practical evaluation of different gaze tracking systems is proposed.Comment: 25 pages, 13 figures, Accepted for publication in IEEE Access in July 201

    Deep learning systems for estimating visual attention in robot-assisted therapy of children with autism and intellectual disability

    Get PDF
    Recent studies suggest that some children with autism prefer robots as tutors for improving their social interaction and communication abilities which are impaired due to their disorder. Indeed, research has focused on developing a very promising form of intervention named Robot-Assisted Therapy. This area of intervention poses many challenges, including the necessary flexibility and adaptability to real unconstrained therapeutic settings, which are different from the constrained lab settings where most of the technology is typically tested. Among the most common impairments of children with autism and intellectual disability is social attention, which includes difficulties in establishing the correct visual focus of attention. This article presents an investigation on the use of novel deep learning neural network architectures for automatically estimating if the child is focusing their visual attention on the robot during a therapy session, which is an indicator of their engagement. To study the application, the authors gathered data from a clinical experiment in an unconstrained setting, which provided low-resolution videos recorded by the robot camera during the child–robot interaction. Two deep learning approaches are implemented in several variants and compared with a standard algorithm for face detection to verify the feasibility of estimating the status of the child directly from the robot sensors without relying on bulky external settings, which can distress the child with autism. One of the proposed approaches demonstrated a very high accuracy and it can be used for off-line continuous assessment during the therapy or for autonomously adapting the intervention in future robots with better computational capabilities

    Eye-tracking assistive technologies for individuals with amyotrophic lateral sclerosis

    Get PDF
    Amyotrophic lateral sclerosis, also known as ALS, is a progressive nervous system disorder that affects nerve cells in the brain and spinal cord, resulting in the loss of muscle control. For individuals with ALS, where mobility is limited to the movement of the eyes, the use of eye-tracking-based applications can be applied to achieve some basic tasks with certain digital interfaces. This paper presents a review of existing eye-tracking software and hardware through which eye-tracking their application is sketched as an assistive technology to cope with ALS. Eye-tracking also provides a suitable alternative as control of game elements. Furthermore, artificial intelligence has been utilized to improve eye-tracking technology with significant improvement in calibration and accuracy. Gaps in literature are highlighted in the study to offer a direction for future research
    • …
    corecore