4 research outputs found

    Computational Methods for Measurement of Visual Attention from Videos towards Large-Scale Behavioral Analysis

    Get PDF
    Visual attention is one of the most important aspects of human social behavior, visual navigation, and interaction with the world, revealing information about their social, cognitive, and affective states. Although monitor-based and wearable eye trackers are widely available, they are not sufficient to support the large-scale collection of naturalistic gaze data in face-to-face social interactions or during interactions with 3D environments. Wearable eye trackers are burdensome to participants and bring issues of calibration, compliance, cost, and battery life. The ability to automatically measure attention from ordinary videos would deliver scalable, dense, and objective measurements to use in practice. This thesis investigates several computational methods to measure visual attention from videos using computer vision and its use for quantifying visual social cues such as eye contact and joint attention. Specifically, three methods are investigated. First, I present methods for detection of looks to camera in first-person view and its use for eye contact detection. Experimental results show that the presented method can achieve the first human expert-level detection of eye contact. Second, I develop a method for tracking heads in a 3d space for measuring attentional shifts. Lastly, I propose spatiotemporal deep neural networks for detecting time-varying attention targets in video and present its application for the detection of shared attention and joint attention. The method achieves state-of-the-art results on different benchmark datasets on attention measurement as well as the first empirical result on clinically-relevant gaze shift classification. Presented approaches have the benefit of linking gaze estimation to the broader tasks of action recognition and dynamic visual scene understanding, and bears potential as a useful tool for understanding attention in various contexts such as human social interactions, skill assessments, and human-robot interactions.Ph.D

    Multi-View Digital Representation Of Social Behaviours In Children And Action Recognition Methods

    Get PDF
    Autism spectrum disorders (ASD) affect at least 1% of children globally. It is partially defined by social behaviour delays in eye contact and joint attention during social interaction and with evidence of reduced heart rate variability (HRV) under static and social stress environments. Currently, no validated artificial intelligence or signal processing algorithms are available to objectively quantify behavioural and physiological markers in unrestricted interactive play environments to assist in the diagnosis of ASD. This thesis proposes that social behavioural and physiological markers of children with ASD can be objectively quantified through a synergistic digital approach from multi-modal and multi-view data sources. First, a novel deep learning (DL) framework for social behaviour recognition using a fusion of multi-view and multi-modal predictions is proposed. It utilises true-colour images and moving trajectory (optical flow) images extracted from fixed camera video recordings to detect eye contact between children and caregivers in free play while elucidating unique digital features of eye contact behaviour in multiple individual social interaction settings. Moreover, for the first time, a support vector machine model with feature selection is implemented along with statistical analysis, to identify effective facial features and facial orientations for use in identifying ASD during joint attention episodes in free play. Furthermore, a customised NeuroKit2 toolbox was validated using the opensource QT database and a clinical baseline social interaction task. This toolbox facilitates the automated extraction of HRV metrics and allows between-group comparisons in physiological markers. The work highlights the importance of developing explainable algorithms that objectively quantifying multi-modal digital markers. It offers the potential for the use of digitalised phenotypes to aid in the assessment of ASD and intervention in naturalistic social interaction
    corecore