16 research outputs found

    Image and Video-Based Autism Spectrum Disorder Detection via Deep Learning

    Get PDF
    People with Autism Spectrum Disorder (ASD) show atypical attention to social stimuli and aberrant gaze when viewing images of the physical world. However, it is unknown how they perceive the world from a first-person perspective. In this study, we used machine learning to classify photos taken in three different categories (people, indoors, and outdoors) as either having been taken by individuals with ASD or by peers without ASD. Our classifier effectively discriminated photos from all three categories but was particularly successful at classifying photos of people with \u3e80% accuracy. Importantly, the visualization of our model revealed critical features that led to successful discrimination and showed that our model adopted a strategy similar to that of ASD experts. Furthermore, for the first time, we showed that photos were taken by individuals with ASD contained less salient objects, especially in the central visual field. Notably, our model outperformed the classification of these photos by ASD experts. Together, we demonstrate an effective and novel method that is capable of discerning photos taken by individuals with ASD and revealing aberrant visual attention in ASD from a unique first-person perspective. Our method may in turn provide an objective measure for evaluations of individuals with ASD. People with ASD also show atypical behavior when they are doing the same action with peers without ASD. However, it is challenging to efficiently extract this feature from spatial and temporal information. In this study, we applied Graph Convolutional Network (GCN) to the 2D skeleton sequence to classify video recording the same action (brush teeth and wash face) as either from individuals with ASD or by peers without ASD. Furthermore, we adopted an adaptive graph mechanism that allows the model to learn a kernel flexibly and exclusively for each layer, which means the model can learn more useful and robust features. Our classifier can effectively reach80% accuracy. Our method may play an important role in the evaluations of individuals with ASD

    Face Image and Video Analysis in Biometrics and Health Applications

    Get PDF
    Computer Vision (CV) enables computers and systems to derive meaningful information from acquired visual inputs, such as images and videos, and make decisions based on the extracted information. Its goal is to acquire, process, analyze, and understand the information by developing a theoretical and algorithmic model. Biometrics are distinctive and measurable human characteristics used to label or describe individuals by combining computer vision with knowledge of human physiology (e.g., face, iris, fingerprint) and behavior (e.g., gait, gaze, voice). Face is one of the most informative biometric traits. Many studies have investigated the human face from the perspectives of various different disciplines, ranging from computer vision, deep learning, to neuroscience and biometrics. In this work, we analyze the face characteristics from digital images and videos in the areas of morphing attack and defense, and autism diagnosis. For face morphing attacks generation, we proposed a transformer based generative adversarial network to generate more visually realistic morphing attacks by combining different losses, such as face matching distance, facial landmark based loss, perceptual loss and pixel-wise mean square error. In face morphing attack detection study, we designed a fusion-based few-shot learning (FSL) method to learn discriminative features from face images for few-shot morphing attack detection (FS-MAD), and extend the current binary detection into multiclass classification, namely, few-shot morphing attack fingerprinting (FS-MAF). In the autism diagnosis study, we developed a discriminative few shot learning method to analyze hour-long video data and explored the fusion of facial dynamics for facial trait classification of autism spectrum disorder (ASD) in three severity levels. The results show outstanding performance of the proposed fusion-based few-shot framework on the dataset. Besides, we further explored the possibility of performing face micro- expression spotting and feature analysis on autism video data to classify ASD and control groups. The results indicate the effectiveness of subtle facial expression changes on autism diagnosis

    DEEP INFERENCE ON MULTI-SENSOR DATA

    Get PDF
    Computer vision-based intelligent autonomous systems engage various types of sensors to perceive the world they navigate in. Vision systems perceive their environments through inferences on entities (structures, humans) and their attributes (pose, shape, materials) that are sensed using RGB and Near-InfraRed (NIR) cameras, LAser Detection And Ranging (LADAR), radar and so on. This leads to challenging and interesting problems in efficient data-capture, feature extraction, and attribute estimation, not only for RGB but various other sensors. In some cases, we encounter very limited amounts of labeled training data. In certain other scenarios we have sufficient data, but annotations are unavailable for supervised learning. This dissertation explores two approaches to learning under conditions of minimal to no ground truth. The first approach applies projections on training data that make learning efficient by improving training dynamics. The first and second topics in this dissertation belong to this category. The second approach makes learning without ground-truth possible via knowledge transfer from a labeled source domain to an unlabeled target domain through projections to domain-invariant shared latent spaces. The third and fourth topics in this dissertation belong to this category. For the first topic we study the feasibility and efficacy of identifying shapes in LADAR data in several measurement modes. We present results on efficient parameter learning with less data (for both traditional machine learning as well as deep models) on LADAR images. We use a LADAR apparatus to obtain range information from a 3-D scene by emitting laser beams and collecting the reflected rays from target objects in the region of interest. The Agile Beam LADAR concept makes the measurement and interpretation process more efficient using a software-defined architecture that leverages computational imaging principles. Using these techniques, we show that object identification and scene understanding can be accurately performed in the LADARmeasurement domain thereby rendering the efforts of pixel-based scene reconstruction superfluous. Next, we explore the effectiveness of deep features extracted by Convolutional Neural Networks (CNNs) in the Discrete Cosine Transform (DCT) domain for various image classification tasks such as pedestrian and face detection, material identification and object recognition. We perform the DCT operation on the feature maps generated by convolutional layers in CNNs. We compare the performance of the same network with the same hyper-parameters with or without the DCT step. Our results indicate that a DCT operation incorporated into the network after the first convolution layer can have certain advantages such as convergence over fewer training epochs and sparser weight matrices that are more conducive to pruning and hashing techniques. Next, we present an adversarial deep domain adaptation (ADA)-based approach for training deep neural networks that fit 3Dmeshes on humans in monocular RGB input images. Estimating a 3D mesh from a 2D image is helpful in harvesting complete 3Dinformation about body pose and shape. However, learning such an estimation task in a supervised way is challenging owing to the fact that ground truth 3D mesh parameters for real humans do not exist. We propose a domain adaptation based single-shot (no re-projection, no iterative refinement), end-to-end training approach with joint optimization on real and synthetic images on a shared common task. Through joint inference on real and synthetic data, the network extracts domain invariant features that are further used to estimate the 3D mesh parameters in a single shot with no supervision on real samples. While we compute regression loss on synthetic samples with ground truth mesh parameters, knowledge is transferred from synthetic to real data through ADA without direct ground truth for supervision. Finally, we propose a partially supervised method for satellite image super-resolution by learning a unified representation of samples from different domains (captured by different sensors) in a shared latent space. The training samples are drawn from two datasets which we refer to as source and target domains. The source domain consists of fewer samples which are of higher resolution and contain very detailed and accurate annotations. In contrast, samples from the target domain are low-resolution and available ground truth is sparse. The pipeline consists of a feature extractor and a super-resolving module which are trained end-to-end. Using a deep feature extractor, we jointly learn (on two datasets) a common embedding space for all samples. Partial supervision is available for the samples in the source domain which have high-resolution ground truth. Adversarial supervision is used to successfully super-resolve low-resolution RGB satellite imagery from target domain without direct paired supervision from high resolution counterparts

    Textural features for fingerprint liveness detection

    Get PDF
    The main topic ofmy research during these three years concerned biometrics and in particular the Fingerprint Liveness Detection (FLD), namely the recognition of fake fingerprints. Fingerprints spoofing is a topical issue as evidenced by the release of the latest iPhone and Samsung Galaxy models with an embedded fingerprint reader as an alternative to passwords. Several videos posted on YouTube show how to violate these devices by using fake fingerprints which demonstrated how the problemof vulnerability to spoofing constitutes a threat to the existing fingerprint recognition systems. Despite the fact that many algorithms have been proposed so far, none of them showed the ability to clearly discriminate between real and fake fingertips. In my work, after a study of the state-of-the-art I paid a special attention on the so called textural algorithms. I first used the LBP (Local Binary Pattern) algorithm and then I worked on the introduction of the LPQ (Local Phase Quantization) and the BSIF (Binarized Statistical Image Features) algorithms in the FLD field. In the last two years I worked especially on what we called the “user specific” problem. In the extracted features we noticed the presence of characteristic related not only to the liveness but also to the different users. We have been able to improve the obtained results identifying and removing, at least partially, this user specific characteristic. Since 2009 the Department of Electrical and Electronic Engineering of the University of Cagliari and theDepartment of Electrical and Computer Engineering of the ClarksonUniversity have organized the Fingerprint Liveness Detection Competition (LivDet). I have been involved in the organization of both second and third editions of the Fingerprint Liveness Detection Competition (LivDet 2011 and LivDet 2013) and I am currently involved in the acquisition of live and fake fingerprint that will be inserted in three of the LivDet 2015 datasets

    Synthetic Aperture Radar (SAR) Meets Deep Learning

    Get PDF
    This reprint focuses on the application of the combination of synthetic aperture radars and depth learning technology. It aims to further promote the development of SAR image intelligent interpretation technology. A synthetic aperture radar (SAR) is an important active microwave imaging sensor, whose all-day and all-weather working capacity give it an important place in the remote sensing community. Since the United States launched the first SAR satellite, SAR has received much attention in the remote sensing community, e.g., in geological exploration, topographic mapping, disaster forecast, and traffic monitoring. It is valuable and meaningful, therefore, to study SAR-based remote sensing applications. In recent years, deep learning represented by convolution neural networks has promoted significant progress in the computer vision community, e.g., in face recognition, the driverless field and Internet of things (IoT). Deep learning can enable computational models with multiple processing layers to learn data representations with multiple-level abstractions. This can greatly improve the performance of various applications. This reprint provides a platform for researchers to handle the above significant challenges and present their innovative and cutting-edge research results when applying deep learning to SAR in various manuscript types, e.g., articles, letters, reviews and technical reports

    The effect of work related mechanical stress on the peripheral temperature of the hand

    Get PDF
    The evolution and developments in modern industry have resulted a wide range of occupational activities, some of which can lead to industrial injuries. Due to the activities of occupational medicine, much progress has been made in transforming the way that operatives perform their tasks. However there are still many occupations where manual tasks have become more repetitive, contributing to the development of conditions that affect the upper limbs. Repetitive Strain Injury is one classification of those conditions which is related to overuse of repetitive movement. Hand Arm Vibration Syndrome is a subtype of this classification directly related to the operation of instruments and machinery which involves vibration. These conditions affect a large number of individuals, and are costly in terms of work absence, loss of income and compensation. While such conditions can be difficult to avoid, they can be monitored and controlled, with prevention usually the least expensive solution. In medico-legal situations it may be difficult to determine the location or the degree of injury, and therefore determining the relevant compensation due is complicated by the absence of objective and quantifiable methods. This research is an investigation into the development of an objective, quantitative and reproducible diagnostic procedure for work related upper limb disorders. A set of objective mechanical provocation tests for the hands have been developed that are associated with vascular challenge. Infrared thermal imaging was used to monitor the temperature changes using a well defined capture protocol. Normal reference values have been measured and a computational tool used to facilitate the process and standardise image processing. These objective tests have demonstrated good discrimination between groups of healthy controls and subjects with work related injuries but not individuals, p<0.05, and are reproducible. A maximum value for thermal symmetry of 0.5±0.3ºC for the whole upper limbs has been established for use as a reference. The tests can be used to monitor occupations at risk, aiming to reduce the impact of these conditions, reducing work related injury costs, and providing early detection. In a medico-legal setting this can also provide important objective information in proof of injury and ultimately in objectively establishing whether or not there is a case for compensation

    Textural features for fingerprint liveness detection

    Get PDF
    The main topic ofmy research during these three years concerned biometrics and in particular the Fingerprint Liveness Detection (FLD), namely the recognition of fake fingerprints. Fingerprints spoofing is a topical issue as evidenced by the release of the latest iPhone and Samsung Galaxy models with an embedded fingerprint reader as an alternative to passwords. Several videos posted on YouTube show how to violate these devices by using fake fingerprints which demonstrated how the problemof vulnerability to spoofing constitutes a threat to the existing fingerprint recognition systems. Despite the fact that many algorithms have been proposed so far, none of them showed the ability to clearly discriminate between real and fake fingertips. In my work, after a study of the state-of-the-art I paid a special attention on the so called textural algorithms. I first used the LBP (Local Binary Pattern) algorithm and then I worked on the introduction of the LPQ (Local Phase Quantization) and the BSIF (Binarized Statistical Image Features) algorithms in the FLD field. In the last two years I worked especially on what we called the “user specific” problem. In the extracted features we noticed the presence of characteristic related not only to the liveness but also to the different users. We have been able to improve the obtained results identifying and removing, at least partially, this user specific characteristic. Since 2009 the Department of Electrical and Electronic Engineering of the University of Cagliari and theDepartment of Electrical and Computer Engineering of the ClarksonUniversity have organized the Fingerprint Liveness Detection Competition (LivDet). I have been involved in the organization of both second and third editions of the Fingerprint Liveness Detection Competition (LivDet 2011 and LivDet 2013) and I am currently involved in the acquisition of live and fake fingerprint that will be inserted in three of the LivDet 2015 datasets

    Deep Neural Networks and Data for Automated Driving

    Get PDF
    This open access book brings together the latest developments from industry and research on automated driving and artificial intelligence. Environment perception for highly automated driving heavily employs deep neural networks, facing many challenges. How much data do we need for training and testing? How to use synthetic data to save labeling costs for training? How do we increase robustness and decrease memory usage? For inevitably poor conditions: How do we know that the network is uncertain about its decisions? Can we understand a bit more about what actually happens inside neural networks? This leads to a very practical problem particularly for DNNs employed in automated driving: What are useful validation techniques and how about safety? This book unites the views from both academia and industry, where computer vision and machine learning meet environment perception for highly automated driving. Naturally, aspects of data, robustness, uncertainty quantification, and, last but not least, safety are at the core of it. This book is unique: In its first part, an extended survey of all the relevant aspects is provided. The second part contains the detailed technical elaboration of the various questions mentioned above
    corecore