512 research outputs found
Face Image and Video Analysis in Biometrics and Health Applications
Computer Vision (CV) enables computers and systems to derive meaningful information from acquired visual inputs, such as images and videos, and make decisions based on the extracted information. Its goal is to acquire, process, analyze, and understand the information by developing a theoretical and algorithmic model. Biometrics are distinctive and measurable human characteristics used to label or describe individuals by combining computer vision with knowledge of human physiology (e.g., face, iris, fingerprint) and behavior (e.g., gait, gaze, voice). Face is one of the most informative biometric traits. Many studies have investigated the human face from the perspectives of various different disciplines, ranging from computer vision, deep learning, to neuroscience and biometrics. In this work, we analyze the face characteristics from digital images and videos in the areas of morphing attack and defense, and autism diagnosis. For face morphing attacks generation, we proposed a transformer based generative adversarial network to generate more visually realistic morphing attacks by combining different losses, such as face matching distance, facial landmark based loss, perceptual loss and pixel-wise mean square error. In face morphing attack detection study, we designed a fusion-based few-shot learning (FSL) method to learn discriminative features from face images for few-shot morphing attack detection (FS-MAD), and extend the current binary detection into multiclass classification, namely, few-shot morphing attack fingerprinting (FS-MAF). In the autism diagnosis study, we developed a discriminative few shot learning method to analyze hour-long video data and explored the fusion of facial dynamics for facial trait classification of autism spectrum disorder (ASD) in three severity levels. The results show outstanding performance of the proposed fusion-based few-shot framework on the dataset. Besides, we further explored the possibility of performing face micro- expression spotting and feature analysis on autism video data to classify ASD and control groups. The results indicate the effectiveness of subtle facial expression changes on autism diagnosis
Action-based Early Autism Diagnosis Using Contrastive Feature Learning
Autism, also known as Autism Spectrum Disorder (or ASD), is a neurological
disorder. Its main symptoms include difficulty in (verbal and/or non-verbal)
communication, and rigid/repetitive behavior. These symptoms are often
indistinguishable from a normal (control) individual, due to which this
disorder remains undiagnosed in early childhood leading to delayed treatment.
Since the learning curve is steep during the initial age, an early diagnosis of
autism could allow to take adequate interventions at the right time, which
might positively affect the growth of an autistic child. Further, the
traditional methods of autism diagnosis require multiple visits to a
specialized psychiatrist, however this process can be time-consuming. In this
paper, we present a learning based approach to automate autism diagnosis using
simple and small action video clips of subjects. This task is particularly
challenging because the amount of annotated data available is small, and the
variations among samples from the two categories (ASD and control) are
generally indistinguishable. This is also evident from poor performance of a
binary classifier learned using the cross-entropy loss on top of a baseline
encoder. To address this, we adopt contrastive feature learning in both self
supervised and supervised learning frameworks, and show that these can lead to
a significant increase in the prediction accuracy of a binary classifier on
this task. We further validate this by conducting thorough experimental
analyses under different set-ups on two publicly available datasets.Comment: This preprint has not undergone peer review (when applicable) or any
postsubmission improvements or corrections. The Version of Record of this
article is published in Multimedia Systems (2023), and is available online at
https://doi.org/10.1007/s00530-023-01132-
Video-based Behavior Understanding of Children for Objective Diagnosis of Autism
International audienceOne of the major diagnostic criteria for Autism Spectrum Disorder (ASD) is the recognition of stereotyped behaviors. However, it primarily relies on parental interviews and clinical observations, which result in a prolonged diagnosis cycle preventing ASD children from timely treatment. To help clinicians speed up the diagnosis process, we propose a computer-vision-based solution. First, we collected and annotated a novel dataset for action recognition tasks in videos of children with ASD in an uncontrolled environment. Second, we propose a multi-modality fusion network based on 3D CNNs. In the first stage of our method, we pre-process the RGB videos to get the ROI (child) using Yolov5 and DeepSORT algorithms. For optical flow extraction, we use the RAFT algorithm. In the second stage, we perform extensive experiments on different deep learning frameworks to propose a baseline. In the last stage, a multi-modality-based late fusion network is proposed to classify and evaluate performance of ASD children. The results revealed that the multi-modality fusion network achieves the best accuracy as compared to other methods. The baseline results also demonstrate the potential of an action-recognition-based system to assist clinicians in a reliable, accurate, and timely diagnosis of ASD disorder
Salient Object Detection Techniques in Computer Vision-A Survey.
Detection and localization of regions of images that attract immediate human visual attention is currently an intensive area of research in computer vision. The capability of automatic identification and segmentation of such salient image regions has immediate consequences for applications in the field of computer vision, computer graphics, and multimedia. A large number of salient object detection (SOD) methods have been devised to effectively mimic the capability of the human visual system to detect the salient regions in images. These methods can be broadly categorized into two categories based on their feature engineering mechanism: conventional or deep learning-based. In this survey, most of the influential advances in image-based SOD from both conventional as well as deep learning-based categories have been reviewed in detail. Relevant saliency modeling trends with key issues, core techniques, and the scope for future research work have been discussed in the context of difficulties often faced in salient object detection. Results are presented for various challenging cases for some large-scale public datasets. Different metrics considered for assessment of the performance of state-of-the-art salient object detection models are also covered. Some future directions for SOD are presented towards end
Deep Learning for Spatial and Temporal Video Localization
In this thesis, we propose to develop novel deep learning algorithms for the video localization tasks including spatio-temporal action localization, temporal action localization, spatio-temporal visual grounding, which require to localize the spatio-temporal or temporal locations of targets from videos.
First, we propose a new Progressive Cross-stream Cooperation (PCSC) framework for the spatio-temporal action localization task. The basic idea is to utilize both spatial region (resp., temporal segment proposals) and features from one stream (i.e., the Flow/RGB stream) to help another stream (i.e., the RGB/Flow stream) to iteratively generate better bounding boxes in the spatial domain (resp., temporal segments in the temporal domain). By first using our newly proposed PCSC framework for spatial localization and then applying our temporal PCSC framework for temporal localization, the action localization results are progressively improved.
Second, we propose a progressive cross-granularity cooperation (PCG-TAL) framework to effectively take advantage of complementarity between the anchor-based and frame-based paradigms, as well as between two-view clues (i.e., appearance and motion) for the temporal action
localization task. The whole framework can be learned in an end-to-end fashion, whilst the temporal action localization performance can be gradually boosted in a progressive manner.
Finally, we propose a two-step visual-linguistic transformer based framework called STVGBert for the spatio-temporal visual grounding task, which consists of a Spatial Visual Grounding network (SVG-net) and a Temporal Boundary Refinement network (TBR-net). Different from the existing works for the video grounding tasks, our proposed frame-work does not rely on any pre-trained object detector. For all our pro-posed approaches, we conduct extensive experiments on publicly avail-able datasets to demonstrate their effectiveness
WATCHING PEOPLE: ALGORITHMS TO STUDY HUMAN MOTION AND ACTIVITIES
Nowadays human motion analysis is one of the most active research topics in Computer Vision and it is receiving an increasing attention from both the industrial and scientific communities.
The growing interest in human motion analysis is motivated by the increasing number of promising applications, ranging from surveillance, human–computer interaction, virtual reality to healthcare, sports, computer games and video conferencing, just to name a few.
The aim of this thesis is to give an overview of the various tasks involved in visual motion analysis of the human body and to present the issues and possible solutions related to it.
In this thesis, visual motion analysis is categorized into three major areas related to the interpretation of human motion: tracking of human motion using virtual pan-tilt-zoom (vPTZ) camera, recognition of human motions and human behaviors segmentation.
In the field of human motion tracking, a virtual environment for PTZ cameras (vPTZ) is presented to overcame the mechanical limitations of PTZ cameras. The vPTZ is built on equirectangular images acquired by 360° cameras and it allows not only the development of pedestrian tracking algorithms but also the comparison of their performances. On the basis of this virtual environment, three novel pedestrian tracking algorithms for 360° cameras were developed, two of which adopt a tracking-by-detection approach while the last adopts a Bayesian approach.
The action recognition problem is addressed by an algorithm that represents actions in terms of multinomial distributions of frequent sequential patterns of different length. Frequent sequential patterns are series of data descriptors that occur many times in the data. The proposed method learns a codebook of frequent sequential patterns by means of an apriori-like algorithm. An action is then represented with a Bag-of-Frequent-Sequential-Patterns approach.
In the last part of this thesis a methodology to semi-automatically annotate behavioral data given a small set of manually annotated data is presented. The resulting methodology is not only effective in the semi-automated annotation task but can also be used in presence of abnormal behaviors, as demonstrated empirically by testing the system on data collected from children affected by neuro-developmental disorders
IDET: Iterative Difference-Enhanced Transformers for High-Quality Change Detection
Change detection (CD) aims to detect change regions within an image pair
captured at different times, playing a significant role for diverse real-world
applications. Nevertheless, most of existing works focus on designing advanced
network architectures to map the feature difference to the final change map
while ignoring the influence of the quality of the feature difference. In this
paper, we study the CD from a new perspective, i.e., how to optimize the
feature difference to highlight changes and suppress unchanged regions, and
propose a novel module denoted as iterative difference-enhanced transformers
(IDET). IDET contains three transformers: two transformers for extracting the
long-range information of the two images and one transformer for enhancing the
feature difference. In contrast to the previous transformers, the third
transformer takes the outputs of the first two transformers to guide the
enhancement of the feature difference iteratively. To achieve more effective
refinement, we further propose the multi-scale IDET-based change detection that
uses multi-scale representations of the images for multiple feature difference
refinements and proposes a coarse-to-fine fusion strategy to combine all
refinements. Our final CD method outperforms seven state-of-the-art methods on
six large-scale datasets under diverse application scenarios, which
demonstrates the importance of feature difference enhancements and the
effectiveness of IDET.Comment: conferenc
Recent Advances in Image Restoration with Applications to Real World Problems
In the past few decades, imaging hardware has improved tremendously in terms of resolution, making widespread usage of images in many diverse applications on Earth and planetary missions. However, practical issues associated with image acquisition are still affecting image quality. Some of these issues such as blurring, measurement noise, mosaicing artifacts, low spatial or spectral resolution, etc. can seriously affect the accuracy of the aforementioned applications. This book intends to provide the reader with a glimpse of the latest developments and recent advances in image restoration, which includes image super-resolution, image fusion to enhance spatial, spectral resolution, and temporal resolutions, and the generation of synthetic images using deep learning techniques. Some practical applications are also included
Biometric Systems
Because of the accelerating progress in biometrics research and the latest nation-state threats to security, this book's publication is not only timely but also much needed. This volume contains seventeen peer-reviewed chapters reporting the state of the art in biometrics research: security issues, signature verification, fingerprint identification, wrist vascular biometrics, ear detection, face detection and identification (including a new survey of face recognition), person re-identification, electrocardiogram (ECT) recognition, and several multi-modal systems. This book will be a valuable resource for graduate students, engineers, and researchers interested in understanding and investigating this important field of study
- …