73 research outputs found

    Ubiquitous Technologies for Emotion Recognition

    Get PDF
    Emotions play a very important role in how we think and behave. As such, the emotions we feel every day can compel us to act and influence the decisions and plans we make about our lives. Being able to measure, analyze, and better comprehend how or why our emotions may change is thus of much relevance to understand human behavior and its consequences. Despite the great efforts made in the past in the study of human emotions, it is only now, with the advent of wearable, mobile, and ubiquitous technologies, that we can aim to sense and recognize emotions, continuously and in real time. This book brings together the latest experiences, findings, and developments regarding ubiquitous sensing, modeling, and the recognition of human emotions

    Behavior Monitoring Using Visual Data and Immersive Environments

    Get PDF
    University of Minnesota Ph.D. dissertation.August 2017. Major: Computer Science. Advisor: Nikolaos Papanikolopoulos. 1 computer file (PDF); viii, 99 pages.Mental health disorders are the leading cause of disability in the United States and Canada, accounting for 25 percent of all years of life lost to disability and premature mortality (Disability Adjusted Life Years or DALYs). Furthermore, in the United States alone, spending for mental disorder related care amounted to approximately $201 billion in 2013. Given these costs, significant effort has been spent on researching ways to mitigate the detrimental effects of mental illness. Commonly, observational studies are employed in research on mental disorders. However, observers must watch activities, either live or recorded, and then code the behavior. This process is often long and requires significant effort. Automating these kinds of labor intensive processes can allow these studies to be performed more effectively. This thesis presents efforts to use computer vision and modern interactive technologies to aid in the study of mental disorders. Motor stereotypies are a class of behavior known to co-occur in some patients diagnosed with autism spectrum disorders. Results are presented for activity classification in these behaviors. Behaviors in the context of environment, setup and task were also explored in relation to obsessive compulsive disorder (OCD). Cleaning compulsions are a known symptom of some persons with OCD. Techniques were created to automate coding of handwashing behavior as part of an OCD study to understand the difference between subjects of different diagnosis. Instrumenting the experiment and coding the videos was a limiting factor in this study. Varied and repeatable environments can be enabled through the use of virtual reality. An end-to-end platform was created to investigate this approach. This system allows the creation of immersive environments that are capable of eliciting symptoms. By controlling the stimulus presented and observing the reaction in a simulated system, new ways of assessment are developed. Evaluation was performed to measure the ability to monitor subject behavior and a protocol was established for the system's future use

    Visual and Camera Sensors

    Get PDF
    This book includes 13 papers published in Special Issue ("Visual and Camera Sensors") of the journal Sensors. The goal of this Special Issue was to invite high-quality, state-of-the-art research papers dealing with challenging issues in visual and camera sensors

    Image-set, Temporal and Spatiotemporal Representations of Videos for Recognizing, Localizing and Quantifying Actions

    Get PDF
    This dissertation addresses the problem of learning video representations, which is defined here as transforming the video so that its essential structure is made more visible or accessible for action recognition and quantification. In the literature, a video can be represented by a set of images, by modeling motion or temporal dynamics, and by a 3D graph with pixels as nodes. This dissertation contributes in proposing a set of models to localize, track, segment, recognize and assess actions such as (1) image-set models via aggregating subset features given by regularizing normalized CNNs, (2) image-set models via inter-frame principal recovery and sparsely coding residual actions, (3) temporally local models with spatially global motion estimated by robust feature matching and local motion estimated by action detection with motion model added, (4) spatiotemporal models 3D graph and 3D CNN to model time as a space dimension, (5) supervised hashing by jointly learning embedding and quantization, respectively. State-of-the-art performances are achieved for tasks such as quantifying facial pain and human diving. Primary conclusions of this dissertation are categorized as follows: (i) Image set can capture facial actions that are about collective representation; (ii) Sparse and low-rank representations can have the expression, identity and pose cues untangled and can be learned via an image-set model and also a linear model; (iii) Norm is related with recognizability; similarity metrics and loss functions matter; (v) Combining the MIL based boosting tracker with the Particle Filter motion model induces a good trade-off between the appearance similarity and motion consistence; (iv) Segmenting object locally makes it amenable to assign shape priors; it is feasible to learn knowledge such as shape priors online from Web data with weak supervision; (v) It works locally in both space and time to represent videos as 3D graphs; 3D CNNs work effectively when inputted with temporally meaningful clips; (vi) the rich labeled images or videos help to learn better hash functions after learning binary embedded codes than the random projections. In addition, models proposed for videos can be adapted to other sequential images such as volumetric medical images which are not included in this dissertation

    Learning discriminative features for human motion understanding

    Get PDF
    Human motion understanding has attracted considerable interest in recent research for its applications to video surveillance, content-based search and healthcare. With different capturing methods, human motion can be recorded in various forms (e.g. skeletal data, video, image, etc.). Compared to the 2D video and image, skeletal data recorded by motion capture device contains full 3D movement information. To begin with, we first look into a gait motion analysis problem based on 3D skeletal data. We propose an automatic framework for identifying musculoskeletal and neurological disorders among older people based on 3D skeletal motion data. In this framework, a feature selection strategy and two new gait features are proposed to choose an optimal feature set from the input features to optimise classification accuracy. Due to self-occlusion caused by single shooting angle, 2D video and image are not able to record full 3D geometric information. Therefore, viewpoint variation dramatically affects the performance on lots of 2D based applications (e.g. arbitrary view action recognition and image-based 3D human shape reconstruction). Leveraging view-invariance from the 3D model is a popular idea to improve the performance on 2D computer vision problems. Therefore, in the second contribution, we adopt 3D models built with computer graphics technology to assist in solving the problem of arbitrary view action recognition. As a solution, a new transfer dictionary learning framework that utilises computer graphics technologies to synthesise realistic 2D and 3D training videos is proposed, which can project a real-world 2D video into a view-invariant sparse representation. In the third contribution, 3D models are utilised to build an end-to-end 3D human shape reconstruction system, which can recover the 3D human shape from a single image without any prior parametric model. In contrast to most existing methods that calculate 3D joint locations, the method proposed in this thesis can produce a richer and more useful point cloud based representation. Synthesised high-quality 2D images and dense 3D point clouds are used to train a CNN-based encoder and 3D regression module. It can be concluded that the methods introduced in this thesis try to explore human motion understanding from 3D to 2D. We investigate how to compensate for the lack of full geometric information in 2D based applications with view-invariance learnt from 3D models

    Advances in video motion analysis research for mature and emerging application areas

    Get PDF

    Intelligent Sensors for Human Motion Analysis

    Get PDF
    The book, "Intelligent Sensors for Human Motion Analysis," contains 17 articles published in the Special Issue of the Sensors journal. These articles deal with many aspects related to the analysis of human movement. New techniques and methods for pose estimation, gait recognition, and fall detection have been proposed and verified. Some of them will trigger further research, and some may become the backbone of commercial systems

    3D-3D Deformable Registration and Deep Learning Segmentation based Neck Diseases Analysis in MRI

    Full text link
    Whiplash, cervical dystonia (CD), neck pain and work-related upper limb disorder (WRULD) are the most common diseases in the cervical region. Headaches, stiffness, sensory disturbance to the legs and arms, optical problems, aching in the back and shoulder, and auditory and visual problems are common symptoms seen in patients with these diseases. CD patients may also suffer tormenting spasticity in some neck muscles, with the symptoms possibly being acute and persisting for a long time, sometimes a lifetime. Whiplash-associated disorders (WADs) may occur due to sudden forward and backward movements of the head and neck occurring during a sporting activity or vehicle or domestic accident. These diseases affect private industries, insurance companies and governments, with the socio-economic costs significantly related to work absences, long-term sick leave, early disability and disability support pensions, health care expenses, reduced productivity and insurance claims. Therefore, diagnosing and treating neck-related diseases are important issues in clinical practice. The reason for these afflictions resulting from accident is the impairment of the cervical muscles which undergo atrophy or pseudo-hypertrophy due to fat infiltrating into them. These morphological changes have to be determined by identifying and quantifying their bio-markers before applying any medical intervention. Volumetric studies of neck muscles are reliable indicators of the proper treatments to apply. Radiation therapy, chemotherapy, injection of a toxin or surgery could be possible ways of treating these diseases. However, the dosages required should be precise because the neck region contains some sensitive organs, such as nerves, blood vessels and the trachea and spinal cord. Image registration and deep learning-based segmentation can help to determine appropriate treatments by analyzing the neck muscles. However, this is a challenging task for medical images due to complexities such as many muscles crossing multiple joints and attaching to many bones. Also, their shapes and sizes vary greatly across populations whereas their cross-sectional areas (CSAs) do not change in proportion to the heights and weights of individuals, with their sizes varying more significantly between males and females than ages. Therefore, the neck's anatomical variabilities are much greater than those of other parts of the human body. Some other challenges which make analyzing neck muscles very difficult are their compactness, similar gray-level appearances, intra-muscular fat, sliding due to cardiac and respiratory motions, false boundaries created by intramuscular fat, low resolution and contrast in medical images, noise, inhomogeneity and background clutter with the same composition and intensity. Furthermore, a patient's mode, position and neck movements during the capture of an image create variability. However, very little significant research work has been conducted on analyzing neck muscles. Although previous image registration efforts form a strong basis for many medical applications, none can satisfy the requirements of all of them because of the challenges associated with their implementation and low accuracy which could be due to anatomical complexities and variabilities or the artefacts of imaging devices. In existing methods, multi-resolution- and heuristic-based methods are popular. However, the above issues cause conventional multi-resolution-based registration methods to be trapped in local minima due to their low degrees of freedom in their geometrical transforms. Although heuristic-based methods are good at handling large mismatches, they require pre-segmentation and are computationally expensive. Also, current deformable methods often face statistical instability problems and many local optima when dealing with small mismatches. On the other hand, deep learning-based methods have achieved significant success over the last few years. Although a deeper network can learn more complex features and yields better performances, its depth cannot be increased as this would cause the gradient to vanish during training and result in training difficulties. Recently, researchers have focused on attention mechanisms for deep learning but current attention models face a challenge in the case of an application with compact and similar small multiple classes, large variability, low contrast and noise. The focus of this dissertation is on the design of 3D-3D image registration approaches as well as deep learning-based semantic segmentation methods for analyzing neck muscles. In the first part of this thesis, a novel object-constrained hierarchical registration framework for aligning inter-subject neck muscles is proposed. Firstly, to handle large-scale local minima, it uses a coarse registration technique which optimizes a new edge position difference (EPD) similarity measure to align large mismatches. Also, a new transformation based on the discrete periodic spline wavelet (DPSW), affine and free-form-deformation (FFD) transformations are exploited. Secondly, to avoid the monotonous nature of using transformations in multiple stages, affine registration technique, which uses a double-pushing system by changing the edges in the EPD and switching the transformation's resolutions, is designed to align small mismatches. The EPD helps in both the coarse and fine techniques to implement object-constrained registration via controlling edges which is not possible using traditional similarity measures. Experiments are performed on clinical 3D magnetic resonance imaging (MRI) scans of the neck, with the results showing that the EPD is more effective than the mutual information (MI) and the sum of squared difference (SSD) measures in terms of the volumetric dice similarity coefficient (DSC). Also, the proposed method is compared with two state-of-the-art approaches with ablation studies of inter-subject deformable registration and achieves better accuracy, robustness and consistency. However, as this method is computationally complex and has a problem handling large-scale anatomical variabilities, another 3D-3D registration framework with two novel contributions is proposed in the second part of this thesis. Firstly, a two-stage heuristic search optimization technique for handling large mismatches,which uses a minimal user hypothesis regarding these mismatches and is computationally fast, is introduced. It brings a moving image hierarchically closer to a fixed one using MI and EPD similarity measures in the coarse and fine stages, respectively, while the images do not require pre-segmentation as is necessary in traditional heuristic optimization-based techniques. Secondly, a region of interest (ROI) EPD-based registration framework for handling small mismatches using salient anatomical information (AI), in which a convex objective function is formed through a unique shape created from the desired objects in the ROI, is proposed. It is compared with two state-of-the-art methods on a neck dataset, with the results showing that it is superior in terms of accuracy and is computationally fast. In the last part of this thesis, an evaluation study of recent U-Net-based convolutional neural networks (CNNs) is performed on a neck dataset. It comprises 6 recent models, the U-Net, U-Net with a conditional random field (CRF-Unet), attention U-Net (A-Unet), nested U-Net or U-Net++, multi-feature pyramid (MFP)-Unet and recurrent residual U-Net (R2Unet) and 4 with more comprehensive modifications, the multi-scale U-Net (MS-Unet), parallel multi-scale U-Net (PMSUnet), recurrent residual attention U-Net (R2A-Unet) and R2A-Unet++ in neck muscles segmentation, with analyses of the numerical results indicating that the R2Unet architecture achieves the best accuracy. Also, two deep learning-based semantic segmentation approaches are proposed. In the first, a new two-stage U-Net++ (TS-UNet++) uses two different types of deep CNNs (DCNNs) rather than one similar to the traditional multi-stage method, with the U-Net++ in the first stage and the U-Net in the second. More convolutional blocks are added after the input and before the output layers of the multi-stage approach to better extract the low- and high-level features. A new concatenation-based fusion structure, which is incorporated in the architecture to allow deep supervision, helps to increase the depth of the network without accelerating the gradient-vanishing problem. Then, more convolutional layers are added after each concatenation of the fusion structure to extract more representative features. The proposed network is compared with the U-Net, U-Net++ and two-stage U-Net (TS-UNet) on the neck dataset, with the results indicating that it outperforms the others. In the second approach, an explicit attention method, in which the attention is performed through a ROI evolved from ground truth via dilation, is proposed. It does not require any additional CNN, as does a cascaded approach, to localize the ROI. Attention in a CNN is sensitive with respect to the area of the ROI. This dilated ROI is more capable of capturing relevant regions and suppressing irrelevant ones than a bounding box and region-level coarse annotation, and is used during training of any CNN. Coarse annotation, which does not require any detailed pixel wise delineation that can be performed by any novice person, is used during testing. This proposed ROI-based attention method, which can handle compact and similar small multiple classes with objects with large variabilities, is compared with the automatic A-Unet and U-Net, and performs best
    • …
    corecore