Search CORE

811 research outputs found

An original framework for understanding human actions and body language by using deep neural networks

Author: MASSARONI CRISTIANO
Publication venue
Publication date: 28/02/2020
Field of study

The evolution of both fields of Computer Vision (CV) and Artificial Neural Networks (ANNs) has allowed the development of efficient automatic systems for the analysis of people's behaviour. By studying hand movements it is possible to recognize gestures, often used by people to communicate information in a non-verbal way. These gestures can also be used to control or interact with devices without physically touching them. In particular, sign language and semaphoric hand gestures are the two foremost areas of interest due to their importance in Human-Human Communication (HHC) and Human-Computer Interaction (HCI), respectively. While the processing of body movements play a key role in the action recognition and affective computing fields. The former is essential to understand how people act in an environment, while the latter tries to interpret people's emotions based on their poses and movements; both are essential tasks in many computer vision applications, including event recognition, and video surveillance. In this Ph.D. thesis, an original framework for understanding Actions and body language is presented. The framework is composed of three main modules: in the first one, a Long Short Term Memory Recurrent Neural Networks (LSTM-RNNs) based method for the Recognition of Sign Language and Semaphoric Hand Gestures is proposed; the second module presents a solution based on 2D skeleton and two-branch stacked LSTM-RNNs for action recognition in video sequences; finally, in the last module, a solution for basic non-acted emotion recognition by using 3D skeleton and Deep Neural Networks (DNNs) is provided. The performances of RNN-LSTMs are explored in depth, due to their ability to model the long term contextual information of temporal sequences, making them suitable for analysing body movements. All the modules were tested by using challenging datasets, well known in the state of the art, showing remarkable results compared to the current literature methods

Archivio della ricerca- Università di Roma La Sapienza

Towards emotion recognition in immersive virtual environments:a method for facial emotion recognition

Author: Amara Kahina
Aouam Djamel
Djekoune Oualid
Guerroudji Mohamed Amine
Larbes Cherif
Ramzan Naeem
Zenati Nadia
Publication venue: CEUR Workshop Proceedings
Publication date: 25/05/2021
Field of study

Research Repository and Portal - University of the West of Scotland

Utilising the Intel RealSense camera for measuring health outcomes in clinical research

Author: A Baldominos
A Bandini
B Byrom
B Galna
Bill Byrom
Francesco Luke Siena
G Downing
H Sin
HA Alabbasi
J Chhor
L Luna-Oliva
MA Cidota
O Ferche
O Ťupa
Paul Watts
Philip Breedon
R Ortiz-Gutiérrez
S Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/02/2018
Field of study

Applications utilising 3D Camera technologies for the measurement of health outcomes in the health and wellness sector continues to expand. The Intel® RealSense™ is one of the leading 3D depth sensing cameras currently available on the market and aligns itself for use in many applications, including robotics, automation, and medical systems. One of the most prominent areas is the production of interactive solutions for rehabilitation which includes gait analysis and facial tracking. Advancements in depth camera technology has resulted in a noticeable increase in the integration of these technologies into portable platforms, suggesting significant future potential for pervasive in-clinic and field based health assessment solutions. This paper reviews the Intel RealSense technology’s technical capabilities and discusses its application to clinical research and includes examples where the Intel RealSense camera range has been used for the measurement of health outcomes. This review supports the use of the technology to develop robust, objective movement and mobility-based endpoints to enable accurate tracking of the effects of treatment interventions in clinical trials

Crossref

Nottingham Trent Institutional Repository (IRep)

Adaptive 3D facial action intensity estimation and emotion recognition

Author: Hossain Mohammed Alamgir
Zhang Li
Zhang Yang
Publication venue: 'Elsevier BV'
Publication date: 15/02/2015
Field of study

Automatic recognition of facial emotion has been widely studied for various computer vision tasks (e.g. health monitoring, driver state surveillance and personalized learning). Most existing facial emotion recognition systems, however, either have not fully considered subject-independent dynamic features or were limited to 2D models, thus are not robust enough for real-life recognition tasks with subject variation, head movement and illumination change. Moreover, there is also lack of systematic research on effective newly arrived novel emotion class detection. To address these challenges, we present a real-time 3D facial Action Unit (AU) intensity estimation and emotion recognition system. It automatically selects 16 motion-based facial feature sets using minimal-redundancy–maximal-relevance criterion based optimization and estimates the intensities of 16 diagnostic AUs using feedforward Neural Networks and Support Vector Regressors. We also propose a set of six novel adaptive ensemble classifiers for robust classification of the six basic emotions and the detection of newly arrived unseen novel emotion classes (emotions that are not included in the training set). A distance-based clustering and uncertainty measures of the base classifiers within each ensemble model are used to inform the novel class detection. Evaluated with the Bosphorus 3D database, the system has achieved the best performance of 0.071 overall Mean Squared Error (MSE) for AU intensity estimation using Support Vector Regressors, and 92.2% average accuracy for the recognition of the six basic emotions using the proposed ensemble classifiers. In comparison with other related work, our research outperforms other state-of-the-art research on 3D facial emotion recognition for the Bosphorus database. Moreover, in on-line real-time evaluation with real human subjects, the proposed system also shows superior real-time performance with 84% recognition accuracy and great flexibility and adaptation for newly arrived novel (e.g. ‘contempt’ which is not included in the six basic emotions) emotion detection

Northumbria Research Link

Crossref

Anglia Ruskin Research

Teeside University's Research Repository

Facial analysis with depth maps and deep learning

Author: Brito Paulo
Publication venue
Publication date: 20/12/2018
Field of study

Tese de Doutoramento em Ciência e Tecnologia Web em associação com a Universidade de Trás-os-Montes e Alto Douro, apresentada à Universidade AbertaA recolha e análise sequencial de dados multimodais do rosto humano é um problema importante em visão por computador, com aplicações variadas na análise e monitorização médica, entretenimento e segurança. No entanto, devido à natureza do problema, há uma falta de sistemas acessíveis e fáceis de usar, em tempo real, com capacidade de anotações, análise 3d, capacidade de reanalisar e com uma velocidade capaz de detetar padrões faciais em ambientes de trabalho. No âmbito de um esforço contínuo, para desenvolver ferramentas de apoio à monitorização e avaliação de emoções/sinais em ambiente de trabalho, será realizada uma investigação relativa à aplicabilidade de uma abordagem de análise facial para mapear e avaliar os padrões faciais humanos. O objetivo consiste em investigar um conjunto de sistemas e técnicas que possibilitem responder à questão de como usar dados de sensores multimodais para obter um sistema de classificação para identificar padrões faciais. Com isso em mente, foi planeado desenvolver ferramentas para implementar um sistema em tempo real de forma a reconhecer padrões faciais. O desafio é interpretar esses dados de sensores multimodais para classificá-los com algoritmos de aprendizagem profunda e cumprir os seguintes requisitos: capacidade de anotações, análise 3d e capacidade de reanalisar. Além disso, o sistema tem que ser capaze de melhorar continuamente o resultado do modelo de classificação para melhorar e avaliar diferentes padrões do rosto humano. A FACE ANALYSYS, uma ferramenta desenvolvida no contexto desta tese de doutoramento, será complementada por várias aplicações para investigar as relações de vários dados de sensores com estados emocionais/sinais. Este trabalho é útil para desenvolver um sistema de análise adequado para a perceção de grandes quantidades de dados comportamentais.Collecting and analyzing in real time multimodal sensor data of a human face is an important problem in computer vision, with applications in medical and monitoring analysis, entertainment, and security. However, due to the exigent nature of the problem, there is a lack of affordable and easy to use systems, with real time annotations capability, 3d analysis, replay capability and with a frame speed capable of detecting facial patterns in working behavior environments. In the context of an ongoing effort to develop tools to support the monitoring and evaluation of human affective state in working environments, this research will investigate the applicability of a facial analysis approach to map and evaluate human facial patterns. Our objective consists in investigating a set of systems and techniques that make it possible to answer the question regarding how to use multimodal sensor data to obtain a classification system in order to identify facial patterns. With that in mind, it will be developed tools to implement a real-time system in a way that it will be able to recognize facial patterns from 3d data. The challenge is to interpret this multi-modal sensor data to classify it with deep learning algorithms and fulfill the follow requirements: annotations capability, 3d analysis and replay capability. In addition, the system will be able to enhance continuously the output result of the system with a training process in order to improve and evaluate different patterns of the human face. FACE ANALYSYS is a tool developed in the context of this doctoral thesis, in order to research the relations of various sensor data with human facial affective state. This work is useful to develop an appropriate visualization system for better insight of a large amount of behavioral data.N/

Repositório Aberto da Universidade Aberta

A Framework for Students Profile Detection

Author: Calado Jorge Miguel da Silva
Publication venue
Publication date: 01/06/2017
Field of study

Some of the biggest problems tackling Higher Education Institutions are students’ drop-out and academic disengagement. Physical or psychological disabilities, social-economic or academic marginalization, and emotional and affective problems, are some of the factors that can lead to it. This problematic is worsened by the shortage of educational resources, that can bridge the communication gap between the faculty staff and the affective needs of these students. This dissertation focus in the development of a framework, capable of collecting analytic data, from an array of emotions, affects and behaviours, acquired either by human observations, like a teacher in a classroom or a psychologist, or by electronic sensors and automatic analysis software, such as eye tracking devices, emotion detection through facial expression recognition software, automatic gait and posture detection, and others. The framework establishes the guidance to compile the gathered data in an ontology, to enable the extraction of patterns outliers via machine learning, which assist the profiling of students in critical situations, like disengagement, attention deficit, drop-out, and other sociological issues. Consequently, it is possible to set real-time alerts when these profiles conditions are detected, so that appropriate experts could verify the situation and employ effective procedures. The goal is that, by providing insightful real-time cognitive data and facilitating the profiling of the students’ problems, a faster personalized response to help the student is enabled, allowing academic performance improvements

Repositório da Universidade Nova de Lisboa

Recommended from our members

Human Motion Anticipation and Recognition from RGB-D

Author: Barsoum Emad
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

Predicting and understanding the dynamic of human motion has many applications such as motion synthesis, augmented reality, security, education, reinforcement learning, autonomous vehicles, and many others. In this thesis, we create a novel end-to-end pipeline that can predict multiple future poses from the same input, and, in addition, can classify the entire sequence. Our focus is on the following two aspects of human motion understanding: Probabilistic human action prediction: Given a sequence of human poses as input, we sample multiple possible future poses from the same input sequence using a new GAN-based network. Human motion understanding: Given a sequence of human poses as input, we classify the actual action performed in the sequence and improve the classification performance using the presentation learned from the prediction network. We also demonstrate how to improve model training from noisy labels, using facial expression recognition as an example. More specifically, we have 10 taggers to label each input image, and compare four different approaches: majority voting, multi-label learning, probabilistic label drawing, and cross-entropy loss. We show that the traditional majority voting scheme does not perform as well as the last two approaches that fully leverage the label distribution. We shared the enhanced FER+ data set with multiple labels for each face image with the research community (https://github.com/Microsoft/FERPlus). For predicting and understanding of human motion, we propose a novel sequence-to-sequence model trained with an improved version of generative adversarial networks (GAN). Our model, which we call HP-GAN2, learns a probability density function of future human poses conditioned on previous poses. It predicts multiple sequences of possible future human poses, each from the same input sequence but seeded with a different vector z drawn from a random distribution. Moreover, to quantify the quality of the non-deterministic predictions, we simultaneously train a motion-quality-assessment model that learns the probability that a given skeleton pose sequence is a real or fake human motion. In order to classify the action performed in a video clip, we took two approaches. In the first approach, we train on a sequence of skeleton poses from scratch using random parameters initialization with the same network architecture used in the discriminator of the HP-GAN2 model. For the second approach, we use the discriminator of the HP-GAN2 network, extend it with an action classification branch, and fine tune the end-to-end model on the classification tasks, since the discriminator in HP-GAN2 learned to differentiate between fake and real human motion. So, our hypothesis is that if the discriminator network can differentiate between synthetic and real skeleton poses, then it also has learned some of the dynamics of a real human motion, and that those dynamics are useful in classification as well. We will show through multiple experiments that that is indeed the case. Therefore, our model learns to predict multiple future sequences of human poses from the same input sequence. We also show that the discriminator learns a general representation of human motion by using the learned features in an action recognition task. And we train a motion-quality-assessment network that measure the probability of a given sequence of poses are valid human poses or not. We test our model on two of the largest human pose datasets: NTURGB-D, and Human3.6M. We train on both single and multiple action types. The predictive power of our model for motion estimation is demonstrated by generating multiple plausible futures from the same input and showing the effect of each of the several loss functions in the ablation study. We also show the advantage of switching to GAN from WGAN-GP, which we used in our previous work. Furthermore, we show that it takes less than half the number of epochs to train an activity recognition network by using the features learned from the discriminator

Columbia University Academic Commons

Toward a flexible facial analysis framework in OpenISS for visual effects

Author: Shen Yiran
Publication venue
Publication date: 29/08/2019
Field of study

Facial analysis, including tasks such as face detection, facial landmark detection, and facial expression recognition, is a significant research domain in computer vision for visual effects. It can be used in various domains such as facial feature mapping for movie animation, biometrics/face recognition for security systems, and driver fatigue monitoring for transportation safety assistance. Most applications involve basic face and landmark detection as preliminary analysis approaches before proceeding into further specialized processing applications. As technology develops, there are plenty of implementations and resources for each task available for researchers, but the key missing properties among them all are fexibility and usability. The integration of functionality components involves complex configurations for each connection joint which is typically problematic with poor reusability and adjustability. The lack of support for integrating different functionality components greatly impact the research effort and cost for individual researchers, which also leads us to the idea of providing a framework solution that can help regarding the issue once and for all. To address this problem, we propose a user-friendly and highly expandable facial analysis framework solution. It contains a core that supports fundamental services for the framework, and a facial analysis module composed of implementations for facial analysis tasks. We evaluate our framework solution and achieve our goals of instantiating the facial analysis specialized framework, which essentially perform tasks in face detection, facial landmark detection, and facial expression recognition. This framework solution as a whole, solves the industry problem of lacking an execution platform for integrated facial analysis implementations and fills the gap in visual effects industry

Concordia University Research Repository