Search CORE

213 research outputs found

ModDrop: adaptive multi-modal gesture recognition

Author: Nebout Florian
Neverova Natalia
Taylor Graham W.
Wolf Christian
Publication venue
Publication date: 06/06/2015
Field of study

We present a method for gesture detection and localisation based on multi-scale and multi-modal deep learning. Each visual modality captures spatial information at a particular spatial scale (such as motion of the upper body or a hand), and the whole system operates at three temporal scales. Key to our technique is a training strategy which exploits: i) careful initialization of individual modalities; and ii) gradual fusion involving random dropping of separate channels (dubbed ModDrop) for learning cross-modality correlations while preserving uniqueness of each modality-specific representation. We present experiments on the ChaLearn 2014 Looking at People Challenge gesture recognition track, in which we placed first out of 17 teams. Fusing multiple modalities at several spatial and temporal scales leads to a significant increase in recognition rates, allowing the model to compensate for errors of the individual classifiers as well as noise in the separate channels. Futhermore, the proposed ModDrop training technique ensures robustness of the classifier to missing signals in one or several channels to produce meaningful predictions from any number of available modalities. In addition, we demonstrate the applicability of the proposed fusion scheme to modalities of arbitrary nature by experiments on the same dataset augmented with audio.Comment: 14 pages, 7 figure

arXiv.org e-Print Archive

HAL

Hal-Diderot

A preliminary study of micro-gestures:dataset collection and analysis with multi-modal dynamic networks

Author: Haoyu C. (Chen)
Publication venue: University of Oulu
Publication date: 31/05/2017
Field of study

Abstract. Micro-gestures (MG) are gestures that people performed spontaneously during communication situations. A preliminary exploration of Micro-Gesture is made in this thesis. By collecting recorded sequences of body gestures in a spontaneous state during games, a MG dataset is built through Kinect V2. A novel term ‘micro-gesture’ is proposed by analyzing the properties of MG dataset. Implementations of two sets of neural network architectures are achieved for micro-gestures segmentation and recognition task, which are the DBN-HMM model and the 3DCNN-HMM model for skeleton data and RGB-D data respectively. We also explore a method for extracting neutral states used in the HMM structure by detecting the activity level of the gesture sequences. The method is simple to derive and implement, and proved to be effective. The DBN-HMM and 3DCNN-HMM architectures are evaluated on MG dataset and optimized for the properties of micro-gestures. Experimental results show that we are able to achieve micro-gesture segmentation and recognition with satisfied accuracy with these two models. The work we have done about the micro-gestures in this thesis also explores a new research path for gesture recognition. Therefore, we believe that our work could be widely used as a baseline for future research on micro-gestures

University of Oulu Repository - Jultika

Deep Dynamic Neural Networks for Multimodal Gesture Segmentation and Recognition

Author: Dambre Joni
Kindermans Pieter-Jan
Le Nam Do-Hoang
Odobez Jean-Marc
Pigou Lionel
Shao Ling
Wu Di
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

This paper describes a novel method called Deep Dynamic Neural Networks (DDNN) for multimodal gesture recognition. A semi-supervised hierarchical dynamic framework based on a Hidden Markov Model (HMM) is proposed for simultaneous gesture segmentation and recognition where skeleton joint information, depth and RGB images, are the multimodal input observations. Unlike most traditional approaches that rely on the construction of complex handcrafted features, our approach learns high-level spatiotemporal representations using deep neural networks suited to the input modality: a Gaussian-Bernouilli Deep Belief Network (DBN) to handle skeletal dynamics, and a 3D Convolutional Neural Network (3DCNN) to manage and fuse batches of depth and RGB images. This is achieved through the modeling and learning of the emission probabilities of the HMM required to infer the gesture sequence. This purely data driven approach achieves a Jaccard index score of 0.81 in the ChaLearn LAP gesture spotting challenge. The performance is on par with a variety of state-of-the-art hand-tuned feature-based approaches and other learning-based methods, therefore opening the door to the use of deep learning techniques in order to further explore multimodal time series data

Infoscience - École polytechnique fédérale de Lausanne

Northumbria Research Link

Ghent University Academic Bibliography

University of East Anglia digital repository

Gesture and sign language recognition with deep learning

Author: Pigou Lionel
Publication venue: Ghent University. Faculty of Engineering and Architecture
Publication date: 01/01/2018
Field of study

Ghent University Academic Bibliography

RoboREIT: an Interactive Robotic Tutor with Instructive Feedback Component for Requirements Elicitation Interview Training

Author: Aydemir Fatma Başak
Görer Binnur
Publication venue
Publication date: 15/04/2023
Field of study

[Context] Interviewing stakeholders is the most popular requirements elicitation technique among multiple methods. The success of an interview depends on the collaboration of the interviewee which can be fostered through the interviewer's preparedness and communication skills. Mastering these skills requires experience and practicing interviews. [Problem] Practical training is resource-heavy as it calls for the time and effort of a stakeholder for each student which may not be feasible for a large number of students. [Method] To address this scalability problem, this paper proposes RoboREIT, an interactive Robotic tutor for Requirements Elicitation Interview Training. The humanoid robotic component of RoboREIT responds to the questions of the interviewer, which the interviewer chooses from a set of predefined alternatives for a particular scenario. After the interview session, RoboREIT provides contextual feedback to the interviewer on their performance and allows the student to inspect their mistakes. RoboREIT is extensible with various scenarios. [Results] We performed an exploratory user study to evaluate RoboREIT and demonstrate its applicability in requirements elicitation interview training. The quantitative and qualitative analyses of the users' responses reveal the appreciation of RoboREIT and provide further suggestions about how to improve it. [Contribution] Our study is the first in the literature that utilizes a social robot in requirements elicitation interview education. RoboREIT's innovative design incorporates replaying faulty interview stages and allows the student to learn from mistakes by a second time practicing. All participants praised the feedback component, which is not present in the state-of-the-art, for being helpful in identifying the mistakes. A favorable response rate of 81% for the system's usefulness indicates the positive perception of the participants.Comment: Author submitted manuscrip

arXiv.org e-Print Archive

Advances in the neurocognition of music and language

Author
Publication venue: 'MDPI AG'
Publication date: 01/09/2020
Field of study

MPG.PuRe

Face Image and Video Analysis in Biometrics and Health Applications

Author: Zhang Na
Publication venue: 'West Virginia University Libraries'
Publication date: 01/01/2023
Field of study

Computer Vision (CV) enables computers and systems to derive meaningful information from acquired visual inputs, such as images and videos, and make decisions based on the extracted information. Its goal is to acquire, process, analyze, and understand the information by developing a theoretical and algorithmic model. Biometrics are distinctive and measurable human characteristics used to label or describe individuals by combining computer vision with knowledge of human physiology (e.g., face, iris, fingerprint) and behavior (e.g., gait, gaze, voice). Face is one of the most informative biometric traits. Many studies have investigated the human face from the perspectives of various different disciplines, ranging from computer vision, deep learning, to neuroscience and biometrics. In this work, we analyze the face characteristics from digital images and videos in the areas of morphing attack and defense, and autism diagnosis. For face morphing attacks generation, we proposed a transformer based generative adversarial network to generate more visually realistic morphing attacks by combining different losses, such as face matching distance, facial landmark based loss, perceptual loss and pixel-wise mean square error. In face morphing attack detection study, we designed a fusion-based few-shot learning (FSL) method to learn discriminative features from face images for few-shot morphing attack detection (FS-MAD), and extend the current binary detection into multiclass classification, namely, few-shot morphing attack fingerprinting (FS-MAF). In the autism diagnosis study, we developed a discriminative few shot learning method to analyze hour-long video data and explored the fusion of facial dynamics for facial trait classification of autism spectrum disorder (ASD) in three severity levels. The results show outstanding performance of the proposed fusion-based few-shot framework on the dataset. Besides, we further explored the possibility of performing face micro- expression spotting and feature analysis on autism video data to classify ASD and control groups. The results indicate the effectiveness of subtle facial expression changes on autism diagnosis

The Research Repository @ WVU (West Virginia University)

Proceedings of the Sixteenth Australasian International Conference on Speech Science and Technology

Author
Publication venue: ASSTA
Publication date: 31/12/2016
Field of study

UCL Discovery