1,529 research outputs found
An original framework for understanding human actions and body language by using deep neural networks
The evolution of both fields of Computer Vision (CV) and Artificial Neural Networks (ANNs) has allowed the development of efficient automatic systems for the analysis of people's behaviour.
By studying hand movements it is possible to recognize gestures, often used by people to communicate information in a non-verbal way.
These gestures can also be used to control or interact with devices without physically touching them. In particular, sign language and semaphoric hand gestures are the two foremost areas of interest due to their importance in Human-Human Communication (HHC) and Human-Computer Interaction (HCI), respectively.
While the processing of body movements play a key role in the action recognition and affective computing fields. The former is essential to understand how people act in an environment, while the latter tries to interpret people's emotions based on their poses and movements;
both are essential tasks in many computer vision applications, including event recognition, and video surveillance.
In this Ph.D. thesis, an original framework for understanding Actions and body language is presented. The framework is composed of three main modules: in the first one, a Long Short Term Memory Recurrent Neural Networks (LSTM-RNNs) based method for the Recognition of Sign Language and Semaphoric Hand Gestures is proposed; the second module presents a solution based on 2D skeleton and two-branch stacked LSTM-RNNs for action recognition in video sequences; finally, in the last module, a solution for basic non-acted emotion recognition by using 3D skeleton and Deep Neural Networks (DNNs) is provided.
The performances of RNN-LSTMs are explored in depth, due to their ability to model the long term contextual information of temporal sequences, making them suitable for analysing body movements.
All the modules were tested by using challenging datasets, well known in the state of the art, showing remarkable results compared to the current literature methods
Driver Behavior Recognition via Interwoven Deep Convolutional Neural Nets With Multi-Stream Inputs
Recognizing driver behaviors is becoming vital for in-vehicle systems that
seek to reduce the incidence of car accidents rooted in cognitive distraction.
In this paper, we harness the exceptional feature extraction abilities of deep
learning and propose a dedicated Interwoven Deep Convolutional Neural Network
(InterCNN) architecture to tackle the accurate classification of driver
behaviors in real-time. The proposed solution exploits information from
multi-stream inputs, i.e., in-vehicle cameras with different fields of view and
optical flows computed based on recorded images, and merges through multiple
fusion layers abstract features that it extracts. This builds a tight
ensembling system, which significantly improves the robustness of the model. We
further introduce a temporal voting scheme based on historical inference
instances, in order to enhance accuracy. Experiments conducted with a real
world dataset that we collect in a mock-up car environment demonstrate that the
proposed InterCNN with MobileNet convolutional blocks can classify 9 different
behaviors with 73.97% accuracy, and 5 aggregated behaviors with 81.66%
accuracy. Our architecture is highly computationally efficient, as it performs
inferences within 15ms, which satisfies the real-time constraints of
intelligent cars. In addition, our InterCNN is robust to lossy input, as the
classification remains accurate when two input streams are occluded
From Unimodal to Multimodal: improving the sEMG-Based Pattern Recognition via deep generative models
Multimodal hand gesture recognition (HGR) systems can achieve higher
recognition accuracy. However, acquiring multimodal gesture recognition data
typically requires users to wear additional sensors, thereby increasing
hardware costs. This paper proposes a novel generative approach to improve
Surface Electromyography (sEMG)-based HGR accuracy via virtual Inertial
Measurement Unit (IMU) signals. Specifically, we trained a deep generative
model based on the intrinsic correlation between forearm sEMG signals and
forearm IMU signals to generate virtual forearm IMU signals from the input
forearm sEMG signals at first. Subsequently, the sEMG signals and virtual IMU
signals were fed into a multimodal Convolutional Neural Network (CNN) model for
gesture recognition. To evaluate the performance of the proposed approach, we
conducted experiments on 6 databases, including 5 publicly available databases
and our collected database comprising 28 subjects performing 38 gestures,
containing both sEMG and IMU data. The results show that our proposed approach
outperforms the sEMG-based unimodal HGR method (with increases of
2.15%-13.10%). It demonstrates that incorporating virtual IMU signals,
generated by deep generative models, can significantly enhance the accuracy of
sEMG-based HGR. The proposed approach represents a successful attempt to
transition from unimodal HGR to multimodal HGR without additional sensor
hardware
Deep Learning on Facial Expression Detection : Artificial Neural Network Model Implementation
The moods, emotions, and even medical issues of a person can frequently be seen directly reflected in their facial expressions. The fields of social science and human-computer interaction have recently begun to pay more attention to facial emotion detection as a result of this. The primary focus of this study is on the automatic recognition of human facial expressions using an artificial neural network (ANN) model and a technique based on straightforward convolution. The dataset utilized is a self-mined dataset that was obtained by utilizing the web scraping approach on Google Image with the help of the Selenium package for Python. A dataset containing six categories of fundamental human expressions that are likely to be met on a daily basis, namely anger, confusion, contempt, crying, sadness, disgust, and happiness, with a total of 6,016 photos being used. The goal of this research is to determine how accurate the model of artificial neural networks can be in predicting
Learning efficient haptic shape exploration with a rigid tactile sensor array
Haptic exploration is a key skill for both robots and humans to discriminate
and handle unknown objects or to recognize familiar objects. Its active nature
is evident in humans who from early on reliably acquire sophisticated
sensory-motor capabilities for active exploratory touch and directed manual
exploration that associates surfaces and object properties with their spatial
locations. This is in stark contrast to robotics. In this field, the relative
lack of good real-world interaction models - along with very restricted sensors
and a scarcity of suitable training data to leverage machine learning methods -
has so far rendered haptic exploration a largely underdeveloped skill. In the
present work, we connect recent advances in recurrent models of visual
attention with previous insights about the organisation of human haptic search
behavior, exploratory procedures and haptic glances for a novel architecture
that learns a generative model of haptic exploration in a simulated
three-dimensional environment. The proposed algorithm simultaneously optimizes
main perception-action loop components: feature extraction, integration of
features over time, and the control strategy, while continuously acquiring data
online. We perform a multi-module neural network training, including a feature
extractor and a recurrent neural network module aiding pose control for storing
and combining sequential sensory data. The resulting haptic meta-controller for
the rigid tactile sensor array moving in a physics-driven
simulation environment, called the Haptic Attention Model, performs a sequence
of haptic glances, and outputs corresponding force measurements. The resulting
method has been successfully tested with four different objects. It achieved
results close to while performing object contour exploration that has
been optimized for its own sensor morphology
Human-Centric Machine Vision
Recently, the algorithms for the processing of the visual information have greatly evolved, providing efficient and effective solutions to cope with the variability and the complexity of real-world environments. These achievements yield to the development of Machine Vision systems that overcome the typical industrial applications, where the environments are controlled and the tasks are very specific, towards the use of innovative solutions to face with everyday needs of people. The Human-Centric Machine Vision can help to solve the problems raised by the needs of our society, e.g. security and safety, health care, medical imaging, and human machine interface. In such applications it is necessary to handle changing, unpredictable and complex situations, and to take care of the presence of humans
- …