55,175 research outputs found
Heterogeneous Multi-task Learning for Human Pose Estimation with Deep Convolutional Neural Network
We propose an heterogeneous multi-task learning framework for human pose
estimation from monocular image with deep convolutional neural network. In
particular, we simultaneously learn a pose-joint regressor and a sliding-window
body-part detector in a deep network architecture. We show that including the
body-part detection task helps to regularize the network, directing it to
converge to a good solution. We report competitive and state-of-art results on
several data sets. We also empirically show that the learned neurons in the
middle layer of our network are tuned to localized body parts
P-CNN: Pose-based CNN Features for Action Recognition
This work targets human action recognition in video. While recent methods
typically represent actions by statistics of local video features, here we
argue for the importance of a representation derived from human pose. To this
end we propose a new Pose-based Convolutional Neural Network descriptor (P-CNN)
for action recognition. The descriptor aggregates motion and appearance
information along tracks of human body parts. We investigate different schemes
of temporal aggregation and experiment with P-CNN features obtained both for
automatically estimated and manually annotated human poses. We evaluate our
method on the recent and challenging JHMDB and MPII Cooking datasets. For both
datasets our method shows consistent improvement over the state of the art.Comment: ICCV, December 2015, Santiago, Chil
Evaluation of Deep Learning based Pose Estimation for Sign Language Recognition
Human body pose estimation and hand detection are two important tasks for
systems that perform computer vision-based sign language recognition(SLR).
However, both tasks are challenging, especially when the input is color videos,
with no depth information. Many algorithms have been proposed in the literature
for these tasks, and some of the most successful recent algorithms are based on
deep learning. In this paper, we introduce a dataset for human pose estimation
for SLR domain. We evaluate the performance of two deep learning based pose
estimation methods, by performing user-independent experiments on our dataset.
We also perform transfer learning, and we obtain results that demonstrate that
transfer learning can improve pose estimation accuracy. The dataset and results
from these methods can create a useful baseline for future works
An original framework for understanding human actions and body language by using deep neural networks
The evolution of both fields of Computer Vision (CV) and Artificial Neural Networks (ANNs) has allowed the development of efficient automatic systems for the analysis of people's behaviour.
By studying hand movements it is possible to recognize gestures, often used by people to communicate information in a non-verbal way.
These gestures can also be used to control or interact with devices without physically touching them. In particular, sign language and semaphoric hand gestures are the two foremost areas of interest due to their importance in Human-Human Communication (HHC) and Human-Computer Interaction (HCI), respectively.
While the processing of body movements play a key role in the action recognition and affective computing fields. The former is essential to understand how people act in an environment, while the latter tries to interpret people's emotions based on their poses and movements;
both are essential tasks in many computer vision applications, including event recognition, and video surveillance.
In this Ph.D. thesis, an original framework for understanding Actions and body language is presented. The framework is composed of three main modules: in the first one, a Long Short Term Memory Recurrent Neural Networks (LSTM-RNNs) based method for the Recognition of Sign Language and Semaphoric Hand Gestures is proposed; the second module presents a solution based on 2D skeleton and two-branch stacked LSTM-RNNs for action recognition in video sequences; finally, in the last module, a solution for basic non-acted emotion recognition by using 3D skeleton and Deep Neural Networks (DNNs) is provided.
The performances of RNN-LSTMs are explored in depth, due to their ability to model the long term contextual information of temporal sequences, making them suitable for analysing body movements.
All the modules were tested by using challenging datasets, well known in the state of the art, showing remarkable results compared to the current literature methods
Person Recognition in Personal Photo Collections
Recognising persons in everyday photos presents major challenges (occluded
faces, different clothing, locations, etc.) for machine vision. We propose a
convnet based person recognition system on which we provide an in-depth
analysis of informativeness of different body cues, impact of training data,
and the common failure modes of the system. In addition, we discuss the
limitations of existing benchmarks and propose more challenging ones. Our
method is simple and is built on open source and open data, yet it improves the
state of the art results on a large dataset of social media photos (PIPA).Comment: Accepted to ICCV 2015, revise
- …