3 research outputs found
Learning a Pose Lexicon for Semantic Action Recognition
This paper presents a novel method for learning a pose lexicon comprising
semantic poses defined by textual instructions and their associated visual
poses defined by visual features. The proposed method simultaneously takes two
input streams, semantic poses and visual pose candidates, and statistically
learns a mapping between them to construct the lexicon. With the learned
lexicon, action recognition can be cast as the problem of finding the maximum
translation probability of a sequence of semantic poses given a stream of
visual pose candidates. Experiments evaluating pre-trained and zero-shot action
recognition conducted on MSRC-12 gesture and WorkoutSu-10 exercise datasets
were used to verify the efficacy of the proposed method.Comment: Accepted by the 2016 IEEE International Conference on Multimedia and
Expo (ICME 2016). 6 pages paper and 4 pages supplementary materia
Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep cnn
This paper presents an image classification based approach for skeleton-based
video action recognition problem. Firstly, A dataset independent
translation-scale invariant image mapping method is proposed, which transformes
the skeleton videos to colour images, named skeleton-images. Secondly, A
multi-scale deep convolutional neural network (CNN) architecture is proposed
which could be built and fine-tuned on the powerful pre-trained CNNs, e.g.,
AlexNet, VGGNet, ResNet etal.. Even though the skeleton-images are very
different from natural images, the fine-tune strategy still works well. At
last, we prove that our method could also work well on 2D skeleton video data.
We achieve the state-of-the-art results on the popular benchmard datasets e.g.
NTU RGB+D, UTD-MHAD, MSRC-12, and G3D. Especially on the largest and challenge
NTU RGB+D, UTD-MHAD, and MSRC-12 dataset, our method outperforms other methods
by a large margion, which proves the efficacy of the proposed method
Image recognition of multi-perspective data for intelligent analysis of gestures and actions
The BERMUDA project started in January 2015 and was successfully completed after less than three years in August 2017. A technical set-up and an image processing and analysis software were developed to record and evaluate multi-perspective videos. Based on two cameras, positioned relatively far from one another with tilted axes, synchronized videos were recorded in the laboratory and in real life. The evaluation comprised the background elimination, the body part classification, the clustering, the assignment to persons and eventually the reconstruction of the skeletons. Based on the skeletons, machine learning techniques were developed to recognize the poses of the persons and next for the actions performed. It was, for example, possible to detect the action of a punch, which is relevant in security issues, with a precision of 51.3 % and a recall of 60.6 %.Das Projekt BERMUDA konnte im Januar 2015 begonnen und nach knapp drei Jahren im August 2017 erfolgreich abgeschlossen werden. Es wurden ein technischer Aufbau und eine Bildverarbeitungs- und Analysesoftware entwickelt, mit denen sich multiperspektivische Videos aufzeichnen und auswerten lassen. Basierend auf zwei in größerem Abstand gewinkelt positionierten Kameras wurden synchrone Videos sowohl im Labor als auch in realen Umgebungen aufgenommen. Die Auswertung umfasst die Hintergrundeliminierung, die Körperteilklassifikation, ein Clustering, die Zuordnung zu Personen und schließlich die Rekonstruktion der Skelette. Ausgehend von den Skeletten wurden Methoden des maschinellen Lernens zur Erkennung der Haltungen und darauf aufbauend zur Gestenerkennung entwickelt. Beispielhaft konnte die im Sicherheitskontext relevante Handlung des Schlagens mit einer Genauigkeit von 51,3 % und einer Trefferquote von 60,6 % erkannt werden