Search CORE

3 research outputs found

Learning a Pose Lexicon for Semantic Action Recognition

Author: Li Wanqing
Ogunbona Philip
Zhou Lijuan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

This paper presents a novel method for learning a pose lexicon comprising semantic poses defined by textual instructions and their associated visual poses defined by visual features. The proposed method simultaneously takes two input streams, semantic poses and visual pose candidates, and statistically learns a mapping between them to construct the lexicon. With the learned lexicon, action recognition can be cast as the problem of finding the maximum translation probability of a sequence of semantic poses given a stream of visual pose candidates. Experiments evaluating pre-trained and zero-shot action recognition conducted on MSRC-12 gesture and WorkoutSu-10 exercise datasets were used to verify the efficacy of the proposed method.Comment: Accepted by the 2016 IEEE International Conference on Multimedia and Expo (ICME 2016). 6 pages paper and 4 pages supplementary materia

arXiv.org e-Print Archive

Crossref

Research Online

Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep cnn

Author: Chen Yucheng
Cheng Xuelian
Dai Yuchao
He Mingyi
Li Bo
Publication venue
Publication date: 12/06/2017
Field of study

This paper presents an image classification based approach for skeleton-based video action recognition problem. Firstly, A dataset independent translation-scale invariant image mapping method is proposed, which transformes the skeleton videos to colour images, named skeleton-images. Secondly, A multi-scale deep convolutional neural network (CNN) architecture is proposed which could be built and fine-tuned on the powerful pre-trained CNNs, e.g., AlexNet, VGGNet, ResNet etal.. Even though the skeleton-images are very different from natural images, the fine-tune strategy still works well. At last, we prove that our method could also work well on 2D skeleton video data. We achieve the state-of-the-art results on the popular benchmard datasets e.g. NTU RGB+D, UTD-MHAD, MSRC-12, and G3D. Especially on the largest and challenge NTU RGB+D, UTD-MHAD, and MSRC-12 dataset, our method outperforms other methods by a large margion, which proves the efficacy of the proposed method

arXiv.org e-Print Archive

Crossref

Image recognition of multi-perspective data for intelligent analysis of gestures and actions

Author: Fechner Pascal
Fiebelkorn Richard
Gedat Egbert
Vandenhouten Jan
Vandenhouten Ralf
Publication venue: Technische Hochschule Wildau
Publication date: 01/01/2018
Field of study

The BERMUDA project started in January 2015 and was successfully completed after less than three years in August 2017. A technical set-up and an image processing and analysis software were developed to record and evaluate multi-perspective videos. Based on two cameras, positioned relatively far from one another with tilted axes, synchronized videos were recorded in the laboratory and in real life. The evaluation comprised the background elimination, the body part classification, the clustering, the assignment to persons and eventually the reconstruction of the skeletons. Based on the skeletons, machine learning techniques were developed to recognize the poses of the persons and next for the actions performed. It was, for example, possible to detect the action of a punch, which is relevant in security issues, with a precision of 51.3 % and a recall of 60.6 %.Das Projekt BERMUDA konnte im Januar 2015 begonnen und nach knapp drei Jahren im August 2017 erfolgreich abgeschlossen werden. Es wurden ein technischer Aufbau und eine Bildverarbeitungs- und Analysesoftware entwickelt, mit denen sich multiperspektivische Videos aufzeichnen und auswerten lassen. Basierend auf zwei in größerem Abstand gewinkelt positionierten Kameras wurden synchrone Videos sowohl im Labor als auch in realen Umgebungen aufgenommen. Die Auswertung umfasst die Hintergrundeliminierung, die Körperteilklassifikation, ein Clustering, die Zuordnung zu Personen und schließlich die Rekonstruktion der Skelette. Ausgehend von den Skeletten wurden Methoden des maschinellen Lernens zur Erkennung der Haltungen und darauf aufbauend zur Gestenerkennung entwickelt. Beispielhaft konnte die im Sicherheitskontext relevante Handlung des Schlagens mit einer Genauigkeit von 51,3 % und einer Trefferquote von 60,6 % erkannt werden