1,136 research outputs found

    A preliminary study of micro-gestures:dataset collection and analysis with multi-modal dynamic networks

    Get PDF
    Abstract. Micro-gestures (MG) are gestures that people performed spontaneously during communication situations. A preliminary exploration of Micro-Gesture is made in this thesis. By collecting recorded sequences of body gestures in a spontaneous state during games, a MG dataset is built through Kinect V2. A novel term ‘micro-gesture’ is proposed by analyzing the properties of MG dataset. Implementations of two sets of neural network architectures are achieved for micro-gestures segmentation and recognition task, which are the DBN-HMM model and the 3DCNN-HMM model for skeleton data and RGB-D data respectively. We also explore a method for extracting neutral states used in the HMM structure by detecting the activity level of the gesture sequences. The method is simple to derive and implement, and proved to be effective. The DBN-HMM and 3DCNN-HMM architectures are evaluated on MG dataset and optimized for the properties of micro-gestures. Experimental results show that we are able to achieve micro-gesture segmentation and recognition with satisfied accuracy with these two models. The work we have done about the micro-gestures in this thesis also explores a new research path for gesture recognition. Therefore, we believe that our work could be widely used as a baseline for future research on micro-gestures

    Source coding for transmission of reconstructed dynamic geometry: a rate-distortion-complexity analysis of different approaches

    Get PDF
    Live 3D reconstruction of a human as a 3D mesh with commodity electronics is becoming a reality. Immersive applications (i.e. cloud gaming, tele-presence) benefit from effective transmission of such content over a bandwidth limited link. In this paper we outline different approaches for compressing live reconstructed mesh geometry based on distributing mesh reconstruction functions between sender and receiver. We evaluate rate-performance-complexity of different configurations. First, we investigate 3D mesh compression methods (i.e. dynamic/static) from MPEG-4. Second, we evaluate the option of using octree based point cloud compression and receiver side surface reconstruction

    Network streaming and compression for mixed reality tele-immersion

    Get PDF
    Bulterman, D.C.A. [Promotor]Cesar, P.S. [Copromotor

    Geometric 3D point cloud compression

    Get PDF
    The use of 3D data in mobile robotics applications provides valuable information about the robot’s environment but usually the huge amount of 3D information is unmanageable by the robot storage and computing capabilities. A data compression is necessary to store and manage this information but preserving as much information as possible. In this paper, we propose a 3D lossy compression system based on plane extraction which represent the points of each scene plane as a Delaunay triangulation and a set of points/area information. The compression system can be customized to achieve different data compression or accuracy ratios. It also supports a color segmentation stage to preserve original scene color information and provides a realistic scene reconstruction. The design of the method provides a fast scene reconstruction useful for further visualization or processing tasks.This work has been supported by the Spanish Government DPI2013-40534-R grant

    Cross Modal Distillation for Supervision Transfer

    Full text link
    In this work we propose a technique that transfers supervision between images from different modalities. We use learned representations from a large labeled modality as a supervisory signal for training representations for a new unlabeled paired modality. Our method enables learning of rich representations for unlabeled modalities and can be used as a pre-training procedure for new modalities with limited labeled data. We show experimental results where we transfer supervision from labeled RGB images to unlabeled depth and optical flow images and demonstrate large improvements for both these cross modal supervision transfers. Code, data and pre-trained models are available at https://github.com/s-gupta/fast-rcnn/tree/distillationComment: Updated version (v2) contains additional experiments and result

    NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding

    Full text link
    Research on depth-based human activity analysis achieved outstanding performance and demonstrated the effectiveness of 3D representation for action recognition. The existing depth-based and RGB+D-based action recognition benchmarks have a number of limitations, including the lack of large-scale training samples, realistic number of distinct class categories, diversity in camera views, varied environmental conditions, and variety of human subjects. In this work, we introduce a large-scale dataset for RGB+D human action recognition, which is collected from 106 distinct subjects and contains more than 114 thousand video samples and 8 million frames. This dataset contains 120 different action classes including daily, mutual, and health-related activities. We evaluate the performance of a series of existing 3D activity analysis methods on this dataset, and show the advantage of applying deep learning methods for 3D-based human action recognition. Furthermore, we investigate a novel one-shot 3D activity recognition problem on our dataset, and a simple yet effective Action-Part Semantic Relevance-aware (APSR) framework is proposed for this task, which yields promising results for recognition of the novel action classes. We believe the introduction of this large-scale dataset will enable the community to apply, adapt, and develop various data-hungry learning techniques for depth-based and RGB+D-based human activity understanding. [The dataset is available at: http://rose1.ntu.edu.sg/Datasets/actionRecognition.asp]Comment: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI
    • …
    corecore