908 research outputs found

    Distributed Video Coding for Multiview and Video-plus-depth Coding

    Get PDF

    Kernelized Multiview Projection for Robust Action Recognition

    Get PDF
    Conventional action recognition algorithms adopt a single type of feature or a simple concatenation of multiple features. In this paper, we propose to better fuse and embed different feature representations for action recognition using a novel spectral coding algorithm called Kernelized Multiview Projection (KMP). Computing the kernel matrices from different features/views via time-sequential distance learning, KMP can encode different features with different weights to achieve a low-dimensional and semantically meaningful subspace where the distribution of each view is sufficiently smooth and discriminative. More crucially, KMP is linear for the reproducing kernel Hilbert space, which allows it to be competent for various practical applications. We demonstrate KMP’s performance for action recognition on five popular action datasets and the results are consistently superior to state-of-the-art techniques

    Depth-based Multi-View 3D Video Coding

    Get PDF

    NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding

    Full text link
    Research on depth-based human activity analysis achieved outstanding performance and demonstrated the effectiveness of 3D representation for action recognition. The existing depth-based and RGB+D-based action recognition benchmarks have a number of limitations, including the lack of large-scale training samples, realistic number of distinct class categories, diversity in camera views, varied environmental conditions, and variety of human subjects. In this work, we introduce a large-scale dataset for RGB+D human action recognition, which is collected from 106 distinct subjects and contains more than 114 thousand video samples and 8 million frames. This dataset contains 120 different action classes including daily, mutual, and health-related activities. We evaluate the performance of a series of existing 3D activity analysis methods on this dataset, and show the advantage of applying deep learning methods for 3D-based human action recognition. Furthermore, we investigate a novel one-shot 3D activity recognition problem on our dataset, and a simple yet effective Action-Part Semantic Relevance-aware (APSR) framework is proposed for this task, which yields promising results for recognition of the novel action classes. We believe the introduction of this large-scale dataset will enable the community to apply, adapt, and develop various data-hungry learning techniques for depth-based and RGB+D-based human activity understanding. [The dataset is available at: http://rose1.ntu.edu.sg/Datasets/actionRecognition.asp]Comment: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI

    Co-operative surveillance cameras for high quality face acquisition in a real-time door monitoring system

    Get PDF
    A poster session on co-operative surveillance cameras for high quality face acquisition in a real-time door monitoring syste

    Robust Multiview Multimodal Driver Monitoring System Using Masked Multi-Head Self-Attention

    Full text link
    Driver Monitoring Systems (DMSs) are crucial for safe hand-over actions in Level-2+ self-driving vehicles. State-of-the-art DMSs leverage multiple sensors mounted at different locations to monitor the driver and the vehicle's interior scene and employ decision-level fusion to integrate these heterogenous data. However, this fusion method may not fully utilize the complementarity of different data sources and may overlook their relative importance. To address these limitations, we propose a novel multiview multimodal driver monitoring system based on feature-level fusion through multi-head self-attention (MHSA). We demonstrate its effectiveness by comparing it against four alternative fusion strategies (Sum, Conv, SE, and AFF). We also present a novel GPU-friendly supervised contrastive learning framework SuMoCo to learn better representations. Furthermore, We fine-grained the test split of the DAD dataset to enable the multi-class recognition of drivers' activities. Experiments on this enhanced database demonstrate that 1) the proposed MHSA-based fusion method (AUC-ROC: 97.0\%) outperforms all baselines and previous approaches, and 2) training MHSA with patch masking can improve its robustness against modality/view collapses. The code and annotations are publicly available.Comment: 9 pages (1 for reference); accepted by the 6th Multimodal Learning and Applications Workshop (MULA) at CVPR 202
    • …
    corecore