217 research outputs found

    Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep cnn

    Full text link
    This paper presents an image classification based approach for skeleton-based video action recognition problem. Firstly, A dataset independent translation-scale invariant image mapping method is proposed, which transformes the skeleton videos to colour images, named skeleton-images. Secondly, A multi-scale deep convolutional neural network (CNN) architecture is proposed which could be built and fine-tuned on the powerful pre-trained CNNs, e.g., AlexNet, VGGNet, ResNet etal.. Even though the skeleton-images are very different from natural images, the fine-tune strategy still works well. At last, we prove that our method could also work well on 2D skeleton video data. We achieve the state-of-the-art results on the popular benchmard datasets e.g. NTU RGB+D, UTD-MHAD, MSRC-12, and G3D. Especially on the largest and challenge NTU RGB+D, UTD-MHAD, and MSRC-12 dataset, our method outperforms other methods by a large margion, which proves the efficacy of the proposed method

    Multi-View Region Adaptive Multi-temporal DMM and RGB Action Recognition

    Get PDF
    Human action recognition remains an important yet challenging task. This work proposes a novel action recognition system. It uses a novel Multiple View Region Adaptive Multi-resolution in time Depth Motion Map (MV-RAMDMM) formulation combined with appearance information. Multiple stream 3D Convolutional Neural Networks (CNNs) are trained on the different views and time resolutions of the region adaptive Depth Motion Maps. Multiple views are synthesised to enhance the view invariance. The region adaptive weights, based on localised motion, accentuate and differentiate parts of actions possessing faster motion. Dedicated 3D CNN streams for multi-time resolution appearance information (RGB) are also included. These help to identify and differentiate between small object interactions. A pre-trained 3D-CNN is used here with fine-tuning for each stream along with multiple class Support Vector Machines (SVM)s. Average score fusion is used on the output. The developed approach is capable of recognising both human action and human-object interaction. Three public domain datasets including: MSR 3D Action,Northwestern UCLA multi-view actions and MSR 3D daily activity are used to evaluate the proposed solution. The experimental results demonstrate the robustness of this approach compared with state-of-the-art algorithms.Comment: 14 pages, 6 figures, 13 tables. Submitte

    Detection of tennis activities with wearable sensors

    Get PDF
    This paper aims to design and implement a system capable of distinguishing between different activities carried out during a tennis match. The goal is to achieve the correct classification of a set of tennis strokes. The system must exhibit robustness to the variability of the height, age or sex of any subject that performs the actions. A new database is developed to meet this objective. The system is based on two sensor nodes using Bluetooth Low Energy (BLE) wireless technology to communicate with a PC that acts as a central device to collect the information received by the sensors. The data provided by these sensors are processed to calculate their spectrograms. Through the application of innovative deep learning techniques with semi-supervised training, it is possible to carry out the extraction of characteristics and the classification of activities. Preliminary results obtained with a data set of eight players, four women and four men have shown that our approach is able to address the problem of the diversity of human constitutions, weight and sex of different players, providing accuracy greater than 96.5% to recognize the tennis strokes of a new player never seen before by the system

    Artificial Intelligence Of Things For Ubiquitous Sports Analytics

    Full text link
    To enable mobile devices to perform in-the-wild sports analytics, particularly swing tracking, remains an open question. A crucial challenge is to develop robust methods that can operate across various sports (e.g., golf and tennis), different sensors (cameras and IMU), and diverse human users. Traditional approaches typically rely on vision-based or IMU-based methods to extract key points from subjects in order to estimate trajectory predictions. However, these methods struggle to generate accurate swing tracking, as vision-based techniques are susceptible to occlusion, and IMU sensors are notorious for accumulated errors. In this thesis, we propose several innovative solutions by leveraging AIoT, including the IoT with ubiquitous wearable devices such as smartphones and smart wristbands, and harnessing the power of AI such as deep neural networks, to achieve ubiquitous sports analytics. We make three main technical contributions: a tailored deep neural network design, network model automatic search, and model domain adaptation to address the problem of heterogeneity among devices, human subjects, and sports for ubiquitous sports analytics. In Chapter 2, we begin with the design of a prototype that combines IMU and depth sensor fusion, along with a tailored deep neural network, to address the occlusion problems faced by depth sensors during swings. To recover swing trajectories with fine-grained details, we propose a CNN-LSTM architecture that learns multi-modalities within depth and IMU sensor fusion. In Chapter 3, we develop a framework to reduce the overhead of model design for new devices, sports, and human users. By designing a regression-based stochastic NAS method, we improve swing-tracking algorithms through automatic model generation. We also extend our studies to include unseen human users, sensor devices, and sports. Leveraging a domain adaptation method, we propose a framework that eliminates the need for tedious training data collection and labeling for new users, devices, and sports via adversarial learning. In Chapter 4, we present a framework to alleviate the model parameter selection process in NAS, as introduced in Chapter 3. By employing zero-cost proxies, we search for the optimal swing tracking architecture without training, in a significantly larger candidate model pool. We demonstrate that the proposed method outperforms state-of-the-art approaches in swing tracking, as well as in adapting to different subjects, sports, and devices. Overall, this thesis develops a series of innovative machine learning algorithms to enable ubiquitous IoT wearable devices to perform accurate swing analytics (e.g., tracking, analysis, and assessment) in real-world conditions

    Multi-set canonical correlation analysis for 3D abnormal gait behaviour recognition based on virtual sample generation

    Get PDF
    Small sample dataset and two-dimensional (2D) approach are challenges to vision-based abnormal gait behaviour recognition (AGBR). The lack of three-dimensional (3D) structure of the human body causes 2D based methods to be limited in abnormal gait virtual sample generation (VSG). In this paper, 3D AGBR based on VSG and multi-set canonical correlation analysis (3D-AGRBMCCA) is proposed. First, the unstructured point cloud data of gait are obtained by using a structured light sensor. A 3D parametric body model is then deformed to fit the point cloud data, both in shape and posture. The features of point cloud data are then converted to a high-level structured representation of the body. The parametric body model is used for VSG based on the estimated body pose and shape data. Symmetry virtual samples, pose-perturbation virtual samples and various body-shape virtual samples with multi-views are generated to extend the training samples. The spatial-temporal features of the abnormal gait behaviour from different views, body pose and shape parameters are then extracted by convolutional neural network based Long Short-Term Memory model network. These are projected onto a uniform pattern space using deep learning based multi-set canonical correlation analysis. Experiments on four publicly available datasets show the proposed system performs well under various conditions
    corecore