25 research outputs found

    Skeletal Movement to Color Map: A Novel Representation for 3D Action Recognition with Inception Residual Networks

    Get PDF
    This paper has been presented at : 25th IEEE International Conference on Image Processing (ICIP)We propose a novel skeleton-based representation for 3D action recognition in videos using Deep Convolutional Neural Networks (D-CNNs). Two key issues have been addressed: First, how to construct a robust representation that easily captures the spatial-temporal evolutions of motions from skeleton sequences. Second, how to design D-CNNs capable of learning discriminative features from the new representation in a effective manner. To address these tasks, a skeleton-based representation, namely, SPMF (Skeleton Pose-Motion Feature) is proposed. The SPMFs are built from two of the most important properties of a human action: postures and their motions. Therefore, they are able to effectively represent complex actions. For learning and recognition tasks, we design and optimize new D-CNNs based on the idea of Inception Residual networks to predict actions from SPMFs. Our method is evaluated on two challenging datasets including MSR Action3D and NTU-RGB+D. Experimental results indicated that the proposed method surpasses state-of-the-art methods whilst requiring less computation

    SAR-NAS: Skeleton-based Action Recognition via Neural Architecture Searching

    Full text link
    This paper presents a study of automatic design of neural network architectures for skeleton-based action recognition. Specifically, we encode a skeleton-based action instance into a tensor and carefully define a set of operations to build two types of network cells: normal cells and reduction cells. The recently developed DARTS (Differentiable Architecture Search) is adopted to search for an effective network architecture that is built upon the two types of cells. All operations are 2D based in order to reduce the overall computation and search space. Experiments on the challenging NTU RGB+D and Kinectics datasets have verified that most of the networks developed to date for skeleton-based action recognition are likely not compact and efficient. The proposed method provides an approach to search for such a compact network that is able to achieve comparative or even better performance than the state-of-the-art methods

    Frustration recognition from speech during game interaction using wide residual networks

    Get PDF
    ABSTRACT Background Although frustration is a common emotional reaction during playing games, an excessive level of frustration can harm users’ experiences, discouraging them from undertaking further game interactions. The automatic detection of players’ frustration enables the development of adaptive systems, which through a real-time difficulty adjustment, would adapt the game to the user’s specific needs; thus, maximising players experience and guaranteeing the game success. To this end, we present our speech-based approach for the automatic detection of frustration during game interactions, a specific task still under-explored in research. Method The experiments were performed on the Multimodal Game Frustration Database (MGFD), an audiovisual dataset—collected within the Wizard-of-Oz framework—specially tailored to investigate verbal and facial expressions of frustration during game interactions. We explored the performance of a variety of acoustic feature sets, including Mel-Spectrograms and Mel-Frequency Cepstral Coefficients (MFCCs), as well as the low dimensional knowledge-based acoustic feature set eGeMAPS. Due to the always increasing improvements achieved by the use of Convolutional Neural Networks (CNNs) in speech recognition tasks, unlike the MGFD baseline—based on Long Short-Term Memory (LSTM) architecture and Support Vector Machine (SVM) classifier—in the present work we take into consideration typically used CNNs, including ResNets, VGG, and AlexNet. Furthermore, given the still open debate on the shallow vs deep networks suitability, we also examine the performance of two of the latest deep CNNs, i. e., WideResNets and EfficientNet. Results Our best result, achieved with WideResNets and Mel-Spectrogram features, increases the system performance from 58.8 % Unweighted Average Recall (UAR) to 93.1 % UAR for speech-based automatic frustration recognition

    Gesture Scoring Based on Gaussian Distance-Improved DTW

    Get PDF
    The power industry has been dedicated to applying virtual reality (VR) technology to build training systems in virtual environments, enabling personnel to complete skill training in real simulated environments while ensuring their safety. Conventional action scoring systems struggle to provide accurate scores for fine movements. Accurate scoring of fine movements can help workers identify their shortcomings during power operations, thus improving learning efficiency. This is of great significance for training on virtual environment-based power operation. This paper proposes a power operation-orientated VR action evaluation method based on the Gaussian distance-improved dynamic time warping (DTW) algorithm and the temporal convolutional network (TCN) model. First, the adaptive adapter is used to extract one-dimensional features from the three-dimensional data of the data gloves. Then, based on the TCN model, action data with significant discrepancies are filtered out. Finally, the obtained data are input into the Gaussian distance-improved DTW algorithm, where the path size is calculated. Corresponding scoring criteria are established on the basis of the path size to evaluate the actions. The results demonstrate that the VR action evaluation method based on the Gaussian distance-improved DTW algorithm and the TCN model significantly improves the accuracy of evaluating fine movements compared to traditional evaluation algorithms

    Hierarchical long short-term memory for action recognition based on 3D skeleton joints from Kinect sensor

    Get PDF
    Action recognition has been used in a wide range of applications such as human-computer interaction, intelligent video surveillance systems, video summarization, and robotics. Recognizing action is important for intelligent agents to understand, learn and interact with the environment. The recent technology that allows the acquisition of RGB+D and 3D skeleton data and a deep learning model's development significantly increases the action recognition model's performance. In this research, hierarchical Long Sort-Term Memory is proposed to recognize action based on 3D skeleton joints from Kinect sensor. The model uses the 3D axis of skeleton joints and groups each joint in the axis into parts, namely, spine, left and right arm, left and right hand, and left and right leg. To fit the hierarchically structured layers of LSTM, the parts are concatenated into spine, arms, hands, and legs and then concatenated into the body. The model crosses the body in each axis into a single final body and fed to the final layer to classify the action. The performance is measured using cross-view and cross-subject evaluation and achieves accuracy 0.854 and 0.837, respectively, from the 10 action classes of the NTU RGB+D dataset