2 research outputs found

    Human Action Recognition using Multi-Kernel Learning for Temporal Residual Network

    Get PDF
    This paper has been presented at the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications.Deep learning has led to a series of breakthrough in the human action recognition field. Given the powerful representational ability of residual networks (ResNet), performance in many computer vision tasks including human action recognition has improved. Motivated by the success of ResNet, we use the residual network and its variations to obtain feature representation. Bearing in mind the importance of appearance and motion information for action representation, our network utilizes both for feature extraction. Appearance and motion features are further fused for action classification using a multi-kernel support vector machine (SVM).We also investigate the fusion of dense trajectories with the proposed network to boost up the network performance. We evaluate our proposed methods on a benchmark dataset (HMDB-51) and results shows the multi-kernel learning shows the better performance than the fusion of classification score from deep network SoftMax layer. Our proposed method also shows good performance as compared to the recent state-of-the-art methods.Sergio A. Velastin has received funding from the Universidad Carlos III de Madrid, the European Unions Seventh Framework Programme for research, technological development and demonstration under grant agreement n◦ 600371, el Ministerio de Economía, Industria y Competitividad (COFUND2013-51509) el Ministerio de Educación, cultura y Deporte (CEI-15-17) and Banco Santander. Authors also acknowledge support from the Higher Education Commission, Pakistan

    Modern architectures convolutional neural networks in human activity recognition

    Get PDF
    In recent years, many researchers have focused on using convolutional neural networks to perform human activity recognition as evidenced by the emergence of a number of convolutional neural network architectures such as LeNet-5,AlexNet and VGG16 and modern architectures such as ResNet, Inception V3, Inception-ResNet, MobileNet V2, NASNet and PNASNet. The main characteristic of a convolutional neural network (CNN) is its ability to extract features automatically from input images, which facilitates the processes of activity recognition and classification. Convolutional networks indeed derive more relevant and complex features with every additional layer. In addition, CNNs have achieved perfect classification on highly similar activities that were previously extremely difficult to classify. In this paper, we evaluate modern convolutional neural networks in terms of their human activity recognition accuracy, and we compare the results with the state of the art methods. In our research, we used two public data sets, HMDB (Shooting gun, kicking, falling to the floor, punching) and the Weizman dataset (walking, running, jumping, bending, one hand waving, two-hand waving, jumping in place, jumping jack, skipping). Our experimental results indicated that the CNN with NASNet architecture achieves the best performance of the six CNN architectures on both human activity data sets (HMDB and Weizman)
    corecore