6 research outputs found

    A Review of Physical Human Activity Recognition Chain Using Sensors

    Get PDF
    In the era of Internet of Medical Things (IoMT), healthcare monitoring has gained a vital role nowadays. Moreover, improving lifestyle, encouraging healthy behaviours, and decreasing the chronic diseases are urgently required. However, tracking and monitoring critical cases/conditions of elderly and patients is a great challenge. Healthcare services for those people are crucial in order to achieve high safety consideration. Physical human activity recognition using wearable devices is used to monitor and recognize human activities for elderly and patient. The main aim of this review study is to highlight the human activity recognition chain, which includes, sensing technologies, preprocessing and segmentation, feature extractions methods, and classification techniques. Challenges and future trends are also highlighted.

    Effective Codebooks for Human Action Representation and Classification in Unconstrained Videos

    No full text
    Recognition and classification of human actions for annotation of unconstrained video sequences has proven to be challenging because of the variations in the environment, appearance of actors, modalities in which the same action is performed by different persons, speed and duration and points of view from which the event is observed. This variability reflects in the difficulty of defining effective descriptors and deriving appropriate and effective codebooks for action categorization. In this paper we propose a novel and effective solution to classify human actions in unconstrained videos. It improves on previous contributions through the definition of a novel local descriptor that uses image gradient and optic flow to respectively model the appearance and motion of human actions at interest point regions. In the formation of the codebook we employ radius-based clustering with soft assignment in order to create a ric

    Effective Codebooks for Human Action Representation and Classification in Unconstrained Videos

    No full text
    Recognition and classification of human actions for annotation of unconstrained video sequences has proven to be challenging because of the variations in the environment, appearance of actors, modalities in which the same action is performed by different persons, speed and duration and points of view from which the event is observed. This variability reflects in the difficulty of defining effective descriptors and deriving appropriate and effective codebooks for action categorization. In this paper we propose a novel and effective solution to classify human actions in unconstrained videos. It improves on previous contributions through the definition of a novel local descriptor that uses image gradient and optic flow to respectively model the appearance and motion of human actions at interest point regions. In the formation of the codebook we employ radius-based clustering with soft assignment in order to create a rich vocabulary that may account for the high variability of human actions. We show that our solution scores very good performance with no need of parameter tuning. We also show that a strong reduction of computation time can be obtained by applying codebook size reduction with Deep Belief Networks with little loss of accuracy

    A PYRAMIDAL APPROACH FOR DESIGNING DEEP NEURAL NETWORK ARCHITECTURES

    Get PDF
    Developing an intelligent system, capable of learning discriminative high-level features from high dimensional data lies at the core of solving many computer vision (CV ) and machine learning (ML) tasks. Scene or human action recognition from videos is an important topic in CV and ML. Its applications include video surveillance, robotics, human-computer interaction, video retrieval, etc. Several bio inspired hand crafted feature extraction systems have been proposed for processing temporal data. However, recent deep learning techniques have dominated CV and ML by their good performance on large scale datasets. One of the most widely used deep learning technique is Convolutional neural network (CNN) or its variations, e.g. ConvNet, 3DCNN, C3D. CNN kernel scheme reduces the number of parameters with respect to fully connected Neural Networks. Recent deep CNNs have more layers and more kernels for each layer with respect to early CNNs, and as a consequence, they result in a large number of parameters. In addition, they violate the pyramidal plausible architecture of biological neural network due to the increasing number of filters at each higher layer resulting in difficulty for convergence at training step. In this dissertation, we address three main questions central to pyramidal structure and deep neural networks: 1) Is it worth to utilize pyramidal architecture for proposing a generalized recognition system? 2) How to enhance pyramidal neural network (PyraNet) for recognizing action and dynamic scenes in the videos? 3) What will be the impact of imposing pyramidal structure on a deep CNN? In the first part of the thesis, we provide a brief review of the work done for action and dynamic scene recognition using traditional computer vision and machine learning approaches. In addition, we give a historical and present overview of pyramidal neural networks and how deep learning emerged. In the second part, we introduce a strictly pyramidal deep architecture for dynamic scene and human action recognition. It is based on the 3DCNN model and the image pyramid concept. We introduce a new 3D weighting scheme that presents a simple connection scheme with lower computational and memory costs and results in less number of learnable parameters compared to other neural networks. 3DPyraNet extracts features from both spatial and temporal dimensions by keeping biological structure, thereby it is capable to capture the motion information encoded in multiple adjacent frames. 3DPyraNet model is extended with three modifications: 1) changing input image size; 2) changing receptive field and overlap size in correlation layers; and 3) adding a linear classifier at the end to classify the learned features. It results in a discriminative approach for spatiotemporal feature learning in action and dynamic scene recognition. In combination with a linear SVM classifier, our model outperforms state-of-the-art methods in one-vs-all accuracy on three video benchmark datasets (KTH, Weizmann, and Maryland). Whereas, it gives competitive accuracy on a 4th dataset (YUPENN). In the last part of our thesis, we investigate to what extent CNN may take advantage of pyramid structure typical of biological neurons. A generalized statement over convolutional layers from input up-to fully connected layer is introduced that further helps in understanding and designing a successful deep network. It reduces ambiguity, number of parameters, and their size on disk without degrading overall accuracy. It also helps in giving a generalize guideline for modeling a deep architecture by keeping certain ratio of filters in starting layers vs. other deeper layers. Competitive results are achieved compared to similar well-engineered deeper architectures on four benchmark datasets. The same approach is further applied on person re-identification. Less ambiguity in features increase Rank-1 performance and results in better or comparable results to the state-of-the-art deep models
    corecore