311 research outputs found

    A new framework for deep learning video based Human Action Recognition on the edge

    Get PDF
    Nowadays, video surveillance systems are commonly found in most public and private spaces. These systems typically consist of a network of cameras that feed into a central node. However, the processing aspect is evolving towards distributed approaches, leveraging edge-computing. These distributed systems are capable of effectively addressing the detection of people or events at each individual node. Most of these systems, rely on the use of deep-learning and segmentation algorithms which enable them to achieve high performance, but usually with a significant computational cost, hindering real-time execution. This paper presents an approach for people detection and action recognition in the wild, optimized for running on the edge, and that is able to work in real-time, in an embedded platform. Human Action Recognition (HAR) is performed by using a Recurrent Neural Network (RNN), specifically a Long Short-Term Memory (LSTM). The input to the LSTM is an ad-hoc, lightweight feature vector obtained from the bounding box of each detected person in the video surveillance image. The resulting system is highly portable and easily scalable, providing a powerful tool for real-world video surveillance applications (in the wild and real-time action recognition). The proposal has been exhaustively evaluated and compared against other state-of-the-art (SOTA) proposals in five datasets, including four widely used (KTH, WEIZMAN, WVU, IXMAX) and a novel one (GBA) recorded in the wild, that includes several people performing different actions simultaneously. The obtained results validate the proposal, since it achieves SOTA accuracy within a much more complicated video surveillance real scenario, and using a lightweight embedded hardware.European CommissionAgencia Estatal de InvestigaciónUniversidad de Alcal

    Action Classification in Human Robot Interaction Cells in Manufacturing

    Get PDF
    Action recognition has become a prerequisite approach to fluent Human-Robot Interaction (HRI) due to a high degree of movement flexibility. With the improvements in machine learning algorithms, robots are gradually transitioning into more human-populated areas. However, HRI systems demand the need for robots to possess enough cognition. The action recognition algorithms require massive training datasets, structural information of objects in the environment, and less expensive models in terms of computational complexity. In addition, many such algorithms are trained on datasets derived from daily activities. The algorithms trained on non-industrial datasets may have an unfavorable impact on implementing models and validating actions in an industrial context. This study proposed a lightweight deep learning model for classifying low-level actions in an assembly setting. The model is based on optical flow feature elicitation and mobilenetV2-SSD action classification and is trained and assessed on an actual industrial activities’ dataset. The experimental outcomes show that the presented method is futuristic and does not require extensive preprocessing; therefore, it can be promising in terms of the feasibility of action recognition for mutual performance monitoring in real-world HRI applications. The test result shows 80% accuracy for low-level RGB action classes. The study’s primary objective is to generate experimental results that may be used as a reference for future HRI algorithms based on the InHard dataset

    Skeleton driven action recognition using an image-based spatial-temporal representation and convolution neural network

    Get PDF
    Individuals with Autism Spectrum Disorder (ASD) typically present difficulties in engaging and interacting with their peers. Thus, researchers have been developing different technological solutions as support tools for children with ASD. Social robots, one example of these technological solutions, are often unaware of their game partners, preventing the automatic adaptation of their behavior to the user. Information that can be used to enrich this interaction and, consequently, adapt the system behavior is the recognition of different actions of the user by using RGB cameras or/and depth sensors. The present work proposes a method to automatically detect in real-time typical and stereotypical actions of children with ASD by using the Intel RealSense and the Nuitrack SDK to detect and extract the user joint coordinates. The pipeline starts by mapping the temporal and spatial joints dynamics onto a color image-based representation. Usually, the position of the joints in the final image is clustered into groups. In order to verify if the sequence of the joints in the final image representation can influence the model’s performance, two main experiments were conducted where in the first, the order of the grouped joints in the sequence was changed, and in the second, the joints were randomly ordered. In each experiment, statistical methods were used in the analysis. Based on the experiments conducted, it was found statistically significant differences concerning the joints sequence in the image, indicating that the order of the joints might impact the model’s performance. The final model, a Convolutional Neural Network (CNN), trained on the different actions (typical and stereotypical), was used to classify the different patterns of behavior, achieving a mean accuracy of 92.4% ± 0.0% on the test data. The entire pipeline ran on average at 31 FPS.This work has been supported by FCT—Fundação para a Ciência e Tecnologia within the R&D Units Project Scope: UIDB/00319/2020. Vinicius Silva thanks FCT for the PhD scholarship SFRH/BD/SFRH/BD/133314/2017

    NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding

    Full text link
    Research on depth-based human activity analysis achieved outstanding performance and demonstrated the effectiveness of 3D representation for action recognition. The existing depth-based and RGB+D-based action recognition benchmarks have a number of limitations, including the lack of large-scale training samples, realistic number of distinct class categories, diversity in camera views, varied environmental conditions, and variety of human subjects. In this work, we introduce a large-scale dataset for RGB+D human action recognition, which is collected from 106 distinct subjects and contains more than 114 thousand video samples and 8 million frames. This dataset contains 120 different action classes including daily, mutual, and health-related activities. We evaluate the performance of a series of existing 3D activity analysis methods on this dataset, and show the advantage of applying deep learning methods for 3D-based human action recognition. Furthermore, we investigate a novel one-shot 3D activity recognition problem on our dataset, and a simple yet effective Action-Part Semantic Relevance-aware (APSR) framework is proposed for this task, which yields promising results for recognition of the novel action classes. We believe the introduction of this large-scale dataset will enable the community to apply, adapt, and develop various data-hungry learning techniques for depth-based and RGB+D-based human activity understanding. [The dataset is available at: http://rose1.ntu.edu.sg/Datasets/actionRecognition.asp]Comment: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI

    Artificial Vision Algorithms for Socially Assistive Robot Applications: A Review of the Literature

    Get PDF
    Today, computer vision algorithms are very important for different fields and applications, such as closed-circuit television security, health status monitoring, and recognizing a specific person or object and robotics. Regarding this topic, the present paper deals with a recent review of the literature on computer vision algorithms (recognition and tracking of faces, bodies, and objects) oriented towards socially assistive robot applications. The performance, frames per second (FPS) processing speed, and hardware implemented to run the algorithms are highlighted by comparing the available solutions. Moreover, this paper provides general information for researchers interested in knowing which vision algorithms are available, enabling them to select the one that is most suitable to include in their robotic system applicationsBeca Conacyt Doctorado No de CVU: 64683

    A lightweight temporal attention-based convolution neural network for driver's activity recognition in edge

    Get PDF
    Low inference latency and accurate response to environment changes play a crucial role in the automated driving system, especially in the current Level 3 automated driving. Achieving the rapid and reliable recognition of driver's non-driving related activities (NDRAs) is important for designing an intelligent takeover strategy that ensures a safe and quick control transition. This paper proposes a novel lightweight temporal attention-based convolutional neural network (LTA-CNN) module dedicated to edge computing platforms, specifically for NDRAs recognition. This module effectively learns spatial and temporal representations at a relatively low computational cost. Its superiority has been demonstrated in an NDRA recognition dataset, achieving 81.01% classification accuracy and an 8.37% increase compared to the best result of the efficient network (MobileNet V3) found in the literature. The inference latency has been evaluated to demonstrate its effectiveness in real applications. The latest NVIDIA Jetson AGX Orin could complete one inference in only 63 ms
    • …
    corecore