9 research outputs found

    Simple yet efficient real-time pose-based action recognition

    Full text link
    Recognizing human actions is a core challenge for autonomous systems as they directly share the same space with humans. Systems must be able to recognize and assess human actions in real-time. In order to train corresponding data-driven algorithms, a significant amount of annotated training data is required. We demonstrated a pipeline to detect humans, estimate their pose, track them over time and recognize their actions in real-time with standard monocular camera sensors. For action recognition, we encode the human pose into a new data format called Encoded Human Pose Image (EHPI) that can then be classified using standard methods from the computer vision community. With this simple procedure we achieve competitive state-of-the-art performance in pose-based action detection and can ensure real-time performance. In addition, we show a use case in the context of autonomous driving to demonstrate how such a system can be trained to recognize human actions using simulation data.Comment: Submitted to IEEE Intelligent Transportation Systems Conference (ITSC) 2019. Code will be available soon at https://github.com/noboevbo/ehpi_action_recognitio

    Autonomous Driving: Framework for Pedestrian Intention Estimation in a Real World Scenario

    Full text link
    Rapid advancements in driver assistance technology will lead to the integration of fully autonomous vehicles on our roads that will interact with other road users. To address the problem that driverless vehicles make interaction through eye contact impossible, we describe a framework for estimating the crossing intentions of pedestrians in order to reduce the uncertainty that the lack of eye contact between road users creates. The framework was deployed in a real vehicle and tested with three experimental cases that showed a variety of communication messages to pedestrians in a shared space scenario. Results from the performed field tests showed the feasibility of the presented approach. © 2020 IEEE.This work was supported by the Austrian Ministry for Climate Action, Environment, Energy, Mobility, Innovation and Technology (BMK) Endowed Professorship for Sustainable Transport Logistics 4.0 and the Spanish Ministry of Economy, Industry and Competitiveness under TRA201563708-R and TRA2016-78886-C3-1-R Projects

    Skeleton driven action recognition using an image-based spatial-temporal representation and convolution neural network

    Get PDF
    Individuals with Autism Spectrum Disorder (ASD) typically present difficulties in engaging and interacting with their peers. Thus, researchers have been developing different technological solutions as support tools for children with ASD. Social robots, one example of these technological solutions, are often unaware of their game partners, preventing the automatic adaptation of their behavior to the user. Information that can be used to enrich this interaction and, consequently, adapt the system behavior is the recognition of different actions of the user by using RGB cameras or/and depth sensors. The present work proposes a method to automatically detect in real-time typical and stereotypical actions of children with ASD by using the Intel RealSense and the Nuitrack SDK to detect and extract the user joint coordinates. The pipeline starts by mapping the temporal and spatial joints dynamics onto a color image-based representation. Usually, the position of the joints in the final image is clustered into groups. In order to verify if the sequence of the joints in the final image representation can influence the model’s performance, two main experiments were conducted where in the first, the order of the grouped joints in the sequence was changed, and in the second, the joints were randomly ordered. In each experiment, statistical methods were used in the analysis. Based on the experiments conducted, it was found statistically significant differences concerning the joints sequence in the image, indicating that the order of the joints might impact the model’s performance. The final model, a Convolutional Neural Network (CNN), trained on the different actions (typical and stereotypical), was used to classify the different patterns of behavior, achieving a mean accuracy of 92.4% ± 0.0% on the test data. The entire pipeline ran on average at 31 FPS.This work has been supported by FCT—Fundação para a Ciência e Tecnologia within the R&D Units Project Scope: UIDB/00319/2020. Vinicius Silva thanks FCT for the PhD scholarship SFRH/BD/SFRH/BD/133314/2017

    Online human action recognition with spatial and temporal skeleton features using a distributed camera network

    Get PDF
    Online action recognition is an important task for human-centered intelligent services. However, it remains a highly challenging problem due to the high varieties and uncertainties of spatial and temporal scales of human actions. In this paper, the following core ideas are proposed to deal with the online action recognition problem. First, we combine spatial and temporal skeleton features to represent human actions, which include not only geometrical features, but also multiscale motion features, such that both spatial and temporal information of the actions are covered. We use an efficient one-dimensional convolutional neural network to fuse spatial and temporal features and train them for action recognition. Second, we propose a group sampling method to combine the previous action frames and current action frames, which are based on the hypothesis that the neighboring frames are largely redundant, and the sampling mechanism ensures that the long-term contextual information is also considered. Third, the skeletons from multiview cameras are fused in a distributed manner, which can improve the human pose accuracy in the case of occlusions. Finally, we propose a Restful style based client-server service architecture to deploy the proposed online action recognition module on the remote server as a public service, such that camera networks for online action recognition can benefit from this architecture due to the limited onboard computational resources. We evaluated our model on the data sets of JHMDB and UT-Kinect, which achieved highly promising accuracy levels of 80.1% and 96.9%, respectively. Our online experiments show that our memory group sampling mechanism is far superior to the traditional sliding window

    Simple yet efficient real-time pose-based action recognition

    No full text
    Recognizing human actions is a core challenge for autonomous systems as they directly share the same space with humans. Systems must be able to recognize and assess human actions in real-time. To train the corresponding data-driven algorithms, a significant amount of annotated training data is required. We demonstrate a pipeline to detect humans, estimate their pose, track them over time and recognize their actions in real-time with standard monocular camera sensors. For action recognition, we transform noisy human pose estimates in an image like format we call Encoded Human Pose Image (EHPI). This encoded information can further be classified using standard methods from the computer vision community. With this simple procedure, we achieve competitive state-of-the-art performance in pose based action detection and can ensure real-time performance. In addition, we show a use case in the context of autonomous driving to demonstrate how such a system can be trained to recognize human actions using simulation data
    corecore