150 research outputs found

    Learning Human Kinematics by Modeling Temporal Correlations between Joints for Video-based Human Pose Estimation

    Full text link
    Estimating human poses from videos is critical in human-computer interaction. By precisely estimating human poses, the robot can provide an appropriate response to the human. Most existing approaches use the optical flow, RNNs, or CNNs to extract temporal features from videos. Despite the positive results of these attempts, most of them only straightforwardly integrate features along the temporal dimension, ignoring temporal correlations between joints. In contrast to previous methods, we propose a plug-and-play kinematics modeling module (KMM) based on the domain-cross attention mechanism to model the temporal correlation between joints across different frames explicitly. Specifically, the proposed KMM models the temporal correlation between any two joints by calculating their temporal similarity. In this way, KMM can learn the motion cues of each joint. Using the motion cues (temporal domain) and historical positions of joints (spatial domain), KMM can infer the initial positions of joints in the current frame in advance. In addition, we present a kinematics modeling network (KIMNet) based on the KMM for obtaining the final positions of joints by combining pose features and initial positions of joints. By explicitly modeling temporal correlations between joints, KIMNet can infer the occluded joints at present according to all joints at the previous moment. Furthermore, the KMM is achieved through an attention mechanism, which allows it to maintain the high resolution of features. Therefore, it can transfer rich historical pose information to the current frame, which provides effective pose information for locating occluded joints. Our approach achieves state-of-the-art results on two standard video-based pose estimation benchmarks. Moreover, the proposed KIMNet shows some robustness to the occlusion, demonstrating the effectiveness of the proposed method

    Physical Violence Detection System to Prevent Student Mental Health Disorders Based on Deep Learning

    Get PDF
    Physical violence in the educational environment by students often occurs and leads to criminal acts. Apart from that, repeated acts of physical violence can be considered non-verbal bullying. This bullying can hurt the victim, causing physical disorders, mental health, impaired social relationships and decreased academic performance. However, monitoring activities against acts of violence currently being carried out have weaknesses, namely weak supervision by the school. A deep Learning-based physical violence detection system, namely LSTM Network, is the solution to this problem. In this research, we develop a Convolutional Neural Network to detect acts of violence. Convolutional Neural Network extracts features at the frame level from videos. At the frame level, the feature uses long short-term memory in the convolutional gate. Convolutional Neural Networks and convolutional short-term memory can capture local spatio-temporal features, enabling local video motion analysis. The performance of the proposed feature extraction pipeline is evaluated on standard benchmark datasets in terms of recognition accuracy. A comparison of the results obtained with state-of-the-art techniques reveals the promising capabilities of the proposed method for recognising violent videos. The model that has been trained and tested will be integrated into a violence detection system, which can provide ease and speed in detecting acts of violence that occur in the school environment

    SSHA: Video Violence Recognition and Localization Using a Semi-Supervised Hard Attention Model

    Full text link
    Current human-based surveillance systems are prone to inadequate availability and reliability. Artificial intelligence-based solutions are compelling, considering their reliability and precision in the face of an increasing adaption of surveillance systems. Exceedingly efficient and precise machine learning models are required to effectively utilize the extensive volume of high-definition surveillance imagery. This study focuses on improving the accuracy of the methods and models used in automated surveillance systems to recognize and localize human violence in video footage. The proposed model uses an I3D backbone pretrained on the Kinetics dataset and has achieved state-of-the-art accuracy of 90.4% and 98.7% on RWF and Hockey datasets, respectively. The semi-supervised hard attention mechanism has enabled the proposed method to fully capture the available information in a high-resolution video by processing the necessary video regions in great detail.Comment: 11 pages, 4 figures, 4 equations, 3 tables, 1 algorith

    Using Cost Simulation and Computer Vision to Inform Probabilistic Cost Estimates

    Get PDF
    Cost estimating is a critical task in the construction process. Building cost estimates using historical data from previously performed projects have long been recognized as one of the better methods to generate precise construction bids. However, the cost and productivity data are typically gathered at the summary level for cost-control purposes. The possible ranges of production rates and costs associated with the construction activities lack accuracy and comprehensiveness. In turn, the robustness of cost estimates is minimal. Thus, this study proposes exploring a range of cost and productivity data to better inform potential outcomes of cost estimates by using probabilistic cost simulation and computer vision techniques for activity production rate analysis. Chapter two employed the Monte Carlo Simulation approach to computing a range of cost outcomes to find the optimal construction methods for large-scale concrete construction. The probabilistic cost simulation approach helps the decision-makers better understand the probable cost consequences of different construction methods and to make more informed decisions based on the project characteristics. Chapter three experimented with the computer vision-based skeletal pose estimation model and recurrent neural network to recognize human activities. The activity recognition algorithm was employed to help interpret the construction activities into productivity information for automated labor productivity tracking. Chapter four implemented computer vision-based object detection and object tracking algorithms to automatically track the construction productivity data. The productivity data collected was used to inform the probabilistic cost estimates. The Monte Carlo Simulation was adopted to explore potential cost outcomes and sensitive cost factors in the overall construction project. The study demonstrated how the computer vision techniques and probabilistic cost simulation optimize the reliability of the cost estimates to support construction decision-making. Advisor: Philip Baruth
    corecore