150 research outputs found
Learning Human Kinematics by Modeling Temporal Correlations between Joints for Video-based Human Pose Estimation
Estimating human poses from videos is critical in human-computer interaction.
By precisely estimating human poses, the robot can provide an appropriate
response to the human. Most existing approaches use the optical flow, RNNs, or
CNNs to extract temporal features from videos. Despite the positive results of
these attempts, most of them only straightforwardly integrate features along
the temporal dimension, ignoring temporal correlations between joints. In
contrast to previous methods, we propose a plug-and-play kinematics modeling
module (KMM) based on the domain-cross attention mechanism to model the
temporal correlation between joints across different frames explicitly.
Specifically, the proposed KMM models the temporal correlation between any two
joints by calculating their temporal similarity. In this way, KMM can learn the
motion cues of each joint. Using the motion cues (temporal domain) and
historical positions of joints (spatial domain), KMM can infer the initial
positions of joints in the current frame in advance. In addition, we present a
kinematics modeling network (KIMNet) based on the KMM for obtaining the final
positions of joints by combining pose features and initial positions of joints.
By explicitly modeling temporal correlations between joints, KIMNet can infer
the occluded joints at present according to all joints at the previous moment.
Furthermore, the KMM is achieved through an attention mechanism, which allows
it to maintain the high resolution of features. Therefore, it can transfer rich
historical pose information to the current frame, which provides effective pose
information for locating occluded joints. Our approach achieves
state-of-the-art results on two standard video-based pose estimation
benchmarks. Moreover, the proposed KIMNet shows some robustness to the
occlusion, demonstrating the effectiveness of the proposed method
Physical Violence Detection System to Prevent Student Mental Health Disorders Based on Deep Learning
Physical violence in the educational environment by students often occurs and leads to criminal acts. Apart from that, repeated acts of physical violence can be considered non-verbal bullying. This bullying can hurt the victim, causing physical disorders, mental health, impaired social relationships and decreased academic performance. However, monitoring activities against acts of violence currently being carried out have weaknesses, namely weak supervision by the school. A deep Learning-based physical violence detection system, namely LSTM Network, is the solution to this problem. In this research, we develop a Convolutional Neural Network to detect acts of violence. Convolutional Neural Network extracts features at the frame level from videos. At the frame level, the feature uses long short-term memory in the convolutional gate. Convolutional Neural Networks and convolutional short-term memory can capture local spatio-temporal features, enabling local video motion analysis. The performance of the proposed feature extraction pipeline is evaluated on standard benchmark datasets in terms of recognition accuracy. A comparison of the results obtained with state-of-the-art techniques reveals the promising capabilities of the proposed method for recognising violent videos. The model that has been trained and tested will be integrated into a violence detection system, which can provide ease and speed in detecting acts of violence that occur in the school environment
SSHA: Video Violence Recognition and Localization Using a Semi-Supervised Hard Attention Model
Current human-based surveillance systems are prone to inadequate availability
and reliability. Artificial intelligence-based solutions are compelling,
considering their reliability and precision in the face of an increasing
adaption of surveillance systems. Exceedingly efficient and precise machine
learning models are required to effectively utilize the extensive volume of
high-definition surveillance imagery. This study focuses on improving the
accuracy of the methods and models used in automated surveillance systems to
recognize and localize human violence in video footage. The proposed model uses
an I3D backbone pretrained on the Kinetics dataset and has achieved
state-of-the-art accuracy of 90.4% and 98.7% on RWF and Hockey datasets,
respectively. The semi-supervised hard attention mechanism has enabled the
proposed method to fully capture the available information in a high-resolution
video by processing the necessary video regions in great detail.Comment: 11 pages, 4 figures, 4 equations, 3 tables, 1 algorith
Using Cost Simulation and Computer Vision to Inform Probabilistic Cost Estimates
Cost estimating is a critical task in the construction process. Building cost estimates using historical data from previously performed projects have long been recognized as one of the better methods to generate precise construction bids. However, the cost and productivity data are typically gathered at the summary level for cost-control purposes. The possible ranges of production rates and costs associated with the construction activities lack accuracy and comprehensiveness. In turn, the robustness of cost estimates is minimal. Thus, this study proposes exploring a range of cost and productivity data to better inform potential outcomes of cost estimates by using probabilistic cost simulation and computer vision techniques for activity production rate analysis.
Chapter two employed the Monte Carlo Simulation approach to computing a range of cost outcomes to find the optimal construction methods for large-scale concrete construction. The probabilistic cost simulation approach helps the decision-makers better understand the probable cost consequences of different construction methods and to make more informed decisions based on the project characteristics.
Chapter three experimented with the computer vision-based skeletal pose estimation model and recurrent neural network to recognize human activities. The activity recognition algorithm was employed to help interpret the construction activities into productivity information for automated labor productivity tracking.
Chapter four implemented computer vision-based object detection and object tracking algorithms to automatically track the construction productivity data. The productivity data collected was used to inform the probabilistic cost estimates. The Monte Carlo Simulation was adopted to explore potential cost outcomes and sensitive cost factors in the overall construction project. The study demonstrated how the computer vision techniques and probabilistic cost simulation optimize the reliability of the cost estimates to support construction decision-making.
Advisor: Philip Baruth
- …