38 research outputs found
A Quadruple Diffusion Convolutional Recurrent Network for Human Motion Prediction
Recurrent neural network (RNN) has become popular for human motion prediction thanks to its ability to capture temporal dependencies. However, it has limited capacity in modeling the complex spatial relationship in the human skeletal structure. In this work, we present a novel diffusion convolutional recurrent predictor for spatial and temporal movement forecasting, with multi-step random walks traversing bidirectionally along an adaptive graph to model interdependency among body joints. In the temporal domain, existing methods rely on a single forward predictor with the produced motion deflecting to the drift route, which leads to error accumulations over time. We propose to supplement the forward predictor with a forward discriminator to alleviate such motion drift in the long term under adversarial training. The solution is further enhanced by a backward predictor and a backward discriminator to effectively reduce the error, such that the system can also look into the past to improve the prediction at early frames. The two-way spatial diffusion convolutions and two-way temporal predictors together form a quadruple network. Furthermore, we train our framework by modeling the velocity from observed motion dynamics instead of static poses to predict future movements that effectively reduces the discontinuity problem at early prediction. Our method outperforms the state of the arts on both 3D and 2D datasets, including the Human3.6M, CMU Motion Capture and Penn Action datasets. The results also show that our method correctly predicts both high-dynamic and low-dynamic moving trends with less motion drift
Alignbodynet: deep learning-based alignment of non-overlapping partial body point clouds from a single depth camera
This paper proposes a novel deep learning framework to generate omnidirectional 3D point clouds of human bodies by registering the front- and back-facing partial scans captured by a single depth camera. Our approach does not require calibration-assisting devices, canonical postures, nor does it make assumptions concerning an initial alignment or correspondences between the partial scans. This is achieved by factoring this challenging problem into ( i ) building virtual correspondences for partial scans, and ( ii ) implicitly predicting the rigid transformation between the two partial scans via the predicted virtual correspondences. In this study, we regress the SMPL vertices from the two partial scans for building the virtual correspondences. The main challenges are ( i ) estimating the body shape and pose under clothing from single partial dressed body point clouds, and ( ii ) the predicted bodies from front- and back-facing inputs required to be the same. We, thus, propose a novel deep neural network dubbed AlignBodyNet that introduces shape-interrelated features and a shape-constraint loss for resolving this problem.We also provide a simple yet efficient method for generating real-world partial scans from complete models, which fills the gap in the lack of quantitative comparisons based on the real-world data for various studies including partial registration, shape completion, and view synthesis. Experiments based on synthetic and real-world data show that our method achieves state-of-the-art performance in both objective and subjective terms
A Two-stream Convolutional Network for Musculoskeletal and Neurological Disorders Prediction
Musculoskeletal and neurological disorders are the most common causes of walking problems among older people, and they often lead to diminished quality of life. Analyzing walking motion data manually requires trained professionals and the evaluations may not always be objective. To facilitate early diagnosis, recent deep learning-based methods have shown promising results for automated analysis, which can discover patterns that have not been found in traditional machine learning methods. We observe that existing work mostly applies deep learning on individual joint features such as the time series of joint positions. Due to the challenge of discovering inter-joint features such as the distance between feet (i.e. the stride width) from generally smaller-scale medical datasets, these methods usually perform sub-optimally. As a result, we propose a solution that explicitly takes both individual joint features and inter-joint features as input, relieving the system from the need of discovering more complicated features from small data. Due to the distinctive nature of the two types of features, we introduce a two-stream framework, with one stream learning from the time series of joint position and the other from the time series of relative joint displacement. We further develop a mid-layer fusion module to combine the discovered patterns in these two streams for diagnosis, which results in a complementary representation of the data for better prediction performance. We validate our system with a benchmark dataset of 3D skeleton motion that involves 45 patients with musculoskeletal and neurological disorders, and achieve a prediction accuracy of 95.56%, outperforming state-of-the-art methods
CP-AGCN: Pytorch-based attention informed graph convolutional network for identifying infants at risk of cerebral palsy
Early prediction is clinically considered one of the essential parts of cerebral palsy (CP) treatment. We propose to implement a low-cost and interpretable classification system for supporting CP prediction based on General Movement Assessment (GMA). We design a Pytorch-based attention-informed graph convolutional network to early identify infants at risk of CP from skeletal data extracted from RGB videos. We also design a frequency-binning module for learning the CP movements in the frequency domain while filtering noise. Our system only requires consumer-grade RGB videos for training to support interactive-time CP prediction by providing an interpretable CP classification result
A new method to evaluate the dynamic air gap thickness and garment sliding of virtual clothes during walking
With the development of e-shopping, there is a significant growth in clothing purchases online. However, the virtual clothing fit evaluation is still under-researched. In the literature, the thickness of the air layer between the human body and clothes is a dominant geometric indicator to evaluate the clothing fit. However, such an approach has only been applied to the stationary positions of the manikin/human body. Physical indicators such as the pressure/tension of a virtual garment fitted on the virtual body in a continuous motion are also proposed for clothing fit evaluation. Both geometric and physical evaluations do not consider the interaction of the garment with body e.g. sliding of the garment along the human body. In this study, a new framework is proposed to automatically determine the dynamic air gap thickness. First, the dynamic dressed character sequence is simulated in a 3D clothing software via importing the body parameters, cloth parameters and a walking motion. Second, a cost function is defined to convert the garment in the previous frame to the local coordinate of the next frame. The dynamic air gap thickness between clothes and the human body is determined. Third, a new metric called 3D garment vector field (3DGVF) is proposed to represent the movement flow of the dynamic virtual garment, whose directional changes are calculated by cosine similarity. Experimental results show that our method is more sensitive to the small air gap thickness changes compared with start-of-the-arts, allowing it to more effectively evaluate clothing fit in a virtual environment
A Sampling Approach to Generating Closely Interacting 3D Pose-pairs from 2D Annotations
We introduce a data-driven method to generate a large number of plausible, closely interacting 3D human pose-pairs, for a given motion category, e.g., wrestling or salsa dance. With much difficulty in acquiring close interactions using 3D sensors, our approach utilizes abundant existing video data which cover many human activities. Instead of treating the data generation problem as one of reconstruction, either through 3D acquisition or direct 2D-to-3D data lifting from video annotations, we present a solution based on Markov Chain Monte Carlo (MCMC) sampling. With a focus on efficient sampling over the space of close interactions, rather than pose spaces, we develop a novel representation called interaction coordinates (IC) to encode both poses and their interactions in an integrated manner. Plausibility of a 3D pose-pair is then defined based on the ICs and with respect to the annotated 2D pose-pairs from video. We show that our sampling-based approach is able to efficiently synthesize a large volume of plausible, closely interacting 3D pose-pairs which provide a good coverage of the input 2D pose-pairs
A review and benchmark on state-of-the-art steel defects detection
Steel, a critical material in construction, automobile, and railroad manufacturing industries, often presents defects that can lead to equipment failure, significant safety risks, and costly downtime. This research aims to evaluate the performance of state-of-the-art object detection models in detecting defects on steel surfaces, a critical task in industries such as railroad and automobile manufacturing. The study addresses the challenges of limited defect data and lengthy model training times. Five existing state-of-the-art object detection models (faster R-CNN, deformable DETR, double head R-CNN, Retinanet, and deformable convolutional network) were benchmarked on the Northeastern University (NEU) steel dataset. The selection of models covers a broad spectrum of methodologies, including two-stage detectors, single-stage detectors, transformers, and a model incorporating deformable convolutions. The deformable convolutional network achieved the highest accuracy of 77.28% on the NEU dataset following a fivefold cross-validation method. Other models also demonstrated notable performance, with accuracies within the 70–75% range. Certain models exhibited particular strengths in detecting specific defects, indicating potential areas for future research and model improvement. The findings provide a comprehensive foundation for future research in steel defect detection and have significant implications for practical applications. The research could improve quality control processes in the steel industry by automating the defect detection task, leading to safer and more reliable steel products and protecting workers by removing the human factor from hazardous environments
A Pose-based Feature Fusion and Classification Framework for the Early Prediction of Cerebral Palsy in Infants
The early diagnosis of cerebral palsy is an area which has recently seen significant multi-disciplinary research. Diagnostic tools such as the General Movements Assessment (GMA), have produced some very promising results. However, the prospect of automating these processes may improve accessibility of the assessment and also enhance the understanding of movement development of infants. Previous works have established the viability of using pose-based features extracted from RGB video sequences to undertake classification of infant body movements based upon the GMA. In this paper, we propose a series of new and improved features, and a feature fusion pipeline for this classification task. We also introduce the RVI-38 dataset, a series of videos captured as part of routine clinical care. By utilising this challenging dataset we establish the robustness of several motion features for classification, subsequently informing the design of our proposed feature fusion framework based upon the GMA. We evaluate our proposed framework’s classification performance using both the RVI-38 dataset and the publicly available MINI-RGBD dataset. We also implement several other methods from the literature for direct comparison using these two independent datasets. Our experimental results and feature analysis show that our proposed pose-based method performs well across both datasets. The proposed features afford us the opportunity to include finer detail than previous methods, and further model GMA specific body movements. These new features also allow us to take advantage of additional body-part specific information as a means of improving the overall classification performance, whilst retaining GMA relevant, interpretable, and shareable features
Posture-based and Action-based Graphs for Boxing Skill Visualization
Automatic evaluation of sports skills has been an active research area. However, most of the existing research focuses on low-level features such as movement speed and strength. In this work, we propose a framework for automatic motion analysis and visualization, which allows us to evaluate high-level skills such as the richness of actions, the flexibility of transitions and the unpredictability of action patterns. The core of our framework is the construction and visualization of the posture-based graph that focuses on the standard postures for launching and ending actions, as well as the action-based graph that focuses on the preference of actions and their transition probability. We further propose two numerical indices, the Connectivity Index and the Action Strategy Index, to assess skill level according to the graph. We demonstrate our framework with motions captured from different boxers. Experimental results demonstrate that our system can effectively visualize the strengths and weaknesses of the boxers