7 research outputs found

    A graph-based approach can improve keypoint detection of complex poses: a proof-of-concept on injury occurrences in alpine ski racing

    Get PDF
    For most applications, 2D keypoint detection works well and offers a simple and fast tool to analyse human movements. However, there remain many situations where even the best state-of-the-art algorithms reach their limits and fail to detect human keypoints correctly. Such situations may occur especially when individual body parts are occluded, twisted, or when the whole person is flipped. Especially when analysing injuries in alpine ski racing, such twisted and rotated body positions occur frequently. To improve the detection of keypoints for this application, we developed a novel method that refines keypoint estimates by rotating the input videos. We select the best rotation for every frame with a graph-based global solver. Thereby, we improve keypoint detection of an arbitrary pose estimation algorithm, in particular for 'hard' keypoints. In the current proof-of-concept study, we show that our approach outperforms standard keypoint detection results in all categories and in all metrics, in injury-related out-of-balance and fall situations by a large margin as well as previous methods, in performance and robustness. The Injury Ski II dataset was made publicly available, aiming to facilitate the investigation of sports accidents based on computer vision in the future

    Improved 2D Keypoint Detection in Out-of-Balance and Fall Situations -- combining input rotations and a kinematic model

    Full text link
    Injury analysis may be one of the most beneficial applications of deep learning based human pose estimation. To facilitate further research on this topic, we provide an injury specific 2D dataset for alpine skiing, covering in total 533 images. We further propose a post processing routine, that combines rotational information with a simple kinematic model. We could improve detection results in fall situations by up to 21% regarding the [email protected] metric.Comment: extended abstract, 4 pages, 3 figures, 2 table

    Deep learning-based 2D keypoint detection in alpine ski racing – A performance analysis of state-of-the-art algorithms applied to regular skiing and injury situations

    Get PDF
    Objectives: In this study, we examined the practicability of deep learning-based 2D keypoint detection applied to regular skiing and injury situations (i.e., out-of-balance situations and fall situations) on an alpine ski racing track. Methods: We therefore created a regular skiing- and injury situation-specific dataset (hereinafter called "Injury Ski Dataset"), on which the state-of-the-art keypoint detection algorithms OpenPose, Mask-R-CNN, AlphaPose and DCPose were compared. The performance of each keypoint detector was evaluated by calculating the mean per joint position error (MPJPE) and the percentage of correct keypoints (PCK). Failure cases and common error patterns were further investigated by a visual analysis. Results: We observed the best results for regular skiing, with 81%–92% of all keypoints detected correctly at an MPJPE of 9 (2) to 14 (3) pixels. In injury situations, self-occlusions and rare poses became more likely, similar to occlusions due to snow spray and motion blur. As a result, the performance in out-of-balance situations decreased to 68%–80% (PCK), while in fall situations, only 35%–54% of all keypoints were detected correctly, with mean errors of 26–36 pixels. Among all algorithms, AlphaPose was the most robust and achieved the best results. Conclusions: PCK and MPJPE for regular skiing were in the range of manual annotation errors and can be considered low enough for further biomechanical analysis. For fall situations, keypoint detection should be further improved. Regarding the development of a deep learning tool for injury analysis in alpine skiing in the future, we propose to fine-tune a well-performing keypoint detector, such as AlphaPose, on a ski- and injury-specific dataset, such as ours

    Integration of a skier-specific keypoint detection model in a hybrid 3D motion capture pipeline

    No full text
    Introduction & Purpose Alpine skiing, like many outdoor sports, presents significant challenges for motion capture due to its large capture volumes, high athlete speeds, variable environmental conditions, and occlusions, e.g., due to snow spray. While traditional marker-based motion capture systems offer highest precision in the lab, they are usually unsuitable for outdoor settings. Sensor-based methods, such as inertial measurement units, however, may suffer from inaccuracies due to sensor noise and drift, while they only provide relative segment positions (Fasel et al., 2018). Therefore, recent studies in alpine skiing preferably used video-based systems (Heinrich et al., 2023; Spörri, 2016). These methods rely on multi-camera setups that require synchronization and camera calibration. However, the extensive manual digitization required for both keypoints and reference points introduces a substantial workload in post-processing, particularly when cameras must pan, tilt, and zoom to cover large capture volumes (Spörri, 2016). Recent advancements in computer vision have unveiled great potential for human motion capture, especially in automating much of the manual work required for video-based systems (Fang et al., 2017; Redmon et al., 2016; Zwölfer et al., 2023a). We therefore developed a novel, hybrid 3D motion capture approach that automates the detection of reference points using a reference point detection algorithm and the digitization of keypoints using a skier-specific keypoint detection algorithm. This reduces the reliance on manual digitization and enhances the scalability and practicality of large-scale motion capture in outdoor environments. The aim of this study was to a) evaluate the performance of the skier-specific keypoint detection against manual digitization and b) determine the impact of the skier-specific keypoint detection on the overall performance of the hybrid 3D motion capture pipeline. Methods The experimental setup involved a multi-camera system comprising eight Sony AX53 cameras uniformly distributed around a capture volume on a ski slope in Zürs am Arlberg, AT. The capture volume measured approximately 250 x 80 x 20 meters. To calibrate the cameras, approximately 300 reference points were placed within this area and surveyed geodetically. Each reference point was equipped with a 9 x 9 cm cube that displayed Aruco markers on all sides, enabling their automated detection by our reference point detection algorithm. This arrangement allowed for continuous calibration of each camera in every frame, accommodating for panning, tilting, and zooming. Camera calibration and 3D reconstruction were performed using the Direct Linear Transformation (DLT) method (Abdel-Aziz & Karara, 1971). In total, ten state-certified Austrian ski instructors performed eight runs according to the progression levels of the Austrian ski curriculum (Österreichischer Skischulverband, 2018). To develop a keypoint detection model, capable of detecting a skier, including equipment, e.g., skis and poles, we finetuned AlphaPose’s HALPE26 model (Fang et al., 2017), which was designed to estimate 26 body keypoints for general motions, on a skier-specific dataset. Training was done for 200 epochs with a learning rate of 10-3 and data augmentation enabled. AlphaPose was chosen for its proven performance in alpine skiing scenarios (Zwölfer et al., 2023a). For the skier-specific dataset, we manually digitized six runs, marking 24 keypoint, including 18 body keypoints, ski tips, ski tails, and poles, in each image. The six runs were selected to include slow, medium and high-speed skiing of one male and one female subject. These digitized images complement the datasets built by Bachmann et al. (2019) and later by Zwölfer et al. (2023b) and Heinrich et al. (2023), all using the same set of keypoints. In total, our comprehensive skier-specific dataset comprised about 15,000 images, with approximately two-thirds of all images originating from the current measurement. The accuracy of our model was evaluated in 2D image space by calculating the mean per joint position error (MPJPE), percentage of correct keypoints (PCK), and mean average precision (mAP) metrics on a test set of about 2,000 images that were excluded from training. In addition, we determined the impact of the keypoint detection algorithm on the hybrid motion capture pipeline. We, therefore, processed one of the manually digitized runs by the skier-specific model as well as the HALPE26 model. Using the calibration matrices, we reconstructed the 3D motion of the skier for each method and calculated the mean lengths of eight body segments (upper arms, forearms, thighs, and shanks). We compared the measured physical lengths and the mean segment lengths reconstructed from manually digitized keypoints, keypoints processed by the HALPE26 model, and our skier-specific model. We also quantified the variation of segment lengths by calculating their mean standard deviation across all 250 frames of the run. Results on segment lengths (mean values and variations) were calculated without any smoothing or filtering. Results Our skier-specific keypoint detection model achieved a PCK of 98%, a mAP of 0.97, and a MPJPE of 10.32 pixels on the test set. Visual assessment of the detected keypoints supported these quantitative results, showing only a few flawed detections. Most inaccuracies involved ski tails or poles and were primarily due to occlusions. Representative images showcasing the model's performance are displayed in Figure 1 (left). The plausibility of the skier-specific model for 3D reconstruction is demonstrated in the 3D visualization of a sample run shown in Figure 1 (right). Differences between measured and reconstructed segment lengths (mean across all frames) ranged from a minimum of 0.2 cm to a maximum of 2.3 cm, with only small differences observed among the different keypoint detection methods investigated. The evaluation of the variations in segment lengths revealed a mean standard deviation of 4.6 cm for manually annotated frames, compared to 4.5 cm for frames processed by the HALPE26 model. The mean deviation for frames processed by our skier-specific model was reduced to 3.4 cm. Discussion Our results demonstrated that the accuracy and precision of our model are at least on par with manual digitization, as evidenced both visually and through quantitative evaluation. On the 2D de tection level, the PCK, MPJPE, and mAP metrics reflected the model’s high performance, aligning with previous studies (Bachmann et al., 2019, Zwölfer et al., 2023a), who reported comparable MPJPE values when applying keypoint detection to regular skiing scenarios. On the 3D reconstruction level, differences between measured and reconstructed segment lengths for all three keypoint detection methods were within the range of typical measurement errors of about 2 cm. However, on this single run, our new model outperformed manual digitization and plain AlphaPose detections in terms of segment length variation. This was especially surprising, as our 3D reconstruction did not enforce temporal smoothness or kinematic constraints such as segment length consistency. This suggests that our model may benefit synergistically from both the pretrained data and manual annotations, possibly averaging out the low precision in manual digitization. While these variations in segment lengths may appear large, no smoothing or filtering was applied. For biomechanical analysis, results can be significantly improved by smoothing the data, e.g., using splines. By integrating our skier-specific keypoint detection model to our hybrid motion capture pipeline, we reduced the manual work required for digitizing all frames in all perspectives from dozens of hours for a single run to just a few minutes of computing time. It is essential to mention that this evaluation was limited to a single run. Moreover, the accuracy of the 3D data heavily relies on accurate camera calibration and temporal synchronization. Therefore, ablation studies to evaluate the influence of the reference point detection algorithm and the temporal synchronization method will be realized in a future study. Conclusion We implemented a skier-specific keypoint detection model capable of detecting a skier, including skis and poles, which showed good performance in both 2D image space and 3D reconstruction. By eliminating the manual digitization workload, the hybrid 3D motion capture pipeline facilitates large-scale motion capture in similar outdoor settings and enhances the scalability of biomechanical research in outdoor sports like alpine skiing. Additionally, this method allows us to automatically digitize the remaining runs recorded during this field study, resulting in an extensive 3D dataset crucial for the future development of fully computer vision-based motion capture methods. References Abdel-Aziz, Y. I., & Karara, H. M. (1971). Direct linear transformation from comparator coordinates into object space coordinates in close-range photogrammetry. Photogrammetric Engineering and Remote Sensing, 38(1), 49-55. Bachmann, R., Spörri, J., Fua, P., & Rhodin, H. (2019). Motion capture from pan-tilt cameras with unknown orientation. arXiv, 1908.11676. https://doi.org/10.48550/arXiv.1908.11676 Fang, H. S., Xie, S., Tai, Y. W., & Lu, C. (2017). RMPE: Regional multi-person pose estimation. 2017 IEEE International Conference on Computer Vision (ICCV), 2353-2362. https://doi.ieeecomputersociety.org/10.1109/ICCV.2017.256 Fasel, B., Spörri, J., Chardonnens, J., Kröll, J., Müller, E., & Aminian, K. (2018). Joint inertial sensor orientation drift reduction for highly dynamic movements. IEEE Journal of Biomedical and Health Informatics, 22(1), 77-86. https://doi.org/10.1109/JBHI.2017.2659758 Heinrich, D., van den Bogert, A., Mössner, M., & Nachbauer, W. (2023). Model-based estimation of muscle and ACL forces during turning maneuvers in alpine skiing. Scientific Reports, 13, Article 9026. https://doi.org/10.1038/s41598-023-35775-4 Österreichischer Skischulverband. (2018). Snowsport Austria - Die Österreichische Skischule - Vom Einstieg zur Perfektion. In vier Stufen zum Erfolg (2nd ed.) [Snowsport Austria - The Austrian Ski School - From entry to perfection. Four steps to success]. Brüder Hollinek. Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 779-788. https://doi.ieeecomputersociety.org/10.1109/CVPR.2016.91 Spörri, J. (2016). Research dedicated to sports injury prevention – the ‘sequence of prevention’ on the example of alpine ski racing. University of Salzburg, Austria. http://dx.doi.org/10.13140/RG.2.2.28451.89126 Zwölfer, M., Heinrich, D., Wandt, B., Rhodin, H., Spörri, J., & Nachbauer, W. (2023a). Deep learning-based 2D keypoint detection in alpine skiing – A performance analysis of state-of-the-art algorithms applied to regular skiing and injury situations. JSAMS Plus, 2, Article 100034. https://doi.org/10.1016/j.jsampl.2023.100034 Zwölfer, M., Heinrich, D., Wandt, B., Rhodin, H., Spörri, J., & Nachbauer, W. (2023b). A graph-based approach can improve keypoint detection of complex poses: a proof-of-concept on injury occurrences in alpine ski racing. Scientific Reports, 13, Article 21465. https://doi.org/10.1038/s41598-023-47875-

    Deep learning-based 2D keypoint detection in alpine ski racing – A performance analysis of state-of-the-art algorithms applied to regular skiing and injury situations

    No full text
    Objectives: In this study, we examined the practicability of deep learning-based 2D keypoint detection applied to regular skiing and injury situations (i.e., out-of-balance situations and fall situations) on an alpine ski racing track. Methods: We therefore created a regular skiing- and injury situation-specific dataset (hereinafter called ''Injury Ski Dataset''), on which the state-of-the-art keypoint detection algorithms OpenPose, Mask-R-CNN, AlphaPose and DCPose were compared. The performance of each keypoint detector was evaluated by calculating the mean per joint position error (MPJPE) and the percentage of correct keypoints (PCK). Failure cases and common error patterns were further investigated by a visual analysis. Results: We observed the best results for regular skiing, with 81%–92% of all keypoints detected correctly at an MPJPE of 9 (2) to 14 (3) pixels. In injury situations, self-occlusions and rare poses became more likely, similar to occlusions due to snow spray and motion blur. As a result, the performance in out-of-balance situations decreased to 68%–80% (PCK), while in fall situations, only 35%–54% of all keypoints were detected correctly, with mean errors of 26–36 pixels. Among all algorithms, AlphaPose was the most robust and achieved the best results. Conclusions: PCK and MPJPE for regular skiing were in the range of manual annotation errors and can be considered low enough for further biomechanical analysis. For fall situations, keypoint detection should be further improved. Regarding the development of a deep learning tool for injury analysis in alpine skiing in the future, we propose to fine-tune a well-performing keypoint detector, such as AlphaPose, on a ski- and injury-specific dataset, such as ours
    corecore