Accurate 3D tracking of hand and fingers movements poses significant
challenges in computer vision. The potential applications span across multiple
domains, including human-computer interaction, virtual reality, industry, and
medicine. While gesture recognition has achieved remarkable accuracy,
quantifying fine movements remains a hurdle, particularly in clinical
applications where the assessment of hand dysfunctions and rehabilitation
training outcomes necessitate precise measurements. Several novel and
lightweight frameworks based on Deep Learning have emerged to address this
issue; however, their performance in accurately and reliably measuring fingers
movements requires validation against well-established gold standard systems.
In this paper, the aim is to validate the handtracking framework implemented by
Google MediaPipe Hand (GMH) and an innovative enhanced version, GMH-D, that
exploits the depth estimation of an RGB-Depth camera to achieve more accurate
tracking of 3D movements. Three dynamic exercises commonly administered by
clinicians to assess hand dysfunctions, namely Hand Opening-Closing, Single
Finger Tapping and Multiple Finger Tapping are considered. Results demonstrate
high temporal and spectral consistency of both frameworks with the gold
standard. However, the enhanced GMH-D framework exhibits superior accuracy in
spatial measurements compared to the baseline GMH, for both slow and fast
movements. Overall, our study contributes to the advancement of hand tracking
technology, the establishment of a validation procedure as a good-practice to
prove efficacy of deep-learning-based hand-tracking, and proves the
effectiveness of GMH-D as a reliable framework for assessing 3D hand movements
in clinical applications