'Institute of Electrical and Electronics Engineers (IEEE)'
Abstract
Author Accepted ManuscriptAccurate decoding of motor intent from biosignals
is essential for controlling upper-limb prostheses.
We proposed a novel high-dimensional multimodal deep
learning framework that fuses surface electromyography
(sEMG) and B-mode ultrasound (US) images to estimate
metacarpophalangeal and proximal interphalangeal joint
angles continuously. The framework employs a shared Encoder–
Decoder–Regression architecture integrating transposed
convolutions, multi-head cross-attention, and long
short-term memory layers to jointly capture spatiotemporal
features from both modalities. In this model, each modality
is processed by its own encoder and decoder, and the
resulting feature maps are fused before being passed to the
regression head. To improve cross-subject generalization
and reduce data requirements for new users, we introduce
a transfer learning strategy with parameter freezing. Experiments
on data from seven subjects show that, compared
with sEMG-only and US-only baselines, the fusion model
reduces test RMSE by 1.873◦ (21.02%) and 0.794◦ (10.15%),
and increases test local correlation by 0.069 (10.02%) and
0.039 (5.48%) (p < 0.05), demonstrating the potential of multimodal
fusion for amputee rehabilitation. Ablation studies
further confirm that the full CNN+LSTM+Attention model
achieves the best performance, reducing test RMSE by
2.022◦ (22.32%) and increasing test local correlation by
0.053 (7.52%) (p < 0.05). Furthermore, fine-tuning the pretrained
model with only 25% of a new subject’s data yields
performance comparable to full retraining, highlighting the
framework’s data efficiency.This work was supported by Startup funding at The University of Alabama (FOAP #13009-214271-200)
Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.