Search CORE

878 research outputs found

From Unimodal to Multimodal: improving the sEMG-Based Pattern Recognition via deep generative models

Author: Ren Linyan
Wei Wentao
Publication venue
Publication date: 08/08/2023
Field of study

Multimodal hand gesture recognition (HGR) systems can achieve higher recognition accuracy. However, acquiring multimodal gesture recognition data typically requires users to wear additional sensors, thereby increasing hardware costs. This paper proposes a novel generative approach to improve Surface Electromyography (sEMG)-based HGR accuracy via virtual Inertial Measurement Unit (IMU) signals. Specifically, we trained a deep generative model based on the intrinsic correlation between forearm sEMG signals and forearm IMU signals to generate virtual forearm IMU signals from the input forearm sEMG signals at first. Subsequently, the sEMG signals and virtual IMU signals were fed into a multimodal Convolutional Neural Network (CNN) model for gesture recognition. To evaluate the performance of the proposed approach, we conducted experiments on 6 databases, including 5 publicly available databases and our collected database comprising 28 subjects performing 38 gestures, containing both sEMG and IMU data. The results show that our proposed approach outperforms the sEMG-based unimodal HGR method (with increases of 2.15%-13.10%). It demonstrates that incorporating virtual IMU signals, generated by deep generative models, can significantly enhance the accuracy of sEMG-based HGR. The proposed approach represents a successful attempt to transition from unimodal HGR to multimodal HGR without additional sensor hardware

arXiv.org e-Print Archive

Sensing and adaption for long-term hand rehabilitation

Author: Boyd Peter Ralph
Publication venue
Publication date: 01/01/2020
Field of study

Portsmouth University Research Portal (Pure)

MultiIoT: Towards Large-scale Multisensory Learning for the Internet of Things

Author: Liang Paul Pu
Mo Shentong
Morency Louis-Philippe
Salakhutdinov Russ
Publication venue
Publication date: 10/11/2023
Field of study

The Internet of Things (IoT), the network integrating billions of smart physical devices embedded with sensors, software, and communication technologies for the purpose of connecting and exchanging data with other devices and systems, is a critical and rapidly expanding component of our modern world. The IoT ecosystem provides a rich source of real-world modalities such as motion, thermal, geolocation, imaging, depth, sensors, video, and audio for prediction tasks involving the pose, gaze, activities, and gestures of humans as well as the touch, contact, pose, 3D of physical objects. Machine learning presents a rich opportunity to automatically process IoT data at scale, enabling efficient inference for impact in understanding human wellbeing, controlling physical devices, and interconnecting smart cities. To develop machine learning technologies for IoT, this paper proposes MultiIoT, the most expansive IoT benchmark to date, encompassing over 1.15 million samples from 12 modalities and 8 tasks. MultiIoT introduces unique challenges involving (1) learning from many sensory modalities, (2) fine-grained interactions across long temporal ranges, and (3) extreme heterogeneity due to unique structure and noise topologies in real-world sensors. We also release a set of strong modeling baselines, spanning modality and task-specific methods to multisensory and multitask models to encourage future research in multisensory representation learning for IoT

arXiv.org e-Print Archive

Deep combination of radar with optical data for gesture recognition: role of attention in fusion architectures

Author: Nguyen H.
Nguyen H.
Towakel P.
Towakel P.
Windridge D.
Windridge D.
Publication venue: IEEE
Publication date: 01/01/2023
Field of study

Multimodal time series classification is an important aspect of human gesture recognition, in which limitations of individual sensors can be overcome by combining data from multiple modalities. In a deep learning pipeline, the attention mechanism further allows for a selective, contextual concentration on relevant features. However, while the standard attention mechanism is an effective tool when working with Natural Language Processing (NLP), it is not ideal when working with temporally- or spatially-sparse multi-modal data. In this paper, we present a novel attention mechanism, Multi-Modal Attention Preconditioning (MMAP). We first demonstrate that MMAP outperforms regular attention for the task of classification of modalities involving temporal and spatial sparsity and secondly investigate the impact of attention in the fusion of radar and optical data for gesture recognition via three specific modalities: dense spatiotemporal optical data, spatially sparse/temporally dense kinematic data, and sparse spatiotemporal radar data. We explore the effect of attention on early, intermediate, and late fusion architectures and compare eight different pipelines in terms of accuracy and their ability to preserve detection accuracy when modalities are missing. Results highlight fundamental differences between late and intermediate attention mechanisms in respect to the fusion of radar and optical data

Middlesex University Research Repository

FAF: A novel multimodal emotion recognition approach integrating face, body and text

Author: Ding Weiping
Fang Zhongyu
Gao Baopeng
He Aoyun
Ma Lei
Yu Qihui
Zhang Tong
Publication venue
Publication date: 20/11/2022
Field of study

Multimodal emotion analysis performed better in emotion recognition depending on more comprehensive emotional clues and multimodal emotion dataset. In this paper, we developed a large multimodal emotion dataset, named "HED" dataset, to facilitate the emotion recognition task, and accordingly propose a multimodal emotion recognition method. To promote recognition accuracy, "Feature After Feature" framework was used to explore crucial emotional information from the aligned face, body and text samples. We employ various benchmarks to evaluate the "HED" dataset and compare the performance with our method. The results show that the five classification accuracy of the proposed multimodal fusion method is about 83.75%, and the performance is improved by 1.83%, 9.38%, and 21.62% respectively compared with that of individual modalities. The complementarity between each channel is effectively used to improve the performance of emotion recognition. We had also established a multimodal online emotion prediction platform, aiming to provide free emotion prediction to more users

arXiv.org e-Print Archive

Analysis of the Efficacy of Real-Time Hand Gesture Detection with Hog and Haar-Like Features Using SVM Classification

Author: Pournima S.
Prabha S. Santhana
Theresa W. Gracy
Thilagavathy D.
Publication venue: Auricle Global Society of Education and Research
Publication date: 31/12/2022
Field of study

The field of hand gesture recognition has recently reached new heights thanks to its widespread use in domains like remote sensing, robotic control, and smart home appliances, among others. Despite this, identifying gestures is difficult because of the intransigent features of the human hand, which make the codes used to decode them illegible and impossible to compare. Differentiating regional patterns is the job of pattern recognition. Pattern recognition is at the heart of sign language. People who are deaf or mute may understand the spoken language of the rest of the world by learning sign language. Any part of the body may be used to create signs in sign language. The suggested system employs a gesture recognition system trained on Indian sign language. The methods of preprocessing, hand segmentation, feature extraction, gesture identification, and classification of hand gestures are discussed in this work as they pertain to hand gesture sign language. A hybrid approach is used to extract the features, which combines the usage of Haar-like features with the application of Histogram of Oriented Gradients (HOG).The SVM classifier is then fed the characteristics it has extracted from the pictures in order to make an accurate classification. A false rejection error rate of 8% is achieved while the accuracy of hand gesture detection is improved by 93.5%

International Journal on Recent and Innovation Trends in Computing and Communication

Robust Latent Representations via Cross-Modal Translation and Alignment

Author: Brutti Alessio
Cavallaro Andrea
Rajan Vandana
Publication venue
Publication date: 01/01/2021
Field of study

Multi-modal learning relates information across observation modalities of the same physical phenomenon to leverage complementary information. Most multi-modal machine learning methods require that all the modalities used for training are also available for testing. This is a limitation when the signals from some modalities are unavailable or are severely degraded by noise. To address this limitation, we aim to improve the testing performance of uni-modal systems using multiple modalities during training only. The proposed multi-modal training framework uses cross-modal translation and correlation-based latent space alignment to improve the representations of the weaker modalities. The translation from the weaker to the stronger modality generates a multi-modal intermediate encoding that is representative of both modalities. This encoding is then correlated with the stronger modality representations in a shared latent space. We validate the proposed approach on the AVEC 2016 dataset for continuous emotion recognition and show the effectiveness of the approach that achieves state-of-the-art (uni-modal) performance for weaker modalities

arXiv.org e-Print Archive

Archivio della ricerca - Fondazione Bruno Kessler