43 research outputs found

    Surface Electromyography and Artificial Intelligence for Human Activity Recognition - A Systematic Review on Methods, Emerging Trends Applications, Challenges, and Future Implementation

    Get PDF
    Human activity recognition (HAR) has become increasingly popular in recent years due to its potential to meet the growing needs of various industries. Electromyography (EMG) is essential in various clinical and biological settings. It is a metric that helps doctors diagnose conditions that affect muscle activation patterns and monitor patients’ progress in rehabilitation, disease diagnosis, motion intention recognition, etc. This review summarizes the various research papers based on HAR with EMG. Over recent years, the integration of Artificial Intelligence (AI) has catalyzed remarkable advancements in the classification of biomedical signals, with a particular focus on EMG data. Firstly, this review meticulously curates a wide array of research papers that have contributed significantly to the evolution of EMG-based activity recognition. By surveying the existing literature, we provide an insightful overview of the key findings and innovations that have propelled this field forward. It explore the various approaches utilized for preprocessing EMG signals, including noise reduction, baseline correction, filtering, and normalization, ensure that the EMG data is suitably prepared for subsequent analysis. In addition, we unravel the multitude of techniques employed to extract meaningful features from raw EMG data, encompassing both time-domain and frequency-domain features. These techniques are fundamental to achieving a comprehensive characterization of muscle activity patterns. Furthermore, we provide an extensive overview of both Machine Learning (ML) and Deep Learning (DL) classification methods, showcasing their respective strengths, limitations, and real-world applications in recognizing diverse human activities from EMG signals. In examining the hardware infrastructure for HAR with EMG, the synergy between hardware and software is underscored as paramount for enabling real-time monitoring. Finally, we also discovered open issues and future research direction that may point to new lines of inquiry for ongoing research toward EMG-based detection.publishedVersio

    Machine learning methods for sign language recognition: a critical review and analysis.

    Get PDF
    Sign language is an essential tool to bridge the communication gap between normal and hearing-impaired people. However, the diversity of over 7000 present-day sign languages with variability in motion position, hand shape, and position of body parts making automatic sign language recognition (ASLR) a complex system. In order to overcome such complexity, researchers are investigating better ways of developing ASLR systems to seek intelligent solutions and have demonstrated remarkable success. This paper aims to analyse the research published on intelligent systems in sign language recognition over the past two decades. A total of 649 publications related to decision support and intelligent systems on sign language recognition (SLR) are extracted from the Scopus database and analysed. The extracted publications are analysed using bibliometric VOSViewer software to (1) obtain the publications temporal and regional distributions, (2) create the cooperation networks between affiliations and authors and identify productive institutions in this context. Moreover, reviews of techniques for vision-based sign language recognition are presented. Various features extraction and classification techniques used in SLR to achieve good results are discussed. The literature review presented in this paper shows the importance of incorporating intelligent solutions into the sign language recognition systems and reveals that perfect intelligent systems for sign language recognition are still an open problem. Overall, it is expected that this study will facilitate knowledge accumulation and creation of intelligent-based SLR and provide readers, researchers, and practitioners a roadmap to guide future direction

    Robust visual speech recognition using optical flow analysis and rotation invariant features

    Get PDF
    The focus of this thesis is to develop computer vision algorithms for visual speech recognition system to identify the visemes. The majority of existing speech recognition systems is based on audio-visual signals and has been developed for speech enhancement and is prone to acoustic noise. Considering this problem, aim of this research is to investigate and develop a visual only speech recognition system which should be suitable for noisy environments. Potential applications of such a system include the lip-reading mobile phones, human computer interface (HCI) for mobility-impaired users, robotics, surveillance, improvement of speech based computer control in a noisy environment and for the rehabilitation of the persons who have undergone a laryngectomy surgery. In the literature, there are several models and algorithms available for visual feature extraction. These features are extracted from static mouth images and characterized as appearance and shape based features. However, these methods rarely incorporate the time dependent information of mouth dynamics. This dissertation presents two optical flow based approaches of visual feature extraction, which capture the mouth motions in an image sequence. The motivation for using motion features is, because the human perception of lip-reading is concerned with the temporal dynamics of mouth motion. The first approach is based on extraction of features from the optical flow vertical component. The optical flow vertical component is decomposed into multiple non-overlapping fixed scale blocks and statistical features of each block are computed for successive video frames of an utterance. To overcome the issue of large variation in speed of speech, each utterance is normalized using simple linear interpolation method. In the second approach, four directional motion templates based on optical flow are developed, each representing the consolidated motion information in an utterance in four directions (i.e.,up, down, left and right). This approach is an evolution of a view based approach known as motion history image (MHI). One of the main issues with the MHI method is its motion overwriting problem because of self-occlusion. DMHIs seem to solve this issue of overwriting. Two types of image descriptors, Zernike moments and Hu moments are used to represent each image of DMHIs. A support vector machine (SVM) classifier was used to classify the features obtained from the optical flow vertical component, Zernike and Hu moments separately. For identification of visemes, a multiclass SVM approach was employed. A video speech corpus of seven subjects was used for evaluating the efficiency of the proposed methods for lip-reading. The experimental results demonstrate the promising performance of the optical flow based mouth movement representations. Performance comparison between DMHI and MHI based on Zernike moments, shows that the DMHI technique outperforms the MHI technique. A video based adhoc temporal segmentation method is proposed in the thesis for isolated utterances. It has been used to detect the start and the end frame of an utterance from an image sequence. The technique is based on a pair-wise pixel comparison method. The efficiency of the proposed technique was tested on the available data set with short pauses between each utterance

    Signal processing and analytics of multimodal biosignals

    Get PDF
    Ph. D. ThesisBiosignals have been extensively studied by researchers for applications in diagnosis, therapy, and monitoring. As these signals are complex, they have to be crafted as features for machine learning to work. This begs the question of how to extract features that are relevant and yet invariant to uncontrolled extraneous factors. In the last decade or so, deep learning has been used to extract features from the raw signals automatically. Furthermore, with the proliferation of sensors, more raw signals are now available, making it possible to use multi-view learning to improve on the predictive performance of deep learning. The purpose of this work is to develop an effective deep learning model of the biosignals and make use of the multi-view information in the sequential data. This thesis describes two proposed methods, namely: (1) The use of a deep temporal convolution network to provide the temporal context of the signals to the deeper layers of a deep belief net. (2) The use of multi-view spectral embedding to blend the complementary data in an ensemble. This work uses several annotated biosignal data sets that are available in the open domain. They are non-stationary, noisy and non-linear signals. Using these signals in their raw form without feature engineering will yield poor results with the traditional machine learning techniques. By passing abstractions that are more useful through the deep belief net and blending the complementary data in an ensemble, there will be improvement in performance in terms of accuracy and variance, as shown by the results of 10-fold validations.Nanyang Polytechni

    Ensemble approach on enhanced compressed noise EEG data signal in wireless body area sensor network

    Get PDF
    The Wireless Body Area Sensor Network (WBASN) is used for communication among sensor nodes operating on or inside the human body in order to monitor vital body parameters and movements. One of the important applications of WBASN is patients’ healthcare monitoring of chronic diseases such as epileptic seizure. Normally, epileptic seizure data of the electroencephalograph (EEG) is captured and compressed in order to reduce its transmission time. However, at the same time, this contaminates the overall data and lowers classification accuracy. The current work also did not take into consideration that large size of collected EEG data. Consequently, EEG data is a bandwidth intensive. Hence, the main goal of this work is to design a unified compression and classification framework for delivery of EEG data in order to address its large size issue. EEG data is compressed in order to reduce its transmission time. However, at the same time, noise at the receiver side contaminates the overall data and lowers classification accuracy. Another goal is to reconstruct the compressed data and then recognize it. Therefore, a Noise Signal Combination (NSC) technique is proposed for the compression of the transmitted EEG data and enhancement of its classification accuracy at the receiving side in the presence of noise and incomplete data. The proposed framework combines compressive sensing and discrete cosine transform (DCT) in order to reduce the size of transmission data. Moreover, Gaussian noise model of the transmission channel is practically implemented to the framework. At the receiving side, the proposed NSC is designed based on weighted voting using four classification techniques. The accuracy of these techniques namely Artificial Neural Network, Naïve Bayes, k-Nearest Neighbour, and Support Victor Machine classifiers is fed to the proposed NSC. The experimental results showed that the proposed technique exceeds the conventional techniques by achieving the highest accuracy for noiseless and noisy data. Furthermore, the framework performs a significant role in reducing the size of data and classifying both noisy and noiseless data. The key contributions are the unified framework and proposed NSC, which improved accuracy of the noiseless and noisy EGG large data. The results have demonstrated the effectiveness of the proposed framework and provided several credible benefits including simplicity, and accuracy enhancement. Finally, the research improves clinical information about patients who not only suffer from epilepsy, but also neurological disorders, mental or physiological problems

    Adaptive threshold optimisation for colour-based lip segmentation in automatic lip-reading systems

    Get PDF
    A thesis submitted to the Faculty of Engineering and the Built Environment, University of the Witwatersrand, Johannesburg, in ful lment of the requirements for the degree of Doctor of Philosophy. Johannesburg, September 2016Having survived the ordeal of a laryngectomy, the patient must come to terms with the resulting loss of speech. With recent advances in portable computing power, automatic lip-reading (ALR) may become a viable approach to voice restoration. This thesis addresses the image processing aspect of ALR, and focuses three contributions to colour-based lip segmentation. The rst contribution concerns the colour transform to enhance the contrast between the lips and skin. This thesis presents the most comprehensive study to date by measuring the overlap between lip and skin histograms for 33 di erent colour transforms. The hue component of HSV obtains the lowest overlap of 6:15%, and results show that selecting the correct transform can increase the segmentation accuracy by up to three times. The second contribution is the development of a new lip segmentation algorithm that utilises the best colour transforms from the comparative study. The algorithm is tested on 895 images and achieves percentage overlap (OL) of 92:23% and segmentation error (SE) of 7:39 %. The third contribution focuses on the impact of the histogram threshold on the segmentation accuracy, and introduces a novel technique called Adaptive Threshold Optimisation (ATO) to select a better threshold value. The rst stage of ATO incorporates -SVR to train the lip shape model. ATO then uses feedback of shape information to validate and optimise the threshold. After applying ATO, the SE decreases from 7:65% to 6:50%, corresponding to an absolute improvement of 1:15 pp or relative improvement of 15:1%. While this thesis concerns lip segmentation in particular, ATO is a threshold selection technique that can be used in various segmentation applications.MT201
    corecore