94 research outputs found

    Generating intelligible audio speech from visual speech

    Get PDF
    This work is concerned with generating intelligible audio speech from a video of a person talking. Regression and classification methods are proposed first to estimate static spectral envelope features from active appearance model (AAM) visual features. Two further methods are then developed to incorporate temporal information into the prediction - a feature-level method using multiple frames and a model-level method based on recurrent neural networks. Speech excitation information is not available from the visual signal, so methods to artificially generate aperiodicity and fundamental frequency are developed. These are combined within the STRAIGHT vocoder to produce a speech signal. The various systems are optimised through objective tests before applying subjective intelligibility tests that determine a word accuracy of 85% from a set of human listeners on the GRID audio-visual speech database. This compares favourably with a previous regression-based system that serves as a baseline which achieved a word accuracy of 33%

    EMG-to-Speech: Direct Generation of Speech from Facial Electromyographic Signals

    Get PDF
    The general objective of this work is the design, implementation, improvement and evaluation of a system that uses surface electromyographic (EMG) signals and directly synthesizes an audible speech output: EMG-to-speech

    Automatic heart rate detection from FBG sensors using sensor fusion and enhanced empirical mode decomposition

    No full text
    International audienceCardiovascular diseases are the world's top leading causes of death. Real time monitoring of patients who have cardiovascular abnormalities can provide comprehensive and preventative health care. We investigate the role of the complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) and sensor fusion for automatic heart rate detection from a mat with embedded Fiber Bragg Grating (FBG) sensor arrays. The fusion process is performed in the time domain by averaging the readings of the sensors for each sensor array. Subsequently, the CEEMDAN is applied to obtain the interbeat intervals. Experiments are performed with 10 human subjects (males and females) lying on two different positions on a bed for a period of 20 minutes. The overall system performance is assessed against the reference ECG signals. The average and standard deviation of the mean relative absolute error are 0.049, 0.019 and 0.047, 0.038 for fused and best sensors respectively. Sensor fusion together with CEEMDAN proved to be robust against motion artifacts caused by body movements

    Speech Recognition using Surface Electromyography

    Get PDF

    Machine learning and inferencing for the decomposition of speech mixtures

    Get PDF
    In this dissertation, we present and evaluate a novel approach for incorporating machine learning and inferencing into the time-frequency decomposition of speech signals in the context of speaker-independent multi-speaker pitch tracking. The pitch tracking performance of the resulting algorithm is comparable to that of a state-of-the-art machine-learning algorithm for multi-pitch tracking while being significantly more computationally efficient and requiring much less training data. Multi-pitch tracking is a time-frequency signal processing problem in which mutual interferences of the harmonics from different speakers make it challenging to design an algorithm to reliably estimate the fundamental frequency trajectories of the individual speakers. The current state-of-the-art in speaker-independent multi-pitch tracking utilizes 1) a deep neural network for producing spectrograms of individual speakers and 2) another deep neural network that acts upon the individual spectrograms and the original audio’s spectrogram to produce estimates of the pitch tracks of the individual speakers. However, the implementation of this Multi-Spectrogram Machine- Learning (MS-ML) algorithm could be computationally intensive and make it impractical for hardware platforms such as embedded devices where the computational power is limited. Instead of utilizing deep neural networks to estimate the pitch values directly, we have derived and evaluated a fault recognition and diagnosis (FRD) framework that utilizes machine learning and inferencing techniques to recognize potential faults in the pitch tracks produced by a traditional multi-pitch tracking algorithm. The result of this fault-recognition phase is then used to trigger a fault-diagnosis phase aimed at resolving the recognized fault(s) through adaptive adjustment of the time-frequency analysis of the input signal. The pitch estimates produced by the resulting FRD-ML algorithm are found to be comparable in accuracy to those produced via the MS-ML algorithm. However, our evaluation of the FRD-ML algorithm shows it to have significant advantages over the MS-ML algorithm. Specifically, the number of multiplications per second in FRD-ML is found to be two orders of magnitude less while the number of additions per second is about the same as in the MS-ML algorithm. Furthermore, the required amount of training data to achieve optimal performance is found to be two orders of magnitude less for the FRD-ML algorithm in comparison to the MS-ML algorithm. The reduction in the number of multiplications per second means it is more feasible to implement the MPT solution on hardware platforms with limited computational power such as embedded devices rather than relying on Graphics Processing Units (GPUs) or cloud computing. The reduction in training data size makes the algorithm more flexible in terms of configuring for different application scenarios such as training for different languages where there may not be a large amount of training data

    Corticomuscular co-activation based hybrid brain-computer interface for motor recovery monitoring

    Get PDF
    The effect of corticomuscular coactivation based hybrid brain-computer interface (h-BCI) on post-stroke neurorehabilitation has not been explored yet. A major challenge in this area is to find an appropriate corticomuscular feature which can not only drive an h-BCI but also serve as a biomarker for motor recovery monitoring. Our previous study established the feasibility of a new method of measuring corticomuscular co-activation called correlation of band-limited power time-courses (CBPT) of EEG and EMG signals, outperforming the traditional EEG-EMG coherence in terms of accurately controlling a robotic hand exoskeleton device by the stroke patients. In this paper, we have evaluated the neurophysiological significance of CBPT for motor recovery monitoring by conducting a 5-week long longitudinal pilot trial on 4 chronic hemiparetic stroke patients. Results show that the CBPT variations correlated significantly (p-value< 0.05) with the dynamic changes in motor outcome measures during the therapy for all the patients. As the bandpower based biomarkers are popular in literature, a comparison with such biomarkers has also been made to cross-verify whether the changes in CBPT are indeed neurophysiological. Thus the study concludes that CBPT can serve as a biomarker for motor recovery monitoring while serving as a corticomuscular co-activation feature for h-BCI based neurorehabilitation. Despite an observed significant positive change between pre- and post-intervention motor outcomes, the question of the clinical effectiveness of CBPT is subject to further controlled trial on a larger cohort

    Data-driven time-frequency analysis of multivariate data

    No full text
    Empirical Mode Decomposition (EMD) is a data-driven method for the decomposition and time-frequency analysis of real world nonstationary signals. Its main advantages over other time-frequency methods are its locality, data-driven nature, multiresolution-based decomposition, higher time-frequency resolution and its ability to capture oscillation of any type (nonharmonic signals). These properties have made EMD a viable tool for real world nonstationary data analysis. Recent advances in sensor and data acquisition technologies have brought to light new classes of signals containing typically several data channels. Currently, such signals are almost invariably processed channel-wise, which is suboptimal. It is, therefore, imperative to design multivariate extensions of the existing nonlinear and nonstationary analysis algorithms as they are expected to give more insight into the dynamics and the interdependence between multiple channels of such signals. To this end, this thesis presents multivariate extensions of the empirical mode de- composition algorithm and illustrates their advantages with regards to multivariate non- stationary data analysis. Some important properties of such extensions are also explored, including their ability to exhibit wavelet-like dyadic filter bank structures for white Gaussian noise (WGN), and their capacity to align similar oscillatory modes from multiple data channels. Owing to the generality of the proposed methods, an improved multi- variate EMD-based algorithm is introduced which solves some inherent problems in the original EMD algorithm. Finally, to demonstrate the potential of the proposed methods, simulations on the fusion of multiple real world signals (wind, images and inertial body motion data) support the analysis
    • …
    corecore