103 research outputs found

    Pre-processing of Speech Signals for Robust Parameter Estimation

    Get PDF

    Data-driven multivariate and multiscale methods for brain computer interface

    Get PDF
    This thesis focuses on the development of data-driven multivariate and multiscale methods for brain computer interface (BCI) systems. The electroencephalogram (EEG), the most convenient means to measure neurophysiological activity due to its noninvasive nature, is mainly considered. The nonlinearity and nonstationarity inherent in EEG and its multichannel recording nature require a new set of data-driven multivariate techniques to estimate more accurately features for enhanced BCI operation. Also, a long term goal is to enable an alternative EEG recording strategy for achieving long-term and portable monitoring. Empirical mode decomposition (EMD) and local mean decomposition (LMD), fully data-driven adaptive tools, are considered to decompose the nonlinear and nonstationary EEG signal into a set of components which are highly localised in time and frequency. It is shown that the complex and multivariate extensions of EMD, which can exploit common oscillatory modes within multivariate (multichannel) data, can be used to accurately estimate and compare the amplitude and phase information among multiple sources, a key for the feature extraction of BCI system. A complex extension of local mean decomposition is also introduced and its operation is illustrated on two channel neuronal spike streams. Common spatial pattern (CSP), a standard feature extraction technique for BCI application, is also extended to complex domain using the augmented complex statistics. Depending on the circularity/noncircularity of a complex signal, one of the complex CSP algorithms can be chosen to produce the best classification performance between two different EEG classes. Using these complex and multivariate algorithms, two cognitive brain studies are investigated for more natural and intuitive design of advanced BCI systems. Firstly, a Yarbus-style auditory selective attention experiment is introduced to measure the user attention to a sound source among a mixture of sound stimuli, which is aimed at improving the usefulness of hearing instruments such as hearing aid. Secondly, emotion experiments elicited by taste and taste recall are examined to determine the pleasure and displeasure of a food for the implementation of affective computing. The separation between two emotional responses is examined using real and complex-valued common spatial pattern methods. Finally, we introduce a novel approach to brain monitoring based on EEG recordings from within the ear canal, embedded on a custom made hearing aid earplug. The new platform promises the possibility of both short- and long-term continuous use for standard brain monitoring and interfacing applications

    Speech Enhancement for Automatic Analysis of Child-Centered Audio Recordings

    Get PDF
    Analysis of child-centred daylong naturalist audio recordings has become a de-facto research protocol in the scientific study of child language development. The researchers are increasingly using these recordings to understand linguistic environment a child encounters in her routine interactions with the world. These audio recordings are captured by a microphone that a child wears throughout a day. The audio recordings, being naturalistic, contain a lot of unwanted sounds from everyday life which degrades the performance of speech analysis tasks. The purpose of this thesis is to investigate the utility of speech enhancement (SE) algorithms in the automatic analysis of such recordings. To this effect, several classical signal processing and modern machine learning-based SE methods were employed 1) as a denoiser for speech corrupted with additive noise sampled from real-life child-centred daylong recordings and 2) as front-end for downstream speech processing tasks of addressee classification (infant vs. adult-directed speech) and automatic syllable count estimation from the speech. The downstream tasks were conducted on data derived from a set of geographically, culturally, and linguistically diverse child-centred daylong audio recordings. The performance of denoising was evaluated through objective quality metrics (spectral distortion and instrumental intelligibility) and through the downstream task performance. Finally, the objective evaluation results were compared with downstream task performance results to find whether objective metrics can be used as a reasonable proxy to select SE front-end for a downstream task. The results obtained show that a recently proposed Long Short-Term Memory (LSTM)-based progressive learning architecture provides maximum performance gains in the downstream tasks in comparison with the other SE methods and baseline results. Classical signal processing-based SE methods also lead to competitive performance. From the comparison of objective assessment and downstream task performance results, no predictive relationship between task-independent objective metrics and performance of downstream tasks was found

    Speech enhancement by perceptual adaptive wavelet de-noising

    Get PDF
    This thesis work summarizes and compares the existing wavelet de-noising methods. Most popular methods of wavelet transform, adaptive thresholding, and musical noise suppression have been analyzed theoretically and evaluated through Matlab simulation. Based on the above work, a new speech enhancement system using adaptive wavelet de-noising is proposed. Each step of the standard wavelet thresholding is improved by optimized adaptive algorithms. The Quantile based adaptive noise estimate and the posteriori SNR based threshold adjuster are compensatory to each other. The combination of them integrates the advantages of these two approaches and balances the effects of noise removal and speech preservation. In order to improve the final perceptual quality, an innovative musical noise analysis and smoothing algorithm and a Teager Energy Operator based silent segment smoothing module are also introduced into the system. The experimental results have demonstrated the capability of the proposed system in both stationary and non-stationary noise environments

    Objective Assessment of Machine Learning Algorithms for Speech Enhancement in Hearing Aids

    Get PDF
    Speech enhancement in assistive hearing devices has been an area of research for many decades. Noise reduction is particularly challenging because of the wide variety of noise sources and the non-stationarity of speech and noise. Digital signal processing (DSP) algorithms deployed in modern hearing aids for noise reduction rely on certain assumptions on the statistical properties of undesired signals. This could be disadvantageous in accurate estimation of different noise types, which subsequently leads to suboptimal noise reduction. In this research, a relatively unexplored technique based on deep learning, i.e. Recurrent Neural Network (RNN), is used to perform noise reduction and dereverberation for assisting hearing-impaired listeners. For noise reduction, the performance of the deep learning model was evaluated objectively and compared with that of open Master Hearing Aid (openMHA), a conventional signal processing based framework, and a Deep Neural Network (DNN) based model. It was found that the RNN model can suppress noise and improve speech understanding better than the conventional hearing aid noise reduction algorithm and the DNN model. The same RNN model was shown to reduce reverberation components with proper training. A real-time implementation of the deep learning model is also discussed

    Current state of digital signal processing in myoelectric interfaces and related applications

    Get PDF
    This review discusses the critical issues and recommended practices from the perspective of myoelectric interfaces. The major benefits and challenges of myoelectric interfaces are evaluated. The article aims to fill gaps left by previous reviews and identify avenues for future research. Recommendations are given, for example, for electrode placement, sampling rate, segmentation, and classifiers. Four groups of applications where myoelectric interfaces have been adopted are identified: assistive technology, rehabilitation technology, input devices, and silent speech interfaces. The state-of-the-art applications in each of these groups are presented.Peer reviewe

    An adaptive autoregressive pre-whitener for speech and acoustic signals based on parametric NMF

    Get PDF
    A common assumption in many speech and acoustic processing methods is that the noise is white and Gaussian (WGN). Although making this assumption results in simple and computationally attractive methods, the assumption is often too simple and crude in many applications. In this paper, we introduce a general purpose and online pre-whitener which can be used as a pre-processor with methods based on the WGN assumption, improving their reliability and performance in applications with colored noise. The pre-whitener is a time-varying filter whose coefficients are found using a parametric non-negative matrix factorization (NMF), based on autoregressive (AR) mixture modeling of both the noise component and the signal component constituting the noisy signal. Compared to other types of pre-whiteners, we show that the proposed pre-whitener has the best performance, especially in applications with non-stationary noise. We also perform a large number of experiments to quantify the benefits of using a pre-whitener as a pre-processor for methods based on the WGN-assumption. The applications of interest were pitch estimation and time-of-arrival (TOA) estimation, where the WGN assumption is very popular

    Some New Results on the Estimation of Sinusoids in Noise

    Get PDF

    Smart helmet: wearable multichannel ECG & EEG

    Get PDF
    Modern wearable technologies have enabled continuous recording of vital signs, however, for activities such as cycling, motor-racing, or military engagement, a helmet with embedded sensors would provide maximum convenience and the opportunity to monitor simultaneously both the vital signs and the electroencephalogram (EEG). To this end, we investigate the feasibility of recording the electrocardiogram (ECG), respiration, and EEG from face-lead locations, by embedding multiple electrodes within a standard helmet. The electrode positions are at the lower jaw, mastoids, and forehead, while for validation purposes a respiration belt around the thorax and a reference ECG from the chest serve as ground truth to assess the performance. The within-helmet EEG is verified by exposing the subjects to periodic visual and auditory stimuli and screening the recordings for the steady-state evoked potentials in response to these stimuli. Cycling and walking are chosen as real-world activities to illustrate how to deal with the so-induced irregular motion artifacts, which contaminate the recordings. We also propose a multivariate R-peak detection algorithm suitable for such noisy environments. Recordings in real-world scenarios support a proof of concept of the feasibility of recording vital signs and EEG from the proposed smart helmet

    A comparison of two auditory front-end models for horizontal localization of concurrent speakers in adverse acoustic scenarios

    Get PDF
    Ears are complex instruments which help humans understand what is happening around them. By using two ears, a person can focus his attention on a specific sound source. The first auditory models appeared in literature in the previous century; nowadays, new approaches extend previous findings. Extensive research has been carried out through the years, but many details of the auditory processing remain unclear. In this thesis, two auditory models will be analyzed and compared
    corecore