103 research outputs found
Data-driven multivariate and multiscale methods for brain computer interface
This thesis focuses on the development of data-driven multivariate and multiscale methods
for brain computer interface (BCI) systems. The electroencephalogram (EEG), the
most convenient means to measure neurophysiological activity due to its noninvasive nature,
is mainly considered. The nonlinearity and nonstationarity inherent in EEG and its
multichannel recording nature require a new set of data-driven multivariate techniques to
estimate more accurately features for enhanced BCI operation. Also, a long term goal
is to enable an alternative EEG recording strategy for achieving long-term and portable
monitoring.
Empirical mode decomposition (EMD) and local mean decomposition (LMD), fully
data-driven adaptive tools, are considered to decompose the nonlinear and nonstationary
EEG signal into a set of components which are highly localised in time and frequency. It
is shown that the complex and multivariate extensions of EMD, which can exploit common
oscillatory modes within multivariate (multichannel) data, can be used to accurately
estimate and compare the amplitude and phase information among multiple sources, a
key for the feature extraction of BCI system. A complex extension of local mean decomposition
is also introduced and its operation is illustrated on two channel neuronal
spike streams. Common spatial pattern (CSP), a standard feature extraction technique
for BCI application, is also extended to complex domain using the augmented complex
statistics. Depending on the circularity/noncircularity of a complex signal, one of the
complex CSP algorithms can be chosen to produce the best classification performance
between two different EEG classes.
Using these complex and multivariate algorithms, two cognitive brain studies are
investigated for more natural and intuitive design of advanced BCI systems. Firstly, a Yarbus-style auditory selective attention experiment is introduced to measure the user
attention to a sound source among a mixture of sound stimuli, which is aimed at improving
the usefulness of hearing instruments such as hearing aid. Secondly, emotion experiments
elicited by taste and taste recall are examined to determine the pleasure and displeasure
of a food for the implementation of affective computing. The separation between two
emotional responses is examined using real and complex-valued common spatial pattern
methods.
Finally, we introduce a novel approach to brain monitoring based on EEG recordings
from within the ear canal, embedded on a custom made hearing aid earplug. The new
platform promises the possibility of both short- and long-term continuous use for standard
brain monitoring and interfacing applications
Speech Enhancement for Automatic Analysis of Child-Centered Audio Recordings
Analysis of child-centred daylong naturalist audio recordings has become a de-facto research protocol in the scientific study of child language development. The researchers are increasingly using these recordings to understand linguistic environment a child encounters in her routine interactions with the world. These audio recordings are captured by a microphone that a child wears throughout a day. The audio recordings, being naturalistic, contain a lot of unwanted sounds from everyday life which degrades the performance of speech analysis tasks. The purpose of this thesis is to investigate the utility of speech enhancement (SE) algorithms in the automatic analysis of such recordings. To this effect, several classical signal processing and modern machine learning-based SE methods were employed 1) as a denoiser for speech corrupted with additive noise sampled from real-life child-centred daylong recordings and 2) as front-end for downstream speech processing tasks of addressee classification (infant vs. adult-directed speech) and automatic syllable count estimation from the speech. The downstream tasks were conducted on data derived from a set of geographically, culturally, and linguistically diverse child-centred daylong audio recordings. The performance of denoising was evaluated through objective quality metrics (spectral distortion and instrumental intelligibility) and through the downstream task performance. Finally, the objective evaluation results were compared with downstream task performance results to find whether objective metrics can be used as a reasonable proxy to select SE front-end for a downstream task. The results obtained show that a recently proposed Long Short-Term Memory (LSTM)-based progressive learning architecture provides maximum performance gains in the downstream tasks in comparison with the other SE methods and baseline results. Classical signal processing-based SE methods also lead to competitive performance. From the comparison of objective assessment and downstream task performance results, no predictive relationship between task-independent objective metrics and performance of downstream tasks was found
Speech enhancement by perceptual adaptive wavelet de-noising
This thesis work summarizes and compares the existing wavelet de-noising methods. Most popular methods of wavelet transform, adaptive thresholding, and musical noise suppression have been analyzed theoretically and evaluated through Matlab simulation. Based on the above work, a new speech enhancement system using adaptive wavelet de-noising is proposed. Each step of the standard wavelet thresholding is improved by optimized adaptive algorithms. The Quantile based adaptive noise estimate and the posteriori SNR based threshold adjuster are compensatory to each other. The combination of them integrates the advantages of these two approaches and balances the effects of noise removal and speech preservation. In order to improve the final perceptual quality, an innovative musical noise analysis and smoothing algorithm and a Teager Energy Operator based silent segment smoothing module are also introduced into the system. The experimental results have demonstrated the capability of the proposed system in both stationary and non-stationary noise environments
Objective Assessment of Machine Learning Algorithms for Speech Enhancement in Hearing Aids
Speech enhancement in assistive hearing devices has been an area of research for many decades. Noise reduction is particularly challenging because of the wide variety of noise sources and the non-stationarity of speech and noise. Digital signal processing (DSP) algorithms deployed in modern hearing aids for noise reduction rely on certain assumptions on the statistical properties of undesired signals. This could be disadvantageous in accurate estimation of different noise types, which subsequently leads to suboptimal noise reduction. In this research, a relatively unexplored technique based on deep learning, i.e. Recurrent Neural Network (RNN), is used to perform noise reduction and dereverberation for assisting hearing-impaired listeners. For noise reduction, the performance of the deep learning model was evaluated objectively and compared with that of open Master Hearing Aid (openMHA), a conventional signal processing based framework, and a Deep Neural Network (DNN) based model. It was found that the RNN model can suppress noise and improve speech understanding better than the conventional hearing aid noise reduction algorithm and the DNN model. The same RNN model was shown to reduce reverberation components with proper training. A real-time implementation of the deep learning model is also discussed
Current state of digital signal processing in myoelectric interfaces and related applications
This review discusses the critical issues and recommended practices from the perspective of myoelectric interfaces. The major benefits and challenges of myoelectric interfaces are evaluated. The article aims to fill gaps left by previous reviews and identify avenues for future research. Recommendations are given, for example, for electrode placement, sampling rate, segmentation, and classifiers. Four groups of applications where myoelectric interfaces have been adopted are identified: assistive technology, rehabilitation technology, input devices, and silent speech interfaces. The state-of-the-art applications in each of these groups are presented.Peer reviewe
An adaptive autoregressive pre-whitener for speech and acoustic signals based on parametric NMF
A common assumption in many speech and acoustic processing methods is that the noise is white and Gaussian (WGN). Although making this assumption results in simple and computationally attractive methods, the assumption is often too simple and crude in many applications. In this paper, we introduce a general purpose and online pre-whitener which can be used as a pre-processor with methods based on the WGN assumption, improving their reliability and performance in applications with colored noise. The pre-whitener is a time-varying filter whose coefficients are found using a parametric non-negative matrix factorization (NMF), based on autoregressive (AR) mixture modeling of both the noise component and the signal component constituting the noisy signal. Compared to other types of pre-whiteners, we show that the proposed pre-whitener has the best performance, especially in applications with non-stationary noise. We also perform a large number of experiments to quantify the benefits of using a pre-whitener as a pre-processor for methods based on the WGN-assumption. The applications of interest were pitch estimation and time-of-arrival (TOA) estimation, where the WGN assumption is very popular
Smart helmet: wearable multichannel ECG & EEG
Modern wearable technologies have enabled continuous recording of vital signs, however, for activities such as cycling, motor-racing, or military engagement, a helmet with embedded sensors would provide maximum convenience and the opportunity to monitor simultaneously both the vital signs and the electroencephalogram (EEG). To this end, we investigate the feasibility of recording the electrocardiogram (ECG), respiration, and EEG from face-lead locations, by embedding multiple electrodes within a standard helmet. The electrode positions are at the lower jaw, mastoids, and forehead, while for validation purposes a respiration belt around the thorax and a reference ECG from the chest serve as ground truth to assess the performance. The within-helmet EEG is verified by exposing the subjects to periodic visual and auditory stimuli and screening the recordings for the steady-state evoked potentials in response to these stimuli. Cycling and walking are chosen as real-world activities to illustrate how to deal with the so-induced irregular motion artifacts, which contaminate the recordings. We also propose a multivariate R-peak detection algorithm suitable for such noisy environments. Recordings in real-world scenarios support a proof of concept of the feasibility of recording vital signs and EEG from the proposed smart helmet
A comparison of two auditory front-end models for horizontal localization of concurrent speakers in adverse acoustic scenarios
Ears are complex instruments which help humans understand what is happening around them. By using two ears, a person can focus his attention on a specific sound source. The first auditory models appeared in literature in the previous century; nowadays, new approaches extend previous findings. Extensive research has been carried out through the years, but many details of the auditory processing remain unclear. In this thesis, two auditory models will be analyzed and compared
- …