18 research outputs found

    Neuromorphic model for sound source segregation

    Get PDF
    While humans can easily segregate and track a speaker's voice in a loud noisy environment, most modern speech recognition systems still perform poorly in loud background noise. The computational principles behind auditory source segregation in humans is not yet fully understood. In this dissertation, we develop a computational model for source segregation inspired by auditory processing in the brain. To support the key principles behind the computational model, we conduct a series of electro-encephalography experiments using both simple tone-based stimuli and more natural speech stimulus. Most source segregation algorithms utilize some form of prior information about the target speaker or use more than one simultaneous recording of the noisy speech mixtures. Other methods develop models on the noise characteristics. Source segregation of simultaneous speech mixtures with a single microphone recording and no knowledge of the target speaker is still a challenge. Using the principle of temporal coherence, we develop a novel computational model that exploits the difference in the temporal evolution of features that belong to different sources to perform unsupervised monaural source segregation. While using no prior information about the target speaker, this method can gracefully incorporate knowledge about the target speaker to further enhance the segregation.Through a series of EEG experiments we collect neurological evidence to support the principle behind the model. Aside from its unusual structure and computational innovations, the proposed model provides testable hypotheses of the physiological mechanisms of the remarkable perceptual ability of humans to segregate acoustic sources, and of its psychophysical manifestations in navigating complex sensory environments. Results from EEG experiments provide further insights into the assumptions behind the model and provide motivation for future single unit studies that can provide more direct evidence for the principle of temporal coherence

    Modified cyclic shift tree denoising technique with fewer number of sweep for wave V detection

    Get PDF
    Nowadays, in developing countries Newborn Hearing Screening (NHS) has become one of the most important recommendations in modern pediatric audiology due to the important of early detection for newborn as the first six month of age are the critical period for learning communication. Auditory Brainstem Response (ABR) is an electrophysiological response in the electroencephalography generated in the brainstem in response to the acoustical stimulus. The conventional method used previously was accurate, but it is time consuming especially with the presence of noise interference. The objective of this research is to reduce screening time by implementing enhanced signal processing method and also to reduce the influence of noise interference. This thesis applies Wavelet Kalman Filter (WKF), Cyclic Shift Tree Denoising (CSTD) and Modified Cyclic Shift Tree Denoising (MCSTD) to overcome these problems. The modified approach MSCTD is a modification from CSTD where it is a combination of the wavelet, KF and CSTD. The modified approach was compared to the averaging, WKF and CSTD to analyze an effective wavelet method for denoising that can give the rapid and accurate extraction of ABRs. Results show that the MCSTD outperform the other methods and giving the highest SNR value and able to detect wave V until reduce sweeps number of 512 and 1024 respectively for chirp and click stimulus


    Get PDF
    Human listeners can reliably recognize speech in complex listening environments. The underlying neural mechanisms, however, remain unclear and cannot yet be emulated by any artificial system. In this dissertation, we study how speech is represented in the human auditory cortex and how the neural representation contributes to reliable speech recognition. Cortical activity from normal hearing human subjects is noninvasively recorded using magnetoencephalography, during natural speech listening. It is first demonstrated that neural activity from auditory cortex is precisely synchronized to the slow temporal modulations of speech, when the speech signal is presented in a quiet listening environment. How this neural representation is affected by acoustic interference is then investigated. Acoustic interference degrades speech perception via two mechanisms, informational masking and energetic masking, which are addressed respectively by using a competing speech stream and a stationary noise as the interfering sound. When two speech streams are presented simultaneously, cortical activity is predominantly synchronized to the speech stream the listener attends to, even if the unattended, competing speech stream is 8 dB more intense. When speech is presented together with spectrally matched stationary noise, cortical activity remains precisely synchronized to the temporal modulations of speech until the noise is 9 dB more intense. Critically, the accuracy of neural synchronization to speech predicts how well individual listeners can understand speech in noise. Further analysis reveals that two neural sources contribute to speech synchronized cortical activity, one with a shorter response latency of about 50 ms and the other with a longer response latency of about 100 ms. The longer-latency component, but not the shorter-latency component, shows selectivity to the attended speech and invariance to background noise, indicating a transition from encoding the acoustic scene to encoding the behaviorally important auditory object, in auditory cortex. Taken together, we have demonstrated that during natural speech comprehension, neural activity in the human auditory cortex is precisely synchronized to the slow temporal modulations of speech. This neural synchronization is robust to acoustic interference, whether speech or noise, and therefore provides a strong candidate for the neural basis of acoustic background invariant speech recognition


    Get PDF
    A key aspect of human auditory cognition is establishing efficient and reliable representations about the acoustic environment, especially at the level of auditory cortex. Since the inception of encoding models that relate sound to neural response, three longstanding questions remain open. First, on the apparently insurmountable problem of fundamental changes to cortical responses depending on certain categories of sound (e.g. simple tones versus environmental sound). Second, on how to integrate inner or subjective perceptual experiences into sound encoding models, given that they presuppose existing, direct physical stimulation which is sometimes missed. And third, on how does context and learning fine-tune these encoding rules, as adaptive changes to improve impoverished conditions particularly important for communication sounds. In this series, each question is addressed by analysis of mappings from sound stimuli delivered-to and/or perceived-by a listener, to large-scale cortically-sourced response time series from magnetoencephalography. It is first shown that the divergent, categorical modes of sensory coding may unify by exploring alternative acoustic representations other than the traditional spectrogram, such as temporal transient maps. Encoding models of either of artificial random tones, music, or speech stimulus classes, were substantially matched in their structure when represented from acoustic energy increases –consistent with the existence of a domain-general common baseline processing stage. Separately, the matter of the perceptual experience of sound via cortical responses is addressed via stereotyped rhythmic patterns normally entraining cortical responses with equal periodicity. Here, it is shown that under conditions of perceptual restoration, namely cases where a listener reports hearing a specific sound pattern in the midst of noise nonetheless, one may access such endogenous representations in the form of evoked cortical oscillations at the same rhythmic rate. Finally, with regards to natural speech, it is shown that extensive prior experience over repeated listening of the same sentence materials may facilitate the ability to reconstruct the original stimulus even where noise replaces it, and to also expedite normal cortical processing times in listeners. Overall, the findings demonstrate cases by which sensory and perceptual coding approaches jointly continue to expand the enquiry about listeners’ personal experience of the communication-rich soundscape

    Optimized Biosignals Processing Algorithms for New Designs of Human Machine Interfaces on Parallel Ultra-Low Power Architectures

    Get PDF
    The aim of this dissertation is to explore Human Machine Interfaces (HMIs) in a variety of biomedical scenarios. The research addresses typical challenges in wearable and implantable devices for diagnostic, monitoring, and prosthetic purposes, suggesting a methodology for tailoring such applications to cutting edge embedded architectures. The main challenge is the enhancement of high-level applications, also introducing Machine Learning (ML) algorithms, using parallel programming and specialized hardware to improve the performance. The majority of these algorithms are computationally intensive, posing significant challenges for the deployment on embedded devices, which have several limitations in term of memory size, maximum operative frequency, and battery duration. The proposed solutions take advantage of a Parallel Ultra-Low Power (PULP) architecture, enhancing the elaboration on specific target architectures, heavily optimizing the execution, exploiting software and hardware resources. The thesis starts by describing a methodology that can be considered a guideline to efficiently implement algorithms on embedded architectures. This is followed by several case studies in the biomedical field, starting with the analysis of a Hand Gesture Recognition, based on the Hyperdimensional Computing algorithm, which allows performing a fast on-chip re-training, and a comparison with the state-of-the-art Support Vector Machine (SVM); then a Brain Machine Interface (BCI) to detect the respond of the brain to a visual stimulus follows in the manuscript. Furthermore, a seizure detection application is also presented, exploring different solutions for the dimensionality reduction of the input signals. The last part is dedicated to an exploration of typical modules for the development of optimized ECG-based applications


    Get PDF
    Older adults frequently report that they can hear what they have been told but cannot understand the meaning. This is particularly true in noisy conditions, where the additional challenge of suppressing irrelevant noise (i.e. a competing talker) adds another layer of difficulty to their speech understanding. Hearing aids improve speech perception in quiet, but their success in noisy environments has been modest, suggesting that peripheral hearing loss may not be the only factor in the older adult’s perceptual difficulties. Recent animal studies have shown that auditory synapses and cells undergo significant age-related changes that could impact the integrity of temporal processing in the central auditory system. Psychoacoustic studies carried out in humans have also shown that hearing loss can explain the decline in older adults’ performance in quiet compared to younger adults, but these psychoacoustic measurements are not accurate in describing auditory deficits in noisy conditions. These results would suggest that temporal auditory processing deficits could play an important role in explaining the reduced ability of older adults to process speech in noisy environments. The goals of this dissertation were to understand how age affects neural auditory mechanisms and at which level in the auditory system these changes are particularly relevant for explaining speech-in-noise problems. Specifically, we used non-invasive neuroimaging techniques to tap into the midbrain and the cortex in order to analyze how auditory stimuli are processed in younger (our standard) and older adults. We will also attempt to investigate a possible interaction between processing carried out in the midbrain and cortex

    Optimizing Common Spatial Pattern for a Motor Imagerybased BCI by Eigenvector Filteration

    Get PDF
    One of the fundamental criterion for the successful application of a brain-computer interface (BCI) system is to extract significant features that confine invariant characteristics specific to each brain state. Distinct features play an important role in enabling a computer to associate different electroencephalogram (EEG) signals to different brain states. To ease the workload on the feature extractor and enhance separability between different brain states, the data is often transformed or filtered to maximize separability before feature extraction. The common spatial patterns (CSP) approach can achieve this by linearly projecting the multichannel EEG data into a surrogate data space by the weighted summation of the appropriate channels. However, choosing the optimal spatial filters is very significant in the projection of the data and this has a direct impact on classification. This paper presents an optimized pattern selection method from the CSP filter for improved classification accuracy. Based on the hypothesis that values closer to zero in the CSP filter introduce noise rather than useful information, the CSP filter is modified by analyzing the CSP filter and removing/filtering the degradative or insignificant values from the filter. This hypothesis is tested by comparing the BCI results of eight subjects using the conventional CSP filters and the optimized CSP filter. In majority of the cases the latter produces better performance in terms of the overall classification accuracy

    Optimizing Common Spatial Pattern for a Motor Imagerybased BCI by Eigenvector Filteration

    Get PDF
    One of the fundamental criterion for the successful application of a brain-computer interface (BCI) system is to extract significant features that confine invariant characteristics specific to each brain state. Distinct features play an important role in enabling a computer to associate different electroencephalogram (EEG) signals to different brain states. To ease the workload on the feature extractor and enhance separability between different brain states, the data is often transformed or filtered to maximize separability before feature extraction. The common spatial patterns (CSP) approach can achieve this by linearly projecting the multichannel EEG data into a surrogate data space by the weighted summation of the appropriate channels. However, choosing the optimal spatial filters is very significant in the projection of the data and this has a direct impact on classification. This paper presents an optimized pattern selection method from the CSP filter for improved classification accuracy. Based on the hypothesis that values closer to zero in the CSP filter introduce noise rather than useful information, the CSP filter is modified by analyzing the CSP filter and removing/filtering the degradative or insignificant values from the filter. This hypothesis is tested by comparing the BCI results of eight subjects using the conventional CSP filters and the optimized CSP filter. In majority of the cases the latter produces better performance in terms of the overall classification accuracy

    Speech Processes for Brain-Computer Interfaces

    Get PDF
    Speech interfaces have become widely used and are integrated in many applications and devices. However, speech interfaces require the user to produce intelligible speech, which might be hindered by loud environments, concern to bother bystanders or the general in- ability to produce speech due to disabilities. Decoding a usera s imagined speech instead of actual speech would solve this problem. Such a Brain-Computer Interface (BCI) based on imagined speech would enable fast and natural communication without the need to actually speak out loud. These interfaces could provide a voice to otherwise mute people. This dissertation investigates BCIs based on speech processes using functional Near In- frared Spectroscopy (fNIRS) and Electrocorticography (ECoG), two brain activity imaging modalities on opposing ends of an invasiveness scale. Brain activity data have low signal- to-noise ratio and complex spatio-temporal and spectral coherence. To analyze these data, techniques from the areas of machine learning, neuroscience and Automatic Speech Recog- nition are combined in this dissertation to facilitate robust classification of detailed speech processes while simultaneously illustrating the underlying neural processes. fNIRS is an imaging modality based on cerebral blood flow. It only requires affordable hardware and can be set up within minutes in a day-to-day environment. Therefore, it is ideally suited for convenient user interfaces. However, the hemodynamic processes measured by fNIRS are slow in nature and the technology therefore offers poor temporal resolution. We investigate speech in fNIRS and demonstrate classification of speech processes for BCIs based on fNIRS. ECoG provides ideal signal properties by invasively measuring electrical potentials artifact- free directly on the brain surface. High spatial resolution and temporal resolution down to millisecond sampling provide localized information with accurate enough timing to capture the fast process underlying speech production. This dissertation presents the Brain-to- Text system, which harnesses automatic speech recognition technology to decode a textual representation of continuous speech from ECoG. This could allow to compose messages or to issue commands through a BCI. While the decoding of a textual representation is unparalleled for device control and typing, direct communication is even more natural if the full expressive power of speech - including emphasis and prosody - could be provided. For this purpose, a second system is presented, which directly synthesizes neural signals into audible speech, which could enable conversation with friends and family through a BCI. Up to now, both systems, the Brain-to-Text and synthesis system are operating on audibly produced speech. To bridge the gap to the final frontier of neural prostheses based on imagined speech processes, we investigate the differences between audibly produced and imagined speech and present first results towards BCI from imagined speech processes. This dissertation demonstrates the usage of speech processes as a paradigm for BCI for the first time. Speech processes offer a fast and natural interaction paradigm which will help patients and healthy users alike to communicate with computers and with friends and family efficiently through BCIs