    When two vowels with different fundamental frequencies (F0s) are presented concurrently, listeners often hear two voices producing different vowels on different pitches. Parsing of this simultaneous speech can also be affected by the signal-to-noise ratio (SNR) in the auditory scene. The extraction and interaction of F0 and SNR cues may occur at multiple levels of the auditory system. The major aims of this dissertation are to elucidate the neural mechanisms and time course of concurrent speech perception in clean and in degraded listening conditions and its behavioral correlates. In two complementary experiments, electrical brain activity (EEG) was recorded at cortical (EEG Study #1) and subcortical (FFR Study #2) levels while participants heard double-vowel stimuli whose fundamental frequencies (F0s) differed by zero and four semitones (STs) presented in either clean or noise degraded (+5 dB SNR) conditions. Behaviorally, listeners were more accurate in identifying both vowels for larger F0 separations (i.e., 4ST; with pitch cues), and this F0-benefit was more pronounced at more favorable SNRs. Time-frequency analysis of cortical EEG oscillations (i.e., brain rhythms) revealed a dynamic time course for concurrent speech processing that depended on both extrinsic (SNR) and intrinsic (pitch) acoustic factors. Early high frequency activity reflected pre-perceptual encoding of acoustic features (~200 ms) and the quality (i.e., SNR) of the speech signal (~250-350ms), whereas later-evolving low-frequency rhythms (~400-500ms) reflected post-perceptual, cognitive operations that covaried with listening effort and task demands. Analysis of subcortical responses indicated that while FFRs provided a high-fidelity representation of double vowel stimuli and the spectro-temporal nonlinear properties of the peripheral auditory system. FFR activity largely reflected the neural encoding of stimulus features (exogenous coding) rather than perceptual outcomes, but timbre (F1) could predict the speed in noise conditions. Taken together, results of this dissertation suggest that subcortical auditory processing reflects mostly exogenous (acoustic) feature encoding in stark contrast to cortical activity, which reflects perceptual and cognitive aspects of concurrent speech perception. By studying multiple brain indices underlying an identical task, these studies provide a more comprehensive window into the hierarchy of brain mechanisms and time-course of concurrent speech processing

    Data-driven multivariate and multiscale methods for brain computer interface

    This thesis focuses on the development of data-driven multivariate and multiscale methods for brain computer interface (BCI) systems. The electroencephalogram (EEG), the most convenient means to measure neurophysiological activity due to its noninvasive nature, is mainly considered. The nonlinearity and nonstationarity inherent in EEG and its multichannel recording nature require a new set of data-driven multivariate techniques to estimate more accurately features for enhanced BCI operation. Also, a long term goal is to enable an alternative EEG recording strategy for achieving long-term and portable monitoring. Empirical mode decomposition (EMD) and local mean decomposition (LMD), fully data-driven adaptive tools, are considered to decompose the nonlinear and nonstationary EEG signal into a set of components which are highly localised in time and frequency. It is shown that the complex and multivariate extensions of EMD, which can exploit common oscillatory modes within multivariate (multichannel) data, can be used to accurately estimate and compare the amplitude and phase information among multiple sources, a key for the feature extraction of BCI system. A complex extension of local mean decomposition is also introduced and its operation is illustrated on two channel neuronal spike streams. Common spatial pattern (CSP), a standard feature extraction technique for BCI application, is also extended to complex domain using the augmented complex statistics. Depending on the circularity/noncircularity of a complex signal, one of the complex CSP algorithms can be chosen to produce the best classification performance between two different EEG classes. Using these complex and multivariate algorithms, two cognitive brain studies are investigated for more natural and intuitive design of advanced BCI systems. Firstly, a Yarbus-style auditory selective attention experiment is introduced to measure the user attention to a sound source among a mixture of sound stimuli, which is aimed at improving the usefulness of hearing instruments such as hearing aid. Secondly, emotion experiments elicited by taste and taste recall are examined to determine the pleasure and displeasure of a food for the implementation of affective computing. The separation between two emotional responses is examined using real and complex-valued common spatial pattern methods. Finally, we introduce a novel approach to brain monitoring based on EEG recordings from within the ear canal, embedded on a custom made hearing aid earplug. The new platform promises the possibility of both short- and long-term continuous use for standard brain monitoring and interfacing applications

    Noise processing in the auditory system with applications in speech enhancement

    Abstract: The auditory system is extremely efficient in extracting auditory information in the presence of background noise. However, speech enhancement algorithms, aimed at removing the background noise from a degraded speech signal, are not achieving results that are near the efficacy of the auditory system. The purpose of this study is thus to first investigate how noise affects the spiking activity of neurons in the auditory system and then use the brain activity in the presence of noise to design better speech enhancement algorithms. In order to investigate how noise affects the spiking activity of neurons, we first design a generalized linear model that relates the spiking activity of neurons to intrinsic and extrinsic covariates that can affect their activity, such as noise. From this model, we extract two metrics, one that shows the effects of noise on the spiking activity and another the relative effects of vocalization compared to noise. We use these metrics to analyze neural data, recorded from a structure of the auditory system named the inferior colliculus (IC), while presenting noisy vocalizations. We studied the effect of different kinds of noises (non-stationary, white and natural stationary), different vocalizations, different input sound levels and signal-to-noise ratios (SNR). We found that the presence of non-stationary noise increases the spiking activity of neurons, regardless of the SNR, input level or vocalization type. The presence of white or natural stationary noises however causes a great diversity of responses where the activity of sites could increase, decrease or remain unchanged. This shows that the noise invariance previously reported in the IC depends on the noisy conditions, which had not been observed before. We then address the problem of speech enhancement using information from the brain's processing in the presence of noise. It has been shown before that the brain waves of a listener strongly correlates with the speaker to which the listener attends. Given this, we design two speech enhancement algorithms with a denoising autoencoder structure, namely the Brain Enhanced Speech Denoiser (BESD) and U-shaped Brain Enhanced Speech Denoiser (U-BESD). These algorithms take advantage of the attended auditory information present in the brain activity of the listener to denoise a multi-talker speech. The U-BESD is built upon the BESD with the addition of skip connections and dilated convolutions. Compared to previously proposed approaches, BESD and U-BESD are trained in a single neural architecture, lowering the complexity of the algorithm. We investigate two experimental settings. In the first one, the attended speaker is known, referred to as the speaker-specific setting, and in the second one no prior information is available about the attended speaker, referred to as the speaker-independent setting. In the speaker-specific setting, we show that both the BESD and U-BESD algorithms surpass a similar denoising autoencoder. Moreover, we also show that in the speaker-independent setting, U-BESD surpasses the performance of the only known approach that also uses the brain's activity.Le système auditif est extrêmement efficace pour extraire de l’information pertinente en présence d’un bruit de fond. Par contre, les algorithmes de rehaussement de la parole, visant à supprimer le bruit d’un signal de parole bruité, n’atteignent pas des résultats proches de l’efficacité du système auditif. Le but de cette étude est donc d’abord d’étudier comment le bruit affecte l’activité neuronale dans le système auditif, puis d’utiliser l’activité cérébrale en présence de bruit pour concevoir de meilleurs algorithmes de rehaussement. Afin d’étudier comment le bruit peut affecter l’activité des neurones, nous concevons d’abord un modèle linéaire généralisé qui relie l’activité des neurones aux covariables intrinsèques et extrinsèques qui peuvent affecter leur activité, comme le bruit. De ce modèle, nous extrayons deux métriques, l’une qui permet d’étudier les effets du bruit sur l’activité neuronale et l’autre les effets relatifs sur cette activité de la vocalisation par rapport au bruit. Nous utilisons ces métriques pour analyser l’activité neuronale d’une structure du système auditif, nomée le colliculus inférieur (IC), enregistrée lors de la présentation de vocalisations bruitées. Nous avons étudié l’effet de différents types de bruits, différentes vocalisations, différents niveaux sonores d’entrée et différents rapports signal sur bruit (SNR). Nous avons constaté que la présence de bruit non stationnaire augmente l’activité des neurones, quel que soit le SNR, le niveau d’entrée ou le type de vocalisation. La présence de bruits stationnaires blancs ou naturels provoque cependant une grande diversité de réponses où l’activité des sites d’enregistrement pouvait augmenter, diminuer ou rester inchangée. Cela montre que l’invariance du bruit précédemment signalée dans l’IC dépend des conditions de bruit, ce qui n’avait pas été observé auparavant. Nous abordons ensuite le problème du rehaussement de la parole en utilisant de l’information provenant du cerveau. Il a été démontré auparavant que les ondes cérébrales d’un auditeur sont fortement corrélées avec le locuteur auquel l’auditeur porte attention. Compte tenu de cette corrélation, nous concevons deux algorithmes de rehaussement de la parole, le Brain Enhanced Speech Denoiser (BESD) et le U-shaped Brain Enhanced Speech Denoiser (U-BESD), qui tirent parti de l’information présente dans l’activité cérébrale de l’auditeur pour débruiter un signal de parole multi-locuteurs. L’U-BESD est construit à partir du BESD avec l’ajout de sauts de connexions (skip connections) et de convolutions dilatées. De plus, BESD et U-BESD sont constitués respectivement d’un seul réseau qui nécessite un seul entraînement, ce qui réduit la complexité de l’algorithme en comparaison avec les approches existantes. Nous étudions deux conditions expérimentales. Dans la première, le locuteur auquel l’auditeur porte attention est connu, et dans la seconde, ce locuteur n’est pas connu. Dans le cadre du locuteur connu, nous montrons que les algorithmes BESD et U-BESD surpassent un autoencodeur similaire. De plus, nous montrons également que dans le cadre du locuteur inconnu, le U-BESD surpasse les performances de la seule approche existante connue qui utilise également l’activité cérébrale

    Augmented Unreality: Synesthetic Artworks & Audio-Visual Hallucinations

    Neuromorphic model for sound source segregation

    While humans can easily segregate and track a speaker's voice in a loud noisy environment, most modern speech recognition systems still perform poorly in loud background noise. The computational principles behind auditory source segregation in humans is not yet fully understood. In this dissertation, we develop a computational model for source segregation inspired by auditory processing in the brain. To support the key principles behind the computational model, we conduct a series of electro-encephalography experiments using both simple tone-based stimuli and more natural speech stimulus. Most source segregation algorithms utilize some form of prior information about the target speaker or use more than one simultaneous recording of the noisy speech mixtures. Other methods develop models on the noise characteristics. Source segregation of simultaneous speech mixtures with a single microphone recording and no knowledge of the target speaker is still a challenge. Using the principle of temporal coherence, we develop a novel computational model that exploits the difference in the temporal evolution of features that belong to different sources to perform unsupervised monaural source segregation. While using no prior information about the target speaker, this method can gracefully incorporate knowledge about the target speaker to further enhance the segregation.Through a series of EEG experiments we collect neurological evidence to support the principle behind the model. Aside from its unusual structure and computational innovations, the proposed model provides testable hypotheses of the physiological mechanisms of the remarkable perceptual ability of humans to segregate acoustic sources, and of its psychophysical manifestations in navigating complex sensory environments. Results from EEG experiments provide further insights into the assumptions behind the model and provide motivation for future single unit studies that can provide more direct evidence for the principle of temporal coherence

    During ‘altered states of consciousness’ (ASCs), such as those produced by psychedelic drugs, an individual may experience substantial changes to mood, thoughts and perception, and have subjective experiences of visual or auditory hallucinations. In Hobson’s (2003, 44–46) discussion of his AIM (Activation, Input, Modulation) model of consciousness he distinguishes the imagery of dreams and hallucinations as ‘internal’ sensory inputs, in contrast with the ‘external’ inputs that are received via the senses from the surrounding environment during normal waking consciousness. For the purposes of this chapter, external inputs correspond with physical ‘reality,’ while the internal inputs generated by the brain during dreams or hallucinations shall be considered as ‘unreality.’ Reproduced by permission of Oxford University Pres

    Change blindness: eradication of gestalt strategies

    Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task

    How musical rhythms entrain the human brain : clarifying the neural mechanisms of sensory-motor entrainment to rhythms

    When listening to music, people across cultures tend to spontaneously perceive and move the body along a periodic pulse-like meter. Increasing evidence suggests that this ability is supported by neural mechanisms that selectively amplify periodicities corresponding to the perceived metric pulses. However, the nature of these neural mechanisms, i.e., the endogenous or exogenous factors that may selectively enhance meter periodicities in brain responses to rhythm, remains largely unknown. This question was investigated in a series of studies in which the electroencephalogram (EEG) of healthy participants was recorded while they listened to musical rhythm. From this EEG, selective contrast at meter periodicities in the elicited neural activity was captured using frequency-tagging, a method allowing direct comparison of this contrast between the sensory input, EEG response, biologically-plausible models of auditory subcortical processing, and behavioral output. The results show that the selective amplification of meter periodicities is shaped by a continuously updated combination of factors including sound spectral content, long-term training and recent context, irrespective of attentional focus and beyond auditory subcortical nonlinear processing. Together, these observations demonstrate that perception of rhythm involves a number of processes that transform the sensory input via fixed low-level nonlinearities, but also through flexible mappings shaped by prior experience at different timescales. These higher-level neural mechanisms could represent a neurobiological basis for the remarkable flexibility and stability of meter perception relative to the acoustic input, which is commonly observed within and across individuals. Fundamentally, the current results add to the evidence that evolution has endowed the human brain with an extraordinary capacity to organize, transform, and interact with rhythmic signals, to achieve adaptive behavior in a complex dynamic environment

    Electrophysiological assessment of audiovisual integration in speech perception

