8 research outputs found
Relating the fundamental frequency of speech with EEG using a dilated convolutional network
To investigate how speech is processed in the brain, we can model the
relation between features of a natural speech signal and the corresponding
recorded electroencephalogram (EEG). Usually, linear models are used in
regression tasks. Either EEG is predicted, or speech is reconstructed, and the
correlation between predicted and actual signal is used to measure the brain's
decoding ability. However, given the nonlinear nature of the brain, the
modeling ability of linear models is limited. Recent studies introduced
nonlinear models to relate the speech envelope to EEG. We set out to include
other features of speech that are not coded in the envelope, notably the
fundamental frequency of the voice (f0). F0 is a higher-frequency feature
primarily coded at the brainstem to midbrain level. We present a
dilated-convolutional model to provide evidence of neural tracking of the f0.
We show that a combination of f0 and the speech envelope improves the
performance of a state-of-the-art envelope-based model. This suggests the
dilated-convolutional model can extract non-redundant information from both f0
and the envelope. We also show the ability of the dilated-convolutional model
to generalize to subjects not included during training. This latter finding
will accelerate f0-based hearing diagnosis.Comment: Accepted for Interspeech 202
Exploring the effect of stimulus complexity, frequency and intonation on the FFR
Abstract:
Frequency following responses (FFRs) can be evoked by a wide range of stimuli. As a result, studies often use different stimuli, complicating the comparison of their results. Besides, it is not very clear which stimuli provide the largest response SNRs, which is important information in the context of clinical applications. To bring some clarity to this matter, we explore the parameter space of three important stimulus parameters and studied the effect on the SNR of the FFR in normal hearing individuals. The first parameter incorporated in the study is the complexity of the stimulus, i.e. from simple modulated tones, over Klatt synthesized vowels, to natural vowels. In the first case, we study the FFR in response to the modulation frequency. For the other two, we study the FFR to the fundamental frequency of the voice. Second, we compare response SNRs across different frequency ranges, i.e. around 100, 150 or 200 Hz. Third, we study how intonation, i.e. the direction of variation of the fundamental frequency, affects the response SNR. We considered three cases: upward, flat, or downward intonation. FFRs are measured with 64 channel EEG and processed with a Fourier Analyzer. Data collection for this study is ongoing and preliminary results will be presented at the conference.
Acknowledgements:
This research is funded by FWO (Research foundation Flanders) within the framework of the TBM-project LUISTER (T002216N) and jointly by Cochlear Ltd. and Flanders Innovation & Entrepreneurship (formerly IWT), project 50432. Financial support was also provided by an SB PhD fellowship from FWO to Jana Van Canneyt.status: publishe
Signal processing techniques to extract the frequency following response
Frequency following responses (FFRs) are auditory potentials that have the same periodicity as the evoking stimulus, which can vary from simple tones to natural speech. They are often used to study auditory and language processing. Different techniques exist to analyze FFRs, but performance of these methods has not yet been compared.
Objectives and methods
Our goal was to compare techniques to analyze the FFR. The methods we considered are:
1. FFT-based method (assumes steady frequency)
2. Cross-correlation with the exact frequency reference (only possible when it is available)
3. Cross-correlation with the reference estimated following Aiken et al. 2008
4. Cross-correlation with the fundamental waveform (Forte et al. 2017)
Since we expect performance to depend on stimulus characteristics, our comparison included different stimulus types (i.e. modulated noise, artificial vowels, natural vowels and words) and different frequency contours (steady, up or down).
Conclusions
Preliminary results show that the FFT-based method sometimes underestimates the response strength because of its bin-based approach. Cross-correlation with the exact reference is most optimal when it is available, but the Aiken-based method performs equally well - validating this method for estimating the reference. The fundamental waveform method outperforms the Aiken method in the case of word stimuli, but not for the other stimuli we studied, probably because it takes amplitude variation into account.
Acknowledgements:
This research is funded by FWO (Research foundation Flanders) within the TBM-project LUISTER (T002216N) and jointly by Cochlear Ltd. and Flanders Innovation & Entrepreneurship (formerly IWT), project 50432. Financial support was also provided by an SB PhD fellowship from FWO to Jana Van Canneyt.status: publishe
Auditory potentials to the fundamental frequency of the voice in cochlear implant listeners
Abstract:
Cochlear implant patients have considerable difficulty with voice gender recognition and intonation perception. Both these speech characteristics are strongly linked to the fundamental frequency of the voice.
In a first experiment, we studied how cochlear implant listeners perceive frequencies in the range of the fundamental frequency of the voice. We used EEG to measure auditory steady-state potentials (ASSRs) to modulated pulse trains (bipolar, 900 pps) with modulation frequencies equal to 100, 200 and 300 Hz. We found that, for modulation frequencies above 100 Hz, ASSRs have very small amplitudes and can often not be measured above background noise. This contrasts with findings in normal hearing subjects, where significant ASSRs can be found up to at least 450 Hz. Similarly, A similar drop in performance above 100 Hz was found in studies on behavioral modulation detection in cochlear implant listeners. These findings indicate that cochlear implant listeners struggle to perceive amplitude modulations above 100 Hz, explaining their troubles with interpreting the fundamental frequency of the voice.
ASSRs have the limitation that they require the stimulus frequency to be steady, which is highly unnatural for the fundamental frequency of the voice. In a second experiment, we aim to study the auditory processing of the naturally varying fundamental frequency of the voice in cochlear implant patients with frequency following responses (FFR). Before setting up experiments with cochlear implant listeners, we explored the effect of three important stimulus parameters in normal hearing individuals. First of all, we studied how response SNR varies with the complexity of the stimulus, i.e. from simple modulated tones, over Klatt synthesized vowels, to natural vowels. Second, we compared response SNRs across different frequency regions, i.e. around 100, 150 or 200 Hz. Third, we studied how direction of variation of the fundamental frequency, i.e. upward, flat, or downward, affects the response SNR. The results of this exploration will be presented at the conference.
Acknowledgements:
This research is funded by FWO (Research foundation Flanders) within the framework of the TBM-project LUISTER (T002216N) and jointly by Cochlear Ltd. and Flanders Innovation & Entrepreneurship (formerly IWT), project 50432. Financial support was also provided by an SB PhD fellowship from FWO to Jana Van Canneyt.status: publishe
From modulated noise to natural speech: The effect of stimulus parameters on the envelope following response
Envelope following responses (EFRs) can be evoked by a wide range of auditory stimuli, but for many stimulus parameters the effect on EFR strength is not fully understood. This complicates the comparison of earlier studies and the design of new studies. Furthermore, the most optimal stimulus parameters are unknown. To help resolve this issue, we investigated the effects of four important stimulus parameters and their interactions on the EFR. Responses were measured in 16 normal hearing subjects evoked by stimuli with four levels of stimulus complexity (amplitude modulated noise, artificial vowels, natural vowels and vowel-consonant-vowel combinations), three fundamental frequencies (105 Hz, 185 Hz and 245 Hz), three fundamental frequency contours (upward sweeping, downward sweeping and flat) and three vowel identities (Flemish /a:/, /u:/, and /i:/). We found that EFRs evoked by artificial vowels were on average 4e6 dB SNR larger than responses evoked by the other stimulus complexities, probably because of (unnaturally) strong higher harmonics. Moreover, response amplitude decreased with fundamental frequency but response SNR remained largely unaffected. Thirdly, fundamental frequency variation within the stimulus did not impact EFR strength, but only when rate of change remained low (e.g. not the case for sweeping natural vowels). Finally, the vowel /i:/ appeared to evoke larger response amplitudes compared to /a:/ and /u:/, but analysis power was too small to confirm this statistically. Vowel- dependent differences in response strength have been suggested to stem from destructive interfer- ence between response components. We show how a model of the auditory periphery can simulate these interference patterns and predict response strength. Altogether, the results of this study can guide stimulus choice for future EFR research and practical applicationsstatus: publishe