1,216 research outputs found

    A Subband-Based SVM Front-End for Robust ASR

    Full text link
    This work proposes a novel support vector machine (SVM) based robust automatic speech recognition (ASR) front-end that operates on an ensemble of the subband components of high-dimensional acoustic waveforms. The key issues of selecting the appropriate SVM kernels for classification in frequency subbands and the combination of individual subband classifiers using ensemble methods are addressed. The proposed front-end is compared with state-of-the-art ASR front-ends in terms of robustness to additive noise and linear filtering. Experiments performed on the TIMIT phoneme classification task demonstrate the benefits of the proposed subband based SVM front-end: it outperforms the standard cepstral front-end in the presence of noise and linear filtering for signal-to-noise ratio (SNR) below 12-dB. A combination of the proposed front-end with a conventional front-end such as MFCC yields further improvements over the individual front ends across the full range of noise levels

    Source Separation for Hearing Aid Applications

    Get PDF

    Linear and nonlinear adaptive filtering and their applications to speech intelligibility enhancement

    Get PDF

    RAPID ADAPTIVE PLASTICITY IN AUDITORY CORTEX

    Get PDF
    Navigating the acoustic environment entails actively listening for different sound sources, extracting signal from a background of noise, identifying the salient features of a signal and determining what parts of it are relevant. Humans and animals in natural environments perform such acoustic tasks routinely, and have to adapt to changes in the environment and features of the acoustic signals surrounding them in real time. Rapid plasticity has been reported to be a possible mechanism underling the ability to perform these tasks. Previous studies report that neurons in primary auditory cortex (A1) undergo changes in spectro-temporal tuning that enhance the discriminability between different sound classes, modulating their tuning to enhance the task relevant feature. This thesis investigates rapid task related plasticity in two distinct directions; first I investigate the effect of manipulating task difficulty on this type of plasticity. Second I expand the investigation of rapid plasticity into higher order auditory areas. With increasing task difficulty, A1 neurons' response is altered to increasingly suppress the representation of the noise while enhancing the representation of the signal. Comparing adaptive plasticity in secondary auditory cortex (PEG) to A1, PEG neurons further enhance the discriminability of the sound classes by an even greater enhancement of the target response. Taken together these results indicate that adaptive neural plasticity is a plausible mechanism that underlies the performance of novel auditory behaviors in real time, and provide insights into the development of behaviorally significant representation of sound in auditory cortex

    Single-Microphone Speech Enhancement Inspired by Auditory System

    Get PDF
    Enhancing quality of speech in noisy environments has been an active area of research due to the abundance of applications dealing with human voice and dependence of their performance on this quality. While original approaches in the field were mostly addressing this problem in a pure statistical framework in which the goal was to estimate speech from its sum with other independent processes (noise), during last decade, the attention of the scientific community has turned to the functionality of human auditory system. A lot of effort has been put to bridge the gap between the performance of speech processing algorithms and that of average human by borrowing the models suggested for the sound processing in the auditory system. In this thesis, we will introduce algorithms for speech enhancement inspired by two of these models i.e. the cortical representation of sounds and the hypothesized role of temporal coherence in the auditory scene analysis. After an introduction to the auditory system and the speech enhancement framework we will first show how traditional speech enhancement technics such as wiener-filtering can benefit on the feature extraction level from discriminatory capabilities of spectro-temporal representation of sounds in the cortex i.e. the cortical model. We will next focus on the feature processing as opposed to the extraction stage in the speech enhancement systems by taking advantage of models hypothesized for human attention for sound segregation. We demonstrate a mask-based enhancement method in which the temporal coherence of features is used as a criterion to elicit information about their sources and more specifically to form the masks needed to suppress the noise. Lastly, we explore how the two blocks for feature extraction and manipulation can be merged into one in a manner consistent with our knowledge about auditory system. We will do this through the use of regularized non-negative matrix factorization to optimize the feature extraction and simultaneously account for temporal dynamics to separate noise from speech

    Towards understanding the role of central processing in release from masking

    Get PDF
    People with normal hearing have the ability to listen to a desired target sound while filtering out unwanted sounds in the background. However, most patients with hearing impairment struggle in noisy environments, a perceptual deficit which current hearing aids and cochlear implants cannot resolve. Even though peripheral dysfunction of the ears undoubtedly contribute to this deficit, surmounting evidence has implicated central processing in the inability to detect sounds in background noise. Therefore, it is essential to better understand the underlying neural mechanisms by which target sounds are dissociated from competing maskers. This research focuses on two phenomena that help suppress background sounds: 1) dip-listening, and 2) directional hearing. When background noise fluctuates slowly over time, both humans and animals can listen in the dips of the noise envelope to detect target sound, a phenomenon referred to as dip-listening. Detection of target sound is facilitated by a central neuronal mechanism called envelope locking suppression. At both positive and negative signal-to-noise ratios (SNRs), the presence of target energy can suppress the strength by which neurons in auditory cortex track background sound, at least in anesthetized animals. However, in humans and animals, most of the perceptual advantage gained by listening in the dips of fluctuating noise emerges when a target is softer than the background sound. This raises the possibility that SNR shapes the reliance on different processing strategies, a hypothesis tested here in awake behaving animals. Neural activity of Mongolian gerbils is measured by chronic implantation of silicon probes in the core auditory cortex. Using appetitive conditioning, gerbils detect target tones in the presence of temporally fluctuating amplitude-modulated background noise, called masker. Using rate- vs. timing-based decoding strategies, analysis of single-unit activity show that both mechanisms can be used for detecting tones at positive SNR. However, only temporal decoding provides an SNR-invariant readout strategy that is viable at both positive and negative SNRs. In addition to dip-listening, spatial cues can facilitate the dissociation of target sounds from background noise. Specifically, an important cue for computing sound direction is the time difference in arrival of acoustic energy reaching each ear, called interaural time difference (ITD). ITDs allow localization of low frequency sounds from left to right inside the listener\u27s head, also called sound lateralization. Models of sound localization commonly assume that sound lateralization from interaural time differences is level invariant. Here, two prevalent theories of sound localization are observed to make opposing predictions. The labelled-line model encodes location through tuned representations of spatial location and predicts that perceived direction is level invariant. In contrast, the hemispheric-difference model encodes location through spike-rate and predicts that perceived direction becomes medially biased at low sound levels. In this research, through behavioral experiments on sound lateralization, the computation of sound location with ITDs is tested. Four groups of normally hearing listeners lateralize sounds based on ITDs as a function of sound intensity, exposure hemisphere, and stimulus history. Stimuli consists of low-frequency band-limited white noise. Statistical analysis, which partial out overall differences between listeners, is inconsistent with the place-coding scheme of sound localization, and supports the hypothesis that human sound localization is instead encoded through a population rate-code

    The effect of uncertainty in MEG-to-MRI coregistrations on MEG inverse problems

    Get PDF
    For high precision in source estimates of magnetoencephalography (MEG) data, high accuracy of the coregistration of sources and sensors is mandatory. Usually, the source space is derived from magnetic resonance imaging (MRI). Sensor-to-MRI coregistrations are the focus of this thesis. The quality of coregistrations is assessed and the effect of their uncertainties on source estimates is analyzed. Both topics, the quality assessment and the propagation of uncertainties to source estimates are treated separately. In this thesis, the target registration error (TRE) is proposed as criterion for the quality of sensor-to-MRI coregistrations. TRE measures the effect of uncertainty in coregistrations at all points of interest. In total, 5544 data sets with sensor-to-head and 128 head-to-MRI coregistrations, from a single MEG laboratory, were analyzed. An adaptive Metropolis algorithm was used to estimate the optimal coregistration and to sample the coregistration parameters (rotation and translation). I found an average TRE between 1.3 and 2.3 mm at the head surface. A mean absolute difference in coregistration parameters between the Metropolis and iterative closest point algorithm of (1.9 ± 1.5)° and (1.1 ± 0.9) mm was found. A paired sample t-test indicated a significant improvement in goal function minimization by using the Metropolis algorithm. The sampled parameters allowed computation of TRE on the entire grid of the MRI volume. Hence, I recommend the Metropolis algorithm for head-to-MRI coregistrations. The propagation of coregistration uncertainty to source estimates was performed by using pseudospectral approximations of beamformer and standardized low resolution tomography (sLORETA). This approach was tested for auditory, visual and somatosensory brain activity with different signal to noise ratios and source orientation constraints on datasets of 20 subjects. By using pseudospectral approximations as efficient surrogates, the spatial distribution of the source estimate maximum was sampled for 50000 coregistrations. From the results, it can be concluded that it is possible to apply stochastic spectral methods to MEG source estimation with high accuracy. The investigated effects of coregistration uncertainties on source estimates are small, typically the maximum location varied within a range of 5 mm, which is in the range of the localization errors. Pseudospectral approximations of the source estimates reduced computation times considerably by a factor of approximately 10000 for beamformer and 50000 for sLORETA compared to the exact original computations.Für eine hohe Präzision in der Schätzung von Gehirnaktivität, ausgehend von Daten der Magnetoenzephalographie (MEG), ist eine sehr genaue Koregistrierung der Quellen und Sensoren notwendig. Üblicherweise werden hierbei die Quellorte der Gehirnaktivität bezüglich zu Koordinaten der Magnetresonanztomographie (MRI) angegeben. Die Sensor-zu-MRI Koregistrierungen sind der Schwerpunkt dieser Arbeit. Die Qualität von Koregistrierungen wird bewertet und der Effekt ihrer Unsicherheiten auf Schätzungen der Gehirnaktivität beziehungsweise auf Quellschätzungen wird untersucht. Beide Themen, die Qualitätsbewertung und die Übertragung der Unsicherheiten auf Quellschätzungen werden separat behandelt. In dieser Arbeit wird vorgeschlagen, den target registration error (TRE) als Qualitätskriterium für Sensor-zu-MRI Koregistrierungen zu verwenden. Der TRE kann den Effekt von Koregistrierungsunsicherheiten an beliebigen Punkten messen. Insgesamt wurden 5544 Datensätze mit Sensor-zu-Kopf und 128 Datensätze mit Kopf-zu-MRI Koregistrierungen aus einem Labor analysiert. Ein adaptiver Metropolis-Algorithmus wurde genutzt um optimale Koregistrierungen zu schätzen und um Stichproben ihrer Parameter (Rotation und Translation) zu ziehen. Es wurde ein TRE von 1.3 und 2.3 mm an der Kopfoberfläche gefunden. Weiter wurde eine mittlere absolute Differenz der Koregistrierungsparameter zwischen Metropolis-Algorithmus und dem etablierten iterative closest point-Algorithmus von (1.9 ± 1.5)° und (1.1 ± 0.9) mm gefunden. Ein Zweistichproben-t-Test zeigte eine signifikante Verbesserung in der Optimierung der Zielfunktion durch den Metropolis-Algorithmus. Die Übertragung der Koregistrierungsunsicherheit auf Quellschätzungen erfolgte unter Verwendung von speziellen Polynom-Entwicklungen des Beamformers und der standardized low resolution tomography (sLORETA). Dieser Ansatz wurde für auditorische, visuelle und somatosensorische Hirnaktivität mit verschiedenen Signal-Rausch-Verhältnissen und Beschränkungen der Quellorientierung auf Datensätzen von 20 Probanden getestet. Durch die Verwendung von Polynom-Entwicklungen als effiziente Surrogate wurde die örtliche Verteilung des Quellschätzungs-Maximums für 50000 Koregistrierungen ermittelt. Aus den Ergebnissen lässt sich schließen, dass es möglich ist, Polynom-Entwicklungen mit hoher Genauigkeit auf MEG-Quellschätzungen anzuwenden. Polynom-Entwicklungen der Quellschätzungen reduzierten die Berechnungszeiten erheblich um den Faktor von etwa 10000 für Beamformer und 50000 für sLORETA im Vergleich zu den exakten Originalrechnungen
    • …
    corecore