14 research outputs found

    Feature enhancement of reverberant speech by distribution matching and non-negative matrix factorization

    Get PDF
    This paper describes a novel two-stage dereverberation feature enhancement method for noise-robust automatic speech recognition. In the first stage, an estimate of the dereverberated speech is generated by matching the distribution of the observed reverberant speech to that of clean speech, in a decorrelated transformation domain that has a long temporal context in order to address the effects of reverberation. The second stage uses this dereverberated signal as an initial estimate within a non-negative matrix factorization framework, which jointly estimates a sparse representation of the clean speech signal and an estimate of the convolutional distortion. The proposed feature enhancement method, when used in conjunction with automatic speech recognizer back-end processing, is shown to improve the recognition performance compared to three other state-of-the-art techniques

    Sensitivity of the human auditory cortex to acoustic degradation of speech and non-speech sounds

    Get PDF
    The perception of speech is usually an effortless and reliable process even in highly adverse listening conditions. In addition to external sound sources, the intelligibility of speech can be reduced by degradation of the structure of speech signal itself, for example by digital compression of sound. This kind of distortion may be even more detrimental to speech intelligibility than external distortion, given that the auditory system will not be able to utilize sound source-specific acoustic features, such as spatial location, to separate the distortion from the speech signal. The perceptual consequences of acoustic distortions on speech intelligibility have been extensively studied. However, the cortical mechanisms of speech perception in adverse listening conditions are not well known at present, particularly in situations where the speech signal itself is distorted. The aim of this thesis was to investigate the cortical mechanisms underlying speech perception in conditions where speech is less intelligible due to external distortion or as a result of digital compression. In the studies of this thesis, the intelligibility of speech was varied either by digital compression or addition of stochastic noise. Cortical activity related to the speech stimuli was measured using magnetoencephalography (MEG). The results indicated that degradation of speech sounds by digital compression enhanced the evoked responses originating from the auditory cortex, whereas addition of stochastic noise did not modulate the cortical responses. Furthermore, it was shown that if the distortion was presented continuously in the background, the transient activity of auditory cortex was delayed. On the perceptual level, digital compression reduced the comprehensibility of speech more than additive stochastic noise. In addition, it was also demonstrated that prior knowledge of speech content enhanced the intelligibility of distorted speech substantially, and this perceptual change was associated with an increase in cortical activity within several regions adjacent to auditory cortex. In conclusion, the results of this thesis show that the auditory cortex is very sensitive to the acoustic features of the distortion, while at later processing stages, several cortical areas reflect the intelligibility of speech. These findings suggest that the auditory system rapidly adapts to the variability of the auditory environment, and can efficiently utilize previous knowledge of speech content in deciphering acoustically degraded speech signals.Puheen havaitseminen on useimmiten vaivatonta ja luotettavaa myös erittÀin huonoissa kuunteluolosuhteissa. Puheen ymmÀrrettÀvyys voi kuitenkin heikentyÀ ympÀristön hÀiriölÀhteiden lisÀksi myös silloin, kun puhesignaalin rakennetta muutetaan esimerkiksi pakkaamalla digitaalista ÀÀntÀ. TÀllainen hÀiriö voi heikentÀÀ ymmÀrrettÀvyyttÀ jopa ulkoisia hÀiriöitÀ voimakkaammin, koska kuulojÀrjestelmÀ ei pysty hyödyntÀmÀÀn ÀÀnilÀhteen ominaisuuksia, kuten ÀÀnen tulosuuntaa, hÀiriön erottelemisessa puheesta. Akustisten hÀiriöiden vaikutuksia puheen havaitsemiseen on tutkttu laajalti, mutta havaitsemiseen liittyvÀt aivomekanismit tunnetaan edelleen melko puutteelisesti etenkin tilanteissa, joissa itse puhesignaali on laadultaan heikentynyt. TÀmÀn vÀitöskirjan tavoitteena oli tutkia puheen havaitsemisen aivomekanismeja tilanteissa, joissa puhesignaali on vaikeammin ymmÀrrettÀvissÀ joko ulkoisen ÀÀnilÀhteen tai digitaalisen pakkauksen vuoksi. VÀitöskirjan neljÀssÀ osatutkimuksessa lyhyiden puheÀÀnien ja jatkuvan puheen ymmÀrrettÀvyyttÀ muokattiin joko digitaalisen pakkauksen kautta tai lisÀÀmÀllÀ puhesignaaliin satunnaiskohinaa. PuheÀrsykkeisiin liittyvÀÀ aivotoimintaa tutkittiin magnetoenkefalografia-mittauksilla. Tutkimuksissa havaittiin, ettÀ kuuloaivokuorella syntyneet herÀtevasteet voimistuivat, kun puheÀÀntÀ pakattiin digitaalisesti. Sen sijaan puheÀÀniin lisÀtty satunnaiskohina ei vaikuttanut herÀtevasteisiin. Edelleen, mikÀli puheÀÀnien taustalla esitettiin jatkuvaa hÀiriötÀ, kuuloaivokuoren aktivoituminen viivÀstyi hÀiriön intensiteetin kasvaessa. Kuuntelukokeissa havaittiin, ettÀ digitaalinen pakkaus heikentÀÀ puheÀÀnien ymmÀrrettÀvyyttÀ voimakkaammin kuin satunnaiskohina. LisÀksi osoitettiin, ettÀ aiempi tieto puheen sisÀllöstÀ paransi merkittÀvÀsti hÀiriöisen puheen ymmÀrrettÀvyyttÀ, mikÀ heijastui aivotoimintaan kuuloaivokuoren viereisillÀ aivoalueilla siten, ettÀ ymmÀrrettÀvÀ puhe aiheutti suuremman aktivaation kuin heikosti ymmÀrrettÀvÀ puhe. VÀitöskirjan tulokset osoittavat, ettÀ kuuloaivokuori on erittÀin herkkÀ puheÀÀnien akustisille hÀiriöille, ja myöhemmissÀ prosessoinnin vaiheissa useat kuuloaivokuoren viereiset aivoalueet heijastavat puheen ymmÀrrettÀvyyttÀ. Tulosten mukaan voi olettaa, ettÀ kuulojÀrjestelmÀ mukautuu nopeasti ÀÀniympÀristön vaihteluihin muun muassa hyödyntÀmÀllÀ aiempaa tietoa puheen sisÀllöstÀ tulkitessaan hÀiriöistÀ puhesignaalia

    Recognition of Reverberant Speech by Missing Data Imputation and NMF Feature Enhancement

    No full text
    The problem of reverberation in speech recognition is addressed in this study by extending a noise-robust feature enhancement method based on non-negative matrix factorization. The signal model of the observation as a linear combination of sample spectrograms is augmented by a mel-spectral feature domain convolution to account for the effects of room reverberation. The proposed method is contrasted with missing data techniques for reverberant speech, and evaluated for speech recognition performance using the REVERB challenge corpus. Our results indicate consistent gains in recognition performance compared to the baseline system, with a relative improvement in word error rate of 42.6% for the optimal case

    Lateralization and Binaural Interaction of Middle-Latency and Late-Brainstem Components of the Auditory Evoked Response

    No full text
    We used magnetoencephalography to examine lateralization and binaural interaction of the middle-latency and late-brainstem components of the auditory evoked response (the MLR and SN10, respectively). Click stimuli were presented either monaurally, or binaurally with left- or right-leading interaural time differences (ITDs). While early MLR components, including the N19 and P30, were larger for monaural stimuli presented contralaterally (by approximately 30 and 36 % in the left and right hemispheres, respectively), later components, including the N40 and P50, were larger ipsilaterally. In contrast, MLRs elicited by binaural clicks with left- or right-leading ITDs did not differ. Depending on filter settings, weak binaural interaction could be observed as early as the P13 but was clearly much larger for later components, beginning at the P30, indicating some degree of binaural linearity up to early stages of cortical processing. The SN10, an obscure late-brainstem component, was observed consistently in individuals and showed linear binaural additivity. The results indicate that while the MLR is lateralized in response to monaural stimuli—and not ITDs—this lateralization reverses from primarily contralateral to primarily ipsilateral as early as 40 ms post stimulus and is never as large as that seen with fMRI
    corecore