298 research outputs found
Spatial Diffuseness Features for DNN-Based Speech Recognition in Noisy and Reverberant Environments
We propose a spatial diffuseness feature for deep neural network (DNN)-based
automatic speech recognition to improve recognition accuracy in reverberant and
noisy environments. The feature is computed in real-time from multiple
microphone signals without requiring knowledge or estimation of the direction
of arrival, and represents the relative amount of diffuse noise in each time
and frequency bin. It is shown that using the diffuseness feature as an
additional input to a DNN-based acoustic model leads to a reduced word error
rate for the REVERB challenge corpus, both compared to logmelspec features
extracted from noisy signals, and features enhanced by spectral subtraction.Comment: accepted for ICASSP201
Motivations underlying self-infliction of pain during thinking for pleasure
Previous research suggested that people prefer to administer unpleasant electric shocks to themselves rather than being left alone with their thoughts because engagement in thinking is an unpleasant activity. The present research examined this negative reinforcement hypothesis by giving participants a choice of distracting themselves with the generation of electric shock causing no to intense pain. Four experiments (N = 254) replicated the result that a large proportion of participants opted to administer painful shocks to themselves during the thinking period. However, they administered strong electric shocks to themselves even when an innocuous response option generating no or a mild shock was available. Furthermore, participants inflicted pain to themselves when they were assisted in the generation of pleasant thoughts during the waiting period, with no difference between pleasant versus unpleasant thought conditions. Overall, these results question that the primary motivation for the self-administration of painful shocks is avoidance of thinking. Instead, it seems that the self-infliction of pain was attractive for many participants, because they were curious about the shocks, their intensities, and the effects they would have on them
Reducing Geographic Disparities in Automatic Speech Recognition via Elastic Weight Consolidation
We present an approach to reduce the performance disparity between geographic
regions without degrading performance on the overall user population for ASR. A
popular approach is to fine-tune the model with data from regions where the ASR
model has a higher word error rate (WER). However, when the ASR model is
adapted to get better performance on these high-WER regions, its parameters
wander from the previous optimal values, which can lead to worse performance in
other regions. In our proposed method, we utilize the elastic weight
consolidation (EWC) regularization loss to identify directions in parameters
space along which the ASR weights can vary to improve for high-error regions,
while still maintaining performance on the speaker population overall. Our
results demonstrate that EWC can reduce the word error rate (WER) in the region
with highest WER by 3.2% relative while reducing the overall WER by 1.3%
relative. We also evaluate the role of language and acoustic models in ASR
fairness and propose a clustering algorithm to identify WER disparities based
on geographic region.Comment: Accepted for publication at Interspeech 202
Cross-utterance ASR Rescoring with Graph-based Label Propagation
We propose a novel approach for ASR N-best hypothesis rescoring with
graph-based label propagation by leveraging cross-utterance acoustic
similarity. In contrast to conventional neural language model (LM) based ASR
rescoring/reranking models, our approach focuses on acoustic information and
conducts the rescoring collaboratively among utterances, instead of
individually. Experiments on the VCTK dataset demonstrate that our approach
consistently improves ASR performance, as well as fairness across speaker
groups with different accents. Our approach provides a low-cost solution for
mitigating the majoritarian bias of ASR systems, without the need to train new
domain- or accent-specific models.Comment: To appear in IEEE ICASSP 202
- …