298 research outputs found

    Spatial Diffuseness Features for DNN-Based Speech Recognition in Noisy and Reverberant Environments

    Full text link
    We propose a spatial diffuseness feature for deep neural network (DNN)-based automatic speech recognition to improve recognition accuracy in reverberant and noisy environments. The feature is computed in real-time from multiple microphone signals without requiring knowledge or estimation of the direction of arrival, and represents the relative amount of diffuse noise in each time and frequency bin. It is shown that using the diffuseness feature as an additional input to a DNN-based acoustic model leads to a reduced word error rate for the REVERB challenge corpus, both compared to logmelspec features extracted from noisy signals, and features enhanced by spectral subtraction.Comment: accepted for ICASSP201

    Motivations underlying self-infliction of pain during thinking for pleasure

    Get PDF
    Previous research suggested that people prefer to administer unpleasant electric shocks to themselves rather than being left alone with their thoughts because engagement in thinking is an unpleasant activity. The present research examined this negative reinforcement hypothesis by giving participants a choice of distracting themselves with the generation of electric shock causing no to intense pain. Four experiments (N = 254) replicated the result that a large proportion of participants opted to administer painful shocks to themselves during the thinking period. However, they administered strong electric shocks to themselves even when an innocuous response option generating no or a mild shock was available. Furthermore, participants inflicted pain to themselves when they were assisted in the generation of pleasant thoughts during the waiting period, with no difference between pleasant versus unpleasant thought conditions. Overall, these results question that the primary motivation for the self-administration of painful shocks is avoidance of thinking. Instead, it seems that the self-infliction of pain was attractive for many participants, because they were curious about the shocks, their intensities, and the effects they would have on them

    Reducing Geographic Disparities in Automatic Speech Recognition via Elastic Weight Consolidation

    Full text link
    We present an approach to reduce the performance disparity between geographic regions without degrading performance on the overall user population for ASR. A popular approach is to fine-tune the model with data from regions where the ASR model has a higher word error rate (WER). However, when the ASR model is adapted to get better performance on these high-WER regions, its parameters wander from the previous optimal values, which can lead to worse performance in other regions. In our proposed method, we utilize the elastic weight consolidation (EWC) regularization loss to identify directions in parameters space along which the ASR weights can vary to improve for high-error regions, while still maintaining performance on the speaker population overall. Our results demonstrate that EWC can reduce the word error rate (WER) in the region with highest WER by 3.2% relative while reducing the overall WER by 1.3% relative. We also evaluate the role of language and acoustic models in ASR fairness and propose a clustering algorithm to identify WER disparities based on geographic region.Comment: Accepted for publication at Interspeech 202

    Cross-utterance ASR Rescoring with Graph-based Label Propagation

    Full text link
    We propose a novel approach for ASR N-best hypothesis rescoring with graph-based label propagation by leveraging cross-utterance acoustic similarity. In contrast to conventional neural language model (LM) based ASR rescoring/reranking models, our approach focuses on acoustic information and conducts the rescoring collaboratively among utterances, instead of individually. Experiments on the VCTK dataset demonstrate that our approach consistently improves ASR performance, as well as fairness across speaker groups with different accents. Our approach provides a low-cost solution for mitigating the majoritarian bias of ASR systems, without the need to train new domain- or accent-specific models.Comment: To appear in IEEE ICASSP 202
    corecore