Search CORE

3,562 research outputs found

A Bayesian Network View on Acoustic Model-Based Techniques for Robust Speech Recognition

Author: Huemmer Christian
Kellermann Walter
Maas Roland
Sehr Armin
Publication venue
Publication date: 22/09/2014
Field of study

This article provides a unifying Bayesian network view on various approaches for acoustic model adaptation, missing feature, and uncertainty decoding that are well-known in the literature of robust automatic speech recognition. The representatives of these classes can often be deduced from a Bayesian network that extends the conventional hidden Markov models used in speech recognition. These extensions, in turn, can in many cases be motivated from an underlying observation model that relates clean and distorted feature vectors. By converting the observation models into a Bayesian network representation, we formulate the corresponding compensation rules leading to a unified view on known derivations as well as to new formulations for certain approaches. The generic Bayesian perspective provided in this contribution thus highlights structural differences and similarities between the analyzed approaches

arXiv.org e-Print Archive

Uncertainty in Signal Estimation and Stochastic Weighted Viterbi Algorithm: A Unified Framework to Address Robustness in Speech Recognition and Speaker Verification

Author: C. Garreton
C. Molina
F. Huenupan
N. Becerra Yoma
Publication venue: 'IntechOpen'
Publication date: 01/06/2007
Field of study

IntechOpen

Multisensory causal inference in the brain

Author: A Pouget
BE Stein
BE Stein
C Dahl
C Kayser
C Kayser
CE Schroeder
Christoph Kayser
CR Fetsch
D Kersten
D Talsma
DE Angelaki
DR Wozny
E Payzan-LeNestour
G Hatfield
JV Haxby
KP Kording
L Shams
L Shams
L Shams
Ladan Shams
M Rigotti
MO Ernst
MO Ernst
MO Ernst
MT Wallace
NW Roach
S Shamma
SW Lee
T Rohe
UR Beierholm
WJ Ma
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2015
Field of study

At any given moment, our brain processes multiple inputs from its different sensory modalities (vision, hearing, touch, etc.). In deciphering this array of sensory information, the brain has to solve two problems: (1) which of the inputs originate from the same object and should be integrated and (2) for the sensations originating from the same object, how best to integrate them. Recent behavioural studies suggest that the human brain solves these problems using optimal probabilistic inference, known as Bayesian causal inference. However, how and where the underlying computations are carried out in the brain have remained unknown. By combining neuroimaging-based decoding techniques and computational modelling of behavioural data, a new study now sheds light on how multisensory causal inference maps onto specific brain areas. The results suggest that the complexity of neural computations increases along the visual hierarchy and link specific components of the causal inference process with specific visual and parietal regions

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Publications at Bielefeld University

Enlighten

Independent Component Analysis and Time-Frequency Masking for Speech Recognition in Multitalker Conditions

Author: Dorothea Kolossa
Eugen Hoffmann
Ramon Fernandez Astudillo
Reinhold Orglmeister
Publication venue: SpringerOpen
Publication date: 01/01/2010
Field of study

When a number of speakers are simultaneously active, for example in meetings or noisy public places, the sources of interest need to be separated from interfering speakers and from each other in order to be robustly recognized. Independent component analysis (ICA) has proven a valuable tool for this purpose. However, ICA outputs can still contain strong residual components of the interfering speakers whenever noise or reverberation is high. In such cases, nonlinear postprocessing can be applied to the ICA outputs, for the purpose of reducing remaining interferences. In order to improve robustness to the artefacts and loss of information caused by this process, recognition can be greatly enhanced by considering the processed speech feature vector as a random variable with time-varying uncertainty, rather than as deterministic. The aim of this paper is to show the potential to improve recognition of multiple overlapping speech signals through nonlinear postprocessing together with uncertainty-based decoding techniques

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Reconstruction-based speech enhancement from robust acoustic features

Author: Ahmadi
Ben Milner
Boll
Cappe
Carmona
Chen
Cohen
Darch
de Cheveigné
Ephraim
Ephraim
Gales
Gauvain
Gerkmann
Gonzalez
Hu
Hu
Hu
Jensen
Kawahara
Leggetter
Loizou
Makhoul
Martin
Martin
McAulay
Milner
Milner
Mohammadiha
Oppenheim
Paliwal
Philip Harding
Rangachari
Reynolds
Stylianou
Syrdal
Varga
Xiao
Yan
Zen
Publication venue: 'Elsevier BV'
Publication date: 17/10/2015
Field of study

This paper proposes a method of speech enhancement where a clean speech signal is reconstructed from a sinusoidal model of speech production and a set of acoustic speech features. The acoustic features are estimated from noisy speech and comprise, for each frame, a voicing classification (voiced, unvoiced or non-speech), fundamental frequency (for voiced frames) and spectral envelope. Rather than using different algorithms to estimate each parameter, a single statistical model is developed. This comprises a set of acoustic models and has similarity to the acoustic modelling used in speech recognition. This allows noise and speaker adaptation to be applied to acoustic feature estimation to improve robustness. Objective and subjective tests compare reconstruction-based enhancement with other methods of enhancement and show the proposed method to be highly effective at removing noise

Crossref

University of East Anglia digital repository

Boosting Cross-Domain Speech Recognition with Self-Supervision

Author: Cheng Gaofeng
Hou Wenxin
Wang Jindong
Yan Yonghong
Zhang Pengyuan
Zhu Han
Publication venue
Publication date: 30/07/2023
Field of study

The cross-domain performance of automatic speech recognition (ASR) could be severely hampered due to the mismatch between training and testing distributions. Since the target domain usually lacks labeled data, and domain shifts exist at acoustic and linguistic levels, it is challenging to perform unsupervised domain adaptation (UDA) for ASR. Previous work has shown that self-supervised learning (SSL) or pseudo-labeling (PL) is effective in UDA by exploiting the self-supervisions of unlabeled data. However, these self-supervisions also face performance degradation in mismatched domain distributions, which previous work fails to address. This work presents a systematic UDA framework to fully utilize the unlabeled data with self-supervision in the pre-training and fine-tuning paradigm. On the one hand, we apply continued pre-training and data replay techniques to mitigate the domain mismatch of the SSL pre-trained model. On the other hand, we propose a domain-adaptive fine-tuning approach based on the PL technique with three unique modifications: Firstly, we design a dual-branch PL method to decrease the sensitivity to the erroneous pseudo-labels; Secondly, we devise an uncertainty-aware confidence filtering strategy to improve pseudo-label correctness; Thirdly, we introduce a two-step PL approach to incorporate target domain linguistic knowledge, thus generating more accurate target domain pseudo-labels. Experimental results on various cross-domain scenarios demonstrate that the proposed approach effectively boosts the cross-domain performance and significantly outperforms previous approaches.Comment: Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 202

arXiv.org e-Print Archive