91 research outputs found
Speech Dereverberation Based on Integrated Deep and Ensemble Learning Algorithm
Reverberation, which is generally caused by sound reflections from walls,
ceilings, and floors, can result in severe performance degradation of acoustic
applications. Due to a complicated combination of attenuation and time-delay
effects, the reverberation property is difficult to characterize, and it
remains a challenging task to effectively retrieve the anechoic speech signals
from reverberation ones. In the present study, we proposed a novel integrated
deep and ensemble learning algorithm (IDEA) for speech dereverberation. The
IDEA consists of offline and online phases. In the offline phase, we train
multiple dereverberation models, each aiming to precisely dereverb speech
signals in a particular acoustic environment; then a unified fusion function is
estimated that aims to integrate the information of multiple dereverberation
models. In the online phase, an input utterance is first processed by each of
the dereverberation models. The outputs of all models are integrated
accordingly to generate the final anechoic signal. We evaluated the IDEA on
designed acoustic environments, including both matched and mismatched
conditions of the training and testing data. Experimental results confirm that
the proposed IDEA outperforms single deep-neural-network-based dereverberation
model with the same model architecture and training data
Deep neural network techniques for monaural speech enhancement: state of the art analysis
Deep neural networks (DNN) techniques have become pervasive in domains such
as natural language processing and computer vision. They have achieved great
success in these domains in task such as machine translation and image
generation. Due to their success, these data driven techniques have been
applied in audio domain. More specifically, DNN models have been applied in
speech enhancement domain to achieve denosing, dereverberation and
multi-speaker separation in monaural speech enhancement. In this paper, we
review some dominant DNN techniques being employed to achieve speech
separation. The review looks at the whole pipeline of speech enhancement from
feature extraction, how DNN based tools are modelling both global and local
features of speech and model training (supervised and unsupervised). We also
review the use of speech-enhancement pre-trained models to boost speech
enhancement process. The review is geared towards covering the dominant trends
with regards to DNN application in speech enhancement in speech obtained via a
single speaker.Comment: conferenc
Ensemble Hierarchical Extreme Learning Machine for Speech Dereverberation
ata-driven deep learning solutions, which are gradient-based neural architectures, have proven useful in overcoming some limitations of traditional signal processing techniques. However, a large number of reverberated-anechoic training utterance pairs covering as many environmental conditions as possible is required to achieve robust performance in unseen testing conditions. In this study, we propose to address the data requirement issue while preserving the advantages of deep neural structures leveraging upon hierarchical extreme learning machines (HELMs), which are not gradient-based neural architectures. In particular, an ensemble HELM learning framework is established to effectively recover anechoic speech from a reverberated one based on a spectral mapping. In addition to the ensemble learning framework, we further derive two novel HELM models, namely highway HELM, termed HELM(Hwy), and residual HELM, termed HELM(Res), both incorporating low-level features to enrich the information for spectral mapping. We evaluated the proposed ensemble learning framework using simulated and measured impulse responses by employing TIMIT, MHINT, and REVERB corpora. Experimental results show that the proposed framework outperforms both traditional methods and a recently proposed integrated deep and ensemble learning algorithm in terms of standardized objective and subjective evaluations under matched and mismatched testing conditions for simulated and measured impulse responses
Integrating Plug-and-Play Data Priors with Weighted Prediction Error for Speech Dereverberation
Speech dereverberation aims to alleviate the detrimental effects of
late-reverberant components. While the weighted prediction error (WPE) method
has shown superior performance in dereverberation, there is still room for
further improvement in terms of performance and robustness in complex and noisy
environments. Recent research has highlighted the effectiveness of integrating
physics-based and data-driven methods, enhancing the performance of various
signal processing tasks while maintaining interpretability. Motivated by these
advancements, this paper presents a novel dereverberation frame-work, which
incorporates data-driven methods for capturing speech priors within the WPE
framework. The plug-and-play strategy (PnP), specifically the regularization by
denoising (RED) strategy, is utilized to incorporate speech prior information
learnt from data during the optimization problem solving iterations.
Experimental results validate the effectiveness of the proposed approach
GibbsDDRM: A Partially Collapsed Gibbs Sampler for Solving Blind Inverse Problems with Denoising Diffusion Restoration
Pre-trained diffusion models have been successfully used as priors in a
variety of linear inverse problems, where the goal is to reconstruct a signal
from noisy linear measurements. However, existing approaches require knowledge
of the linear operator. In this paper, we propose GibbsDDRM, an extension of
Denoising Diffusion Restoration Models (DDRM) to a blind setting in which the
linear measurement operator is unknown. GibbsDDRM constructs a joint
distribution of the data, measurements, and linear operator by using a
pre-trained diffusion model for the data prior, and it solves the problem by
posterior sampling with an efficient variant of a Gibbs sampler. The proposed
method is problem-agnostic, meaning that a pre-trained diffusion model can be
applied to various inverse problems without fine-tuning. In experiments, it
achieved high performance on both blind image deblurring and vocal
dereverberation tasks, despite the use of simple generic priors for the
underlying linear operators
Feature enhancement of reverberant speech by distribution matching and non-negative matrix factorization
This paper describes a novel two-stage dereverberation feature enhancement method for noise-robust automatic speech recognition. In the first stage, an estimate of the dereverberated speech is generated by matching the distribution of the observed reverberant speech to that of clean speech, in a decorrelated transformation domain that has a long temporal context in order to address the effects of reverberation. The second stage uses this dereverberated signal as an initial estimate within a non-negative matrix factorization framework, which jointly estimates a sparse representation of the clean speech signal and an estimate of the convolutional distortion. The proposed feature enhancement method, when used in conjunction with automatic speech recognizer back-end processing, is shown to improve the recognition performance compared to three other state-of-the-art techniques
- …