52 research outputs found
Efficient Acoustic Echo Suppression with Condition-Aware Training
The topic of deep acoustic echo control (DAEC) has seen many approaches with
various model topologies in recent years. Convolutional recurrent networks
(CRNs), consisting of a convolutional encoder and decoder encompassing a
recurrent bottleneck, are repeatedly employed due to their ability to preserve
nearend speech even in double-talk (DT) condition. However, past architectures
are either computationally complex or trade off smaller model sizes with a
decrease in performance. We propose an improved CRN topology which, compared to
other realizations of this class of architectures, not only saves parameters
and computational complexity, but also shows improved performance in DT,
outperforming both baseline architectures FCRN and CRUSE. Striving for a
condition-aware training, we also demonstrate the importance of a high
proportion of double-talk and the missing value of nearend-only speech in DAEC
training data. Finally, we show how to control the trade-off between aggressive
echo suppression and near-end speech preservation by fine-tuning with
condition-aware component loss functions.Comment: 5 pages, accepted to WASPAA 202
On the Importance of Harmonic Phase Modification for Improved Speech Signal Reconstruction
Abstract • Phase importance in single-channel speech enhancement • The current study addresses two questions: • 1) STFT or harmonic phase? • 2) Harmonic Phase: Unwrapped phase versus linear phase
Joint Single-Channel Speech Separation and Speaker Identification
In this paper, we propose a closed loop system to improve the perfor-mance of single-channel speech separation in a speaker independent scenario. The system is composed of two interconnected blocks: a separation block and a speaker identiſcation block. The improve-ment is accomplished by incorporating the speaker identities found by the speaker identiſcation block as additional information for the separation block, which converts the speaker-independent separation problem to a speaker-dependent one where the speaker codebooks are known. Simulation results show that the closed loop system en-hances the quality of the separated output signals. To assess the im-provements, the results are reported in terms of PESQ for both target and masked signals. Index Terms — Single-channel speech separation, speaker iden-tiſcation, sinusoidal mixture estimator, vector quantization, Gaus-sian mixture model. 1
- …