Search CORE

327 research outputs found

Bidirectional truncated recurrent neural networks for efficient speech denoising

Author: Brakel Philémon
Schrauwen Benjamin
Stroobandt Dirk
Publication venue
Publication date: 01/01/2013
Field of study

We propose a bidirectional truncated recurrent neural network architecture for speech denoising. Recent work showed that deep recurrent neural networks perform well at speech denoising tasks and outperform feed forward architectures [1]. However, recurrent neural networks are difficult to train and their simulation does not allow for much parallelization. Given the increasing availability of parallel computing architectures like GPUs this is disadvantageous. The architecture we propose aims to retain the positive properties of recurrent neural networks and deep learning while remaining highly parallelizable. Unlike a standard recurrent neural network, it processes information from both past and future time steps. We evaluate two variants of this architecture on the Aurora2 task for robust ASR where they show promising results. The models outperform the ETSI2 advanced front end and the SPLICE algorithm under matching noise conditions.We propose a bidirectional truncated recurrent neural network architecture for speech denoising. Recent work showed that deep recurrent neural networks perform well at speech denoising tasks and outperform feed forward architectures [1]. However, recurrent neural networks are difficult to train and their simulation does not allow for much parallelization. Given the increasing availability of parallel computing architectures like GPUs this is disadvantageous. The architecture we propose aims to retain the positive properties of recurrent neural networks and deep learning while remaining highly parallelizable. Unlike a standard recurrent neural network, it processes information from both past and future time steps. We evaluate two variants of this architecture on the Aurora2 task for robust ASR where they show promising results. The models outperform the ETSI2 advanced front end and the SPLICE algorithm under matching noise conditions.P

Ghent University Academic Bibliography

Archivsystem Ask23

High-dimensional sequence transduction

Author: Bengio Yoshua
Boulanger-Lewandowski Nicolas
Vincent Pascal
Publication venue
Publication date: 09/12/2012
Field of study

We investigate the problem of transforming an input sequence into a high-dimensional output sequence in order to transcribe polyphonic audio music into symbolic notation. We introduce a probabilistic model based on a recurrent neural network that is able to learn realistic output distributions given the input and we devise an efficient algorithm to search for the global mode of that distribution. The resulting method produces musically plausible transcriptions even under high levels of noise and drastically outperforms previous state-of-the-art approaches on five datasets of synthesized sounds and real recordings, approximately halving the test error rate

arXiv.org e-Print Archive

CiteSeerX

Crossref

The estimation and application of unnormalized statistical models

Author: Brakel Philémon
Publication venue: Ghent University. Faculty of Engineering and Architecture
Publication date: 01/01/2014
Field of study

Ghent University Academic Bibliography

On the Effectiveness of Low-Rank Matrix Factorization for LSTM Model Compression

Author: Barezi Elham J.
Fung Pascale
Madotto Andrea
Shin Jamin
Winata Genta Indra
Publication venue: Waseda Institute for the Study of Language and Information
Publication date: 01/01/2019
Field of study

Waseda University Repository

Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement

Author: Ai Yang
Ling Zhen-Hua
Lu Ye-Xin
Publication venue
Publication date: 17/08/2023
Field of study

Phase information has a significant impact on speech perceptual quality and intelligibility. However, existing speech enhancement methods encounter limitations in explicit phase estimation due to the non-structural nature and wrapping characteristics of the phase, leading to a bottleneck in enhanced speech quality. To overcome the above issue, in this paper, we proposed MP-SENet, a novel Speech Enhancement Network which explicitly enhances Magnitude and Phase spectra in parallel. The proposed MP-SENet adopts a codec architecture in which the encoder and decoder are bridged by time-frequency Transformers along both time and frequency dimensions. The encoder aims to encode time-frequency representations derived from the input distorted magnitude and phase spectra. The decoder comprises dual-stream magnitude and phase decoders, directly enhancing magnitude and wrapped phase spectra by incorporating a magnitude estimation architecture and a phase parallel estimation architecture, respectively. To train the MP-SENet model effectively, we define multi-level loss functions, including mean square error and perceptual metric loss of magnitude spectra, anti-wrapping loss of phase spectra, as well as mean square error and consistency loss of short-time complex spectra. Experimental results demonstrate that our proposed MP-SENet excels in high-quality speech enhancement across multiple tasks, including speech denoising, dereverberation, and bandwidth extension. Compared to existing phase-aware speech enhancement methods, it successfully avoids the bidirectional compensation effect between the magnitude and phase, leading to a better harmonic restoration. Notably, for the speech denoising task, the MP-SENet yields a state-of-the-art performance with a PESQ of 3.60 on the public VoiceBank+DEMAND dataset.Comment: Submmited to IEEE Transactions on Audio, Speech and Language Processin

arXiv.org e-Print Archive