Search CORE

282 research outputs found

Low Rank and Sparsity Analysis Applied to Speech Enhancement via Online Estimated Dictionary

Author: Qin Jun
Sun Pengfei
Publication venue: OpenSIUC
Publication date: 29/09/2016
Field of study

In this letter, we propose an online estimated local dictionary based single-channel speech enhancement algorithm, which focuses on low-rank and sparse matrix decomposition. In the proposed algorithm, a noisy speech spectrogram can be decomposed into low-rank background noise components and an activation of the online speech dictionary, on which both low-rank and sparsity constraints are imposed. This decomposition takes the advantage of local estimated exemplar’s high expressiveness on speech components and also accommodates nonstationary background noise. The local dictionary can be obtained through estimating the speech presence probability (SPP) by applying expectation–maximal algorithm, in which a generalized Gamma prior for speech magnitude spectrum is used. The proposed algorithm is evaluated using signal-to-distortion ratio, and perceptual evaluation of speech quality. The results show that the proposed algorithm achieves significant improvements at various SNRs when compared to four other speech enhancement algorithms, including improved Karhunen–Loeve transform approach, SPP-based MMSE, nonnegative matrix factorization-based robust principal component analysis (RPCA), and RPCA

arXiv.org e-Print Archive

OpenSIUC

Dictionary Learning-Based Speech Enhancement

Author: Bui Manh-Quan
Duong Viet-Hang
Wang Jia-Ching
Publication venue: 'IntechOpen'
Publication date: 06/05/2019
Field of study

IntechOpen

Crossref

Recommended from our members

Adaptive Noise Reduction for Sound Event Detection Using Subband-Weighted NMF

Author: Emmanouil Benetos
Févotte
Qing Zhou
Virtanen
Zuren Feng
Publication venue: 'MDPI AG'
Publication date: 12/07/2019
Field of study

Sound event detection in real-world environments suffers from the interference of non-stationary and time-varying noise. This paper presents an adaptive noise reduction method for sound event detection based on non-negative matrix factorization (NMF). First, a scheme for noise dictionary learning from the input noisy signal is employed by the technique of robust NMF, which supports adaptation to noise variations. The estimated noise dictionary is used to develop a supervised source separation framework in combination with a pre-trained event dictionary. Second, to improve the separation quality, we extend the basic NMF model to a weighted form, with the aim of varying the relative importance of the different components when separating a target sound event from noise. With properly designed weights, the separation process is forced to rely more on those dominant event components, whereas the noise gets greatly suppressed. The proposed method is evaluated on a dataset of the rare sound event detection task of the DCASE 2017 challenge, and achieves comparable results to the top-ranking system based on convolutional recurrent neural networks (CRNNs). The proposed weighted NMF method shows an excellent noise reduction ability, and achieves an improvement of an F-score by 5%, compared to the unweighted approach

City Research Online

Crossref

Queen Mary Research Online

Hybrid sparse and low-rank time-frequency signal decomposition

Author: Févotte Cédric
Kowalski Matthieu
Publication venue: HAL CCSD
Publication date: 31/08/2015
Field of study

International audienceWe propose a new hybrid (or morphological) generative model that decomposes a signal into two (and possibly more) layers. Each layer is a linear combination of localised atoms from a time-frequency dictionary. One layer has a low-rank time-frequency structure while the other as a sparse structure. The time-frequency resolutions of the dictionaries describing each layer may be different. Our contribution builds on the recently introduced Low-Rank Time-Frequency Synthesis (LRTFS) model and proposes an iterative algorithm similar to the popular iterative shrinkage/thresholding algorithm. We illustrate the capacities of the proposed model and estimation procedure on a tonal + transient audio decomposition example. Index Terms— Low-rank time-frequency synthesis, sparse component analysis, hybrid/morphological decom-positions, non-negative matrix factorisation

HAL-CentraleSupelec

HAL-UNICE

INRIA a CCSD electronic archive server

HAL-INSU

HAL-CEA

HAL-Rennes 1

Recommended from our members

Single Channel auditory source separation with neural network

Author: Chen Zhuo
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2017
Field of study

Although distinguishing diﬀerent sounds in noisy environment is a relative easy task for human, source separation has long been extremely diﬃcult in audio signal processing. The problem is challenging for three reasons: the large variety of sound type, the abundant mixing conditions and the unclear mechanism to distinguish sources, especially for similar sounds. In recent years, the neural network based methods achieved impressive successes in various problems, including the speech enhancement, where the task is to separate the clean speech out of the noise mixture. However, the current deep learning based source separator does not perform well on real recorded noisy speech, and more importantly, is not applicable in a more general source separation scenario such as overlapped speech. In this thesis, we ﬁrstly propose extensions for the current mask learning network, for the problem of speech enhancement, to ﬁx the scale mismatch problem which is usually occurred in real recording audio. We solve this problem by combining two additional restoration layers in the existing mask learning network. We also proposed a residual learning architecture for the speech enhancement, further improving the network generalization under diﬀerent recording conditions. We evaluate the proposed speech enhancement models on CHiME 3 data. Without retraining the acoustic model, the best bi-direction LSTM with residue connections yields 25.13% relative WER reduction on real data and 34.03% WER on simulated data. Then we propose a novel neural network based model called “deep clustering” for more general source separation tasks. We train a deep network to assign contrastive embedding vectors to each time-frequency region of the spectrogram in order to implicitly predict the segmentation labels of the target spectrogram from the input mixtures. This yields a deep network-based analogue to spectral clustering, in that the embeddings form a low-rank pairwise aﬃnity matrix that approximates the ideal aﬃnity matrix, while enabling much faster performance. At test time, the clustering step “decodes” the segmentation implicit in the embeddings by optimizing K-means with respect to the unknown assignments. Experiments on single channel mixtures from multiple speakers show that a speaker-independent model trained on two-speaker and three speakers mixtures can improve signal quality for mixtures of held-out speakers by an average over 10dB. We then propose an extension for deep clustering named “deep attractor” network that allows the system to perform eﬃcient end-to-end training. In the proposed model, attractor points for each source are ﬁrstly created the acoustic signals which pull together the time-frequency bins corresponding to each source by ﬁnding the centroids of the sources in the embedding space, which are subsequently used to determine the similarity of each bin in the mixture to each source. The network is then trained to minimize the reconstruction error of each source by optimizing the embeddings. We showed that this frame work can achieve even better results. Lastly, we introduce two applications of the proposed models, in singing voice separation and the smart hearing aid device. For the former, a multi-task architecture is proposed, which combines the deep clustering and the classiﬁcation based network. And a new state of the art separation result was achieved, where the signal to noise ratio was improved by 11.1dB on music and 7.9dB on singing voice. In the application of smart hearing aid device, we combine the neural decoding with the separation network. The system ﬁrstly decodes the user’s attention, which is further used to guide the separator for the targeting source. Both objective study and subjective study show the proposed system can accurately decode the attention and significantly improve the user experience

Columbia University Academic Commons

Denoising sound signals in a bioinspired non-negative spectro-temporal domain

Author: Di Persia Leandro Ezequiel
Goddard J.
Martínez César Ernesto
Milone Diego Humberto
Rufiner Hugo Leonardo
Publication venue: Academic Press Inc Elsevier Science
Publication date: 01/03/2015
Field of study

The representation of sound signals at the cochlea and auditory cortical level has been studied as an alternative to classical analysis methods. In this work, we put forward a recently proposed feature extraction method called approximate auditory cortical representation, based on an approximation to the statistics of discharge patterns at the primary auditory cortex. The approach here proposed estimates a non-negative sparse coding with a combined dictionary of atoms. These atoms represent the spectro-temporal receptive fields of the auditory cortical neurons, and are calculated from the auditory spectrograms of clean signal and noise. The denoising is carried out on noisy signals by the reconstruction of the signal discarding the atoms corresponding to the noise. Experiments are presented using synthetic (chirps) and real data (speech), in the presence of additive noise. For the evaluation of the new method and its variants, we used two objective measures: the perceptual evaluation of speech quality and the segmental signal-to-noise ratio. Results show that the proposed method improves the quality of the signals, mainly under severe degradation.Fil: Martínez, César Ernesto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Goddard, J.. Universidad Autónoma Metropolitana; MéxicoFil: Di Persia, Leandro Ezequiel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Rufiner, Hugo Leonardo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentina. Universidad Nacional de Entre Ríos. Facultad de Ingeniería; Argentin

CONICET Digital

Low-Rank and Sparsity Analysis Applied to Speech Enhancement Via Online Estimated Dictionary

Author: Jun Qin
Pengfei Sun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref