Search CORE

2 research outputs found

Blind Source Separation with Optimal Transport Non-negative Matrix Factorization

Author: Blondel Mathieu
Rolet Antoine
Sawada Hiroshi
Seguy Vivien
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/02/2018
Field of study

Optimal transport as a loss for machine learning optimization problems has recently gained a lot of attention. Building upon recent advances in computational optimal transport, we develop an optimal transport non-negative matrix factorization (NMF) algorithm for supervised speech blind source separation (BSS). Optimal transport allows us to design and leverage a cost between short-time Fourier transform (STFT) spectrogram frequencies, which takes into account how humans perceive sound. We give empirical evidence that using our proposed optimal transport NMF leads to perceptually better results than Euclidean NMF, for both isolated voice reconstruction and BSS tasks. Finally, we demonstrate how to use optimal transport for cross domain sound processing tasks, where frequencies represented in the input spectrograms may be different from one spectrogram to another.Comment: 22 pages, 7 figures, 2 additional file

arXiv.org e-Print Archive

Directory of Open Access Journals

Unsupervised Harmonic Parameter Estimation Using Differentiable DSP and Spectral Optimal Transport

Author: Peeters Geoffroy
Richard Gaël
Torres Bernardo
Publication venue
Publication date: 15/01/2024
Field of study

In neural audio signal processing, pitch conditioning has been used to enhance the performance of synthesizers. However, jointly training pitch estimators and synthesizers is a challenge when using standard audio-to-audio reconstruction loss, leading to reliance on external pitch trackers. To address this issue, we propose using a spectral loss function inspired by optimal transportation theory that minimizes the displacement of spectral energy. We validate this approach through an unsupervised autoencoding task that fits a harmonic template to harmonic signals. We jointly estimate the fundamental frequency and amplitudes of harmonics using a lightweight encoder and reconstruct the signals using a differentiable harmonic synthesizer. The proposed approach offers a promising direction for improving unsupervised parameter estimation in neural audio applications.Comment: Accepted in ICASSP 202

arXiv.org e-Print Archive