2 research outputs found
Blind Source Separation with Optimal Transport Non-negative Matrix Factorization
Optimal transport as a loss for machine learning optimization problems has
recently gained a lot of attention. Building upon recent advances in
computational optimal transport, we develop an optimal transport non-negative
matrix factorization (NMF) algorithm for supervised speech blind source
separation (BSS). Optimal transport allows us to design and leverage a cost
between short-time Fourier transform (STFT) spectrogram frequencies, which
takes into account how humans perceive sound. We give empirical evidence that
using our proposed optimal transport NMF leads to perceptually better results
than Euclidean NMF, for both isolated voice reconstruction and BSS tasks.
Finally, we demonstrate how to use optimal transport for cross domain sound
processing tasks, where frequencies represented in the input spectrograms may
be different from one spectrogram to another.Comment: 22 pages, 7 figures, 2 additional file
Unsupervised Harmonic Parameter Estimation Using Differentiable DSP and Spectral Optimal Transport
In neural audio signal processing, pitch conditioning has been used to
enhance the performance of synthesizers. However, jointly training pitch
estimators and synthesizers is a challenge when using standard audio-to-audio
reconstruction loss, leading to reliance on external pitch trackers. To address
this issue, we propose using a spectral loss function inspired by optimal
transportation theory that minimizes the displacement of spectral energy. We
validate this approach through an unsupervised autoencoding task that fits a
harmonic template to harmonic signals. We jointly estimate the fundamental
frequency and amplitudes of harmonics using a lightweight encoder and
reconstruct the signals using a differentiable harmonic synthesizer. The
proposed approach offers a promising direction for improving unsupervised
parameter estimation in neural audio applications.Comment: Accepted in ICASSP 202