2 research outputs found
Spherical-harmonics-based sound field decomposition and multichannel NMF for sound source separation
Direction Specific Ambisonics Source Separation with End-To-End Deep Learning
Ambisonics is a scene-based spatial audio format that has several useful
features compared to object-based formats, such as efficient whole scene
rotation and versatility. However, it does not provide direct access to the
individual source signals, so that these have to be separated from the mixture
when required. Typically, this is done with linear spherical harmonics (SH)
beamforming. In this paper, we explore deep-learning-based source separation on
static Ambisonics mixtures. In contrast to most source separation approaches,
which separate a fixed number of sources of specific sound types, we focus on
separating arbitrary sound from specific directions. Specifically, we propose
three operating modes that combine a source separation neural network with SH
beamforming: refinement, implicit, and mixed mode. We show that a neural
network can implicitly associate conditioning directions with the spatial
information contained in the Ambisonics scene to extract specific sources. We
evaluate the performance of the three proposed approaches and compare them to
SH beamforming on musical mixtures generated with the musdb18 dataset, as well
as with mixtures generated with the FUSS dataset for universal source
separation, under both anechoic and room conditions. Results show that the
proposed approaches offer improved separation performance and spatial
selectivity compared to conventional SH beamforming.Comment: To be published in Acta Acustica. Code and listening examples:
https://github.com/francesclluis/direction-ambisonics-source-separatio