250,887 research outputs found
Improving Sound Event Detection In Domestic Environments Using Sound Separation
Performing sound event detection on real-world recordings often implies
dealing with overlapping target sound events and non-target sounds, also
referred to as interference or noise. Until now these problems were mainly
tackled at the classifier level. We propose to use sound separation as a
pre-processing for sound event detection. In this paper we start from a sound
separation model trained on the Free Universal Sound Separation dataset and the
DCASE 2020 task 4 sound event detection baseline. We explore different methods
to combine separated sound sources and the original mixture within the sound
event detection. Furthermore, we investigate the impact of adapting the sound
separation model to the sound event detection data on both the sound separation
and the sound event detection
Sampling-Frequency-Independent Universal Sound Separation
This paper proposes a universal sound separation (USS) method capable of
handling untrained sampling frequencies (SFs). The USS aims at separating
arbitrary sources of different types and can be the key technique to realize a
source separator that can be universally used as a preprocessor for any
downstream tasks. To realize a universal source separator, there are two
essential properties: universalities with respect to source types and recording
conditions. The former property has been studied in the USS literature, which
has greatly increased the number of source types that can be handled by a
single neural network. However, the latter property (e.g., SF) has received
less attention despite its necessity. Since the SF varies widely depending on
the downstream tasks, the universal source separator must handle a wide variety
of SFs. In this paper, to encompass the two properties, we propose an
SF-independent (SFI) extension of a computationally efficient USS network,
SuDoRM-RF. The proposed network uses our previously proposed SFI convolutional
layers, which can handle various SFs by generating convolutional kernels in
accordance with an input SF. Experiments show that signal resampling can
degrade the USS performance and the proposed method works more consistently
than signal-resampling-based methods for various SFs.Comment: Submitted to ICASSP202
Audio Prompt Tuning for Universal Sound Separation
Universal sound separation (USS) is a task to separate arbitrary sounds from
an audio mixture. Existing USS systems are capable of separating arbitrary
sources, given a few examples of the target sources as queries. However,
separating arbitrary sounds with a single system is challenging, and the
robustness is not always guaranteed. In this work, we propose audio prompt
tuning (APT), a simple yet effective approach to enhance existing USS systems.
Specifically, APT improves the separation performance of specific sources
through training a small number of prompt parameters with limited audio
samples, while maintaining the generalization of the USS model by keeping its
parameters frozen. We evaluate the proposed method on MUSDB18 and ESC-50
datasets. Compared with the baseline model, APT can improve the
signal-to-distortion ratio performance by 0.67 dB and 2.06 dB using the full
training set of two datasets. Moreover, APT with only 5 audio samples even
outperforms the baseline systems utilizing full training data on the ESC-50
dataset, indicating the great potential of few-shot APT
Improving Sound Event Detection In Domestic Environments Using Sound Separation
International audiencePerforming sound event detection on real-world recordings often implies dealing with overlapping target sound events and non-target sounds, also referred to as interference or noise. Until now these problems were mainly tackled at the classifier level. We propose to use sound separation as a pre-processing for sound event detection. In this paper we start from a sound separation model trained on the Free Universal Sound Separation dataset and the DCASE 2020 task 4 sound event detection baseline. We explore different methods to combine separated sound sources and the original mixture within the sound event detection. Furthermore, we investigate the impact of adapting the sound separation model to the sound event detection data on both the sound separation and the sound event detection
Separating Invisible Sounds Toward Universal Audiovisual Scene-Aware Sound Separation
The audio-visual sound separation field assumes visible sources in videos,
but this excludes invisible sounds beyond the camera's view. Current methods
struggle with such sounds lacking visible cues. This paper introduces a novel
"Audio-Visual Scene-Aware Separation" (AVSA-Sep) framework. It includes a
semantic parser for visible and invisible sounds and a separator for
scene-informed separation. AVSA-Sep successfully separates both sound types,
with joint training and cross-modal alignment enhancing effectiveness.Comment: Accepted at ICCV 2023 - AV4D, 4 figures, 3 table
- …