5 research outputs found
A Four-Stage Data Augmentation Approach to ResNet-Conformer Based Acoustic Modeling for Sound Event Localization and Detection
In this paper, we propose a novel four-stage data augmentation approach to
ResNet-Conformer based acoustic modeling for sound event localization and
detection (SELD). First, we explore two spatial augmentation techniques, namely
audio channel swapping (ACS) and multi-channel simulation (MCS), to deal with
data sparsity in SELD. ACS and MDS focus on augmenting the limited training
data with expanding direction of arrival (DOA) representations such that the
acoustic models trained with the augmented data are robust to localization
variations of acoustic sources. Next, time-domain mixing (TDM) and
time-frequency masking (TFM) are also investigated to deal with overlapping
sound events and data diversity. Finally, ACS, MCS, TDM and TFM are combined in
a step-by-step manner to form an effective four-stage data augmentation scheme.
Tested on the Detection and Classification of Acoustic Scenes and Events
(DCASE) 2020 data sets, our proposed augmentation approach greatly improves the
system performance, ranking our submitted system in the first place in the SELD
task of DCASE 2020 Challenge. Furthermore, we employ a ResNet-Conformer
architecture to model both global and local context dependencies of an audio
sequence to yield further gains over those architectures used in the DCASE 2020
SELD evaluations.Comment: 12 pages, 8 figure