979 research outputs found
Acoustic event detection for multiple overlapping similar sources
Many current paradigms for acoustic event detection (AED) are not adapted to
the organic variability of natural sounds, and/or they assume a limit on the
number of simultaneous sources: often only one source, or one source of each
type, may be active. These aspects are highly undesirable for applications such
as bird population monitoring. We introduce a simple method modelling the
onsets, durations and offsets of acoustic events to avoid intrinsic limits on
polyphony or on inter-event temporal patterns. We evaluate the method in a case
study with over 3000 zebra finch calls. In comparison against a HMM-based
method we find it more accurate at recovering acoustic events, and more robust
for estimating calling rates.Comment: Accepted for WASPAA 201
A Four-Stage Data Augmentation Approach to ResNet-Conformer Based Acoustic Modeling for Sound Event Localization and Detection
In this paper, we propose a novel four-stage data augmentation approach to
ResNet-Conformer based acoustic modeling for sound event localization and
detection (SELD). First, we explore two spatial augmentation techniques, namely
audio channel swapping (ACS) and multi-channel simulation (MCS), to deal with
data sparsity in SELD. ACS and MDS focus on augmenting the limited training
data with expanding direction of arrival (DOA) representations such that the
acoustic models trained with the augmented data are robust to localization
variations of acoustic sources. Next, time-domain mixing (TDM) and
time-frequency masking (TFM) are also investigated to deal with overlapping
sound events and data diversity. Finally, ACS, MCS, TDM and TFM are combined in
a step-by-step manner to form an effective four-stage data augmentation scheme.
Tested on the Detection and Classification of Acoustic Scenes and Events
(DCASE) 2020 data sets, our proposed augmentation approach greatly improves the
system performance, ranking our submitted system in the first place in the SELD
task of DCASE 2020 Challenge. Furthermore, we employ a ResNet-Conformer
architecture to model both global and local context dependencies of an audio
sequence to yield further gains over those architectures used in the DCASE 2020
SELD evaluations.Comment: 12 pages, 8 figure
Augmentation Methods on Monophonic Audio for Instrument Classification in Polyphonic Music
Instrument classification is one of the fields in Music Information Retrieval
(MIR) that has attracted a lot of research interest. However, the majority of
that is dealing with monophonic music, while efforts on polyphonic material
mainly focus on predominant instrument recognition. In this paper, we propose
an approach for instrument classification in polyphonic music from purely
monophonic data, that involves performing data augmentation by mixing different
audio segments. A variety of data augmentation techniques focusing on different
sonic aspects, such as overlaying audio segments of the same genre, as well as
pitch and tempo-based synchronization, are explored. We utilize Convolutional
Neural Networks for the classification task, comparing shallow to deep network
architectures. We further investigate the usage of a combination of the above
classifiers, each trained on a single augmented dataset. An ensemble of
VGG-like classifiers, trained on non-augmented, pitch-synchronized,
tempo-synchronized and genre-similar excerpts, respectively, yields the best
results, achieving slightly above 80% in terms of label ranking average
precision (LRAP) in the IRMAS test set.ruments in over 2300 testing tracks
- …