Anomalous sound detection for machine condition monitoring has great
potential in the development of Industry 4.0. However, these anomalous sounds
of machines are usually unavailable in normal conditions. Therefore, the models
employed have to learn acoustic representations with normal sounds for
training, and detect anomalous sounds while testing. In this article, we
propose a self-supervised dual-path Transformer (SSDPT) network to detect
anomalous sounds in machine monitoring. The SSDPT network splits the acoustic
features into segments and employs several DPT blocks for time and frequency
modeling. DPT blocks use attention modules to alternately model the interactive
information about the frequency and temporal components of the segmented
acoustic features. To address the problem of lack of anomalous sound, we adopt
a self-supervised learning approach to train the network with normal sound.
Specifically, this approach randomly masks and reconstructs the acoustic
features, and jointly classifies machine identity information to improve the
performance of anomalous sound detection. We evaluated our method on the
DCASE2021 task2 dataset. The experimental results show that the SSDPT network
achieves a significant increase in the harmonic mean AUC score, in comparison
to present state-of-the-art methods of anomalous sound detection