Search CORE

3 research outputs found

Anomalous Sound Detection using unsupervised and semi-supervised autoencoders and gammatone audio representation

Author: Cobos Maximo
Naranjo-Alcazar Javier
Perez-Castanos Sergi
Zuccarello Pedro
Publication venue
Publication date: 27/06/2020
Field of study

Anomalous sound detection (ASD) is, nowadays, one of the topical subjects in machine listening discipline. Unsupervised detection is attracting a lot of interest due to its immediate applicability in many fields. For example, related to industrial processes, the early detection of malfunctions or damage in machines can mean great savings and an improvement in the efficiency of industrial processes. This problem can be solved with an unsupervised ASD solution since industrial machines will not be damaged simply by having this audio data in the training stage. This paper proposes a novel framework based on convolutional autoencoders (both unsupervised and semi-supervised) and a Gammatone-based representation of the audio. The results obtained by these architectures substantially exceed the results presented as a baseline.Comment: Submitted to DCASE2020 Workshop, Workshop on Detection and Classification of Acoustic Scenes and Event

arXiv.org e-Print Archive

Unsupervised Detection of Anomalous Sound based on Deep Learning and the Neyman-Pearson Lemma

Author: Harada Noboru
Kawachi Hisashi Uematsum Yuta
Koizumi Yuma
Saito Shoichiro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/10/2018
Field of study

This paper proposes a novel optimization principle and its implementation for unsupervised anomaly detection in sound (ADS) using an autoencoder (AE). The goal of unsupervised-ADS is to detect unknown anomalous sound without training data of anomalous sound. Use of an AE as a normal model is a state-of-the-art technique for unsupervised-ADS. To decrease the false positive rate (FPR), the AE is trained to minimize the reconstruction error of normal sounds and the anomaly score is calculated as the reconstruction error of the observed sound. Unfortunately, since this training procedure does not take into account the anomaly score for anomalous sounds, the true positive rate (TPR) does not necessarily increase. In this study, we define an objective function based on the Neyman-Pearson lemma by considering ADS as a statistical hypothesis test. The proposed objective function trains the AE to maximize the TPR under an arbitrary low FPR condition. To calculate the TPR in the objective function, we consider that the set of anomalous sounds is the complementary set of normal sounds and simulate anomalous sounds by using a rejection sampling algorithm. Through experiments using synthetic data, we found that the proposed method improved the performance measures of ADS under low FPR conditions. In addition, we confirmed that the proposed method could detect anomalous sounds in real environments.Comment: IEEE/ACM Transactions on Audio, Speech, and Language Processing, 201

arXiv.org e-Print Archive

Inspection of Visible and Invisible Features of Objects with Image and Sound Signal Processing

Author
Publication venue
Publication date: 03/12/2008
Field of study

Abstract — In this paper, we propose a new method that can inspect visual and non-visual features of objects simultaneously by using image and sound signal processing techniques. A method for discriminating a property of an object with the use of generated sound when striking it with a hammer is called a hammering test. This method can investigate non-visual features of objects such as inner structure of objects, e.g., the existence of defects and cracks inside objects. However, this method depends on human experience and skills. In addition, if we perform this test over a wide area of objects, it is required to manually record hammering positions one by one. To solve these problems, this paper proposes a hammering test system consisting of two video cameras that can acquire image and sound signals of a hammering scene. The shape of the object (visual feature) is measured by the image signal processing from the result of 3-D measurement of each hammering position, and the thickness or material (non-visual feature) is estimated by the sound signal processing in time and frequency domains. The validity of proposed method is shown through experiments. I

CiteSeerX