54,135 research outputs found
Robust Sound Event Classification using Deep Neural Networks
The automatic recognition of sound events by computers is an important aspect of emerging applications such as automated surveillance, machine hearing and auditory scene understanding. Recent advances in machine learning, as well as in computational models of the human auditory system, have contributed to advances in this increasingly popular research field. Robust sound event classification, the ability to recognise sounds under real-world noisy conditions, is an especially challenging task. Classification methods translated from the speech recognition domain, using features such as mel-frequency cepstral coefficients, have been shown to perform reasonably well for the sound event classification task, although spectrogram-based or auditory image analysis techniques reportedly achieve superior performance in noise.
This paper outlines a sound event classification framework that compares auditory image front end features with spectrogram image-based front end features, using support vector machine and deep neural network classifiers. Performance is evaluated on a standard robust classification task in different levels of corrupting noise, and with several system enhancements, and shown to compare very well with current state-of-the-art classification techniques
Educational video classification by using a transcript to image transform and supervised learning
In this work, we present a method for automatic topic classification of educational videos using a speech transcript transform. Our method works as follows: First, speech recognition is used to generate video transcripts. Then, the transcripts are converted into images using a statistical co-occurrence transformation that we designed. Finally, a classifier is used to produce video category labels for a transcript image input. For our classifiers, we report results using a convolutional neural network (CNN) and a principal component analysis (PCA) model.
In order to evaluate our method, we used the Khan Academy on a Stick dataset that contains 2,545 videos, where each video is labeled with one or two of 13 categories. Experiments show that our method is effective and strongly competitive against other supervised learning-based methods
A privacy-preserving method using secret key for convolutional neural network-based speech classification
In this paper, we propose a privacy-preserving method with a secret key for
convolutional neural network (CNN)-based speech classification tasks. Recently,
many methods related to privacy preservation have been developed in image
classification research fields. In contrast, in speech classification research
fields, little research has considered these risks. To promote research on
privacy preservation for speech classification, we provide an encryption method
with a secret key in CNN-based speech classification systems. The encryption
method is based on a random matrix with an invertible inverse. The encrypted
speech data with a correct key can be accepted by a model with an encrypted
kernel generated using an inverse matrix of a random matrix. Whereas the
encrypted speech data is strongly distorted, the classification tasks can be
correctly performed when a correct key is provided. Additionally, in this
paper, we evaluate the difficulty of reconstructing the original information
from the encrypted spectrograms and waveforms. In our experiments, the proposed
encryption methods are performed in automatic speech recognition~(ASR) and
automatic speaker verification~(ASV) tasks. The results show that the encrypted
data can be used completely the same as the original data when a correct secret
key is provided in the transformer-based ASR and x-vector-based ASV with
self-supervised front-end systems. The robustness of the encrypted data against
reconstruction attacks is also illustrated.Comment: To appear in the 31st European Signal Processing Conference (EUSIPCO
2023
- …