11 research outputs found
Domestic Activities Classification from Audio Recordings Using Multi-scale Dilated Depthwise Separable Convolutional Network
Domestic activities classification (DAC) from audio recordings aims at
classifying audio recordings into pre-defined categories of domestic
activities, which is an effective way for estimation of daily activities
performed in home environment. In this paper, we propose a method for DAC from
audio recordings using a multi-scale dilated depthwise separable convolutional
network (DSCN). The DSCN is a lightweight neural network with small size of
parameters and thus suitable to be deployed in portable terminals with limited
computing resources. To expand the receptive field with the same size of DSCN's
parameters, dilated convolution, instead of normal convolution, is used in the
DSCN for further improving the DSCN's performance. In addition, the embeddings
of various scales learned by the dilated DSCN are concatenated as a multi-scale
embedding for representing property differences among various classes of
domestic activities. Evaluated on a public dataset of the Task 5 of the 2018
challenge on Detection and Classification of Acoustic Scenes and Events
(DCASE-2018), the results show that: both dilated convolution and multi-scale
embedding contribute to the performance improvement of the proposed method; and
the proposed method outperforms the methods based on state-of-the-art
lightweight network in terms of classification accuracy.Comment: 5 pages, 2 figures, 4 tables. Accepted for publication in IEEE
MMSP202
Domestic Activity Clustering from Audio via Depthwise Separable Convolutional Autoencoder Network
Automatic estimation of domestic activities from audio can be used to solve
many problems, such as reducing the labor cost for nursing the elderly people.
This study focuses on solving the problem of domestic activity clustering from
audio. The target of domestic activity clustering is to cluster audio clips
which belong to the same category of domestic activity into one cluster in an
unsupervised way. In this paper, we propose a method of domestic activity
clustering using a depthwise separable convolutional autoencoder network. In
the proposed method, initial embeddings are learned by the depthwise separable
convolutional autoencoder, and a clustering-oriented loss is designed to
jointly optimize embedding refinement and cluster assignment. Different methods
are evaluated on a public dataset (a derivative of the SINS dataset) used in
the challenge on Detection and Classification of Acoustic Scenes and Events
(DCASE) in 2018. Our method obtains the normalized mutual information (NMI)
score of 54.46%, and the clustering accuracy (CA) score of 63.64%, and
outperforms state-of-the-art methods in terms of NMI and CA. In addition, both
computational complexity and memory requirement of our method is lower than
that of previous deep-model-based methods. Codes:
https://github.com/vinceasvp/domestic-activity-clustering-from-audioComment: 6 pages, 5 figures, 4 tables. Accepted by IEEE MMSP 202
Low-Complexity Acoustic Scene Classification Using Data Augmentation and Lightweight ResNet
We present a work on low-complexity acoustic scene classification (ASC) with
multiple devices, namely the subtask A of Task 1 of the DCASE2021 challenge.
This subtask focuses on classifying audio samples of multiple devices with a
low-complexity model, where two main difficulties need to be overcome. First,
the audio samples are recorded by different devices, and there is mismatch of
recording devices in audio samples. We reduce the negative impact of the
mismatch of recording devices by using some effective strategies, including
data augmentation (e.g., mix-up, spectrum correction, pitch shift), usages of
multi-patch network structure and channel attention. Second, the model size
should be smaller than a threshold (e.g., 128 KB required by the DCASE2021
challenge). To meet this condition, we adopt a ResNet with both depthwise
separable convolution and channel attention as the backbone network, and
perform model compression. In summary, we propose a low-complexity ASC method
using data augmentation and a lightweight ResNet. Evaluated on the official
development and evaluation datasets, our method obtains classification accuracy
scores of 71.6% and 66.7%, respectively; and obtains Log-loss scores of 1.038
and 1.136, respectively. Our final model size is 110.3 KB which is smaller than
the maximum of 128 KB.Comment: 5 pages, 5 figures, 4 tables. Accepted for publication in the 16th
IEEE International Conference on Signal Processing (IEEE ICSP
Diseño, implementación y evaluación de una estrategia de detección de eventos acústicos
Este trabajo tiene como objetivo el estudio y desarrollo de un sistema de detección de eventos acústicos.
Para ello, se parte de un sistema previo sobre el que se realiza una evaluación en diferentes condiciones
y sobre varias bases de datos, incluyendo modificaciones en la arquitectura y parámetros de control del
mismo. Este sistema utiliza un método conocido como redes neuronales, las cuales, tras una etapa de
entrenamiento, generan modelos capaces de realizar predicciones sobre unos datos de entrada. Estos datos de entrada se corresponden con archivos de audio y su etiquetado extraídos de bases de datos disponibles en la red.
Durante un periodo de búsqueda de información, se recopilaron diferentes bases de datos y sistemas
disponibles públicamente orientados a la detección de eventos sonoros. De esta recopilación de sistemas
se selecciona uno de ellos como sistema de referencia empleado en este trabajo. El sistema de referencia es entrenado y evaluado utilizando la base de datos que utiliza por defecto, para comprobar su correcto funcionamiento. Posteriormente este sistema fue evaluado utilizando un conjunto de datos que contiene algunas de las clases de eventos sonoros que este sistema es capaz de detectar extraídos de algunas de las bases de datos recopiladas anteriormente.
De la misma forma, el sistema de referencia se entrenó y evaluó utilizando otra base de datos que
contiene clases de eventos sonoros distintas a las utilizada por el sistema previamente, apodando a esta
variante como “sistema urban”. Para concluir se realizan varios cambios en algunos parámetros de la red
neuronal empleada en este sistema, entrenando y evaluando nuevamente el sistema de referencia y el
sistema urban para cada modificación, describiendo los resultados obtenidos para cada caso.The objective of this work is the study and development of an acoustic event detection system. For this
purpose, we start from a previous system which is evaluated under different conditions and datasets,
including modifications in its architecture and control parameters. This system uses a method known as
neural networks, which, after a training stage, generate models capable of making predictions on input
data. This input data corresponds to audio files and their labeling extracted from databases available on
the network.
In the same way, the reference system was trained and evaluated using another database that contains
classes of sound events different from those used by the system previously. This new system was dubbed
as “sistema urban”. To conclude, several changes will be made in some parameters of the neural network
used in this system, training and evaluating again the reference system and “sistema urban” for each
modification, commenting on the results obtained for each case.Grado en Ingeniería en Sistemas de Telecomunicació