11 research outputs found

    Domestic Activities Classification from Audio Recordings Using Multi-scale Dilated Depthwise Separable Convolutional Network

    Full text link
    Domestic activities classification (DAC) from audio recordings aims at classifying audio recordings into pre-defined categories of domestic activities, which is an effective way for estimation of daily activities performed in home environment. In this paper, we propose a method for DAC from audio recordings using a multi-scale dilated depthwise separable convolutional network (DSCN). The DSCN is a lightweight neural network with small size of parameters and thus suitable to be deployed in portable terminals with limited computing resources. To expand the receptive field with the same size of DSCN's parameters, dilated convolution, instead of normal convolution, is used in the DSCN for further improving the DSCN's performance. In addition, the embeddings of various scales learned by the dilated DSCN are concatenated as a multi-scale embedding for representing property differences among various classes of domestic activities. Evaluated on a public dataset of the Task 5 of the 2018 challenge on Detection and Classification of Acoustic Scenes and Events (DCASE-2018), the results show that: both dilated convolution and multi-scale embedding contribute to the performance improvement of the proposed method; and the proposed method outperforms the methods based on state-of-the-art lightweight network in terms of classification accuracy.Comment: 5 pages, 2 figures, 4 tables. Accepted for publication in IEEE MMSP202

    Domestic Activity Clustering from Audio via Depthwise Separable Convolutional Autoencoder Network

    Get PDF
    Automatic estimation of domestic activities from audio can be used to solve many problems, such as reducing the labor cost for nursing the elderly people. This study focuses on solving the problem of domestic activity clustering from audio. The target of domestic activity clustering is to cluster audio clips which belong to the same category of domestic activity into one cluster in an unsupervised way. In this paper, we propose a method of domestic activity clustering using a depthwise separable convolutional autoencoder network. In the proposed method, initial embeddings are learned by the depthwise separable convolutional autoencoder, and a clustering-oriented loss is designed to jointly optimize embedding refinement and cluster assignment. Different methods are evaluated on a public dataset (a derivative of the SINS dataset) used in the challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) in 2018. Our method obtains the normalized mutual information (NMI) score of 54.46%, and the clustering accuracy (CA) score of 63.64%, and outperforms state-of-the-art methods in terms of NMI and CA. In addition, both computational complexity and memory requirement of our method is lower than that of previous deep-model-based methods. Codes: https://github.com/vinceasvp/domestic-activity-clustering-from-audioComment: 6 pages, 5 figures, 4 tables. Accepted by IEEE MMSP 202

    Low-Complexity Acoustic Scene Classification Using Data Augmentation and Lightweight ResNet

    Full text link
    We present a work on low-complexity acoustic scene classification (ASC) with multiple devices, namely the subtask A of Task 1 of the DCASE2021 challenge. This subtask focuses on classifying audio samples of multiple devices with a low-complexity model, where two main difficulties need to be overcome. First, the audio samples are recorded by different devices, and there is mismatch of recording devices in audio samples. We reduce the negative impact of the mismatch of recording devices by using some effective strategies, including data augmentation (e.g., mix-up, spectrum correction, pitch shift), usages of multi-patch network structure and channel attention. Second, the model size should be smaller than a threshold (e.g., 128 KB required by the DCASE2021 challenge). To meet this condition, we adopt a ResNet with both depthwise separable convolution and channel attention as the backbone network, and perform model compression. In summary, we propose a low-complexity ASC method using data augmentation and a lightweight ResNet. Evaluated on the official development and evaluation datasets, our method obtains classification accuracy scores of 71.6% and 66.7%, respectively; and obtains Log-loss scores of 1.038 and 1.136, respectively. Our final model size is 110.3 KB which is smaller than the maximum of 128 KB.Comment: 5 pages, 5 figures, 4 tables. Accepted for publication in the 16th IEEE International Conference on Signal Processing (IEEE ICSP

    Diseño, implementación y evaluación de una estrategia de detección de eventos acústicos

    Get PDF
    Este trabajo tiene como objetivo el estudio y desarrollo de un sistema de detección de eventos acústicos. Para ello, se parte de un sistema previo sobre el que se realiza una evaluación en diferentes condiciones y sobre varias bases de datos, incluyendo modificaciones en la arquitectura y parámetros de control del mismo. Este sistema utiliza un método conocido como redes neuronales, las cuales, tras una etapa de entrenamiento, generan modelos capaces de realizar predicciones sobre unos datos de entrada. Estos datos de entrada se corresponden con archivos de audio y su etiquetado extraídos de bases de datos disponibles en la red. Durante un periodo de búsqueda de información, se recopilaron diferentes bases de datos y sistemas disponibles públicamente orientados a la detección de eventos sonoros. De esta recopilación de sistemas se selecciona uno de ellos como sistema de referencia empleado en este trabajo. El sistema de referencia es entrenado y evaluado utilizando la base de datos que utiliza por defecto, para comprobar su correcto funcionamiento. Posteriormente este sistema fue evaluado utilizando un conjunto de datos que contiene algunas de las clases de eventos sonoros que este sistema es capaz de detectar extraídos de algunas de las bases de datos recopiladas anteriormente. De la misma forma, el sistema de referencia se entrenó y evaluó utilizando otra base de datos que contiene clases de eventos sonoros distintas a las utilizada por el sistema previamente, apodando a esta variante como “sistema urban”. Para concluir se realizan varios cambios en algunos parámetros de la red neuronal empleada en este sistema, entrenando y evaluando nuevamente el sistema de referencia y el sistema urban para cada modificación, describiendo los resultados obtenidos para cada caso.The objective of this work is the study and development of an acoustic event detection system. For this purpose, we start from a previous system which is evaluated under different conditions and datasets, including modifications in its architecture and control parameters. This system uses a method known as neural networks, which, after a training stage, generate models capable of making predictions on input data. This input data corresponds to audio files and their labeling extracted from databases available on the network. In the same way, the reference system was trained and evaluated using another database that contains classes of sound events different from those used by the system previously. This new system was dubbed as “sistema urban”. To conclude, several changes will be made in some parameters of the neural network used in this system, training and evaluating again the reference system and “sistema urban” for each modification, commenting on the results obtained for each case.Grado en Ingeniería en Sistemas de Telecomunicació
    corecore