Search CORE

7 research outputs found

Environmental Sound Classification with Parallel Temporal-spectral Attention

Author: Chong Dading
Wang Helin
Wang Wenwu
Zou Yuexian
Publication venue
Publication date: 20/05/2020
Field of study

Convolutional neural networks (CNN) are one of the best-performing neural network architectures for environmental sound classification (ESC). Recently, temporal attention mechanisms have been used in CNN to capture the useful information from the relevant time frames for audio classification, especially for weakly labelled data where the onset and offset times of the sound events are not applied. In these methods, however, the inherent spectral characteristics and variations are not explicitly exploited when obtaining the deep features. In this paper, we propose a novel parallel temporal-spectral attention mechanism for CNN to learn discriminative sound representations, which enhances the temporal and spectral features by capturing the importance of different time frames and frequency bands. Parallel branches are constructed to allow temporal attention and spectral attention to be applied respectively in order to mitigate interference from the segments without the presence of sound events. The experiments on three environmental sound classification (ESC) datasets and two acoustic scene classification (ASC) datasets show that our method improves the classification performance and also exhibits robustness to noise.Comment: submitted to INTERSPEECH202

arXiv.org e-Print Archive

Crossref

University of Surrey

Surrey Research Insight

Deep recurrent neural networks with attention mechanisms for respiratory anomaly classification.

Author: Mistry Kamlesh
Wall Conor
Yu Yonghong
Zhang Li
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/07/2021
Field of study

In recent years, a variety of deep learning techniques and methods have been adopted to provide AI solutions to issues within the medical field, with one specific area being audio-based classification of medical datasets. This research aims to create a novel deep learning architecture for this purpose, with a variety of different layer structures implemented for undertaking audio classification. Specifically, bidirectional Long Short-Term Memory (BiLSTM) and Gated Recurrent Units (GRU) networks in conjunction with an attention mechanism, are implemented in this research for chronic and non-chronic lung disease and COVID-19 diagnosis. We employ two audio datasets, i.e. the Respiratory Sound and the Coswara datasets, to evaluate the proposed model architectures pertaining to lung disease classification. The Respiratory Sound Database contains audio data with respect to lung conditions such as Chronic Obstructive Pulmonary Disease (COPD) and asthma, while the Coswara dataset contains coughing audio samples associated with COVID-19. After a comprehensive evaluation and experimentation process, as the most performant architecture, the proposed attention BiLSTM network (A-BiLSTM) achieves accuracy rates of 96.2% and 96.8% for the Respiratory Sound and the Coswara datasets, respectively. Our research indicates that the implementation of the BiLSTM and attention mechanism was effective in improving performance for undertaking audio classification with respect to various lung condition diagnoses

Open Access Institutional Repository at Robert Gordon University

CnnSound: Convolutional Neural Networks for the Classification of Environmental Sounds

Author: inik ozkan
SEKER Huseyin
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date
Field of study

The classification of environmental sounds (ESC) has been increasingly studied in recent years. The main reason is that environmental sounds are part of our daily life, and associating them with our environment that we live in is important in several aspects as ESC is used in areas such as managing smart cities, determining location from environmental sounds, surveillance systems, machine hearing, environment monitoring. The ESC is however more difficult than other sounds because there are too many parameters that generate background noise in the ESC, which makes the sound more difficult to model and classify. The main aim of this study is therefore to develop more robust convolution neural networks architecture (CNN). For this purpose, 150 different CNN-based models were designed by changing the number of layers and values of their tuning parameters used in the layers. In order to test the accuracy of the models, the Urbansound8k environmental sound database was used. The sounds in this data set were first converted into an image format of 32x32x3. The proposed CNN model has yielded an accuracy of as much as 82.5% being higher than its classical counterpart. As there was not that much fine-tuning, the obtained accuracy has been found to be better and satisfactory compared to other studies on the Urbansound8k when both accuracy and computational complexity are considered. The results also suggest further improvement possible due to low complexity of the proposed CNN architecture and its applicability in real-world settings

STORE - Staffordshire Online Repository

Sound classification using evolving ensemble models and Particle Swarm Optimization

Author: Jiang Ming
Lim Chee Peng
Yu Yonghong
Zhang Li
Publication venue: 'Elsevier BV'
Publication date: 08/01/2022
Field of study

Royal Holloway - Pure

Attention based convolutional recurrent neural network for environmental sound classification

Author: Aytar
Barchiesi
Bisot
Boddapati
Chu
Dhanalakshmi
Geiger
Guo
Li
Lyon
McLoughlin
Mesaros
Piczak
Vacher
Valero
Yang
Zhang
Zhang
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref