Search CORE

16 research outputs found

Sample Mixed-Based Data Augmentation for Domestic Audio Tagging

Author: Kong Qiuqiang
Liao Feifan
Wang Dezhi
Wang Huaimin
Wei Shengyun
Xu Kele
Publication venue
Publication date: 01/01/2018
Field of study

Audio tagging has attracted increasing attention since last decade and has various potential applications in many fields. The objective of audio tagging is to predict the labels of an audio clip. Recently deep learning methods have been applied to audio tagging and have achieved state-of-the-art performance, which provides a poor generalization ability on new data. However due to the limited size of audio tagging data such as DCASE data, the trained models tend to result in overfitting of the network. Previous data augmentation methods such as pitch shifting, time stretching and adding background noise do not show much improvement in audio tagging. In this paper, we explore the sample mixed data augmentation for the domestic audio tagging task, including mixup, SamplePairing and extrapolation. We apply a convolutional recurrent neural network (CRNN) with attention module with log-scaled mel spectrum as a baseline system. In our experiments, we achieve an state-of-the-art of equal error rate (EER) of 0.10 on DCASE 2016 task4 dataset with mixup approach, outperforming the baseline system without data augmentation.Comment: submitted to the workshop of Detection and Classification of Acoustic Scenes and Events 2018 (DCASE 2018), 19-20 November 2018, Surrey, U

arXiv.org e-Print Archive

University of Surrey

Surrey Research Insight

Relational Teacher Student Learning with Neural Label Embedding for Device Adaptation in Acoustic Scene Classification

Author: Hu Hu
Lee Chin-Hui
Siniscalchi Sabato Marco
Wang Yannan
Publication venue
Publication date: 01/01/2020
Field of study

In this paper, we propose a domain adaptation framework to address the device mismatch issue in acoustic scene classification leveraging upon neural label embedding (NLE) and relational teacher student learning (RTSL). Taking into account the structural relationships between acoustic scene classes, our proposed framework captures such relationships which are intrinsically device-independent. In the training stage, transferable knowledge is condensed in NLE from the source domain. Next in the adaptation stage, a novel RTSL strategy is adopted to learn adapted target models without using paired source-target data often required in conventional teacher student learning. The proposed framework is evaluated on the DCASE 2018 Task1b data set. Experimental results based on AlexNet-L deep classification models confirm the effectiveness of our proposed approach for mismatch situations. NLE-alone adaptation compares favourably with the conventional device adaptation and teacher student based adaptation techniques. NLE with RTSL further improves the classification accuracy.Comment: Accepted by Interspeech 202

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università di Palermo

A Two-Stage Approach to Device-Robust Acoustic Scene Classification

Author: Bai Xue
Bao Feng
Chai Li
Du Jun
Hu Hu
Lee Chin-Hui
Li Juanjuan
Niu Shutong
Siniscalchi Sabato Marco
Tang Xin
Wang Yajian
Wang Yannan
Xia Xianjun
Yang Chao-Han Huck
Zhao Yuanjun
Zhu Hongning
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/11/2020
Field of study

To improve device robustness, a highly desirable key feature of a competitive data-driven acoustic scene classification (ASC) system, a novel two-stage system based on fully convolutional neural networks (CNNs) is proposed. Our two-stage system leverages on an ad-hoc score combination based on two CNN classifiers: (i) the first CNN classifies acoustic inputs into one of three broad classes, and (ii) the second CNN classifies the same inputs into one of ten finer-grained classes. Three different CNN architectures are explored to implement the two-stage classifiers, and a frequency sub-sampling scheme is investigated. Moreover, novel data augmentation schemes for ASC are also investigated. Evaluated on DCASE 2020 Task 1a, our results show that the proposed ASC system attains a state-of-the-art accuracy on the development set, where our best system, a two-stage fusion of CNN ensembles, delivers a 81.9% average accuracy among multi-device test data, and it obtains a significant improvement on unseen devices. Finally, neural saliency analysis with class activation mapping (CAM) gives new insights on the patterns learnt by our models.Comment: Submitted to ICASSP 2021. Code available: https://github.com/MihawkHu/DCASE2020_task

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Palermo

Multi-channel spectrograms for speech processing applications using deep learning methods

Author: Arias-Vergara T.
Klumpp P.
Nöth E.
Orozco-Arroyave J. R.
Schuster M.
Vasquez-Correa J. C.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Open Access LMU