3 research outputs found
Improving semi-supervised learning for audio classification with FixMatch
Including unlabeled data in the training process of neural networks using Semi-Supervised Learning (SSL) has shown impressive results in the image domain, where state-of-the-art results were obtained with only a fraction of the labeled data. The commonality between recent SSL methods is that they strongly rely on the augmentation of unannotated data. This is vastly unexplored for audio data. In this work, SSL using the state-of-the-art FixMatch approach is evaluated on three audio classification tasks, including music, industrial sounds, and acoustic scenes. The performance of FixMatch is compared to Convolutional Neural Networks (CNN) trained from scratch, Transfer Learning, and SSL using the Mean Teacher approach. Additionally, a simple yet effective approach for selecting suitable augmentation methods for FixMatch is introduced. FixMatch with the proposed modifications always outperformed Mean Teacher and the CNNs trained from scratch. For the industrial sounds and music datasets, the CNN baseline performance using the full dataset was reached with less than 5% of the initial training data, demonstrating the potential of recent SSL methods for audio data. Transfer Learning outperformed FixMatch only for the most challenging dataset from acoustic scene classification, showing that there is still room for improvement
Data augmentation for instrument classification robust to audio effects
Comunicaci贸 presentada a la 22a International Conference on Digital Audio Effects (DAFx-19) que se celebra del 2 al 6 de setembre de 2019 a Birmingham, Regne Unit.Reusing recorded sounds (sampling) is a key component in
Electronic Music Production (EMP), which has been present since
its early days and is at the core of genres like hip-hop or jungle.
Commercial and non-commercial services allow users to obtain
collections of sounds (sample packs) to reuse in their compositions.
Automatic classification of one-shot instrumental sounds
allows automatically categorising the sounds contained in these
collections, allowing easier navigation and better characterisation.
Automatic instrument classification has mostly targeted the
classification of unprocessed isolated instrumental sounds or detecting
predominant instruments in mixed music tracks. For this
classification to be useful in audio databases for EMP, it has to be
robust to the audio effects applied to unprocessed sounds.
In this paper we evaluate how a state of the art model trained
with a large dataset of one-shot instrumental sounds performs
when classifying instruments processed with audio effects. In order
to evaluate the robustness of the model, we use data augmentation
with audio effects and evaluate how each effect influences the
classification accuracy.This project has received funding from the European Union鈥檚 Horizon
2020 research and innovation programme under the Marie
Sk艂odowska-Curie grant agreement No 765068, MIP-Frontiers.
We thank Matthew Davies for reviewing a draft of this paper and
providing helpful feedback
Data augmentation for instrument classification robust to audio effects
Comunicaci贸 presentada a la 22a International Conference on Digital Audio Effects (DAFx-19) que se celebra del 2 al 6 de setembre de 2019 a Birmingham, Regne Unit.Reusing recorded sounds (sampling) is a key component in
Electronic Music Production (EMP), which has been present since
its early days and is at the core of genres like hip-hop or jungle.
Commercial and non-commercial services allow users to obtain
collections of sounds (sample packs) to reuse in their compositions.
Automatic classification of one-shot instrumental sounds
allows automatically categorising the sounds contained in these
collections, allowing easier navigation and better characterisation.
Automatic instrument classification has mostly targeted the
classification of unprocessed isolated instrumental sounds or detecting
predominant instruments in mixed music tracks. For this
classification to be useful in audio databases for EMP, it has to be
robust to the audio effects applied to unprocessed sounds.
In this paper we evaluate how a state of the art model trained
with a large dataset of one-shot instrumental sounds performs
when classifying instruments processed with audio effects. In order
to evaluate the robustness of the model, we use data augmentation
with audio effects and evaluate how each effect influences the
classification accuracy.This project has received funding from the European Union鈥檚 Horizon
2020 research and innovation programme under the Marie
Sk艂odowska-Curie grant agreement No 765068, MIP-Frontiers.
We thank Matthew Davies for reviewing a draft of this paper and
providing helpful feedback