32,921 research outputs found
Low-Resource Music Genre Classification with Advanced Neural Model Reprogramming
Transfer learning (TL) approaches have shown promising results when handling
tasks with limited training data. However, considerable memory and
computational resources are often required for fine-tuning pre-trained neural
networks with target domain data. In this work, we introduce a novel method for
leveraging pre-trained models for low-resource (music) classification based on
the concept of Neural Model Reprogramming (NMR). NMR aims at re-purposing a
pre-trained model from a source domain to a target domain by modifying the
input of a frozen pre-trained model. In addition to the known,
input-independent, reprogramming method, we propose an advanced reprogramming
paradigm: Input-dependent NMR, to increase adaptability to complex input data
such as musical audio. Experimental results suggest that a neural model
pre-trained on large-scale datasets can successfully perform music genre
classification by using this reprogramming method. The two proposed
Input-dependent NMR TL methods outperform fine-tuning-based TL methods on a
small genre classification dataset.Comment: Submitted to ICASSP 2023. Some experimental results were reduced due
to the space limit. The implementation will be available at
https://github.com/biboamy/music-repr
Transfer learning by supervised pre-training for audio-based music classification
Very few large-scale music research datasets are publicly available. There is an increasing need for such datasets, because the shift from physical to digital distribution in the music industry has given the listener access to a large body of music, which needs to be cataloged efficiently and be easily browsable. Additionally, deep learning and feature learning techniques are becoming increasingly popular for music information retrieval applications, and they typically require large amounts of training data to work well. In this paper, we propose to exploit an available large-scale music dataset, the Million Song Dataset (MSD), for classification tasks on other datasets, by reusing models trained on the MSD for feature extraction. This transfer learning approach, which we refer to as supervised pre-training, was previously shown to be very effective for computer vision problems. We show that features learned from MSD audio fragments in a supervised manner, using tag labels and user listening data, consistently outperform features learned in an unsupervised manner in this setting, provided that the learned feature extractor is of limited complexity. We evaluate our approach on the GTZAN, 1517-Artists, Unique and Magnatagatune datasets
- …