7,833 research outputs found
Deep Learning and Music Adversaries
OA Monitor ExerciseOA Monitor ExerciseAn {\em adversary} is essentially an algorithm intent on making a classification system perform in some particular way given an input, e.g., increase the probability of a false negative. Recent work builds adversaries for deep learning systems applied to image object recognition, which exploits the parameters of the system to find the minimal perturbation of the input image such that the network misclassifies it with high confidence. We adapt this approach to construct and deploy an adversary of deep learning systems applied to music content analysis. In our case, however, the input to the systems is magnitude spectral frames, which requires special care in order to produce valid input audio signals from network-derived perturbations. For two different train-test partitionings of two benchmark datasets, and two different deep architectures, we find that this adversary is very effective in defeating the resulting systems. We find the convolutional networks are more robust, however, compared with systems based on a majority vote over individually classified audio frames. Furthermore, we integrate the adversary into the training of new deep systems, but do not find that this improves their resilience against the same adversary
Automatic Music Genre Classification of Audio Signals with Machine Learning Approaches
Musical genre classification is put into context byexplaining about the structures in music and how it is analyzedand perceived by humans. The increase of the music databaseson the personal collection and the Internet has brought a greatdemand for music information retrieval, and especiallyautomatic musical genre classification. In this research wefocused on combining information from the audio signal thandifferent sources. This paper presents a comprehensivemachine learning approach to the problem of automaticmusical genre classification using the audio signal. Theproposed approach uses two feature vectors, Support vectormachine classifier with polynomial kernel function andmachine learning algorithms. More specifically, two featuresets for representing frequency domain, temporal domain,cepstral domain and modulation frequency domain audiofeatures are proposed. Using our proposed features SVM act asstrong base learner in AdaBoost, so its performance of theSVM classifier cannot improve using boosting method. Thefinal genre classification is obtained from the set of individualresults according to a weighting combination late fusionmethod and it outperformed the trained fusion method. Musicgenre classification accuracy of 78% and 81% is reported onthe GTZAN dataset over the ten musical genres and theISMIR2004 genre dataset over the six musical genres,respectively. We observed higher classification accuracies withthe ensembles, than with the individual classifiers andimprovements of the performances on the GTZAN andISMIR2004 genre datasets are three percent on average. Thisensemble approach show that it is possible to improve theclassification accuracy by using different types of domainbased audio features
Sequential Complexity as a Descriptor for Musical Similarity
We propose string compressibility as a descriptor of temporal structure in
audio, for the purpose of determining musical similarity. Our descriptors are
based on computing track-wise compression rates of quantised audio features,
using multiple temporal resolutions and quantisation granularities. To verify
that our descriptors capture musically relevant information, we incorporate our
descriptors into similarity rating prediction and song year prediction tasks.
We base our evaluation on a dataset of 15500 track excerpts of Western popular
music, for which we obtain 7800 web-sourced pairwise similarity ratings. To
assess the agreement among similarity ratings, we perform an evaluation under
controlled conditions, obtaining a rank correlation of 0.33 between intersected
sets of ratings. Combined with bag-of-features descriptors, we obtain
performance gains of 31.1% and 10.9% for similarity rating prediction and song
year prediction. For both tasks, analysis of selected descriptors reveals that
representing features at multiple time scales benefits prediction accuracy.Comment: 13 pages, 9 figures, 8 tables. Accepted versio
- …