242 research outputs found

    An Analysis of the GTZAN Music Genre Dataset

    Get PDF

    The GTZAN dataset: Its contents, its faults, their effects on evaluation, and its future use

    Get PDF
    The GTZAN dataset appears in at least 100 published works, and is the most-used public dataset for evaluation in machine listening research for music genre recognition (MGR). Our recent work, however, shows GTZAN has several faults (repetitions, mislabelings, and distortions), which challenge the interpretability of any result derived using it. In this article, we disprove the claims that all MGR systems are affected in the same ways by these faults, and that the performances of MGR systems in GTZAN are still meaningfully comparable since they all face the same faults. We identify and analyze the contents of GTZAN, and provide a catalog of its faults. We review how GTZAN has been used in MGR research, and find few indications that its faults have been known and considered. Finally, we rigorously study the effects of its faults on evaluating five different MGR systems. The lesson is not to banish GTZAN, but to use it with consideration of its contents.Comment: 29 pages, 7 figures, 6 tables, 128 reference

    Beat histogram features for rhythm-based musical genre classification using multiple novelty functions

    Get PDF
    In this paper we present beat histogram features for multiple level rhythm description and evaluate them in a musical genre classification task. Audio features pertaining to various musical content categories and their related novelty functions are extracted as a basis for the creation of beat histograms. The proposed features capture not only amplitude, but also tonal and general spectral changes in the signal, aiming to represent as much rhythmic information as possible. The most and least informative features are identified through feature selection methods and are then tested using Support Vector Machines on five genre datasets concerning classification accuracy against a baseline feature set. Results show that the presented features provide comparable classification accuracy with respect to other genre classification approaches using periodicity histograms and display a performance close to that of much more elaborate up-to-date approaches for rhythm description. The use of bar boundary annotations for the texture frames has provided an improvement for the dance-oriented Ballroom dataset. The comparably small number of descriptors and the possibility of evaluating the influence of specific signal components to the general rhythmic content encourage the further use of the method in rhythm description tasks

    Deep Learning and Music Adversaries

    Get PDF
    OA Monitor ExerciseOA Monitor ExerciseAn {\em adversary} is essentially an algorithm intent on making a classification system perform in some particular way given an input, e.g., increase the probability of a false negative. Recent work builds adversaries for deep learning systems applied to image object recognition, which exploits the parameters of the system to find the minimal perturbation of the input image such that the network misclassifies it with high confidence. We adapt this approach to construct and deploy an adversary of deep learning systems applied to music content analysis. In our case, however, the input to the systems is magnitude spectral frames, which requires special care in order to produce valid input audio signals from network-derived perturbations. For two different train-test partitionings of two benchmark datasets, and two different deep architectures, we find that this adversary is very effective in defeating the resulting systems. We find the convolutional networks are more robust, however, compared with systems based on a majority vote over individually classified audio frames. Furthermore, we integrate the adversary into the training of new deep systems, but do not find that this improves their resilience against the same adversary
    • …
    corecore