5,960 research outputs found

    The GTZAN dataset: Its contents, its faults, their effects on evaluation, and its future use

    Get PDF
    The GTZAN dataset appears in at least 100 published works, and is the most-used public dataset for evaluation in machine listening research for music genre recognition (MGR). Our recent work, however, shows GTZAN has several faults (repetitions, mislabelings, and distortions), which challenge the interpretability of any result derived using it. In this article, we disprove the claims that all MGR systems are affected in the same ways by these faults, and that the performances of MGR systems in GTZAN are still meaningfully comparable since they all face the same faults. We identify and analyze the contents of GTZAN, and provide a catalog of its faults. We review how GTZAN has been used in MGR research, and find few indications that its faults have been known and considered. Finally, we rigorously study the effects of its faults on evaluating five different MGR systems. The lesson is not to banish GTZAN, but to use it with consideration of its contents.Comment: 29 pages, 7 figures, 6 tables, 128 reference

    Sequential Complexity as a Descriptor for Musical Similarity

    Get PDF
    We propose string compressibility as a descriptor of temporal structure in audio, for the purpose of determining musical similarity. Our descriptors are based on computing track-wise compression rates of quantised audio features, using multiple temporal resolutions and quantisation granularities. To verify that our descriptors capture musically relevant information, we incorporate our descriptors into similarity rating prediction and song year prediction tasks. We base our evaluation on a dataset of 15500 track excerpts of Western popular music, for which we obtain 7800 web-sourced pairwise similarity ratings. To assess the agreement among similarity ratings, we perform an evaluation under controlled conditions, obtaining a rank correlation of 0.33 between intersected sets of ratings. Combined with bag-of-features descriptors, we obtain performance gains of 31.1% and 10.9% for similarity rating prediction and song year prediction. For both tasks, analysis of selected descriptors reveals that representing features at multiple time scales benefits prediction accuracy.Comment: 13 pages, 9 figures, 8 tables. Accepted versio

    Methodological contributions by means of machine learning methods for automatic music generation and classification

    Get PDF
    189 p.Ikerketa lan honetan bi gai nagusi landu dira: musikaren sorkuntza automatikoa eta sailkapena. Musikaren sorkuntzarako bertso doinuen corpus bat hartu da abiapuntu moduan doinu ulergarri berriak sortzeko gai den metodo bat sortzeko. Doinuei ulergarritasuna hauen barnean dauden errepikapen egiturek ematen dietela suposatu da, eta metodoaren hiru bertsio nagusi aurkeztu dira, bakoitzean errepikapen horien definizio ezberdin bat erabiliz.Musikaren sailkapen automatikoan hiru ataza garatu dira: generoen sailkapena, familia melodikoen taldekatzea eta konposatzaileen identifikazioa. Musikaren errepresentazio ezberdinak erabili dira ataza bakoitzerako, eta ikasketa automatikoko hainbat teknika ere probatu dira, emaitzarik hoberenak zeinek ematen dituen aztertzeko.Gainbegiratutako sailkapenaren alorrean ere binakako sailkapenaren gainean lana egin da, aurretik existitzen zen metodo bat optimizatuz. Hainbat datu baseren gainean probatu da garatutako teknika, baita konposatzaile klasikoen piezen ezaugarriez osatutako datu base batean ere

    A Survey of Evaluation in Music Genre Recognition

    Get PDF

    Modelling Digital Media Objects

    Get PDF

    Methodological contributions by means of machine learning methods for automatic music generation and classification

    Get PDF
    189 p.Ikerketa lan honetan bi gai nagusi landu dira: musikaren sorkuntza automatikoa eta sailkapena. Musikaren sorkuntzarako bertso doinuen corpus bat hartu da abiapuntu moduan doinu ulergarri berriak sortzeko gai den metodo bat sortzeko. Doinuei ulergarritasuna hauen barnean dauden errepikapen egiturek ematen dietela suposatu da, eta metodoaren hiru bertsio nagusi aurkeztu dira, bakoitzean errepikapen horien definizio ezberdin bat erabiliz.Musikaren sailkapen automatikoan hiru ataza garatu dira: generoen sailkapena, familia melodikoen taldekatzea eta konposatzaileen identifikazioa. Musikaren errepresentazio ezberdinak erabili dira ataza bakoitzerako, eta ikasketa automatikoko hainbat teknika ere probatu dira, emaitzarik hoberenak zeinek ematen dituen aztertzeko.Gainbegiratutako sailkapenaren alorrean ere binakako sailkapenaren gainean lana egin da, aurretik existitzen zen metodo bat optimizatuz. Hainbat datu baseren gainean probatu da garatutako teknika, baita konposatzaile klasikoen piezen ezaugarriez osatutako datu base batean ere

    Classification Accuracy Is Not Enough:On the Evaluation of Music Genre Recognition Systems

    Get PDF

    Leaning Robust Sequence Features via Dynamic Temporal Pattern Discovery

    Get PDF
    As a major type of data, time series possess invaluable latent knowledge for describing the real world and human society. In order to improve the ability of intelligent systems for understanding the world and people, it is critical to design sophisticated machine learning algorithms for extracting robust time series features from such latent knowledge. Motivated by the successful applications of deep learning in computer vision, more and more machine learning researchers put their attentions on the topic of applying deep learning techniques to time series data. However, directly employing current deep models in most time series domains could be problematic. A major reason is that temporal pattern types that current deep models are aiming at are very limited, which cannot meet the requirement of modeling different underlying patterns of data coming from various sources. In this study we address this problem by designing different network structures explicitly based on specific domain knowledge such that we can extract features via most salient temporal patterns. More specifically, we mainly focus on two types of temporal patterns: order patterns and frequency patterns. For order patterns, which are usually related to brain and human activities, we design a hashing-based neural network layer to globally encode the ordinal pattern information into the resultant features. It is further generalized into a specially designed Recurrent Neural Networks (RNN) cell which can learn order patterns in an online fashion. On the other hand, we believe audio-related data such as music and speech can benefit from modeling frequency patterns. Thus, we do so by developing two types of RNN cells. The first type tries to directly learn the long-term dependencies on frequency domain rather than time domain. The second one aims to dynamically filter out the noise frequencies based on temporal contexts. By proposing various deep models based on different domain knowledge and evaluating them on extensive time series tasks, we hope this work can provide inspirations for others and increase the community\u27s interests on the problem of applying deep learning techniques to more time series tasks

    Making Neural Networks Interpretable with Attribution: Application to Implicit Signals Prediction

    Full text link
    Explaining recommendations enables users to understand whether recommended items are relevant to their needs and has been shown to increase their trust in the system. More generally, if designing explainable machine learning models is key to check the sanity and robustness of a decision process and improve their efficiency, it however remains a challenge for complex architectures, especially deep neural networks that are often deemed "black-box". In this paper, we propose a novel formulation of interpretable deep neural networks for the attribution task. Differently to popular post-hoc methods, our approach is interpretable by design. Using masked weights, hidden features can be deeply attributed, split into several input-restricted sub-networks and trained as a boosted mixture of experts. Experimental results on synthetic data and real-world recommendation tasks demonstrate that our method enables to build models achieving close predictive performances to their non-interpretable counterparts, while providing informative attribution interpretations.Comment: 14th ACM Conference on Recommender Systems (RecSys '20
    corecore