18 research outputs found

    The GTZAN dataset: Its contents, its faults, their effects on evaluation, and its future use

    Get PDF
    The GTZAN dataset appears in at least 100 published works, and is the most-used public dataset for evaluation in machine listening research for music genre recognition (MGR). Our recent work, however, shows GTZAN has several faults (repetitions, mislabelings, and distortions), which challenge the interpretability of any result derived using it. In this article, we disprove the claims that all MGR systems are affected in the same ways by these faults, and that the performances of MGR systems in GTZAN are still meaningfully comparable since they all face the same faults. We identify and analyze the contents of GTZAN, and provide a catalog of its faults. We review how GTZAN has been used in MGR research, and find few indications that its faults have been known and considered. Finally, we rigorously study the effects of its faults on evaluating five different MGR systems. The lesson is not to banish GTZAN, but to use it with consideration of its contents.Comment: 29 pages, 7 figures, 6 tables, 128 reference

    Hybrids and Fragments: Music, Genre, Culture and Technology

    Get PDF
    Technologies are fundamental to music and its marketing and dissemination, as is the categorisation of music by genre. In this research we examine the relationship between musical genre and technology by examining genre proliferation, fragmentation and hybridity. We compare the movement of musical artists between genres in various technological eras, and evaluate the connections between the dissemination of music and its categorisation. Cultural hybridity and fragmentation is thought to be the norm in the globalised era by many scholars, and the online music environment appears to be populated by hybrid genres and micro-genres. To examine this we study the representation of musical genre on the Internet. We acquire data from three main sources: The Echo Nest, a music-intelligence system, and two collectively constructed knowledge-bases, Wikidata and MusicBrainz. We discover geographical and commercial biases. We calculate genre inception dates in order to examine category proliferation, and construct networks from these data, using the relationships between artists and genres to establish structure. Using network analyses to quantify genre hybridity we find increasing hybridisation, peaking at various periods in different datasets. Statistical analyses, comparing hybridity within our various data, validates our method and reveals a relationship between the activity of editing music information and the movement of musical artists between musical genres. We also find evidence for the fragmentation of genre and the appearance of micro- genres. We consider artists that are invisible in mainstream systems using data from three alternative platforms, Bandcamp, CD Baby and SoundCloud, and examine rapid genre proliferation in Spotify. We then discuss hybridity and fragmentation in relation to postmodernity, hypermodernity and unimodernity, music and genre within society, and the ways genre intersects with technology

    Large-Scale Pattern Discovery in Music

    Get PDF
    This work focuses on extracting patterns in musical data from very large collections. The problem is split in two parts. First, we build such a large collection, the Million Song Dataset, to provide researchers access to commercial-size datasets. Second, we use this collection to study cover song recognition which involves finding harmonic patterns from audio features. Regarding the Million Song Dataset, we detail how we built the original collection from an online API, and how we encouraged other organizations to participate in the project. The result is the largest research dataset with heterogeneous sources of data available to music technology researchers. We demonstrate some of its potential and discuss the impact it already has on the field. On cover song recognition, we must revisit the existing literature since there are no publicly available results on a dataset of more than a few thousand entries. We present two solutions to tackle the problem, one using a hashing method, and one using a higher-level feature computed from the chromagram (dubbed the 2DFTM). We further investigate the 2DFTM since it has potential to be a relevant representation for any task involving audio harmonic content. Finally, we discuss the future of the dataset and the hope of seeing more work making use of the different sources of data that are linked in the Million Song Dataset. Regarding cover songs, we explain how this might be a first step towards defining a harmonic manifold of music, a space where harmonic similarities between songs would be more apparent

    Workshop proceedings:CBRecSys 2014. Workshop on New Trends in Content-based Recommender Systems

    Get PDF

    Expressive re-performance

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 167-171).Many music enthusiasts abandon music studies because they are frustrated by the amount of time and effort it takes to learn to play interesting songs. There are two major components to performance: the technical requirement of correctly playing the notes, and the emotional content conveyed through expressivity. While technical details like pitch and note order are largely set, expression, which is accomplished through timing, dynamics, vibrato, and timbre, is more personal. This thesis develops expressive re-performance, which entails the simplification of technical requirements of music-making to allow a user to experience music beyond his technical level, with particular focus on expression. Expressive re-performance aims to capture the fantasy and sound of a favorite recording by using audio extraction to split the original target solo and giving expressive control over that solo to a user. The re-performance experience starts with an electronic mimic of a traditional instrument with which the user steps-through a recording. Data generated from the users actions is parsed to determine note changes and expressive intent. Pitch is innate to the recording, allowing the user to concentrate on expressive gesture. Two pre-processing systems, analysis to discover note starts and extraction, are necessary. Extraction of the solo is done through user provided mimicry of the target combined with Probabalistic Latent Component Analysis with Dirichlet Hyperparameters. Audio elongation to match the users performance is performed using time-stretch. Instrument interfaces used were Akais Electronic Wind Controller (EWI), Fender's Squier Stratocaster Guitar and Controller, and a Wii-mote. Tests of the system and concept were performed using the EWI and Wii-mote for re-performance of two songs. User response indicated that while the interaction was fun, it did not succeed at enabling significant expression. Users expressed difficulty learning to use the EWI during the short test window and had insufficient interest in the offered songs. Both problems should be possible to overcome with further test time and system development. Users expressed interest in the concept of a real instrument mimic and found the audio extractions to be sufficient. Follow-on work to address issues discovered during the testing phase is needed to further validate the concept and explore means of developing expressive re-performance as a learning tool.by Laurel S. Pardue.S.M

    Modeling and predicting emotion in music

    Get PDF
    With the explosion of vast and easily-accessible digital music libraries over the past decade, there has been a rapid expansion of research towards automated systems for searching and organizing music and related data. Online retailers now offer vast collections of music, spanning tens of millions of songs, available for immediate download. While these online stores present a drastically different dynamic than the record stores of the past, consumers still arrive with the same requests recommendation of music that is similar to their tastes; for both recommendation and curation, the vast digital music libraries of today necessarily require powerful automated tools.The medium of music has evolved speci cally for the expression of emotions, and it is natural for us to organize music in terms of its emotional associations. But while such organization is a natural process for humans, quantifying it empirically proves to be a very difficult task. Myriad features, such as harmony, timbre, interpretation, and lyrics affect emotion, and the mood of a piece may also change over its duration. Furthermore, in developing automated systems to organize music in terms of emotional content, we are faced with a problem that oftentimes lacks a well-defined answer; there may be considerable disagreement regarding the perception and interpretation of the emotions of a song or even ambiguity within the piece itself.Automatic identi cation of musical mood is a topic still in its early stages, though it has received increasing attention in recent years. Such work offers potential not just to revolutionize how we buy and listen to our music, but to provide deeper insight into the understanding of human emotions in general. This work seeks to relate core concepts from psychology to that of signal processing to understand how to extract information relevant to musical emotion from an acoustic signal. The methods discussed here survey existing features using psychology studies and develop new features using basis functions learned directly from magnitude spectra. Furthermore, this work presents a wide breadth of approaches in developing functional mappings between acoustic data and emotion space parameters. Using these models, a framework is constructed for content-based modeling and prediction of musical emotion.Ph.D., Electrical Engineering -- Drexel University, 201
    corecore