19 research outputs found

    User-centric Music Information Retrieval

    Get PDF
    The rapid growth of the Internet and the advancements of the Web technologies have made it possible for users to have access to large amounts of on-line music data, including music acoustic signals, lyrics, style/mood labels, and user-assigned tags. The progress has made music listening more fun, but has raised an issue of how to organize this data, and more generally, how computer programs can assist users in their music experience. An important subject in computer-aided music listening is music retrieval, i.e., the issue of efficiently helping users in locating the music they are looking for. Traditionally, songs were organized in a hierarchical structure such as genre-\u3eartist-\u3ealbum-\u3etrack, to facilitate the users’ navigation. However, the intentions of the users are often hard to be captured in such a simply organized structure. The users may want to listen to music of a particular mood, style or topic; and/or any songs similar to some given music samples. This motivated us to work on user-centric music retrieval system to improve users’ satisfaction with the system. The traditional music information retrieval research was mainly concerned with classification, clustering, identification, and similarity search of acoustic data of music by way of feature extraction algorithms and machine learning techniques. More recently the music information retrieval research has focused on utilizing other types of data, such as lyrics, user access patterns, and user-defined tags, and on targeting non-genre categories for classification, such as mood labels and styles. This dissertation focused on investigating and developing effective data mining techniques for (1) organizing and annotating music data with styles, moods and user-assigned tags; (2) performing effective analysis of music data with features from diverse information sources; and (3) recommending music songs to the users utilizing both content features and user access patterns

    Content-based music retrieval by acoustic query

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Lainakappaleiden tunnistaminen tiedon tiivistÀmiseen perustuvia etÀisyysmittoja kÀyttÀen

    Get PDF
    Measuring similarity in music data is a problem with various potential applications. In recent years, the task known as cover song identification has gained widespread attention. In cover song identification, the purpose is to determine whether a piece of music is a different rendition of a previous version of the composition. The task is quite trivial for a human listener, but highly challenging for a computer. This research approaches the problem from an information theoretic starting point. Assuming that cover versions share musical information with the original performance, we strive to measure the degree of this common information as the amount of computational resources needed to turn one version into another. Using a similarity measure known as normalized compression distance, we approximate the non-computable Kolmogorov complexity as the length of an object when compressed using a real-world data compression algorithm. If two pieces of music share musical information, we should be able to compress one using a model learned from the other. In order to use compression-based similarity measuring, the meaningful musical information needs to be extracted from the raw audio signal data. The most commonly used representation for this task is known as chromagram: a sequence of real-valued vectors describing the temporal tonal content of the piece of music. Measuring the similarity between two chromagrams effectively with a data compression algorithm requires further processing to extract relevant features and find a more suitable discrete representation for them. Here, the challenge is to process the data without losing the distinguishing characteristics of the music. In this research, we study the difficult nature of cover song identification and search for an effective compression-based system for the task. Harmonic and melodic features, different representations for them, commonly used data compression algorithms, and several other variables of the problem are addressed thoroughly. The research seeks to shed light on how different choices in the scheme attribute to the performance of the system. Additional attention is paid to combining different features, with several combination strategies studied. Extensive empirical evaluation of the identification system has been performed, using large sets of real-world music data. Evaluations show that the compression-based similarity measuring performs relatively well but fails to achieve the accuracy of the existing solution that measures similarity by using common subsequences. The best compression-based results are obtained by a combination of distances based on two harmonic representations obtained from chromagrams using hidden Markov model chord estimation, and an octave-folded version of the extracted salient melody representation. The most distinct reason for the shortcoming of the compression performance is the scarce amount of data available for a single piece of music. This was partially overcome by internal data duplication. As a whole, the process is solid and provides a practical foundation for an information theoretic approach for cover song identification.Lainakappeleiksi kutsutaan musiikkiesityksiÀ, jotka ovat eri esittÀjÀn tekemiÀ uusia tulkintoja kappaleen alkuperÀisen esittÀjÀn tekemÀstÀ versiosta. Toisinaan lainakappaleet voivat olla hyvinkin samanlaisia alkuperÀisversioiden kanssa, toisinaan versioilla saattaa olla vain nimellisesti yhtÀlÀisyyksiÀ. Ihmisille lainakappaleiden tunnistaminen on yleensÀ helppoa, jos alkuperÀisesitys on tuttu. Lainakappaleiden automaattinen, algoritmeihin perustuva tunnistaminen on kuitenkin huomattavasti haastavampi ongelma, eikÀ tÀysin tyydyttÀviÀ ratkaisuja ole vielÀ esitetty. Ongelman ratkaisulla olisi useita tutkimuksellisesti ja kaupallisesti potentiaalisia sovelluskohteita, kuten esimerkiksi plagioinnin automaattinen tunnistaminen. VÀitöskirjassa lainakappeleiden automaattista tunnistamista kÀsitellÀÀn informaatioteoreettisesta lÀhtökohdasta. Tutkimuksessa selvitetÀÀn, pystytÀÀnkö kappaleiden sisÀltÀmÀÀ tonaalista samanlaisuutta mittaamaan siten, ettÀ sen perusteella voidaan todeta eri esitysten olevan pohjimmiltaan saman sÀvellyksen eri tulkintoja. Samanlaisuuden mittaamisessa hyödynnetÀÀn tiedontiivistysalgoritmeihin perustuvaa samanlaisuusmetriikkaa, jota varten musiikkikappaleista pitÀÀ pystyÀ erottamaan ja esittÀmÀÀn sen sÀvellyksellisesti yksilöivimmÀt piirteet. Tutkimus tehdÀÀn laajalla aineistolla audiomuotoista populaarimusiikkia. VÀitöstutkimus kÀy lÀpi useita tutkimusongelman eri vaiheita lÀhtien signaalidatan kÀsittelemiseen liittyvistÀ parametreista, edeten siihen miten signaalista erotettu esitysmuoto saadaan muunnettua merkkijonomuotoiseksi siten, ettÀ prosessin tulos edelleen kuvaa kappaleen keskeisiÀ musiikillisia piirteitÀ, ja miten saatua merkkijonodataa voidaan vielÀ jatkokÀsitellÀ tunnistamisen parantamiseksi. TÀmÀn ohella vÀitöksessÀ tutkitaan, miten kappaleiden erilaiset musiikilliset eroavaisuudet (tempo, sÀvellaji, sovitukset) vaikuttavat tunnistamiseen ja miten nÀiden eroavaisuuksien vaikutus mittaamisessa voidaan minimoida. Tutkimuksen kohteena on myös yleisimpien tiedontiivistysalgoritmien soveltuvuus mittausmenetelmÀnÀ kÀsiteltÀvÀÀn ongelmaan. NÀiden lisÀksi tutkimus esittelee, miten samasta kappaleesta irrotettuja useita erilaisia esitysmuotoja voidaan yhdistÀÀ paremman tunnistamistarkkuuden saavuttamiseksi. Lopputuloksena vÀitöskirja esittelee tiedontiivistystÀ hyödyntÀvÀn jÀrjestelmÀn lainakappaleiden tunnistamiseen ja kÀsittelee sen keskeiset vahvuudet ja heikkoudet. Tutkimuksen tuloksena arvioidaan myös mitkÀ asiat tekevÀt lainakappaleiden automaattisesta tunnistamisesta niin haastavan ongelman kuin mitÀ se on

    Content-based retrieval of melodies using artificial neural networks

    Get PDF
    Human listeners are capable of spontaneously organizing and remembering a continuous stream of musical notes. A listener automatically segments a melody into phrases, from which an entire melody may be learnt and later recognized. This ability makes human listeners ideal for the task of retrieving melodies by content. This research introduces two neural networks, known as SONNETMAP and _ReTREEve, which attempt to model this behaviour. SONNET-MAP functions as a melody segmenter, whereas ReTREEve is specialized towards content-based retrieval (CBR). Typically, CBR systems represent melodies as strings of symbols drawn from a finite alphabet, thereby reducing the retrieval process to the task of approximate string matching. SONNET-MAP and ReTREEwe, which are derived from Nigrin’s SONNET architecture, offer a novel approach to these traditional systems, and indeed CBR in general. Based on melodic grouping cues, SONNETMAP segments a melody into phrases. Parallel SONNET modules form independent, sub-symbolic representations of the pitch and rhythm dimensions of each phrase. These representations are then bound using associative maps, forming a two-dimensional representation of each phrase. This organizational scheme enables SONNET-MAP to segment melodies into phrases using both the pitch and rhythm features of each melody. The boundary points formed by these melodic phrase segments are then utilized to populate the iieTREEve network. ReTREEw is organized in the same parallel fashion as SONNET-MAP. However, in addition, melodic phrases are aggregated by an additional layer; thus forming a two-dimensional, hierarchical memory structure of each entire melody. Melody retrieval is accomplished by matching input queries, whether perfect (for example, a fragment from the original melody) or imperfect (for example, a fragment derived from humming), against learned phrases and phrase sequence templates. Using a sample of fifty melodies composed by The Beatles , results show th a t the use of both pitch and rhythm during the retrieval process significantly improves retrieval results over networks that only use either pitch o r rhythm. Additionally, queries that are aligned along phrase boundaries are retrieved using significantly fewer notes than those that are not, thus indicating the importance of a human-based approach to melody segmentation. Moreover, depending on query degradation, different melodic features prove more adept at retrieval than others. The experiments presented in this thesis represent the largest empirical test of SONNET-based networks ever performed. As far as we are aware, the combined SONNET-MAP and -ReTREEue networks constitute the first self-organizing CBR system capable of automatic segmentation and retrieval of melodies using various features of pitch and rhythm

    Content-based visualisation to aid common navigation of musical audio

    Get PDF
    corecore