87 research outputs found

    Melody retrieval on the Web

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2001.Includes bibliographical references (p. 87-90).The emergence of digital music on the Internet requires new information retrieval methods adapted to specific characteristics and needs. While music retrieval based on the text information, such as title, composers, or subject classification, has been implemented in many existing systems, retrieval of a piece of music based on musical content, especially an incomplete, imperfect recall of a fragment of the music, has not yet been fully explored. This thesis will explore both theoretical and practical issues involved in a web-based melody retrieval system. I built a query-by-humming system, which can find a piece of music in the digital music repository based on a few hummed notes. Since an input query (hummed melody) may have various errors due to uncertainty of the user's memory or the user's singing ability, the system should be able to tolerate errors. Furthermore, extracting melodies to build a melody database is also a complicated task. Therefore, melody representation, query construction, melody matching and melody extraction are critical for an efficient and robust query-by-humming system. Thus, these are the main tasks to be addressed in the thesis. Compared to previous systems, a new and more effective melody representation and corresponding matching methods which combined both pitch and rhythmic information were adopted, a whole set of tools and deliverable software were implemented, and experiments were conducted to evaluate the system performance as well as to explore other melody perception issues. Experimental results demonstrate that our methods incorporating rhythmic information rather than previous pitch-only methods did help improving the effectiveness of a query-by-humming system.by Wei Chai.S.M

    Creative Support Musical Composition System: a study on Multiple Viewpoints Representations in Variable Markov Oracle

    Get PDF
    Em meados do século XX, assistiu-se ao surgimento de uma área de estudo focada na geração au-tomática de conteúdo musical por meios computacionais. Os primeiros exemplos concentram-se no processamento offline de dados musicais mas, recentemente, a comunidade tem vindo a explorar maioritariamente sistemas musicais interativos e em tempo-real. Além disso, uma tendência recente enfatiza a importância da tecnologia assistiva, que promove uma abordagem centrada em escolhas do utilizador, oferecendo várias sugestões para um determinado problema criativo. Nesse contexto, a minha investigação tem como objetivo promover novas ferramentas de software para sistemas de suporte criativo, onde algoritmos podem participar colaborativamente no fluxo de composição. Em maior detalhe, procuro uma ferramenta que aprenda com dados musicais de tamanho variável para fornecer feedback em tempo real durante o processo de composição. À luz das características de multi-dimensionalidade e hierarquia presentes nas estruturas musicais, pretendo estudar as representações que abstraem os seus padrões temporais, para promover a geração de múltiplas soluções ordenadas por grau de optimização para um determinado contexto musical. Por fim, a natureza subjetiva da escolha é dada ao utilizador, ao qual é fornecido um número limitado de soluções 'ideais'. Uma representação simbólica da música manifestada como Modelos sob múltiplos pontos de vista, combinada com o autómato Variable Markov Oracle (VMO), é usada para testar a interação ideal entre a multi-dimensionalidade da representação e a idealidade do modelo VMO, fornecendo soluções coerentes, inovadoras e estilisticamente diversas. Para avaliar o sistema, foram realizados testes para validar a ferramenta num cenário especializado com alunos de composição, usando o modelo de testes do índice de suporte à criatividade.The mid-20th century witnessed the emergence of an area of study that focused on the automatic generation of musical content by computational means. Early examples focus on offline processing of musical data and recently, the community has moved towards interactive online musical systems. Furthermore, a recent trend stresses the importance of assistive technology, which pro-motes a user-in-loop approach by offering multiple suggestions to a given creative problem. In this context, my research aims to foster new software tools for creative support systems, where algorithms can collaboratively participate in the composition flow. In greater detail, I seek a tool that learns from variable-length musical data to provide real-time feedback during the composition process. In light of the multidimensional and hierarchical structure of music, I aim to study the representations which abstract its temporal patterns, to foster the generation of multiple ranked solutions to a given musical context. Ultimately, the subjective nature of the choice is given to the user to which a limited number of 'optimal' solutions are provided. A symbolic music representation manifested as Multiple Viewpoint Models combined with the Variable Markov Oracle (VMO) automaton, are used to test optimal interaction between the multi-dimensionality of the representation with the optimality of the VMO model in providing both style-coherent, novel, and diverse solutions. To evaluate the system, an experiment was conducted to validate the tool in an expert-based scenario with composition students, using the creativity support index test

    Controllable music performance synthesis via hierarchical modelling

    Full text link
    L’expression musicale requiert le contrôle sur quelles notes sont jouées ainsi que comment elles se jouent. Les synthétiseurs audios conventionnels offrent des contrôles expressifs détaillés, cependant au détriment du réalisme. La synthèse neuronale en boîte noire des audios et les échantillonneurs concaténatifs sont capables de produire un son réaliste, pourtant, nous avons peu de mécanismes de contrôle. Dans ce travail, nous introduisons MIDI-DDSP, un modèle hiérarchique des instruments musicaux qui permet tant la synthèse neuronale réaliste des audios que le contrôle sophistiqué de la part des utilisateurs. À partir des paramètres interprétables de synthèse provenant du traitement différentiable des signaux numériques (Differentiable Digital Signal Processing, DDSP), nous inférons les notes musicales et la propriété de haut niveau de leur performance expressive (telles que le timbre, le vibrato, l’intensité et l’articulation). Ceci donne naissance à une hiérarchie de trois niveaux (notes, performance, synthèse) qui laisse aux individus la possibilité d’intervenir à chaque niveau, ou d’utiliser la distribution préalable entraînée (notes étant donné performance, synthèse étant donné performance) pour une assistance créative. À l’aide des expériences quantitatives et des tests d’écoute, nous démontrons que cette hiérarchie permet de reconstruire des audios de haute fidélité, de prédire avec précision les attributs de performance d’une séquence de notes, mais aussi de manipuler indépendamment les attributs étant donné la performance. Comme il s’agit d’un système complet, la hiérarchie peut aussi générer des audios réalistes à partir d’une nouvelle séquence de notes. En utilisant une hiérarchie interprétable avec de multiples niveaux de granularité, MIDI-DDSP ouvre la porte aux outils auxiliaires qui renforce la capacité des individus à travers une grande variété d’expérience musicale.Musical expression requires control of both what notes are played, and how they are performed. Conventional audio synthesizers provide detailed expressive controls, but at the cost of realism. Black-box neural audio synthesis and concatenative samplers can produce realistic audio, but have few mechanisms for control. In this work, we introduce MIDI-DDSP a hierarchical model of musical instruments that enables both realistic neural audio synthesis and detailed user control. Starting from interpretable Differentiable Digital Signal Processing (DDSP) synthesis parameters, we infer musical notes and high-level properties of their expressive performance (such as timbre, vibrato, dynamics, and articulation). This creates a 3-level hierarchy (notes, performance, synthesis) that affords individuals the option to intervene at each level, or utilize trained priors (performance given notes, synthesis given performance) for creative assistance. Through quantitative experiments and listening tests, we demonstrate that this hierarchy can reconstruct high-fidelity audio, accurately predict performance attributes for a note sequence, independently manipulate the attributes of a given performance, and as a complete system, generate realistic audio from a novel note sequence. By utilizing an interpretable hierarchy, with multiple levels of granularity, MIDI-DDSP opens the door to assistive tools to empower individuals across a diverse range of musical experience

    Music Synchronization, Audio Matching, Pattern Detection, and User Interfaces for a Digital Music Library System

    Get PDF
    Over the last two decades, growing efforts to digitize our cultural heritage could be observed. Most of these digitization initiatives pursuit either one or both of the following goals: to conserve the documents - especially those threatened by decay - and to provide remote access on a grand scale. For music documents these trends are observable as well, and by now several digital music libraries are in existence. An important characteristic of these music libraries is an inherent multimodality resulting from the large variety of available digital music representations, such as scanned score, symbolic score, audio recordings, and videos. In addition, for each piece of music there exists not only one document of each type, but many. Considering and exploiting this multimodality and multiplicity, the DFG-funded digital library initiative PROBADO MUSIC aimed at developing a novel user-friendly interface for content-based retrieval, document access, navigation, and browsing in large music collections. The implementation of such a front end requires the multimodal linking and indexing of the music documents during preprocessing. As the considered music collections can be very large, the automated or at least semi-automated calculation of these structures would be recommendable. The field of music information retrieval (MIR) is particularly concerned with the development of suitable procedures, and it was the goal of PROBADO MUSIC to include existing and newly developed MIR techniques to realize the envisioned digital music library system. In this context, the present thesis discusses the following three MIR tasks: music synchronization, audio matching, and pattern detection. We are going to identify particular issues in these fields and provide algorithmic solutions as well as prototypical implementations. In Music synchronization, for each position in one representation of a piece of music the corresponding position in another representation is calculated. This thesis focuses on the task of aligning scanned score pages of orchestral music with audio recordings. Here, a previously unconsidered piece of information is the textual specification of transposing instruments provided in the score. Our evaluations show that the neglect of such information can result in a measurable loss of synchronization accuracy. Therefore, we propose an OCR-based approach for detecting and interpreting the transposition information in orchestral scores. For a given audio snippet, audio matching methods automatically calculate all musically similar excerpts within a collection of audio recordings. In this context, subsequence dynamic time warping (SSDTW) is a well-established approach as it allows for local and global tempo variations between the query and the retrieved matches. Moving to real-life digital music libraries with larger audio collections, however, the quadratic runtime of SSDTW results in untenable response times. To improve on the response time, this thesis introduces a novel index-based approach to SSDTW-based audio matching. We combine the idea of inverted file lists introduced by Kurth and Müller (Efficient index-based audio matching, 2008) with the shingling techniques often used in the audio identification scenario. In pattern detection, all repeating patterns within one piece of music are determined. Usually, pattern detection operates on symbolic score documents and is often used in the context of computer-aided motivic analysis. Envisioned as a new feature of the PROBADO MUSIC system, this thesis proposes a string-based approach to pattern detection and a novel interactive front end for result visualization and analysis

    Literary review of content-based music recognition paradigms

    Get PDF
    During the last few decades, a need for novel retrieval strategies for large audio databases emerged as millions of digital audio documents became accessible for everyone through the Internet. It became essential that the users could search for songs that they had no prior information about using only the content of the audio as a query. In practice this means that when a user hears an unknown song coming out of the radio and wants to get more information about it, he or she can simply record a sample of the song with a mobile device and send it to a music recognition application as a query. Query results would then be presented on the screen with all the necessary meta data, such as the song name and artist. The retrieval systems are expected to perform quickly and accurately against large databases that may contain millions of songs, which poses lots of challenges for the researchers. This thesis is a literature review which will go through some audio retrieval paradigms that allow querying for songs using only their audio content, such as audio fingerprinting. It will also address the typical problems and challenges of audio retrieval and compare how each of these proposed paradigms performs in these challenging scenarios

    Lainakappaleiden tunnistaminen tiedon tiivistämiseen perustuvia etäisyysmittoja käyttäen

    Get PDF
    Measuring similarity in music data is a problem with various potential applications. In recent years, the task known as cover song identification has gained widespread attention. In cover song identification, the purpose is to determine whether a piece of music is a different rendition of a previous version of the composition. The task is quite trivial for a human listener, but highly challenging for a computer. This research approaches the problem from an information theoretic starting point. Assuming that cover versions share musical information with the original performance, we strive to measure the degree of this common information as the amount of computational resources needed to turn one version into another. Using a similarity measure known as normalized compression distance, we approximate the non-computable Kolmogorov complexity as the length of an object when compressed using a real-world data compression algorithm. If two pieces of music share musical information, we should be able to compress one using a model learned from the other. In order to use compression-based similarity measuring, the meaningful musical information needs to be extracted from the raw audio signal data. The most commonly used representation for this task is known as chromagram: a sequence of real-valued vectors describing the temporal tonal content of the piece of music. Measuring the similarity between two chromagrams effectively with a data compression algorithm requires further processing to extract relevant features and find a more suitable discrete representation for them. Here, the challenge is to process the data without losing the distinguishing characteristics of the music. In this research, we study the difficult nature of cover song identification and search for an effective compression-based system for the task. Harmonic and melodic features, different representations for them, commonly used data compression algorithms, and several other variables of the problem are addressed thoroughly. The research seeks to shed light on how different choices in the scheme attribute to the performance of the system. Additional attention is paid to combining different features, with several combination strategies studied. Extensive empirical evaluation of the identification system has been performed, using large sets of real-world music data. Evaluations show that the compression-based similarity measuring performs relatively well but fails to achieve the accuracy of the existing solution that measures similarity by using common subsequences. The best compression-based results are obtained by a combination of distances based on two harmonic representations obtained from chromagrams using hidden Markov model chord estimation, and an octave-folded version of the extracted salient melody representation. The most distinct reason for the shortcoming of the compression performance is the scarce amount of data available for a single piece of music. This was partially overcome by internal data duplication. As a whole, the process is solid and provides a practical foundation for an information theoretic approach for cover song identification.Lainakappeleiksi kutsutaan musiikkiesityksiä, jotka ovat eri esittäjän tekemiä uusia tulkintoja kappaleen alkuperäisen esittäjän tekemästä versiosta. Toisinaan lainakappaleet voivat olla hyvinkin samanlaisia alkuperäisversioiden kanssa, toisinaan versioilla saattaa olla vain nimellisesti yhtäläisyyksiä. Ihmisille lainakappaleiden tunnistaminen on yleensä helppoa, jos alkuperäisesitys on tuttu. Lainakappaleiden automaattinen, algoritmeihin perustuva tunnistaminen on kuitenkin huomattavasti haastavampi ongelma, eikä täysin tyydyttäviä ratkaisuja ole vielä esitetty. Ongelman ratkaisulla olisi useita tutkimuksellisesti ja kaupallisesti potentiaalisia sovelluskohteita, kuten esimerkiksi plagioinnin automaattinen tunnistaminen. Väitöskirjassa lainakappeleiden automaattista tunnistamista käsitellään informaatioteoreettisesta lähtökohdasta. Tutkimuksessa selvitetään, pystytäänkö kappaleiden sisältämää tonaalista samanlaisuutta mittaamaan siten, että sen perusteella voidaan todeta eri esitysten olevan pohjimmiltaan saman sävellyksen eri tulkintoja. Samanlaisuuden mittaamisessa hyödynnetään tiedontiivistysalgoritmeihin perustuvaa samanlaisuusmetriikkaa, jota varten musiikkikappaleista pitää pystyä erottamaan ja esittämään sen sävellyksellisesti yksilöivimmät piirteet. Tutkimus tehdään laajalla aineistolla audiomuotoista populaarimusiikkia. Väitöstutkimus käy läpi useita tutkimusongelman eri vaiheita lähtien signaalidatan käsittelemiseen liittyvistä parametreista, edeten siihen miten signaalista erotettu esitysmuoto saadaan muunnettua merkkijonomuotoiseksi siten, että prosessin tulos edelleen kuvaa kappaleen keskeisiä musiikillisia piirteitä, ja miten saatua merkkijonodataa voidaan vielä jatkokäsitellä tunnistamisen parantamiseksi. Tämän ohella väitöksessä tutkitaan, miten kappaleiden erilaiset musiikilliset eroavaisuudet (tempo, sävellaji, sovitukset) vaikuttavat tunnistamiseen ja miten näiden eroavaisuuksien vaikutus mittaamisessa voidaan minimoida. Tutkimuksen kohteena on myös yleisimpien tiedontiivistysalgoritmien soveltuvuus mittausmenetelmänä käsiteltävään ongelmaan. Näiden lisäksi tutkimus esittelee, miten samasta kappaleesta irrotettuja useita erilaisia esitysmuotoja voidaan yhdistää paremman tunnistamistarkkuuden saavuttamiseksi. Lopputuloksena väitöskirja esittelee tiedontiivistystä hyödyntävän järjestelmän lainakappaleiden tunnistamiseen ja käsittelee sen keskeiset vahvuudet ja heikkoudet. Tutkimuksen tuloksena arvioidaan myös mitkä asiat tekevät lainakappaleiden automaattisesta tunnistamisesta niin haastavan ongelman kuin mitä se on

    Generative rhythmic models

    Get PDF
    A system for generative rhythmic modeling is presented. The work aims to explore computational models of creativity, realizing them in a system designed for realtime generation of semi-improvisational music. This is envisioned as an attempt to develop musical intelligence in the context of structured improvisation, and by doing so to enable and encourage new forms of musical control and performance; the systems described in this work, already capable of realtime creation, have been designed with the explicit intention of embedding them in a variety of performance-based systems. A model of qaida, a solo tabla form, is presented, along with the results of an online survey comparing it to a professional tabla player's recording on dimensions of musicality, creativity, and novelty. The qaida model generates a bank of rhythmic variations by reordering subphrases. Selections from this bank are sequenced using a feature-based approach. An experimental extension into modeling layer- and loop-based forms of electronic music is presented, in which the initial modeling approach is generalized. Starting from a seed track, the layer-based model utilizes audio analysis techniques such as blind source separation and onset-based segmentation to generate layers which are shuffled and recombined to generate novel music in a manner analogous to the qaida model.M.S.Committee Chair: Chordia, Parag; Committee Member: Freeman, Jason; Committee Member: Weinberg, Gi

    Perception and modeling of segment boundaries in popular music

    Get PDF

    Content-based music classification, summarization and retrieval

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH
    corecore