    Synchronizing Sequencing Software to a Live Drummer

    Copyright 2013 Massachusetts Institute of Technology. MIT allows authors to archive published versions of their articles after an embargo period. The article is available at

    Multi-channel approaches for musical audio content analysis

    The goal of this research project is to undertake a critical evaluation of signal representations for musical audio content analysis. In particular it will contrast three different means for undertaking the analysis of micro-rhythmic content in Afro-Latin American music, namely through the use of: i) stereo or mono mixed recordings; ii) separated sources obtained via state of the art musical audio source separation techniques; and iii) the use of perfectly separated multi-track stems. In total the project comprises the following four objectives: i) To compile a dataset of mixed and multi-channel recordings of the Brazilian Maracatu musicians; ii) To conceive methods for rhythmical micro-variations analysis and pattern recognition; iii) To explore diverse music source separation approaches that preserve micro-rhythmic content; iv) To evaluate the performance of several automatic onset estimation approaches; and v) To compare the rhythmic analysis obtained from the original multi-channel sources versus the separated ones to evaluate separation quality regarding microtiming identification

    Computational Tonality Estimation: Signal Processing and Hidden Markov Models

    PhDThis thesis investigates computational musical tonality estimation from an audio signal. We present a hidden Markov model (HMM) in which relationships between chords and keys are expressed as probabilities of emitting observable chords from a hidden key sequence. The model is tested first using symbolic chord annotations as observations, and gives excellent global key recognition rates on a set of Beatles songs. The initial model is extended for audio input by using an existing chord recognition algorithm, which allows it to be tested on a much larger database. We show that a simple model of the upper partials in the signal improves percentage scores. We also present a variant of the HMM which has a continuous observation probability density, but show that the discrete version gives better performance. Then follows a detailed analysis of the effects on key estimation and computation time of changing the low level signal processing parameters. We find that much of the high frequency information can be omitted without loss of accuracy, and significant computational savings can be made by applying a threshold to the transform kernels. Results show that there is no single ideal set of parameters for all music, but that tuning the parameters can make a difference to accuracy. We discuss methods of evaluating more complex tonal changes than a single global key, and compare a metric that measures similarity to a ground truth to metrics that are rooted in music retrieval. We show that the two measures give different results, and so recommend that the choice of evaluation metric is determined by the intended application. Finally we draw together our conclusions and use them to suggest areas for continuation of this research, in the areas of tonality model development, feature extraction, evaluation methodology, and applications of computational tonality estimation.Engineering and Physical Sciences Research Council (EPSRC)

    Computational methods for percussion music analysis : the afro-uruguayan candombe drumming as a case study

    Most of the research conducted on information technologies applied to music has been largely limited to a few mainstream styles of the so-called `Western' music. The resulting tools often do not generalize properly or cannot be easily extended to other music traditions. So, culture-specific approaches have been recently proposed as a way to build richer and more general computational models for music. This thesis work aims at contributing to the computer-aided study of rhythm, with the focus on percussion music and in the search of appropriate solutions from a culture specifc perspective by considering the Afro-Uruguayan candombe drumming as a case study. This is mainly motivated by its challenging rhythmic characteristics, troublesome for most of the existing analysis methods. In this way, it attempts to push ahead the boundaries of current music technologies. The thesis o ers an overview of the historical, social and cultural context in which candombe drumming is embedded, along with a description of the rhythm. One of the specific contributions of the thesis is the creation of annotated datasets of candombe drumming suitable for computational rhythm analysis. Performances were purposely recorded, and received annotations of metrical information, location of onsets, and sections. A dataset of annotated recordings for beat and downbeat tracking was publicly released, and an audio-visual dataset of performances was obtained, which serves both documentary and research purposes. Part of the dissertation focused on the discovery and analysis of rhythmic patterns from audio recordings. A representation in the form of a map of rhythmic patterns based on spectral features was devised. The type of analyses that can be conducted with the proposed methods is illustrated with some experiments. The dissertation also systematically approached (to the best of our knowledge, for the first time) the study and characterization of the micro-rhythmical properties of candombe drumming. The ndings suggest that micro-timing is a structural component of the rhythm, producing a sort of characteristic "swing". The rest of the dissertation was devoted to the automatic inference and tracking of the metric structure from audio recordings. A supervised Bayesian scheme for rhythmic pattern tracking was proposed, of which a software implementation was publicly released. The results give additional evidence of the generalizability of the Bayesian approach to complex rhythms from diferent music traditions. Finally, the downbeat detection task was formulated as a data compression problem. This resulted in a novel method that proved to be e ective for a large part of the dataset and opens up some interesting threads for future research.La mayoría de la investigación realizada en tecnologías de la información aplicadas a la música se ha limitado en gran medida a algunos estilos particulares de la así llamada música `occidental'. Las herramientas resultantes a menudo no generalizan adecuadamente o no se pueden extender fácilmente a otras tradiciones musicales. Por lo tanto, recientemente se han propuesto enfoques culturalmente específicos como forma de construir modelos computacionales más ricos y más generales. Esta tesis tiene como objetivo contribuir al estudio del ritmo asistido por computadora, desde una perspectiva cultural específica, considerando el candombe Afro-Uruguayo como caso de estudio. Esto está motivado principalmente por sus características rítmicas, problemáticas para la mayoría de los métodos de análisis existentes. Así , intenta superar los límites actuales de estas tecnologías. La tesis ofrece una visión general del contexto histórico, social y cultural en el que el candombe está integrado, junto con una descripción de su ritmo. Una de las contribuciones específicas de la tesis es la creación de conjuntos de datos adecuados para el análisis computacional del ritmo. Se llevaron adelante sesiones de grabación y se generaron anotaciones de información métrica, ubicación de eventos y secciones. Se disponibilizó públicamente un conjunto de grabaciones anotadas para el seguimiento de pulso e inicio de compás, y se generó un registro audiovisual que sirve tanto para fines documentales como de investigación. Parte de la tesis se centró en descubrir y analizar patrones rítmicos a partir de grabaciones de audio. Se diseñó una representación en forma de mapa de patrones rítmicos basada en características espectrales. El tipo de análisis que se puede realizar con los métodos propuestos se ilustra con algunos experimentos. La tesis también abordó de forma sistemática (y por primera vez) el estudio y la caracterización de las propiedades micro rítmicas del candombe. Los resultados sugieren que las micro desviaciones temporales son un componente estructural del ritmo, dando lugar a una especie de "swing" característico. El resto de la tesis se dedicó a la inferencia automática de la estructura métrica a partir de grabaciones de audio. Se propuso un esquema Bayesiano supervisado para el seguimiento de patrones rítmicos, del cual se disponibilizó públicamente una implementación de software. Los resultados dan evidencia adicional de la capacidad de generalización del enfoque Bayesiano a ritmos complejos. Por último, la detección de inicio de compás se formuló como un problema de compresión de datos. Esto resultó en un método novedoso que demostró ser efectivo para una buena parte de los datos y abre varias líneas de investigación

    Robust and Efficient Joint Alignment of Multiple Musical Performances

    Automatic annotation of musical audio for interactive applications

    PhDAs machines become more and more portable, and part of our everyday life, it becomes apparent that developing interactive and ubiquitous systems is an important aspect of new music applications created by the research community. We are interested in developing a robust layer for the automatic annotation of audio signals, to be used in various applications, from music search engines to interactive installations, and in various contexts, from embedded devices to audio content servers. We propose adaptations of existing signal processing techniques to a real time context. Amongst these annotation techniques, we concentrate on low and mid-level tasks such as onset detection, pitch tracking, tempo extraction and note modelling. We present a framework to extract these annotations and evaluate the performances of different algorithms. The first task is to detect onsets and offsets in audio streams within short latencies. The segmentation of audio streams into temporal objects enables various manipulation and analysis of metrical structure. Evaluation of different algorithms and their adaptation to real time are described. We then tackle the problem of fundamental frequency estimation, again trying to reduce both the delay and the computational cost. Different algorithms are implemented for real time and experimented on monophonic recordings and complex signals. Spectral analysis can be used to label the temporal segments; the estimation of higher level descriptions is approached. Techniques for modelling of note objects and localisation of beats are implemented and discussed. Applications of our framework include live and interactive music installations, and more generally tools for the composers and sound engineers. Speed optimisations may bring a significant improvement to various automated tasks, such as automatic classification and recommendation systems. We describe the design of our software solution, for our research purposes and in view of its integration within other systems.EU-FP6-IST-507142 project SIMAC (Semantic Interaction with Music Audio Contents); EPSRC grants GR/R54620; GR/S75802/01

    Kompozicionalni hierarhični model za pridobivanje informacij iz glasbe

    In recent years, deep architectures, most commonly based on neural networks, have advanced the state of the art in many research areas. Due to the popularity and the success of deep neural-networks, other deep architectures, including compositional models, have been put aside from mainstream research. This dissertation presents the compositional hierarchical model as a novel deep architecture for music processing. Our main motivation was to develop and explore an alternative non-neural deep architecture for music processing which would be transparent, meaning that the encoded knowledge would be interpretable, trained in an unsupervised manner and on small datasets, and useful as a feature extractor for classification tasks, as well as a transparent model for unsupervised pattern discovery. We base our work on compositional models, as compositionality is inherent in music. The proposed compositional hierarchical model learns a multi-layer hierarchical representation of the analyzed music signals in an unsupervised manner. It provides transparent insights into the learned concepts and their structure. It can be used as a feature extractor---its output can be used for classification tasks using existing machine learning techniques. Moreover, the model\u27s transparency enables an interpretation of the learned concepts, so the model can be used for analysis (exploration of the learned hierarchy) or discovery-oriented (inferring the hierarchy) tasks, which is difficult with most neural network based architectures. The proposed model uses relative coding of the learned concepts, which eliminates the need for large annotated training datasets that are essential in deep architectures with a large number of parameters. Relative coding contributes to slim models, which are fast to execute and have low memory requirements. The model also incorporates several biologically-inspired mechanisms that are modeled according to the mechanisms that exists at the lower levels of human perception (eg~ lateral inhibition in the human ear) and that significantly affect perception. The proposed model is evaluated on several music information retrieval tasks and its results are compared to the current state of the art. The dissertation is structured as follows. In the first chapter we present the motivation for the development of the new model. In the second chapter we elaborate on the related work in music information retrieval and review other compositional and transparent models. Chapter three introduces a thorough description of the proposed model. The model structure, its learning and inference methods are explained, as well as the incorporated biologically-inspired mechanisms. The model is then applied to several different music domains, which are divided according to the type of input data. In this we follow the timeline of the development and the implementation of the model. In chapter four, we present the model\u27s application to audio recordings, specifically for two tasks: automatic chord estimation and multiple fundamental frequency estimation. In chapter five, we present the model\u27s application to symbolic music representations. We concentrate on pattern discovery, emphasizing the model\u27s ability to tackle such problems. We also evaluate the model as a feature generator for tune family classification. Finally, in chapter six, we show the latest progress in developing the model for representing rhythm and show that it exhibits a high degree of robustness in extracting high-level rhythmic structures from music signals. We conclude the dissertation by summarizing our work and the results, elaborating on forthcoming work in the development of the model and its future applications.S porastom globokih arhitektur, ki temeljijo na nevronskih mrežah, so se v zadnjem času bistveno izboljšali rezultati pri reševanju problemov na več področjih. Zaradi popularnosti in uspešnosti teh globokih pristopov, temelječih na nevronskih mrežah, so bili drugi, predvsem kompozicionalni pristopi, odmaknjeni od središča pozornosti raziskav. V pričujoči disertaciji se posvečamo vprašanju, ali je mogoče razviti globoko arhitekturo, ki bo presegla obstoječe probleme globokih arhitektur. S tem namenom se vračamo h kompozicionalnim modelom in predstavimo kompozicionalni hierarhični model kot alternativno globoko arhitekturo, ki bo imela naslednje značilnosti: transparentnost, ki omogoča enostavno razlago naučenih konceptov, nenadzorovano učenje in zmožnost učenja na majhnih podatkovnih bazah, uporabnost modela kot izluščevalca značilk, kot tudi zmožnost uporabe transparentnosti modela za odkrivanje vzorcev. Naše delo temelji na kompozicionalnih modelih, ki so v glasbi intuitivni. Predlagani kompozicionalni hierarhični model je zmožen nenadzorovanega učenja večnivojske predstavitve glasbenega vhoda. Model omogoča pregled naučenih konceptov skozi transparentne strukture. Lahko ga uporabimo kot generator značilk -- izhod modela lahko uporabimo za klasifikacijo z drugimi pristopi strojnega učenja. Hkrati pa lahko transparentnost predlaganega modela uporabimo za analizo (raziskovanje naučene hierarhije) pri odkrivanju vzorcev, kar je težko izvedljivo z ostalimi pristopi, ki temeljijo na nevronskih mrežah. Relativno kodiranje konceptov v samem modelu pripomore k precej manjšim modelom in posledično zmanjšuje potrebo po velikih podatkovnih zbirkah, potrebnih za učenje modela. Z vpeljavo biološko navdahnjenih mehanizmov želimo model še bolj približati človeškemu načinu zaznave. Za nekatere mehanizme, na primer inhibicijo, vemo, da so v človeški percepciji prisotni na nižjih nivojih v ušesu in bistveno vplivajo na način zaznave. V modelu uvedemo prve korake k takšnemu načinu procesiranja proti končnemu cilju izdelave modela, ki popolnoma odraža človeško percepcijo. V prvem poglavju disertacije predstavimo motivacijo za razvoj novega modela. V drugem poglavju se posvetimo dosedanjim objavljenim dosežkom na tem področju. V nadaljnjih poglavjih se osredotočimo na sam model. Sprva opišemo teoretično zasnovo modela in način učenja ter delovanje biološko-navdahnjenih mehanizmov. V naslednjem koraku model apliciramo na več različnih glasbenih domen, ki so razdeljene glede na tip vhodnih podatkov. Pri tem sledimo časovnici razvoja in implementacijam modela tekom doktorskega študija. Najprej predstavimo aplikacijo modela za časovno-frekvenčne signale, na katerem model preizkusimo za dve opravili: avtomatsko ocenjevanje harmonij in avtomatsko transkripcijo osnovnih frekvenc. V petem poglavju predstavimo drug način aplikacije modela, tokrat na simbolne vhodne podatke, ki predstavljajo glasbeni zapis. Pri tem pristopu se osredotočamo na odkrivanje vzorcev, s čimer poudarimo zmožnost modela za reševanje tovrstnih problemov, ki je ostalim pristopom še nedosegljivo. Model prav tako evalviramo v vlogi generatorja značilk. Pri tem ga evalviramo na problemu melodične podobnosti pesmi in razvrščanja v variantne tipe. Nazadnje, v šestem poglavju, pokažemo zadnji dosežek razvoja modela, ki ga apliciramo na problem razumevanja ritma v glasbi. Prilagojeni model analiziramo in pokažemo njegovo zmožnost učenja različnih ritmičnih oblik in visoko stopnjo robustnosti pri izluščevanju visokonivojskih struktur v ritmu. V zaključkih disertacije povzamemo vloženo delo in rezultate ter nakažemo nadaljnje korake za razvoj modela v prihodnosti