1,530 research outputs found

    Proceedings of the 6th International Workshop on Folk Music Analysis, 15-17 June, 2016

    Get PDF
    The Folk Music Analysis Workshop brings together computational music analysis and ethnomusicology. Both symbolic and audio representations of music are considered, with a broad range of scientific approaches being applied (signal processing, graph theory, deep learning). The workshop features a range of interesting talks from international researchers in areas such as Indian classical music, Iranian singing, Ottoman-Turkish Makam music scores, Flamenco singing, Irish traditional music, Georgian traditional music and Dutch folk songs. Invited guest speakers were Anja Volk, Utrecht University and Peter Browne, Technological University Dublin

    Improve automatic detection of animal call sequences with temporal context

    Get PDF
    Funding: This work was supported by the US Office of Naval Research (grant no. N00014-17-1-2867).Many animals rely on long-form communication, in the form of songs, for vital functions such as mate attraction and territorial defence. We explored the prospect of improving automatic recognition performance by using the temporal context inherent in song. The ability to accurately detect sequences of calls has implications for conservation and biological studies. We show that the performance of a convolutional neural network (CNN), designed to detect song notes (calls) in short-duration audio segments, can be improved by combining it with a recurrent network designed to process sequences of learned representations from the CNN on a longer time scale. The combined system of independently trained CNN and long short-term memory (LSTM) network models exploits the temporal patterns between song notes. We demonstrate the technique using recordings of fin whale (Balaenoptera physalus) songs, which comprise patterned sequences of characteristic notes. We evaluated several variants of the CNN + LSTM network. Relative to the baseline CNN model, the CNN + LSTM models reduced performance variance, offering a 9-17% increase in area under the precision-recall curve and a 9-18% increase in peak F1-scores. These results show that the inclusion of temporal information may offer a valuable pathway for improving the automatic recognition and transcription of wildlife recordings.Publisher PDFPeer reviewe

    Computational methods for percussion music analysis : the afro-uruguayan candombe drumming as a case study

    Get PDF
    Most of the research conducted on information technologies applied to music has been largely limited to a few mainstream styles of the so-called `Western' music. The resulting tools often do not generalize properly or cannot be easily extended to other music traditions. So, culture-specific approaches have been recently proposed as a way to build richer and more general computational models for music. This thesis work aims at contributing to the computer-aided study of rhythm, with the focus on percussion music and in the search of appropriate solutions from a culture specifc perspective by considering the Afro-Uruguayan candombe drumming as a case study. This is mainly motivated by its challenging rhythmic characteristics, troublesome for most of the existing analysis methods. In this way, it attempts to push ahead the boundaries of current music technologies. The thesis o ers an overview of the historical, social and cultural context in which candombe drumming is embedded, along with a description of the rhythm. One of the specific contributions of the thesis is the creation of annotated datasets of candombe drumming suitable for computational rhythm analysis. Performances were purposely recorded, and received annotations of metrical information, location of onsets, and sections. A dataset of annotated recordings for beat and downbeat tracking was publicly released, and an audio-visual dataset of performances was obtained, which serves both documentary and research purposes. Part of the dissertation focused on the discovery and analysis of rhythmic patterns from audio recordings. A representation in the form of a map of rhythmic patterns based on spectral features was devised. The type of analyses that can be conducted with the proposed methods is illustrated with some experiments. The dissertation also systematically approached (to the best of our knowledge, for the first time) the study and characterization of the micro-rhythmical properties of candombe drumming. The ndings suggest that micro-timing is a structural component of the rhythm, producing a sort of characteristic "swing". The rest of the dissertation was devoted to the automatic inference and tracking of the metric structure from audio recordings. A supervised Bayesian scheme for rhythmic pattern tracking was proposed, of which a software implementation was publicly released. The results give additional evidence of the generalizability of the Bayesian approach to complex rhythms from diferent music traditions. Finally, the downbeat detection task was formulated as a data compression problem. This resulted in a novel method that proved to be e ective for a large part of the dataset and opens up some interesting threads for future research.La mayoría de la investigación realizada en tecnologías de la información aplicadas a la música se ha limitado en gran medida a algunos estilos particulares de la así llamada música `occidental'. Las herramientas resultantes a menudo no generalizan adecuadamente o no se pueden extender fácilmente a otras tradiciones musicales. Por lo tanto, recientemente se han propuesto enfoques culturalmente específicos como forma de construir modelos computacionales más ricos y más generales. Esta tesis tiene como objetivo contribuir al estudio del ritmo asistido por computadora, desde una perspectiva cultural específica, considerando el candombe Afro-Uruguayo como caso de estudio. Esto está motivado principalmente por sus características rítmicas, problemáticas para la mayoría de los métodos de análisis existentes. Así , intenta superar los límites actuales de estas tecnologías. La tesis ofrece una visión general del contexto histórico, social y cultural en el que el candombe está integrado, junto con una descripción de su ritmo. Una de las contribuciones específicas de la tesis es la creación de conjuntos de datos adecuados para el análisis computacional del ritmo. Se llevaron adelante sesiones de grabación y se generaron anotaciones de información métrica, ubicación de eventos y secciones. Se disponibilizó públicamente un conjunto de grabaciones anotadas para el seguimiento de pulso e inicio de compás, y se generó un registro audiovisual que sirve tanto para fines documentales como de investigación. Parte de la tesis se centró en descubrir y analizar patrones rítmicos a partir de grabaciones de audio. Se diseñó una representación en forma de mapa de patrones rítmicos basada en características espectrales. El tipo de análisis que se puede realizar con los métodos propuestos se ilustra con algunos experimentos. La tesis también abordó de forma sistemática (y por primera vez) el estudio y la caracterización de las propiedades micro rítmicas del candombe. Los resultados sugieren que las micro desviaciones temporales son un componente estructural del ritmo, dando lugar a una especie de "swing" característico. El resto de la tesis se dedicó a la inferencia automática de la estructura métrica a partir de grabaciones de audio. Se propuso un esquema Bayesiano supervisado para el seguimiento de patrones rítmicos, del cual se disponibilizó públicamente una implementación de software. Los resultados dan evidencia adicional de la capacidad de generalización del enfoque Bayesiano a ritmos complejos. Por último, la detección de inicio de compás se formuló como un problema de compresión de datos. Esto resultó en un método novedoso que demostró ser efectivo para una buena parte de los datos y abre varias líneas de investigación

    Detection of Whale Acoustic Signals in the Northern Gulf of Mexico LADC-GEMM Database

    Get PDF
    Low-pass Fourier filter, wavelet filter, as well as matched filter detection methods were used to detect baleen whale signals in northern Gulf of Mexico data collected by the Littoral Acoustic Demonstration Center (LADC) consortium. Some potential low frequency signals appeared on the matched filter output figure. The shape of the signals is in line with one of the typical signal shapes of fin whales--vertical down-sweeps with 18s-time interval. Another shape of the signals is in line with one of the call type shapes of Bryde\u27s whales--down-sweeps with 7s-time interval. A high-pass Fourier filter was also used to find toothed whale high frequency sounds in the Gulf of Mexico data. The sounds featuring click trains and codas belonging to sperm whales have been clearly identified

    Detection of Whale Acoustic Signals in the Northern Gulf of Mexico LADC-GEMM Database

    Get PDF
    Low-pass Fourier filter, wavelet filter, as well as matched filter detection methods were used to detect baleen whale signals in northern Gulf of Mexico data collected by the Littoral Acoustic Demonstration Center (LADC) consortium. Some potential low frequency signals appeared on the matched filter output figure. The shape of the signals is in line with one of the typical signal shapes of fin whales--vertical down-sweeps with 18s-time interval. Another shape of the signals is in line with one of the call type shapes of Bryde\u27s whales--down-sweeps with 7s-time interval. A high-pass Fourier filter was also used to find toothed whale high frequency sounds in the Gulf of Mexico data. The sounds featuring click trains and codas belonging to sperm whales have been clearly identified

    Engineering systematic musicology : methods and services for computational and empirical music research

    Get PDF
    One of the main research questions of *systematic musicology* is concerned with how people make sense of their musical environment. It is concerned with signification and meaning-formation and relates musical structures to effects of music. These fundamental aspects can be approached from many different directions. One could take a cultural perspective where music is considered a phenomenon of human expression, firmly embedded in tradition. Another approach would be a cognitive perspective, where music is considered as an acoustical signal of which perception involves categorizations linked to representations and learning. A performance perspective where music is the outcome of human interaction is also an equally valid view. To understand a phenomenon combining multiple perspectives often makes sense. The methods employed within each of these approaches turn questions into concrete musicological research projects. It is safe to say that today many of these methods draw upon digital data and tools. Some of those general methods are feature extraction from audio and movement signals, machine learning, classification and statistics. However, the problem is that, very often, the *empirical and computational methods require technical solutions* beyond the skills of researchers that typically have a humanities background. At that point, these researchers need access to specialized technical knowledge to advance their research. My PhD-work should be seen within the context of that tradition. In many respects I adopt a problem-solving attitude to problems that are posed by research in systematic musicology. This work *explores solutions that are relevant for systematic musicology*. It does this by engineering solutions for measurement problems in empirical research and developing research software which facilitates computational research. These solutions are placed in an engineering-humanities plane. The first axis of the plane contrasts *services* with *methods*. Methods *in* systematic musicology propose ways to generate new insights in music related phenomena or contribute to how research can be done. Services *for* systematic musicology, on the other hand, support or automate research tasks which allow to change the scope of research. A shift in scope allows researchers to cope with larger data sets which offers a broader view on the phenomenon. The second axis indicates how important Music Information Retrieval (MIR) techniques are in a solution. MIR-techniques are contrasted with various techniques to support empirical research. My research resulted in a total of thirteen solutions which are placed in this plane. The description of seven of these are bundled in this dissertation. Three fall into the methods category and four in the services category. For example Tarsos presents a method to compare performance practice with theoretical scales on a large scale. SyncSink is an example of a service

    Kompozicionalni hierarhični model za pridobivanje informacij iz glasbe

    Full text link
    In recent years, deep architectures, most commonly based on neural networks, have advanced the state of the art in many research areas. Due to the popularity and the success of deep neural-networks, other deep architectures, including compositional models, have been put aside from mainstream research. This dissertation presents the compositional hierarchical model as a novel deep architecture for music processing. Our main motivation was to develop and explore an alternative non-neural deep architecture for music processing which would be transparent, meaning that the encoded knowledge would be interpretable, trained in an unsupervised manner and on small datasets, and useful as a feature extractor for classification tasks, as well as a transparent model for unsupervised pattern discovery. We base our work on compositional models, as compositionality is inherent in music. The proposed compositional hierarchical model learns a multi-layer hierarchical representation of the analyzed music signals in an unsupervised manner. It provides transparent insights into the learned concepts and their structure. It can be used as a feature extractor---its output can be used for classification tasks using existing machine learning techniques. Moreover, the model\u27s transparency enables an interpretation of the learned concepts, so the model can be used for analysis (exploration of the learned hierarchy) or discovery-oriented (inferring the hierarchy) tasks, which is difficult with most neural network based architectures. The proposed model uses relative coding of the learned concepts, which eliminates the need for large annotated training datasets that are essential in deep architectures with a large number of parameters. Relative coding contributes to slim models, which are fast to execute and have low memory requirements. The model also incorporates several biologically-inspired mechanisms that are modeled according to the mechanisms that exists at the lower levels of human perception (eg~ lateral inhibition in the human ear) and that significantly affect perception. The proposed model is evaluated on several music information retrieval tasks and its results are compared to the current state of the art. The dissertation is structured as follows. In the first chapter we present the motivation for the development of the new model. In the second chapter we elaborate on the related work in music information retrieval and review other compositional and transparent models. Chapter three introduces a thorough description of the proposed model. The model structure, its learning and inference methods are explained, as well as the incorporated biologically-inspired mechanisms. The model is then applied to several different music domains, which are divided according to the type of input data. In this we follow the timeline of the development and the implementation of the model. In chapter four, we present the model\u27s application to audio recordings, specifically for two tasks: automatic chord estimation and multiple fundamental frequency estimation. In chapter five, we present the model\u27s application to symbolic music representations. We concentrate on pattern discovery, emphasizing the model\u27s ability to tackle such problems. We also evaluate the model as a feature generator for tune family classification. Finally, in chapter six, we show the latest progress in developing the model for representing rhythm and show that it exhibits a high degree of robustness in extracting high-level rhythmic structures from music signals. We conclude the dissertation by summarizing our work and the results, elaborating on forthcoming work in the development of the model and its future applications.S porastom globokih arhitektur, ki temeljijo na nevronskih mrežah, so se v zadnjem času bistveno izboljšali rezultati pri reševanju problemov na več področjih. Zaradi popularnosti in uspešnosti teh globokih pristopov, temelječih na nevronskih mrežah, so bili drugi, predvsem kompozicionalni pristopi, odmaknjeni od središča pozornosti raziskav. V pričujoči disertaciji se posvečamo vprašanju, ali je mogoče razviti globoko arhitekturo, ki bo presegla obstoječe probleme globokih arhitektur. S tem namenom se vračamo h kompozicionalnim modelom in predstavimo kompozicionalni hierarhični model kot alternativno globoko arhitekturo, ki bo imela naslednje značilnosti: transparentnost, ki omogoča enostavno razlago naučenih konceptov, nenadzorovano učenje in zmožnost učenja na majhnih podatkovnih bazah, uporabnost modela kot izluščevalca značilk, kot tudi zmožnost uporabe transparentnosti modela za odkrivanje vzorcev. Naše delo temelji na kompozicionalnih modelih, ki so v glasbi intuitivni. Predlagani kompozicionalni hierarhični model je zmožen nenadzorovanega učenja večnivojske predstavitve glasbenega vhoda. Model omogoča pregled naučenih konceptov skozi transparentne strukture. Lahko ga uporabimo kot generator značilk -- izhod modela lahko uporabimo za klasifikacijo z drugimi pristopi strojnega učenja. Hkrati pa lahko transparentnost predlaganega modela uporabimo za analizo (raziskovanje naučene hierarhije) pri odkrivanju vzorcev, kar je težko izvedljivo z ostalimi pristopi, ki temeljijo na nevronskih mrežah. Relativno kodiranje konceptov v samem modelu pripomore k precej manjšim modelom in posledično zmanjšuje potrebo po velikih podatkovnih zbirkah, potrebnih za učenje modela. Z vpeljavo biološko navdahnjenih mehanizmov želimo model še bolj približati človeškemu načinu zaznave. Za nekatere mehanizme, na primer inhibicijo, vemo, da so v človeški percepciji prisotni na nižjih nivojih v ušesu in bistveno vplivajo na način zaznave. V modelu uvedemo prve korake k takšnemu načinu procesiranja proti končnemu cilju izdelave modela, ki popolnoma odraža človeško percepcijo. V prvem poglavju disertacije predstavimo motivacijo za razvoj novega modela. V drugem poglavju se posvetimo dosedanjim objavljenim dosežkom na tem področju. V nadaljnjih poglavjih se osredotočimo na sam model. Sprva opišemo teoretično zasnovo modela in način učenja ter delovanje biološko-navdahnjenih mehanizmov. V naslednjem koraku model apliciramo na več različnih glasbenih domen, ki so razdeljene glede na tip vhodnih podatkov. Pri tem sledimo časovnici razvoja in implementacijam modela tekom doktorskega študija. Najprej predstavimo aplikacijo modela za časovno-frekvenčne signale, na katerem model preizkusimo za dve opravili: avtomatsko ocenjevanje harmonij in avtomatsko transkripcijo osnovnih frekvenc. V petem poglavju predstavimo drug način aplikacije modela, tokrat na simbolne vhodne podatke, ki predstavljajo glasbeni zapis. Pri tem pristopu se osredotočamo na odkrivanje vzorcev, s čimer poudarimo zmožnost modela za reševanje tovrstnih problemov, ki je ostalim pristopom še nedosegljivo. Model prav tako evalviramo v vlogi generatorja značilk. Pri tem ga evalviramo na problemu melodične podobnosti pesmi in razvrščanja v variantne tipe. Nazadnje, v šestem poglavju, pokažemo zadnji dosežek razvoja modela, ki ga apliciramo na problem razumevanja ritma v glasbi. Prilagojeni model analiziramo in pokažemo njegovo zmožnost učenja različnih ritmičnih oblik in visoko stopnjo robustnosti pri izluščevanju visokonivojskih struktur v ritmu. V zaključkih disertacije povzamemo vloženo delo in rezultate ter nakažemo nadaljnje korake za razvoj modela v prihodnosti

    Automatic Drum Transcription and Source Separation

    Get PDF
    While research has been carried out on automated polyphonic music transcription, to-date the problem of automated polyphonic percussion transcription has not received the same degree of attention. A related problem is that of sound source separation, which attempts to separate a mixture signal into its constituent sources. This thesis focuses on the task of polyphonic percussion transcription and sound source separation of a limited set of drum instruments, namely the drums found in the standard rock/pop drum kit. As there was little previous research on polyphonic percussion transcription a broad review of music information retrieval methods, including previous polyphonic percussion systems, was also carried out to determine if there were any methods which were of potential use in the area of polyphonic drum transcription. Following on from this a review was conducted of general source separation and redundancy reduction techniques, such as Independent Component Analysis and Independent Subspace Analysis, as these techniques have shown potential in separating mixtures of sources. Upon completion of the review it was decided that a combination of the blind separation approach, Independent Subspace Analysis (ISA), with the use of prior knowledge as used in music information retrieval methods, was the best approach to tackling the problem of polyphonic percussion transcription as well as that of sound source separation. A number of new algorithms which combine the use of prior knowledge with the source separation abilities of techniques such as ISA are presented. These include sub-band ISA, Prior Subspace Analysis (PSA), and an automatic modelling and grouping technique which is used in conjunction with PSA to perform polyphonic percussion transcription. These approaches are demonstrated to be effective in the task of polyphonic percussion transcription, and PSA is also demonstrated to be capable of transcribing drums in the presence of pitched instruments
    corecore