20 research outputs found

    Multi-channel approaches for musical audio content analysis

    Get PDF
    The goal of this research project is to undertake a critical evaluation of signal representations for musical audio content analysis. In particular it will contrast three different means for undertaking the analysis of micro-rhythmic content in Afro-Latin American music, namely through the use of: i) stereo or mono mixed recordings; ii) separated sources obtained via state of the art musical audio source separation techniques; and iii) the use of perfectly separated multi-track stems. In total the project comprises the following four objectives: i) To compile a dataset of mixed and multi-channel recordings of the Brazilian Maracatu musicians; ii) To conceive methods for rhythmical micro-variations analysis and pattern recognition; iii) To explore diverse music source separation approaches that preserve micro-rhythmic content; iv) To evaluate the performance of several automatic onset estimation approaches; and v) To compare the rhythmic analysis obtained from the original multi-channel sources versus the separated ones to evaluate separation quality regarding microtiming identification

    Tracking beats and microtiming in Afro-Latin American music using conditional random fields and deep learning

    Get PDF
    Trabajo presentado en ISMIR 2019 : 20th Conference of the International Society for Music Information Retrieval, Delft, Netherlands, 4-8 nov, 2019PostprintEvents in music frequently exhibit small-scale temporal deviations (microtiming), with respect to the underlying regular metrical grid. In some cases, as in music from the Afro-Latin American tradition, such deviations appear systematically, disclosing their structural importance in rhythmic and stylistic configuration. In this work we explore the idea of automatically and jointly tracking beats and microtiming in timekeeper instruments of Afro-Latin American music, in particular Brazilian samba and Uruguayan candombe. To that end, we propose a language model based on conditional random fields that integrates beat and onset likelihoods as observations. We derive those activations using deep neural networks and evaluate its performance on manually annotated data using a scheme adapted to this task. We assess our approach in controlled conditions suitable for these timekeeper instruments, and study the microtiming profiles’ dependency on genre and performer, illustrating promising aspects of this technique towards a more comprehensive understanding of these music traditions

    Interactive real-time musical systems

    Get PDF
    PhDThis thesis focuses on the development of automatic accompaniment systems. We investigate previous systems and look at a range of approaches that have been attempted for the problem of beat tracking. Most beat trackers are intended for the purposes of music information retrieval where a `black box' approach is tested on a wide variety of music genres. We highlight some of the diffculties facing offline beat trackers and design a new approach for the problem of real-time drum tracking, developing a system, B-Keeper, which makes reasonable assumptions on the nature of the signal and is provided with useful prior knowledge. Having developed the system with offline studio recordings, we look to test the system with human players. Existing offline evaluation methods seem less suitable for a performance system, since we also wish to evaluate the interaction between musician and machine. Although statistical data may reveal quantifiable measurements of the system's predictions and behaviour, we also want to test how well it functions within the context of a live performance. To do so, we devise an evaluation strategy to contrast a machine-controlled accompaniment with one controlled by a human. We also present recent work on a real-time multiple pitch tracking, which is then extended to provide automatic accompaniment for harmonic instruments such as guitar. By aligning salient notes in the output from a dual pitch tracking process, we make changes to the tempo of the accompaniment in order to align it with a live stream. By demonstrating the system's ability to align offline tracks, we can show that under restricted initial conditions, the algorithm works well as an alignment tool

    Automatic Music Transcription An overview

    Get PDF

    A Survey of Music Generation in the Context of Interaction

    Full text link
    In recent years, machine learning, and in particular generative adversarial neural networks (GANs) and attention-based neural networks (transformers), have been successfully used to compose and generate music, both melodies and polyphonic pieces. Current research focuses foremost on style replication (eg. generating a Bach-style chorale) or style transfer (eg. classical to jazz) based on large amounts of recorded or transcribed music, which in turn also allows for fairly straight-forward "performance" evaluation. However, most of these models are not suitable for human-machine co-creation through live interaction, neither is clear, how such models and resulting creations would be evaluated. This article presents a thorough review of music representation, feature analysis, heuristic algorithms, statistical and parametric modelling, and human and automatic evaluation measures, along with a discussion of which approaches and models seem most suitable for live interaction

    Probabilistic Models of Rhythmic Expectation & Synchronisation

    Get PDF
    Rhythms are central in human speech and music, with the tradition of using computational models to understand the unique human talent for flexibly processing and synchronising movement with auditory rhythms dating back to the early history of cognitive science. Previous approaches to developing computational cognitive models of rhythm processing are either based on discrete (symbolic) processing and representations, simulating long-term learning of rhythms within a musical culture; or continuous (sub-symbolic) processing, simulating short-term dynamics of synchronisation to musical rhythms. The disconnect between these existing models limits our psychological understanding of rhythm processing, particularly at the intersection of two core cognitive processes: enculturation and entrainment. Motivated by the predictive processing framework, the present research characterises rhythmic expectation and synchronisation through a series of formal inference problems, and incrementally defines a hierarchical probabilistic generative model capable of performing continuous Bayesian inference about the phase and meter of a rhythmic stimulus by interfacing with discrete representations of its statistical regularities. The model is evaluated through computational studies which offer insights into complex and poorly understood aspects of the cognitive processing underlying rhythm perception and production: how long-term music-cultural experience shapes synchronisation to rhythmic patterns; the perception of expressive microtiming in musical rhythms; and the effect of synchronised movement on meter perception. Together, the results show that complex rhythm processing can be understood as a hierarchy of inferential predictions about a rhythmic stimulus at different temporal scales. The multiscale probabilistic models developed in this thesis are continuous-time variational Bayesian filters for point process data, with rate functions driven by variable order Markov models. These models are applicable to any time series data consisting of event sequences, and might prove particularly promising in domains such as speech research

    16th Sound and Music Computing Conference SMC 2019 (28–31 May 2019, Malaga, Spain)

    Get PDF
    The 16th Sound and Music Computing Conference (SMC 2019) took place in Malaga, Spain, 28-31 May 2019 and it was organized by the Application of Information and Communication Technologies Research group (ATIC) of the University of Malaga (UMA). The SMC 2019 associated Summer School took place 25-28 May 2019. The First International Day of Women in Inclusive Engineering, Sound and Music Computing Research (WiSMC 2019) took place on 28 May 2019. The SMC 2019 TOPICS OF INTEREST included a wide selection of topics related to acoustics, psychoacoustics, music, technology for music, audio analysis, musicology, sonification, music games, machine learning, serious games, immersive audio, sound synthesis, etc

    Time-domain music source separation for choirs and ensembles

    Get PDF
    Music source separation is the task of separating musical sources from an audio mixture. It has various direct applications including automatic karaoke generation, enhancing musical recordings, and 3D-audio upmixing; but also has implications for other downstream music information retrieval tasks such as multi-instrument transcription. However, the majority of research has focused on fixed stem separation of vocals, drums, and bass stems. While such models have highlighted capabilities of source separation using deep learning, their implications are limited to very few use cases. Such models are unable to separate most other instruments due to insufficient training data. Moreover, class-based separation inherently limits the applicability of such models to be unable to separate monotimbral mixtures. This thesis focuses on separating musical sources without requiring timbral distinction among the sources. Preliminary attempts focus on the separation of vocal harmonies from choral ensembles using time-domain models with permutation invariant training. The method performs well but fails to generalise across datasets mainly due to a lack of sizeable clean training data. Recognising the challenge of obtaining sizeable, bleed-free data for ensemble recordings, a new high-quality synthesised dataset "EnsembleSet" is presented which was used to train a monotimbral ensemble separation model for string ensembles. Moreover, training a model using permutation invariant training is found to be capable of separate mixtures of identical, distinct, and unseen timbres as well. Although models trained on EnsembleSet can separate mixtures from unseen real-world datasets, performance drops are observed for out-of-domain test data. Subsequently improving cross-dataset performance using fine-tuning is explored for time-domain and complex-domain separation models. Further investigation into the performance of these models with different training strategies and different musical contexts is investigated to achieve a better understanding of the behaviour of these timbre-agnostic separation models. The techniques developed in this work are currently being utilised in the industry for vocal harmony separation and also lay the groundwork for future exploration toward universal source separation based on monophonic sound event separation

    ESCOM 2017 Proceedings

    Get PDF
    corecore