16 research outputs found

    Improving MIDI-audio alignment with acoustic features

    Get PDF
    This paper describes a technique to improve the accuracy of dynamic time warping-based MIDI-audio alignment. The technique implements a hidden Markov model that uses aperiodicity and power estimates from the signal as observations and the results of a dynamic time warping alignment as a prior. In addition to improving the overall alignment, this technique also identifies the transient and steady state sections of the note. This information is important for describing various aspects of a musical performance, including both pitch and rhythm

    Real-Time Audio-to-Score Alignment of Music Performances Containing Errors and Arbitrary Repeats and Skips

    Full text link
    This paper discusses real-time alignment of audio signals of music performance to the corresponding score (a.k.a. score following) which can handle tempo changes, errors and arbitrary repeats and/or skips (repeats/skips) in performances. This type of score following is particularly useful in automatic accompaniment for practices and rehearsals, where errors and repeats/skips are often made. Simple extensions of the algorithms previously proposed in the literature are not applicable in these situations for scores of practical length due to the problem of large computational complexity. To cope with this problem, we present two hidden Markov models of monophonic performance with errors and arbitrary repeats/skips, and derive efficient score-following algorithms with an assumption that the prior probability distributions of score positions before and after repeats/skips are independent from each other. We confirmed real-time operation of the algorithms with music scores of practical length (around 10000 notes) on a modern laptop and their tracking ability to the input performance within 0.7 s on average after repeats/skips in clarinet performance data. Further improvements and extension for polyphonic signals are also discussed.Comment: 12 pages, 8 figures, version accepted in IEEE/ACM Transactions on Audio, Speech, and Language Processin

    Coherent Time Modeling of semi-Markov Models with Application to Real-Time Audio-to-Score Alignment

    Get PDF
    International audienceThis paper proposes a novel insight to the problem of duration modeling for recognition setups where events are inferred from time-signals using a probabilistic framework. When a prior knowledge about the duration of events is available, Hidden Markov or Semi-Markov models allow the setting of individual duration distributions but give no clue about their choice. We propose two criteria of temporal coherency for such applications and prove they are fulfilled by statistical properties like infinite divisibility and log-concavity. We conclude by showing practical consequences of these properties in a real-time audio-to-score alignment experiment.Ce papier propose une nouvel éclairage sur la question de la modélisation des durées dans les algorithmes de reconnaissance, lorsque les événements reconnus sont inférés à partir de signaux temporels au moyen d'un modèle probabiliste. Si une connaissance a priori sur la durée nominale des événements est disponible, les modèles de Markov et de semi-Markov cachés permettent de choisir en fonction les distributions de durées de chaque événement, mais laissent ce choix complètement ouvert. Nous proposons deux critères de cohérence temporelle de tels algorithmes, et prouvons que ceux-ci si impliqués par des propriétés particulières étudiées en statistiques, telles que l'infinie divisibilité et la log-concavité. En conclusion, nous rapportons une expérience d'alignement audio-sur-partition en temps réel, qui montre l'intérêt pratique de ces propriétés théoriques

    Master of Science

    Get PDF
    thesisMultiple Instance Learning (MIL) is a type of supervised learning with missing data. Here, each example (a.k.a. bag) has one or more instances. In the training set, we have only labels at bag level. The task is to label both bags and instances from the test set. In most practical MIL problems, there is a relationship between the instances of a bag. Capturing this relationship may help learn the underlying concept better. We present an algorithm that uses the structure of bags along with the features of instances. The key idea is to allow a structured support vector machine (SVM) to "guess" at the true underlying structure, so long as it is consistent with the bag labels. This idea is formalized and a new cutting plane algorithm is proposed for optimization. To verify this idea, we implemented our algorithm for a particular kind of structure - hidden markov models. We performed experiments on three datasets and found this algorithm to work better than the existing algorithms in MIL. We present the details of these experiments and the effects of varying different hyperparameters in detail. The key contribution from our work is a very simple loss function with only one hyperparameter that needs to be tuned using a small portion of the training set. The thesis of this work is that it is possible and desirable to exploit the structural relationship between instances in a bag, even though that structure is not observed at training time (i.e., correct labels for all the instances are unknown). Our work opens a new direction to solving the MIL problem. We suggest a few ideas to further our work in this direction

    A Coupled Duration-Focused Architecture for Real-Time Music-to-Score Alignment

    Get PDF
    International audienceThe capacity for realtime synchronization and coordination is a common ability among trained musicians performing a music score that presents an interesting challenge for machine intelligence. Compared to speech recognition, which has influenced many music information retrieval systems, music's temporal dynamics and complexity pose challenging problems to common approximations regarding time modeling of data streams. In this paper, we propose a design for a realtime music to score alignment system. Given a live recording of a musician playing a music score, the system is capable of following the musician in realtime within the score and decoding the tempo (or pace) of its performance. The proposed design features two coupled audio and tempo agents within a unique probabilistic inference framework that adaptively updates its parameters based on the realtime context. Online decoding is achieved through the collaboration of the coupled agents in a Hidden Hybrid Markov/semi-Markov framework where prediction feedback of one agent affects the behavior of the other. We perform evaluations for both realtime alignment and the proposed temporal model. An implementation of the presented system has been widely used in real concert situations worldwide and the readers are encouraged to access the actual system and experiment the results

    Suivi de chansons par reconnaissance automatique de parole et alignement temporel

    Get PDF
    Le suivi de partition est défini comme étant la synchronisation sur ordinateur entre une partition musicale connue et le signal sonore de l'interprète de cette partition. Dans le cas particulier de la voix chantée, il y a encore place à l'amélioration des algorithmes existants, surtout pour le suivi de partition en temps réel. L'objectif de ce projet est donc d'arriver à mettre en oeuvre un logiciel suiveur de partition robuste et en temps-réel utilisant le signal numérisé de voix chantée et le texte des chansons. Le logiciel proposé utilise à la fois plusieurs caractéristiques de la voix chantée (énergie, correspondance avec les voyelles et nombre de passages par zéro du signal) et les met en correspondance avec la partition musicale en format MusicXML. Ces caractéristiques, extraites pour chaque trame, sont alignées aux unités phonétiques de la partition. En parallèle avec cet alignement à court terme, le système ajoute un deuxième niveau d'estimation plus fiable sur la position en associant une segmentation du signal en blocs de chant à des sections chantées en continu dans la partition. La performance du système est évaluée en présentant les alignements obtenus en différé sur 3 extraits de chansons interprétés par 2 personnes différentes, un homme et une femme, en anglais et en français
    corecore