16 research outputs found
Recommended from our members
Handling Asynchrony in Audio-Score Alignment
Aligning a canonical score to an audio recording of a musical performance can provide very good information about the timing of individual notes. However, a score representation frequently treats multiple note events as simultaneous, whereas in reality different performers will start notes at slightly differing times, and these timing details may be significant in the analysis of performance and expression. Using an example of a four-part a cappella vocal piece where each voice was recorded separately, we compare note onset and offset times obtained by manual annotation to three difference types of alignment: forced alignment of each part individually to its corresponding track, simultaneous alignment of the polyphonic score to the full audio, and independent alignment of single parts to the polyphonic audio. In each case, we examine the kinds of errors that occur. We discuss how standard dynamic time warping may be extended so that it retains the advantages of polyphonic alignment while allowing ostensibly simultaneous notes to have different onset and offset times
Improving MIDI-audio alignment with acoustic features
This paper describes a technique to improve the accuracy of dynamic time warping-based MIDI-audio alignment. The technique implements a hidden Markov model that uses aperiodicity and power estimates from the signal as observations and the results of a dynamic time warping alignment as a prior. In addition to improving the overall alignment, this technique also identifies the transient and steady state sections of the note. This information is important for describing various aspects of a musical performance, including both pitch and rhythm
Real-Time Audio-to-Score Alignment of Music Performances Containing Errors and Arbitrary Repeats and Skips
This paper discusses real-time alignment of audio signals of music
performance to the corresponding score (a.k.a. score following) which can
handle tempo changes, errors and arbitrary repeats and/or skips (repeats/skips)
in performances. This type of score following is particularly useful in
automatic accompaniment for practices and rehearsals, where errors and
repeats/skips are often made. Simple extensions of the algorithms previously
proposed in the literature are not applicable in these situations for scores of
practical length due to the problem of large computational complexity. To cope
with this problem, we present two hidden Markov models of monophonic
performance with errors and arbitrary repeats/skips, and derive efficient
score-following algorithms with an assumption that the prior probability
distributions of score positions before and after repeats/skips are independent
from each other. We confirmed real-time operation of the algorithms with music
scores of practical length (around 10000 notes) on a modern laptop and their
tracking ability to the input performance within 0.7 s on average after
repeats/skips in clarinet performance data. Further improvements and extension
for polyphonic signals are also discussed.Comment: 12 pages, 8 figures, version accepted in IEEE/ACM Transactions on
Audio, Speech, and Language Processin
Coherent Time Modeling of semi-Markov Models with Application to Real-Time Audio-to-Score Alignment
International audienceThis paper proposes a novel insight to the problem of duration modeling for recognition setups where events are inferred from time-signals using a probabilistic framework. When a prior knowledge about the duration of events is available, Hidden Markov or Semi-Markov models allow the setting of individual duration distributions but give no clue about their choice. We propose two criteria of temporal coherency for such applications and prove they are fulfilled by statistical properties like infinite divisibility and log-concavity. We conclude by showing practical consequences of these properties in a real-time audio-to-score alignment experiment.Ce papier propose une nouvel éclairage sur la question de la modélisation des durées dans les algorithmes de reconnaissance, lorsque les événements reconnus sont inférés à partir de signaux temporels au moyen d'un modèle probabiliste. Si une connaissance a priori sur la durée nominale des événements est disponible, les modèles de Markov et de semi-Markov cachés permettent de choisir en fonction les distributions de durées de chaque événement, mais laissent ce choix complètement ouvert. Nous proposons deux critères de cohérence temporelle de tels algorithmes, et prouvons que ceux-ci si impliqués par des propriétés particulières étudiées en statistiques, telles que l'infinie divisibilité et la log-concavité. En conclusion, nous rapportons une expérience d'alignement audio-sur-partition en temps réel, qui montre l'intérêt pratique de ces propriétés théoriques
Master of Science
thesisMultiple Instance Learning (MIL) is a type of supervised learning with missing data. Here, each example (a.k.a. bag) has one or more instances. In the training set, we have only labels at bag level. The task is to label both bags and instances from the test set. In most practical MIL problems, there is a relationship between the instances of a bag. Capturing this relationship may help learn the underlying concept better. We present an algorithm that uses the structure of bags along with the features of instances. The key idea is to allow a structured support vector machine (SVM) to "guess" at the true underlying structure, so long as it is consistent with the bag labels. This idea is formalized and a new cutting plane algorithm is proposed for optimization. To verify this idea, we implemented our algorithm for a particular kind of structure - hidden markov models. We performed experiments on three datasets and found this algorithm to work better than the existing algorithms in MIL. We present the details of these experiments and the effects of varying different hyperparameters in detail. The key contribution from our work is a very simple loss function with only one hyperparameter that needs to be tuned using a small portion of the training set. The thesis of this work is that it is possible and desirable to exploit the structural relationship between instances in a bag, even though that structure is not observed at training time (i.e., correct labels for all the instances are unknown). Our work opens a new direction to solving the MIL problem. We suggest a few ideas to further our work in this direction
A Coupled Duration-Focused Architecture for Real-Time Music-to-Score Alignment
International audienceThe capacity for realtime synchronization and coordination is a common ability among trained musicians performing a music score that presents an interesting challenge for machine intelligence. Compared to speech recognition, which has influenced many music information retrieval systems, music's temporal dynamics and complexity pose challenging problems to common approximations regarding time modeling of data streams. In this paper, we propose a design for a realtime music to score alignment system. Given a live recording of a musician playing a music score, the system is capable of following the musician in realtime within the score and decoding the tempo (or pace) of its performance. The proposed design features two coupled audio and tempo agents within a unique probabilistic inference framework that adaptively updates its parameters based on the realtime context. Online decoding is achieved through the collaboration of the coupled agents in a Hidden Hybrid Markov/semi-Markov framework where prediction feedback of one agent affects the behavior of the other. We perform evaluations for both realtime alignment and the proposed temporal model. An implementation of the presented system has been widely used in real concert situations worldwide and the readers are encouraged to access the actual system and experiment the results
Suivi de chansons par reconnaissance automatique de parole et alignement temporel
Le suivi de partition est défini comme étant la synchronisation sur ordinateur entre une partition musicale connue et le signal sonore de l'interprète de cette partition. Dans le cas particulier de la voix chantée, il y a encore place à l'amélioration des algorithmes existants, surtout pour le suivi de partition en temps réel. L'objectif de ce projet est donc d'arriver à mettre en oeuvre un logiciel suiveur de partition robuste et en temps-réel utilisant le signal numérisé de voix chantée et le texte des chansons. Le logiciel proposé utilise à la fois plusieurs caractéristiques de la voix chantée (énergie, correspondance avec les voyelles et nombre de passages par zéro du signal) et les met en correspondance avec la partition musicale en format MusicXML. Ces caractéristiques, extraites pour chaque trame, sont alignées aux unités phonétiques de la partition. En parallèle avec cet alignement à court terme, le système ajoute un deuxième niveau d'estimation plus fiable sur la position en associant une segmentation du signal en blocs de chant à des sections chantées en continu dans la partition. La performance du système est évaluée en présentant les alignements obtenus en différé sur 3 extraits de chansons interprétés par 2 personnes différentes, un homme et une femme, en anglais et en français
REFINING MUSIC SIGNAL TO LYRIC TEXT SYNCHRONIZATION FROM LINE-LEVEL TO SYLLABLE-LEVEL BY CONSTRAINING DYNAMIC TIME WARPING SEARCH
Master'sMASTER OF SCIENC