47 research outputs found
Real-Time Audio-to-Score Alignment of Music Performances Containing Errors and Arbitrary Repeats and Skips
This paper discusses real-time alignment of audio signals of music
performance to the corresponding score (a.k.a. score following) which can
handle tempo changes, errors and arbitrary repeats and/or skips (repeats/skips)
in performances. This type of score following is particularly useful in
automatic accompaniment for practices and rehearsals, where errors and
repeats/skips are often made. Simple extensions of the algorithms previously
proposed in the literature are not applicable in these situations for scores of
practical length due to the problem of large computational complexity. To cope
with this problem, we present two hidden Markov models of monophonic
performance with errors and arbitrary repeats/skips, and derive efficient
score-following algorithms with an assumption that the prior probability
distributions of score positions before and after repeats/skips are independent
from each other. We confirmed real-time operation of the algorithms with music
scores of practical length (around 10000 notes) on a modern laptop and their
tracking ability to the input performance within 0.7 s on average after
repeats/skips in clarinet performance data. Further improvements and extension
for polyphonic signals are also discussed.Comment: 12 pages, 8 figures, version accepted in IEEE/ACM Transactions on
Audio, Speech, and Language Processin
Statistical Piano Reduction Controlling Performance Difficulty
We present a statistical-modelling method for piano reduction, i.e.
converting an ensemble score into piano scores, that can control performance
difficulty. While previous studies have focused on describing the condition for
playable piano scores, it depends on player's skill and can change continuously
with the tempo. We thus computationally quantify performance difficulty as well
as musical fidelity to the original score, and formulate the problem as
optimization of musical fidelity under constraints on difficulty values. First,
performance difficulty measures are developed by means of probabilistic
generative models for piano scores and the relation to the rate of performance
errors is studied. Second, to describe musical fidelity, we construct a
probabilistic model integrating a prior piano-score model and a model
representing how ensemble scores are likely to be edited. An iterative
optimization algorithm for piano reduction is developed based on statistical
inference of the model. We confirm the effect of the iterative procedure; we
find that subjective difficulty and musical fidelity monotonically increase
with controlled difficulty values; and we show that incorporating sequential
dependence of pitches and fingering motion in the piano-score model improves
the quality of reduction scores in high-difficulty cases.Comment: 12 pages, 7 figures, version accepted to APSIPA Transactions on
Signal and Information Processin
Rhythm Transcription of Polyphonic MIDI Performances Based on a Merged-output HMM for Multiple Voices
(Abstract to follow
MULTI-STEP CHORD SEQUENCE PREDICTION BASED ON AGGREGATED MULTI-SCALE ENCODER-DECODER NETWORKS
International audienceThis paper studies the prediction of chord progressions for jazz music by relying on machine learning models. The motivation of our study comes from the recent success of neu-ral networks for performing automatic music composition. Although high accuracies are obtained in single-step prediction scenarios, most models fail to generate accurate multi-step chord predictions. In this paper, we postulate that this comes from the multi-scale structure of musical information and propose new architectures based on an iterative temporal aggregation of input labels. Specifically, the input and ground truth labels are merged into increasingly large temporal bags, on which we train a family of encoder-decoder networks for each temporal scale. In a second step, we use these pre-trained encoder bottleneck features at each scale in order to train a final encoder-decoder network. Furthermore, we rely on different reductions of the initial chord alphabet into three adapted chord alphabets. We perform evaluations against several state-of-the-art models and show that our multi-scale architecture outperforms existing methods in terms of accuracy and perplexity, while requiring relatively few parameters. We analyze musical properties of the results, showing the influence of downbeat position within the analysis window on accuracy , and evaluate errors using a musically-informed distance metric