125 research outputs found
Real-Time Audio-to-Score Alignment of Music Performances Containing Errors and Arbitrary Repeats and Skips
This paper discusses real-time alignment of audio signals of music
performance to the corresponding score (a.k.a. score following) which can
handle tempo changes, errors and arbitrary repeats and/or skips (repeats/skips)
in performances. This type of score following is particularly useful in
automatic accompaniment for practices and rehearsals, where errors and
repeats/skips are often made. Simple extensions of the algorithms previously
proposed in the literature are not applicable in these situations for scores of
practical length due to the problem of large computational complexity. To cope
with this problem, we present two hidden Markov models of monophonic
performance with errors and arbitrary repeats/skips, and derive efficient
score-following algorithms with an assumption that the prior probability
distributions of score positions before and after repeats/skips are independent
from each other. We confirmed real-time operation of the algorithms with music
scores of practical length (around 10000 notes) on a modern laptop and their
tracking ability to the input performance within 0.7 s on average after
repeats/skips in clarinet performance data. Further improvements and extension
for polyphonic signals are also discussed.Comment: 12 pages, 8 figures, version accepted in IEEE/ACM Transactions on
Audio, Speech, and Language Processin
Dynamic Bayesian networks for symbolic polyphonic pitch modeling
National audienceThe performance of many MIR analysis algorithms, most importantly polyphonic pitch transcription, can be improved by introducing musicological knowledge to the estimation process. We have developed a probabilistically rigorous musicological model that takes into account dependencies between consequent musical notes and consequent chords, as well as the dependencies between chords, notes and the observed note saliences. We investigate its modeling potential by measuring and comparing the cross-entropy with symbolic (MIDI) data
Rhythm Transcription of Polyphonic MIDI Performances Based on a Merged-output HMM for Multiple Voices
(Abstract to follow
Robust estimation of directions-of-arrival in diffuse noise based on matrix-space sparsity
We consider the estimation of the Directions-Of-Arrival (DOA) of target signals in diffuse noise. The state-of-the-art MUltiple SIgnal Classification (MUSIC) algorithm necessitates accurate identification of the signal subspace. In diffuse noise, however, it is difficult to identify it directly from the observed spatial covariance matrix. In our approach, we estimate the target spatial covariance matrix, so that we can identify the orthogonal complement of the signal subspace as its null space. We present a unified framework for modeling noise covariance in a matrix space, which generalizes four state-of-the-art diffuse noise models. We propose two alternative algorithms for estimating the target spatial covariance matrix, namely Low-rank Matrix Completion (LMC) and Trace Norm Minimization (TNM). These rely on denoising of the observed spatial covariance matrix via orthogonal projection onto the orthogonal complement of the noise matrix subspace. The missing component lying in the noise matrix subspace is then completed by exploiting the low-rankness of the target spatial covariance matrix. Large-scale experiments with real-world noise show that TNM with a certain noise model outperforms conventional MUSIC based on Generalized EigenValue Decomposition (GEVD) by 5% in terms of the precision averaged over the dataset
Feature-Dependent Allophone Clustering
We propose a novel method for clustering allophones called Feature-Dependent Allophone Clustering (FD-AC) that determines feature-dependent HMM topology automatically. Existing methods for allophone clustering are based on parameter sharing between the allophone models that resemble each other in behaviors of feature vector sequences. However, all the features of the vector sequences may not necessarily have a common allophone clustering structures It is considered that the vector sequences can be better modeled by allocating the optimal allophone clustering structure to each feature. In this paper, we propose Feature-Dependent Successive State Splitting (FD-SSS) as an implementation of FD-AC. In speaker-dependent continuous phoneme recognition experiments, HMMs created by FD-SSS reduced the error rates by about 10% compared with the conventional HMMs that have a common allophone clustering structure for all the features
Asynchronous-Transition HMM
We propose a new class of hidden Markov model (HMM) called asynchronous-transition HMM (AT-HMM). Opposed to conventional HMMs where hidden state transition occurs simultaneously to all features, the new class of HMM allows state transitions asynchronized between individual features to better model asynchronous timings of acoustic feature changes. In this paper, we focus on a particular class of AT-HMM with sequential constraints based on a novel concept of “state tying along time”. To maximize the advantage of the new model, we also introduce a feature-wise state tying technique. Speaker-dependent speech recognition experiments demonstrated error reduction rates more than 30% and 50% in phoneme and isolated word recognitions, respectively, compared with conventional HMM
Multichannel harmonic and percussive component separation by joint modeling of spatial and spectral continuity
International audienceThis paper considers the blind separation of the harmonic and percussive components of multichannel music signals. We model the contribution of each source to all mixture channels in the time-frequency domain via a spatial covariance matrix, which encodes its spatial characteristics, and a scalar spectral variance, which represents its spectral structure. We then exploit the spatial continuity and the different spectral continuity structures of harmonic and percussive components as prior information to derive maximum a posteriori (MAP) estimates of the parameters using the expectation-maximization (EM) algorithm. Experimental results over professional musical mixtures show the effectiveness of the proposed approach
Substroke Approach to HMM-based On-line Kanji Handwriting Recognition.
A new method is proposed for on-line handwriting recognition of Kanji characters. The method employs substroke HMMs as minimum units to constitute Japanese Kanji characters and utilizes the direction of pen motion. The main motivation is to fully utilize the continuous speech recognition algorithm by relating sentence speech to Kanji character, phonemes to substrokes, and grammar to Kanji structure. The proposed system consists input feature analysis, substroke HMMs, a character structure dictionary and a decoder. The present approach has the following advantages over the conventional methods that employ whole character HMMs. 1) Much smaller memory requirement for dictionary and models. 2) Fast recognition by employing efficient substroke network search. 3) Capability of recognizing characters not included in the training data if defined as a sequence of substrokes in the dictionary. 4) Capability of recognizing characters written by various different stroke orders with multiple definitions per one character in the dictionary. 5) Easiness in HMM adaptation to the user with a few sample character data
- …