104 research outputs found

    Real-Time Audio-to-Score Alignment of Music Performances Containing Errors and Arbitrary Repeats and Skips

    Full text link
    This paper discusses real-time alignment of audio signals of music performance to the corresponding score (a.k.a. score following) which can handle tempo changes, errors and arbitrary repeats and/or skips (repeats/skips) in performances. This type of score following is particularly useful in automatic accompaniment for practices and rehearsals, where errors and repeats/skips are often made. Simple extensions of the algorithms previously proposed in the literature are not applicable in these situations for scores of practical length due to the problem of large computational complexity. To cope with this problem, we present two hidden Markov models of monophonic performance with errors and arbitrary repeats/skips, and derive efficient score-following algorithms with an assumption that the prior probability distributions of score positions before and after repeats/skips are independent from each other. We confirmed real-time operation of the algorithms with music scores of practical length (around 10000 notes) on a modern laptop and their tracking ability to the input performance within 0.7 s on average after repeats/skips in clarinet performance data. Further improvements and extension for polyphonic signals are also discussed.Comment: 12 pages, 8 figures, version accepted in IEEE/ACM Transactions on Audio, Speech, and Language Processin

    Autoregressive hidden semi-Markov model of symbolic music performance for score following

    Get PDF
    International audienceA stochastic model of symbolic (MIDI) performance of polyphonic scores is presented and applied to score following. Stochastic modelling has been one of the most successful strategies in this field. We describe the performance as a hierarchical process of performer's progression in the score and the production of performed notes, and represent the process as an extension of the hidden semi-Markov model. The model is compared with a previously studied model based on hidden Markov model (HMM), and reasons are given that the present model is advantageous for score following especially for scores with trills, tremolos, and arpeggios. This is also confirmed empirically by comparing the accuracy of score following and analysing the errors. We also provide a hybrid of this model and the HMM-based model which is computationally more efficient and retains the advantages of the former model. The present model yields one of the state-of-the-art score following algorithms for symbolic performance and can possibly be applicable for other music recognition problems

    A Novel Interface for the Graphical Analysis of Music Practice Behaviors

    Get PDF
    Practice is an essential part of music training, but critical content-based analyses of practice behaviors still lack tools for conveying informative representation of practice sessions. To bridge this gap, we present a novel visualization system, the Music Practice Browser, for representing, identifying, and analysing music practice behaviors. The Music Practice Browser provides a graphical interface for reviewing recorded practice sessions, which allows musicians, teachers, and researchers to examine aspects and features of music practice behaviors. The system takes beat and practice segment information together with a musical score in XML format as input, and produces a number of different visualizations: Practice Session Work Maps give an overview of contiguous practice segments; Practice Segment Arcs make evident transitions and repeated segments; Practice Session Precision Maps facilitate the identifying of errors; Tempo-Loudness Evolution Graphs track expressive variations over the course of a practice session. We then test the new system on practice sessions of pianists of varying levels of expertise ranging from novice to expert. The practice patterns found include Drill-Correct, Drill-Smooth, Memorization Strategy, Review and Explore, and Expressive Evolution. The analysis reveals practice patterns and behavior differences between beginners and experts, such as a higher proportion of Drill-Smooth patterns in expert practice

    PIANO SCORE FOLLOWING WITH HIDDEN TIMBRE OR TEMPO USING SWITCHING KALMAN FILTERS

    Get PDF
    Thesis (Ph.D.) - Indiana University, University Graduate School/Luddy School of Informatics, Computing, and Engineering, 2020Score following is an AI technique that enables computer programs to “listen to” music: to track a live musical performance in relation to its written score, even through variations in tempo and amplitude. This ability can be transformative for musical practice, performance, education, and composition. Although score following has been successful on monophonic music (one note at a time), it has difficulty with polyphonic music. One of the greatest challenges is piano music, which is highly polyphonic. This dissertation investigates ways to overcome the challenges of polyphonic music, and casts light on the nature of the problem through empirical experiments. I propose two new approaches inspired by two important aspects of music that humans perceive during a performance: the pitch profile of the sound, and the timing. In the first approach, I account for changing timbre within a chord by tracking harmonic amplitudes to improve matching between the score and the sound. In the second approach, I model tempo in music, allowing it to deviate from the default tempo value within reasonable statistical constraints. For both methods, I develop switching Kalman filter models that are interesting in their own right. I have conducted experiments on 50 excerpts of real piano performances, and analyzed the results both case-by-case and statistically. The results indicate that modeling tempo is essential for piano score following, and the second method significantly outperformed the state-of-the-art baseline. The first method, although it did not show improvement over the baseline, still represents a promising new direction for future research. Taken together, the results contribute to a more nuanced and multifaceted understanding of the score-following problem

    Computational Models of Expressive Music Performance: A Comprehensive and Critical Review

    Get PDF
    Expressive performance is an indispensable part of music making. When playing a piece, expert performers shape various parameters (tempo, timing, dynamics, intonation, articulation, etc.) in ways that are not prescribed by the notated score, in this way producing an expressive rendition that brings out dramatic, affective, and emotional qualities that may engage and affect the listeners. Given the central importance of this skill for many kinds of music, expressive performance has become an important research topic for disciplines like musicology, music psychology, etc. This paper focuses on a specific thread of research: work on computational music performance models. Computational models are attempts at codifying hypotheses about expressive performance in terms of mathematical formulas or computer programs, so that they can be evaluated in systematic and quantitative ways. Such models can serve at least two purposes: they permit us to systematically study certain hypotheses regarding performance; and they can be used as tools to generate automated or semi-automated performances, in artistic or educational contexts. The present article presents an up-to-date overview of the state of the art in this domain. We explore recent trends in the field, such as a strong focus on data-driven (machine learning) approaches; a growing interest in interactive expressive systems, such as conductor simulators and automatic accompaniment systems; and an increased interest in exploring cognitively plausible features and models. We provide an in-depth discussion of several important design choices in such computer models, and discuss a crucial (and still largely unsolved) problem that is hindering systematic progress: the question of how to evaluate such models in scientifically and musically meaningful ways. From all this, we finally derive some research directions that should be pursued with priority, in order to advance the field and our understanding of expressive music performance

    Modelling Professional Singers: A Bayesian Machine Learning Approach with Enhanced Real-time Pitch Contour Extraction and Onset Processing from an Extended Dataset.

    Get PDF
    Singing signals are one of the input data that computer systems need to analyse, and singing is part of all the cultures in the world. However, although there have been several studies on audio signal processing during the last three decades, it is still an active research area because most of the available algorithms in the literature require improvement due to the complexity of audio/music signals. More efforts are needed for analysing sounds/music in a real-time environment since the algorithms should work only on the past data, while in an offline system, all the required data are available. In addition, the complexity of the data will be increased if the audio signals come from singing due to the unique features of singing signals (such as vocal system, vibration, pitch drift, and tuning approach) that make the signals different and more complicated than those from an instrument. This thesis is mainly focused on analysing singing signals and better understanding how trained- professional singers sing the pitch frequency and duration of the notes according to their position in a piece of music and the singing technique applied. To do this, it is discovered that by incorporating singing features, such as gender and BPM, a real-time pitch detection algorithm can be found to estimate fundamental frequencies with fewer errors. In addition, two novel algorithms were proposed, one for smoothing pitch contours and another for estimating onset, offset, and the transition between notes. These two algorithms showed better results as compared to several other state-of-the-art algorithms. Moreover, a new vocal dataset that included several annotations for 2688 singing files was published. Finally, this thesis presents two models for calculating pitches and the duration of notes according to their positions in a piece of music. In conclusion, optimizing results for pitch-oriented Music Information Retrieval (MIR) algorithms necessitates adapting/selecting them based on the unique characteristics of the signals. Achieving a universal algorithm that performs exceptionally well on all data types remains a formidable challenge given the current state of technology

    Computational Methods for the Alignment and Score-Informed Transcription of Piano Music

    Get PDF
    PhDThis thesis is concerned with computational methods for alignment and score-informed transcription of piano music. Firstly, several methods are proposed to improve the alignment robustness and accuracywhen various versions of one piece of music showcomplex differences with respect to acoustic conditions or musical interpretation. Secondly, score to performance alignment is applied to enable score-informed transcription. Although music alignment methods have considerably improved in accuracy in recent years, the task remains challenging. The research in this thesis aims to improve the robustness for some cases where there are substantial differences between versions and state-of-the-art methods may fail in identifying a correct alignment. This thesis first exploits the availability of multiple versions of the piece to be aligned. By processing these jointly, the alignment process can be stabilised by exploiting additional examples of how a section might be interpreted or which acoustic conditions may arise. Two methods are proposed, progressive alignment and profile HMM, both adapted from the multiple biological sequence alignment task. Experiments demonstrate that these methods can indeed improve the alignment accuracy and robustness over comparable pairwise methods. Secondly, this thesis presents a score to performance alignment method that can improve the robustness in cases where some musical voices, such as the melody, are played asynchronously to others – a stylistic device used in musical expression. The asynchronies between the melody and the accompaniment are handled by treating the voices as separate timelines in a multi-dimensional variant of dynamic time warping (DTW). The method measurably improves the alignment accuracy for pieces with asynchronous voices and preserves the accuracy otherwise. Once an accurate alignment between a score and an audio recording is available, the score information can be exploited as prior knowledge in automatic music transcription (AMT), for scenarios where score is available, such as music tutoring. Score-informed dictionary learning is used to learn the spectral pattern of each pitch that describes the energy distribution of the associated notes in the recording. More precisely, the dictionary learning process in non-negative matrix factorization (NMF) is constrained using the aligned score. This way, by adapting the dictionary to a given recording, the proposed method improves the accuracy over the state-of-the-art.China Scholarship Council

    Proceedings of the 7th Sound and Music Computing Conference

    Get PDF
    Proceedings of the SMC2010 - 7th Sound and Music Computing Conference, July 21st - July 24th 2010
    corecore