200 research outputs found

    Computational Models of Expressive Music Performance: A Comprehensive and Critical Review

    Get PDF
    Expressive performance is an indispensable part of music making. When playing a piece, expert performers shape various parameters (tempo, timing, dynamics, intonation, articulation, etc.) in ways that are not prescribed by the notated score, in this way producing an expressive rendition that brings out dramatic, affective, and emotional qualities that may engage and affect the listeners. Given the central importance of this skill for many kinds of music, expressive performance has become an important research topic for disciplines like musicology, music psychology, etc. This paper focuses on a specific thread of research: work on computational music performance models. Computational models are attempts at codifying hypotheses about expressive performance in terms of mathematical formulas or computer programs, so that they can be evaluated in systematic and quantitative ways. Such models can serve at least two purposes: they permit us to systematically study certain hypotheses regarding performance; and they can be used as tools to generate automated or semi-automated performances, in artistic or educational contexts. The present article presents an up-to-date overview of the state of the art in this domain. We explore recent trends in the field, such as a strong focus on data-driven (machine learning) approaches; a growing interest in interactive expressive systems, such as conductor simulators and automatic accompaniment systems; and an increased interest in exploring cognitively plausible features and models. We provide an in-depth discussion of several important design choices in such computer models, and discuss a crucial (and still largely unsolved) problem that is hindering systematic progress: the question of how to evaluate such models in scientifically and musically meaningful ways. From all this, we finally derive some research directions that should be pursued with priority, in order to advance the field and our understanding of expressive music performance

    Perspectives and approaches to determine measures of similarity for musical performances using data analysis algorithms

    Get PDF
    La caracterización automática de obras musicales, e intérpretes de las mismas, es objeto de investigación en la actualidad. Debido principalmente a la importancia del tópico, pero también al auge tecnológico y la disposición de herramientas computacionales capaces de detectar voz y sonido. El reconocimiento automático de una composición, así como su intérprete y características relevantes desde el punto de vista musical, ha mantenido a una comunidad de investigadores en la búsqueda de medidas que permitan hacer comparaciones e inferir con precisión sobre las características de la composición y el intérprete; sin embargo, esta medida aún está por descubrirse, aunque se han generado diversas técnicas estadístico-computacionales que merecen ser evaluadas y quizás combinadas para fortalecer cualquier investigación en este tópico. Este trabajo, producto de una basta revisión bibliográfica, recoge las principales técnicas y herramientas que han sido utilizadas y propuestas por investigadores en las últimas dos décadas. El documento será de ayuda a los investigadores que decidan emprender estudios, evaluaciones e implementaciones de estas herramientas, así como también aquellos que deseen trabajar en el reconocimiento automático de obras musicales, sus características e intérpretes, o recuperación de información musical.The automatic characterization of a musical composition, and its interpretation, is a current line of research, due to its importance and the technological resources available and computational tools capable for detecting voice and sound. The recognition of an interpreter when listening a composition is simple for a human, but not as simple for machines, thus, this topic has kept a community of researchers in search of measures to compare and accurately recognize the composition characteristics and the interpreter. However, still a general measure has yet to be discovered, although various statistical-computational techniques have been generated and deserve to be evaluated and perhaps combined to strengthen any research on this topic. This work is the product of a comprehensive literature review that collects the main techniques and tools that have been used and proposed by researchers over the last two decades. The document will be of help to researchers who decide to undertake studies, evaluations and implementations of these tools, as well as those who wish to work on automatic recognition of interpreters and characteristics of musical compositions, or music information retrieval

    Sounding Out Reconstruction Error-Based Evaluation of Generative Models of Expressive Performance

    Full text link
    Generative models of expressive piano performance are usually assessed by comparing their predictions to a reference human performance. A generative algorithm is taken to be better than competing ones if it produces performances that are closer to a human reference performance. However, expert human performers can (and do) interpret music in different ways, making for different possible references, and quantitative closeness is not necessarily aligned with perceptual similarity, raising concerns about the validity of this evaluation approach. In this work, we present a number of experiments that shed light on this problem. Using precisely measured high-quality performances of classical piano music, we carry out a listening test indicating that listeners can sometimes perceive subtle performance difference that go unnoticed under quantitative evaluation. We further present tests that indicate that such evaluation frameworks show a lot of variability in reliability and validity across different reference performances and pieces. We discuss these results and their implications for quantitative evaluation, and hope to foster a critical appreciation of the uncertainties involved in quantitative assessments of such performances within the wider music information retrieval (MIR) community

    Musical complexity and ‘Embodied notation’ : a study of the opus Clavicembalisticum (K. S. Sorabji)

    Get PDF
    Scores of complex, 20th century, solo piano pieces can be difficult to perform and may even include elements that are physically impossible to play. This article investigates the role of music notation in the Opus Clavicembalisticum of Sorabji, which is a rather extreme case in terms of virtuosity and length. To analyze the effect of score notation on learning and performing, 9 pianists were asked to practice music fragments in 3 different score editions, being the original Urtext edition (a 4-staff score), Performance edition (same notes but organized according to an “embodied” performance viewpoint), and Study edition (further simplified and with added analytical reading aids). The hypothesis was that the “embodied notation”, would have an effect on study time (shorter study time) and errors (less errors). Objective features of the study process and performance, such as study time, error ratio, markings on the score (fingerings, hand distribution, synchronization) were compared. Subjective remarks the performers made about the scores were also analyzed. Findings indicate a significant positive influence of the score type on the study time. These results suggest that players draw on ideomotor principles, which include processes based on learned and “embodied” associations between perceived images of the scores and the motor activity that is directly associated with it

    Interactive real-time musical systems

    Get PDF
    PhDThis thesis focuses on the development of automatic accompaniment systems. We investigate previous systems and look at a range of approaches that have been attempted for the problem of beat tracking. Most beat trackers are intended for the purposes of music information retrieval where a `black box' approach is tested on a wide variety of music genres. We highlight some of the diffculties facing offline beat trackers and design a new approach for the problem of real-time drum tracking, developing a system, B-Keeper, which makes reasonable assumptions on the nature of the signal and is provided with useful prior knowledge. Having developed the system with offline studio recordings, we look to test the system with human players. Existing offline evaluation methods seem less suitable for a performance system, since we also wish to evaluate the interaction between musician and machine. Although statistical data may reveal quantifiable measurements of the system's predictions and behaviour, we also want to test how well it functions within the context of a live performance. To do so, we devise an evaluation strategy to contrast a machine-controlled accompaniment with one controlled by a human. We also present recent work on a real-time multiple pitch tracking, which is then extended to provide automatic accompaniment for harmonic instruments such as guitar. By aligning salient notes in the output from a dual pitch tracking process, we make changes to the tempo of the accompaniment in order to align it with a live stream. By demonstrating the system's ability to align offline tracks, we can show that under restricted initial conditions, the algorithm works well as an alignment tool

    Statistical distribution of common audio features : encounters in a heavy-tailed universe

    Get PDF
    In the last few years some Music Information Retrieval (MIR) researchers have spotted important drawbacks in applying standard successful-in-monophonic algorithms to polyphonic music classification and similarity assessment. Noticeably, these so called “Bag-of-Frames” (BoF) algorithms share a common set of assumptions. These assumptions are substantiated in the belief that the numerical descriptions extracted from short-time audio excerpts (or frames) are enough to capture relevant information for the task at hand, that these frame-based audio descriptors are time independent, and that descriptor frames are well described by Gaussian statistics. Thus, if we want to improve current BoF algorithms we could: i) improve current audio descriptors, ii) include temporal information within algorithms working with polyphonic music, and iii) study and characterize the real statistical properties of these frame-based audio descriptors. From a literature review, we have detected that many works focus on the first two improvements, but surprisingly, there is a lack of research in the third one. Therefore, in this thesis we analyze and characterize the statistical distribution of common audio descriptors of timbre, tonal and loudness information. Contrary to what is usually assumed, our work shows that the studied descriptors are heavy-tailed distributed and thus, they do not belong to a Gaussian universe. This new knowledge led us to propose new algorithms that show improvements over the BoF approach in current MIR tasks such as genre classification, instrument detection, and automatic tagging of music. Furthermore, we also address new MIR tasks such as measuring the temporal evolution of Western popular music. Finally, we highlight some promising paths for future audio-content MIR research that will inhabit a heavy-tailed universe.En el campo de la extracción de información musical o Music Information Retrieval (MIR), los algoritmos llamados Bag-of-Frames (BoF) han sido aplicados con éxito en la clasificación y evaluación de similitud de señales de audio monofónicas. Por otra parte, investigaciones recientes han señalado problemas importantes a la hora de aplicar dichos algoritmos a señales de música polifónica. Estos algoritmos suponen que las descripciones numéricas extraídas de los fragmentos de audio de corta duración (o frames ) son capaces de capturar la información necesaria para la realización de las tareas planteadas, que el orden temporal de estos fragmentos de audio es irrelevante y que las descripciones extraídas de los segmentos de audio pueden ser correctamente descritas usando estadísticas Gaussianas. Por lo tanto, si se pretende mejorar los algoritmos BoF actuales se podría intentar: i) mejorar los descriptores de audio, ii) incluir información temporal en los algoritmos que trabajan con música polifónica y iii) estudiar y caracterizar las propiedades estadísticas reales de los descriptores de audio. La bibliografía actual sobre el tema refleja la existencia de un número considerable de trabajos centrados en las dos primeras opciones de mejora, pero sorprendentemente, hay una carencia de trabajos de investigación focalizados en la tercera opción. Por lo tanto, esta tesis se centra en el análisis y caracterización de la distribución estadística de descriptores de audio comúnmente utilizados para representar información tímbrica, tonal y de volumen. Al contrario de lo que se asume habitualmente, nuestro trabajo muestra que los descriptores de audio estudiados se distribuyen de acuerdo a una distribución de “cola pesada” y por lo tanto no pertenecen a un universo Gaussiano. Este descubrimiento nos permite proponer nuevos algoritmos que evidencian mejoras importantes sobre los algoritmos BoF actualmente utilizados en diversas tareas de MIR tales como clasificación de género, detección de instrumentos musicales y etiquetado automático de música. También nos permite proponer nuevas tareas tales como la medición de la evolución temporal de la música popular occidental. Finalmente, presentamos algunas prometedoras líneas de investigación para tareas de MIR ubicadas, a partir de ahora, en un universo de “cola pesada”.En l’àmbit de la extracció de la informació musical o Music Information Retrieval (MIR), els algorismes anomenats Bag-of-Frames (BoF) han estat aplicats amb èxit en la classificació i avaluació de similitud entre senyals monofòniques. D’altra banda, investigacions recents han assenyalat importants inconvenients a l’hora d’aplicar aquests mateixos algorismes en senyals de música polifònica. Aquests algorismes BoF suposen que les descripcions numèriques extretes dels fragments d’àudio de curta durada (frames) son suficients per capturar la informació rellevant per als algorismes, que els descriptors basats en els fragments son independents del temps i que l’estadística Gaussiana descriu correctament aquests descriptors. Per a millorar els algorismes BoF actuals doncs, es poden i) millorar els descriptors, ii) incorporar informació temporal dins els algorismes que treballen amb música polifònica i iii) estudiar i caracteritzar les propietats estadístiques reals d’aquests descriptors basats en fragments d’àudio. Sorprenentment, de la revisió bibliogràfica es desprèn que la majoria d’investigacions s’han centrat en els dos primers punts de millora mentre que hi ha una mancança quant a la recerca en l’àmbit del tercer punt. És per això que en aquesta tesi, s’analitza i caracteritza la distribució estadística dels descriptors més comuns de timbre, to i volum. El nostre treball mostra que contràriament al què s’assumeix, els descriptors no pertanyen a l’univers Gaussià sinó que es distribueixen segons una distribució de “cua pesada”. Aquest descobriment ens permet proposar nous algorismes que evidencien millores importants sobre els algorismes BoF utilitzats actualment en diferents tasques com la classificació del gènere, la detecció d’instruments musicals i l’etiquetatge automàtic de música. Ens permet també proposar noves tasques com la mesura de l’evolució temporal de la música popular occidental. Finalment, presentem algunes prometedores línies d’investigació per a tasques de MIR ubicades a partir d’ara en un univers de “cua pesada”

    On the analysis of musical performance by computer

    Get PDF
    Existing automatic methods of analysing musical performance can generally be described as music-oriented DSP analysis. However, this merely identifies attributes, or artefacts which can be found within the performance. This information, though invaluable, is not an analysis of the performance process. The process of performance first involves an analysis of the score (whether from a printed sheet or from memory), and through this analysis, the performer decides how to perform the piece. Thus, an analysis of the performance process requires an analysis of the performance attributes and artefacts in the context of the musical score. With this type analysis it is possible to ask profound questions such as “why or when does a performer use this technique”. The work presented in this thesis provides the tools which are required to investigate these performance issues. A new computer representation, Performance Markup Language (PML) is presented which combines the domains of the musical score, performance information and analytical structures. This representation provides the framework with which information within these domains can be cross-referenced internally, and the markup of information in external files. Most importantly, the rep resentation defines the relationship between performance events and the corresponding objects within the score, thus facilitating analysis of performance information in the context of the score and analyses of the score. To evaluate the correspondences between performance notes and notes within the score, the performance must be analysed using a score-performance matching algorithm. A new score-performance matching algorithm is presented in this document which is based on Dynamic Programming. In score-performance matching there are situations where dynamic programming alone is not sufficient to accurately identify correspondences. The algorithm presented here makes use of analyses of both the score and the performance to overcome the inherent shortcomings of the DP method and to improve the accuracy and robustness of DP matching in the presence of performance errors and expressive timing. Together with the musical score and performance markup, the correspondences identified by the matching algorithm provide the minimum information required to investigate musical performance, and forms the foundation of a PML representation. The Microtonalism project investigated the issues surrounding the performance of microtonal music on conventional (i.e. non microtonal specific) instruments, namely voice. This included the automatic analysis of vocal performances to extract information regarding pitch accuracy. This was possible using tools developed using the performance representation and the matching algorithm

    Group-wise automatic music transcription

    Get PDF
    Background: Music transcription is the conversion of musical audio into notation such that a musician can recreate the piece. Automatic music transcription (AMT) is the automation of this process. Current AMT algorithms produce a less musically meaningful transcription than human transcribers. However, AMT performs better at predicting notes present in a short time frame. Group-wise Automatic Music Transcription, (GWAMT) is when several renditions of a piece are used to give a single transcription. Aims: The main aim was to perform investigations into GWAMT. Secondary aims included: Comparing methods for GWAMT on the frame level; Considering the impact of GWAMT on the broader field of AMT. Method(s)/Procedures: GWAMT transcription is split into three stages: transcription, alignment and combination. Transcription is performed by splitting the piece into frames, and using a classifier to identify the notes present. Convolutional Neural Networks (CNNs) are used with a novel training methodology and architecture. Different renditions of the same piece have corresponding notes occurring at different times. In order to match corresponding frames, methods for the alignment of multiple renditions are used. Several methods were compared, pairwise alignment, progressive alignment and a new method, iterative alignment. The effect of when the aligned features are combined (early/late), and how (majority vote, linear opinion pool, logarithmic opinion pool, max, median), is investigated. Results: The developed method for frame-level transcription achieves state-of-the-art transcription accuracy on the MAPS database with an F1-score of 76.67%. Experiments on GWAMT show that the F1-score can be improved by between 0.005 to 0.01 using the majority vote and logarithmic pool combination methods. Conclusions/Implications: These experiments show that group-wise frame-level transcription can improve the transcription when there are different tempos, noise levels, dynamic ranges and reverbs between the clips. They also demonstrate a future application of GWAMT to individual pieces with repeated segments
    corecore