399 research outputs found
Sheet Music Unbound: A fluid approach to sheet music display and annotation on a multi-touch screen
In this thesis we present the design and prototype implementation of a Digital Music Stand that focuses on fluid music layout management and free-form digital ink annotation. An analysis of user constraints and available technology lead us to select a 21.5” multi-touch monitor as the preferred input and display device. This comfortably displays two A4 pages of music side by side with space for a control panel. The analysis also identified single handed input as a viable choice for musicians. Finger input was chosen to avoid the need for any additional input equipment. To support layout reflow and zooming we develop a vector based music representation, based around the bar structure. This representation supports animation of transitions, in such a way as to give responsive dynamic interaction with multi-touch gesture input. In developing the prototype, particular attention was paid to the problem of drawing small, intricate annotation accurately located on the music using a fingertip. The zoomable nature of the music structure was leveraged to accomplish this, and an evaluation carried out to establish the best level of magnification. The thesis demonstrates, in the context of music, that annotation and layout management (typically treated as two distinct tasks) can be integrated into a single task yielding fluid and natural interaction
From heuristics-based to data-driven audio melody extraction
The identification of the melody from a music recording is a relatively easy task for humans, but very challenging for computational systems. This task is known as "audio melody extraction", more formally defined as the automatic estimation of the pitch sequence of the melody directly from the audio signal of a polyphonic music recording. This thesis investigates the benefits of exploiting knowledge automatically derived from data for audio melody extraction, by combining digital signal processing and machine learning methods. We extend the scope of melody extraction research by working with a varied dataset and multiple definitions of melody. We first present an overview of the state of the art, and perform an evaluation focused on a novel symphonic music dataset. We then propose melody extraction methods based on a source-filter model and pitch contour characterisation and evaluate them on a wide range of music genres. Finally, we explore novel timbre, tonal and spatial features for contour characterisation, and propose a method for estimating multiple melodic lines. The combination of supervised and unsupervised approaches leads to advancements on melody extraction and shows a promising path for future research and applications
Music Synchronization, Audio Matching, Pattern Detection, and User Interfaces for a Digital Music Library System
Over the last two decades, growing efforts to digitize our cultural heritage could be observed. Most of these digitization initiatives pursuit either one or both of the following goals: to conserve the documents - especially those threatened by decay - and to provide remote access on a grand scale. For music documents these trends are observable as well, and by now several digital music libraries are in existence. An important characteristic of these music libraries is an inherent multimodality resulting from the large variety of available digital music representations, such as scanned score, symbolic score, audio recordings, and videos. In addition, for each piece of music there exists not only one document of each type, but many. Considering and exploiting this multimodality and multiplicity, the DFG-funded digital library initiative PROBADO MUSIC aimed at developing a novel user-friendly interface for content-based retrieval, document access, navigation, and browsing in large music collections. The implementation of such a front end requires the multimodal linking and indexing of the music documents during preprocessing. As the considered music collections can be very large, the automated or at least semi-automated calculation of these structures would be recommendable. The field of music information retrieval (MIR) is particularly concerned with the development of suitable procedures, and it was the goal of PROBADO MUSIC to include existing and newly developed MIR techniques to realize the envisioned digital music library system. In this context, the present thesis discusses the following three MIR tasks: music synchronization, audio matching, and pattern detection. We are going to identify particular issues in these fields and provide algorithmic solutions as well as prototypical implementations. In Music synchronization, for each position in one representation of a piece of music the corresponding position in another representation is calculated. This thesis focuses on the task of aligning scanned score pages of orchestral music with audio recordings. Here, a previously unconsidered piece of information is the textual specification of transposing instruments provided in the score. Our evaluations show that the neglect of such information can result in a measurable loss of synchronization accuracy. Therefore, we propose an OCR-based approach for detecting and interpreting the transposition information in orchestral scores. For a given audio snippet, audio matching methods automatically calculate all musically similar excerpts within a collection of audio recordings. In this context, subsequence dynamic time warping (SSDTW) is a well-established approach as it allows for local and global tempo variations between the query and the retrieved matches. Moving to real-life digital music libraries with larger audio collections, however, the quadratic runtime of SSDTW results in untenable response times. To improve on the response time, this thesis introduces a novel index-based approach to SSDTW-based audio matching. We combine the idea of inverted file lists introduced by Kurth and Müller (Efficient index-based audio matching, 2008) with the shingling techniques often used in the audio identification scenario. In pattern detection, all repeating patterns within one piece of music are determined. Usually, pattern detection operates on symbolic score documents and is often used in the context of computer-aided motivic analysis. Envisioned as a new feature of the PROBADO MUSIC system, this thesis proposes a string-based approach to pattern detection and a novel interactive front end for result visualization and analysis
Performance Following: Real-Time Prediction of Musical Sequences Without a Score
(c)2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works
Evaluation and combination of pitch estimation methods for melody extraction in symphonic classical music
The extraction of pitch information is arguably one of the most important
tasks in automatic music description systems. However, previous
research and evaluation datasets dealing with pitch estimation focused
on relatively limited kinds of musical data. This work aims to broaden
this scope by addressing symphonic western classical music recordings,
focusing on pitch estimation for melody extraction. This material is characterised
by a high number of overlapping sources, and by the fact that the
melody may be played by different instrumental sections, often alternating
within an excerpt. We evaluate the performance of eleven state-of-the-art
pitch salience functions, multipitch estimation and melody extraction algorithms
when determining the sequence of pitches corresponding to the
main melody in a varied set of pieces. An important contribution of the
present study is the proposed evaluation framework, including the annotation
methodology, generated dataset and evaluation metrics. The results
show that the assumptions made by certain methods hold better than
others when dealing with this type of music signals, leading to a better
performance. Additionally, we propose a simple method for combining
the output of several algorithms, with promising results
Wagner Ring Dataset: A Complex Opera Scenario for Music Processing and Computational Musicology
This paper introduces the Wagner Ring Dataset (WRD), a multi-modal and multi-version resource on the large-scale opera cycle Der Ring des Nibelungen by Richard Wagner. The Ring comprises four music dramas organized into eleven acts and 21 939 measures in total. Concerning sheet music, we processed a publicly available piano reduction (822 pages) of the full score with optical music recognition followed by extensive manual corrections to create a high-quality, machine-readable symbolic score. Concerning audio data, our corpus covers 16 recorded performances of the full Ring (three of which are publicly available thanks to copyright expiry), each lasting about 14–15 hours. To musically synchronize these versions among each other, we manually annotated all measure positions for three performances, which we transferred to the remaining performances via automated synchronization techniques. The dataset further comprises annotations of key and time signatures, scenes, and singing voice regions (libretto). Moreover, we provide note event annotations for all performances derived from the piano score. The WRD thus constitutes a comprehensive resource for developing algorithms for various music information retrieval tasks, complementing existing datasets with a complex opera scenario. For computational musicology, the WRD serves as a structured dataset that allows for studying the composition and performances of the Ring
Alignment for Honesty
Recent research has made significant strides in applying alignment techniques
to enhance the helpfulness and harmlessness of large language models (LLMs) in
accordance with human intentions. In this paper, we argue for the importance of
alignment for honesty, ensuring that LLMs proactively refuse to answer
questions when they lack knowledge, while still not being overly conservative.
However, a pivotal aspect of alignment for honesty involves discerning the
limits of an LLM's knowledge, which is far from straightforward. This challenge
demands comprehensive solutions in terms of metric development, benchmark
creation, and training methodologies. In this paper, we address these
challenges by first establishing a precise problem definition and defining
``honesty'' inspired by the Analects of Confucius. This serves as a cornerstone
for developing metrics that effectively measure an LLM's honesty by quantifying
its progress post-alignment. Furthermore, we introduce a flexible training
framework which is further instantiated by several efficient fine-tuning
techniques that emphasize honesty without sacrificing performance on other
tasks. Our extensive experiments reveal that these aligned models show a marked
increase in honesty, as indicated by our proposed metrics. We open-source a
wealth of resources to facilitate future research at
https://github.com/GAIR-NLP/alignment-for-honesty, including honesty-aligned
models, training and evaluation datasets for honesty alignment, concept
glossary, as well as all relevant source code
- …