399 research outputs found

    Sheet Music Unbound: A fluid approach to sheet music display and annotation on a multi-touch screen

    Get PDF
    In this thesis we present the design and prototype implementation of a Digital Music Stand that focuses on fluid music layout management and free-form digital ink annotation. An analysis of user constraints and available technology lead us to select a 21.5” multi-touch monitor as the preferred input and display device. This comfortably displays two A4 pages of music side by side with space for a control panel. The analysis also identified single handed input as a viable choice for musicians. Finger input was chosen to avoid the need for any additional input equipment. To support layout reflow and zooming we develop a vector based music representation, based around the bar structure. This representation supports animation of transitions, in such a way as to give responsive dynamic interaction with multi-touch gesture input. In developing the prototype, particular attention was paid to the problem of drawing small, intricate annotation accurately located on the music using a fingertip. The zoomable nature of the music structure was leveraged to accomplish this, and an evaluation carried out to establish the best level of magnification. The thesis demonstrates, in the context of music, that annotation and layout management (typically treated as two distinct tasks) can be integrated into a single task yielding fluid and natural interaction

    From heuristics-based to data-driven audio melody extraction

    Get PDF
    The identification of the melody from a music recording is a relatively easy task for humans, but very challenging for computational systems. This task is known as "audio melody extraction", more formally defined as the automatic estimation of the pitch sequence of the melody directly from the audio signal of a polyphonic music recording. This thesis investigates the benefits of exploiting knowledge automatically derived from data for audio melody extraction, by combining digital signal processing and machine learning methods. We extend the scope of melody extraction research by working with a varied dataset and multiple definitions of melody. We first present an overview of the state of the art, and perform an evaluation focused on a novel symphonic music dataset. We then propose melody extraction methods based on a source-filter model and pitch contour characterisation and evaluate them on a wide range of music genres. Finally, we explore novel timbre, tonal and spatial features for contour characterisation, and propose a method for estimating multiple melodic lines. The combination of supervised and unsupervised approaches leads to advancements on melody extraction and shows a promising path for future research and applications

    Music Synchronization, Audio Matching, Pattern Detection, and User Interfaces for a Digital Music Library System

    Get PDF
    Over the last two decades, growing efforts to digitize our cultural heritage could be observed. Most of these digitization initiatives pursuit either one or both of the following goals: to conserve the documents - especially those threatened by decay - and to provide remote access on a grand scale. For music documents these trends are observable as well, and by now several digital music libraries are in existence. An important characteristic of these music libraries is an inherent multimodality resulting from the large variety of available digital music representations, such as scanned score, symbolic score, audio recordings, and videos. In addition, for each piece of music there exists not only one document of each type, but many. Considering and exploiting this multimodality and multiplicity, the DFG-funded digital library initiative PROBADO MUSIC aimed at developing a novel user-friendly interface for content-based retrieval, document access, navigation, and browsing in large music collections. The implementation of such a front end requires the multimodal linking and indexing of the music documents during preprocessing. As the considered music collections can be very large, the automated or at least semi-automated calculation of these structures would be recommendable. The field of music information retrieval (MIR) is particularly concerned with the development of suitable procedures, and it was the goal of PROBADO MUSIC to include existing and newly developed MIR techniques to realize the envisioned digital music library system. In this context, the present thesis discusses the following three MIR tasks: music synchronization, audio matching, and pattern detection. We are going to identify particular issues in these fields and provide algorithmic solutions as well as prototypical implementations. In Music synchronization, for each position in one representation of a piece of music the corresponding position in another representation is calculated. This thesis focuses on the task of aligning scanned score pages of orchestral music with audio recordings. Here, a previously unconsidered piece of information is the textual specification of transposing instruments provided in the score. Our evaluations show that the neglect of such information can result in a measurable loss of synchronization accuracy. Therefore, we propose an OCR-based approach for detecting and interpreting the transposition information in orchestral scores. For a given audio snippet, audio matching methods automatically calculate all musically similar excerpts within a collection of audio recordings. In this context, subsequence dynamic time warping (SSDTW) is a well-established approach as it allows for local and global tempo variations between the query and the retrieved matches. Moving to real-life digital music libraries with larger audio collections, however, the quadratic runtime of SSDTW results in untenable response times. To improve on the response time, this thesis introduces a novel index-based approach to SSDTW-based audio matching. We combine the idea of inverted file lists introduced by Kurth and Müller (Efficient index-based audio matching, 2008) with the shingling techniques often used in the audio identification scenario. In pattern detection, all repeating patterns within one piece of music are determined. Usually, pattern detection operates on symbolic score documents and is often used in the context of computer-aided motivic analysis. Envisioned as a new feature of the PROBADO MUSIC system, this thesis proposes a string-based approach to pattern detection and a novel interactive front end for result visualization and analysis

    Performance Following: Real-Time Prediction of Musical Sequences Without a Score

    Get PDF
    (c)2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works

    Evaluation and combination of pitch estimation methods for melody extraction in symphonic classical music

    Get PDF
    The extraction of pitch information is arguably one of the most important tasks in automatic music description systems. However, previous research and evaluation datasets dealing with pitch estimation focused on relatively limited kinds of musical data. This work aims to broaden this scope by addressing symphonic western classical music recordings, focusing on pitch estimation for melody extraction. This material is characterised by a high number of overlapping sources, and by the fact that the melody may be played by different instrumental sections, often alternating within an excerpt. We evaluate the performance of eleven state-of-the-art pitch salience functions, multipitch estimation and melody extraction algorithms when determining the sequence of pitches corresponding to the main melody in a varied set of pieces. An important contribution of the present study is the proposed evaluation framework, including the annotation methodology, generated dataset and evaluation metrics. The results show that the assumptions made by certain methods hold better than others when dealing with this type of music signals, leading to a better performance. Additionally, we propose a simple method for combining the output of several algorithms, with promising results

    Wagner Ring Dataset: A Complex Opera Scenario for Music Processing and Computational Musicology

    Get PDF
    This paper introduces the Wagner Ring Dataset (WRD), a multi-modal and multi-version resource on the large-scale opera cycle Der Ring des Nibelungen by Richard Wagner. The Ring comprises four music dramas organized into eleven acts and 21 939 measures in total. Concerning sheet music, we processed a publicly available piano reduction (822 pages) of the full score with optical music recognition followed by extensive manual corrections to create a high-quality, machine-readable symbolic score. Concerning audio data, our corpus covers 16 recorded performances of the full Ring (three of which are publicly available thanks to copyright expiry), each lasting about 14–15 hours. To musically synchronize these versions among each other, we manually annotated all measure positions for three performances, which we transferred to the remaining performances via automated synchronization techniques. The dataset further comprises annotations of key and time signatures, scenes, and singing voice regions (libretto). Moreover, we provide note event annotations for all performances derived from the piano score. The WRD thus constitutes a comprehensive resource for developing algorithms for various music information retrieval tasks, complementing existing datasets with a complex opera scenario. For computational musicology, the WRD serves as a structured dataset that allows for studying the composition and performances of the Ring

    Alignment for Honesty

    Full text link
    Recent research has made significant strides in applying alignment techniques to enhance the helpfulness and harmlessness of large language models (LLMs) in accordance with human intentions. In this paper, we argue for the importance of alignment for honesty, ensuring that LLMs proactively refuse to answer questions when they lack knowledge, while still not being overly conservative. However, a pivotal aspect of alignment for honesty involves discerning the limits of an LLM's knowledge, which is far from straightforward. This challenge demands comprehensive solutions in terms of metric development, benchmark creation, and training methodologies. In this paper, we address these challenges by first establishing a precise problem definition and defining ``honesty'' inspired by the Analects of Confucius. This serves as a cornerstone for developing metrics that effectively measure an LLM's honesty by quantifying its progress post-alignment. Furthermore, we introduce a flexible training framework which is further instantiated by several efficient fine-tuning techniques that emphasize honesty without sacrificing performance on other tasks. Our extensive experiments reveal that these aligned models show a marked increase in honesty, as indicated by our proposed metrics. We open-source a wealth of resources to facilitate future research at https://github.com/GAIR-NLP/alignment-for-honesty, including honesty-aligned models, training and evaluation datasets for honesty alignment, concept glossary, as well as all relevant source code
    corecore