90 research outputs found
Statistical Piano Reduction Controlling Performance Difficulty
We present a statistical-modelling method for piano reduction, i.e.
converting an ensemble score into piano scores, that can control performance
difficulty. While previous studies have focused on describing the condition for
playable piano scores, it depends on player's skill and can change continuously
with the tempo. We thus computationally quantify performance difficulty as well
as musical fidelity to the original score, and formulate the problem as
optimization of musical fidelity under constraints on difficulty values. First,
performance difficulty measures are developed by means of probabilistic
generative models for piano scores and the relation to the rate of performance
errors is studied. Second, to describe musical fidelity, we construct a
probabilistic model integrating a prior piano-score model and a model
representing how ensemble scores are likely to be edited. An iterative
optimization algorithm for piano reduction is developed based on statistical
inference of the model. We confirm the effect of the iterative procedure; we
find that subjective difficulty and musical fidelity monotonically increase
with controlled difficulty values; and we show that incorporating sequential
dependence of pitches and fingering motion in the piano-score model improves
the quality of reduction scores in high-difficulty cases.Comment: 12 pages, 7 figures, version accepted to APSIPA Transactions on
Signal and Information Processin
Automatic Chord Estimation Based on a Frame-wise Convolutional Recurrent Neural Network with Non-Aligned Annotations
International audienceThis paper describes a weakly-supervised approach to Automatic Chord Estimation (ACE) task that aims to estimate a sequence of chords from a given music audio signal at the frame level, under a realistic condition that only non-aligned chord annotations are available. In conventional studies assuming the availability of time-aligned chord annotations, Deep Neural Networks (DNNs) that learn frame-wise mappings from acoustic features to chords have attained excellent performance. The major drawback of such frame-wise models is that they cannot be trained without the time alignment information. Inspired by a common approach in automatic speech recognition based on non-aligned speech transcriptions, we propose a two-step method that trains a Hidden Markov Model (HMM) for the forced alignment between chord annotations and music signals, and then trains a powerful frame-wise DNN model for ACE. Experimental results show that although the frame-level accuracy of the forced alignment was just under 90%, the performance of the proposed method was degraded only slightly from that of the DNN model trained by using the ground-truth alignment data. Furthermore, using a sufficient amount of easily collected non-aligned data, the proposed method is able to reach or even outperform the conventional methods based on ground-truth time-aligned annotations
Tracking the Evolution of a Band's Live Performances over Decades
International Society for Music Information Retrieval Conference (ISMIR 2022) , Bengaluru, India, December 4-8, 2022Evolutionary studies have become a dominant thread in the analysis of large audio collections. Such corpora usually consist of musical pieces by various composers or bands and the studies usually focus on identifying general historical trends in harmonic content or music production techniques. In this paper we present a comparable study that examines the music of a single band whose publicly available live recordings span three decades. We first discuss the opportunities and challenges faced when working with single-artist and live-music datasets and introduce solutions for audio feature validation and outlier detection. We then investigate how individual songs vary over time and identify general performance trends using a new approach based on relative feature values, which improves accuracy for features with a large variance. Finally, we validate our findings by juxtaposing them with descriptions posted in online forums by experienced listeners of the band's large following
End-to-End Lyrics Transcription Informed by Pitch and Onset Estimation
International Society for Music Information Retrieval Conference (ISMIR 2022) , Bengaluru, India, December 4-8, 2022This paper presents an automatic lyrics transcription (ALT) method for music recordings that leverages the framewise semitone-level sung pitches estimated in a multi-task learning framework. Compared to automatic speech recognition (ASR), ALT is challenging due to the insufficiency of training data and the variation and contamination of acoustic features caused by singing expressions and accompaniment sounds. The domain adaptation approach has thus recently been taken for updating an ASR model pre-trained from sufficient speech data. In the naive application of the end-to-end approach to ALT, the internal audio-to-lyrics alignment often fails due to the time-stretching nature of singing features. To stabilize the alignment, we make use of the semi-synchronous relationships between notes and characters. Specifically, a convolutional recurrent neural network (CRNN) is used for estimating the semitone-level pitches with note onset times while eliminating the intra- and inter-note pitch variations. This estimate helps an end-to-end ALT model based on connectionist temporal classification (CTC) learn correct audio-to-character alignment and mapping, where the ALT model is trained jointly with the pitch and onset estimation model. The experimental results show the usefulness of the pitch and onset information in ALT
Rhythm Transcription of Polyphonic MIDI Performances Based on a Merged-output HMM for Multiple Voices
(Abstract to follow
- …