84 research outputs found
Non-Negative Group Sparsity with Subspace Note Modelling for Polyphonic Transcription
This work was supported by EPSRC Platform Grant EPSRC EP/K009559/1, EPSRC Grant EP/L027119/1, and EPSRC Grant EP/J010375/1
Deep Polyphonic ADSR Piano Note Transcription
We investigate a late-fusion approach to piano transcription, combined with a
strong temporal prior in the form of a handcrafted Hidden Markov Model (HMM).
The network architecture under consideration is compact in terms of its number
of parameters and easy to train with gradient descent. The network outputs are
fused over time in the final stage to obtain note segmentations, with an HMM
whose transition probabilities are chosen based on a model of attack, decay,
sustain, release (ADSR) envelopes, commonly used for sound synthesis. The note
segments are then subject to a final binary decision rule to reject too weak
note segment hypotheses. We obtain state-of-the-art results on the MAPS
dataset, and are able to outperform other approaches by a large margin, when
predicting complete note regions from onsets to offsets.Comment: 5 pages, 2 figures, published as ICASSP'1
Investigating the Perceptual Validity of Evaluation Metrics for Automatic Piano Music Transcription
Automatic Music Transcription (AMT) is usually evaluated using low-level criteria, typically by counting the numbers of errors, with equal weighting. Yet, some errors (e.g. out-of-key notes) are more salient than others. In this study, we design an online listening test to gather judgements about AMT quality. These judgements take the form of pairwise comparisons of transcriptions of the same music by pairs of different AMT systems. We investigate how these judgements correlate with benchmark metrics, and find that although they match in many cases, agreement drops when comparing pairs with similar scores, or pairs of poor transcriptions. We show that onset-only notewise F-measure is the benchmark metric that correlates best with human judgement, all the more so with higher onset tolerance thresholds. We define a set of features related to various musical attributes, and use them to design a new metric that correlates significantly better with listeners' quality judgements. We examine which musical aspects were important to raters by conducting an ablation study on the defined metric, highlighting the importance of the rhythmic dimension (tempo, meter). We make the collected data entirely available for further study, in particular to evaluate the perceptual relevance of new AMT metrics
Learning and Evaluation Methodologies for Polyphonic Music Sequence Prediction with LSTMs
Music language models (MLMs) play an important role for various music signal and symbolic music processing tasks, such as music generation, symbolic music classification, or automatic music transcription (AMT). In this paper, we investigate Long Short-Term Memory (LSTM) networks for polyphonic music prediction, in the form of binary piano rolls. A preliminary experiment, assessing the influence of the timestep of piano rolls on system performance, highlights the need for more musical evaluation metrics. We introduce a range of metrics, focusing on temporal and harmonic aspects. We propose to combine them into a parametrisable loss to train our network. We then conduct a range of experiments with this new loss, both for polyphonic music prediction (intrinsic evaluation) and using our predictive model as a language model for AMT (extrinsic evaluation). Intrinsic evaluation shows that tuning the behaviour of a model is possible by adjusting loss parameters, with consistent results across timesteps. Extrinsic evaluation shows consistent behaviour across timesteps in terms of precision and recall with respect to the loss parameters, leading to an improvement in AMT performance without changing the complexity of the model. In particular, we show that intrinsic performance (in terms of cross entropy) is not related to extrinsic performance, highlighting the importance of using custom training losses for each specific application. Our model also compares favourably with previously proposed MLMs
Recommended from our members
Monophonic Automatic Music Transcription With Convolutional Neural Networks
This thesis utilizes convolutional neural networks for monophonic automatic music transcription of piano music. We present three different systems utilizing CNNs to perform onset, pitch, and offset detection to get a final output of sheet music. Our TCN system is based on Bai et al.'s TCN architecture, and itachieved the best results due to having the best offset detection, and we were able to get fairly accurate sheet music from this system
- …