3,032 research outputs found
A Hybrid Recurrent Neural Network For Music Transcription
We investigate the problem of incorporating higher-level symbolic score-like information into Automatic Music Transcription (AMT) systems to improve their performance. We use recurrent neural networks (RNNs) and their variants as music language models (MLMs) and present a generative architecture for combining these models with predictions from a frame level acoustic classifier. We also compare different neural network architectures for acoustic modeling. The proposed model computes a distribution over possible output sequences given the acoustic input signal and we present an algorithm for performing a global search for good candidate transcriptions. The performance of the proposed model is evaluated on piano music from the MAPS dataset and we observe that the proposed model consistently outperforms existing transcription methods
Recommended from our members
An RNN-based Music Language Model for Improving Automatic Music Transcription
In this paper, we investigate the use of Music Language Models (MLMs) for improving Automatic Music Transcription performance. The MLMs are trained on sequences of symbolic polyphonic music from the Nottingham dataset. We train Recurrent Neural Network (RNN)-based models, as they are capable of capturing complex temporal structure present in symbolic music data. Similar to the function of language models in automatic speech recognition, we use the MLMs to generate a prior probability for the occurrence of a sequence. The acoustic AMT model is based on probabilistic latent component analysis, and prior information from the MLM is incorporated into the transcription framework using Dirichlet priors. We test our hybrid models on a dataset of multiple-instrument polyphonic music and report a significant 3% improvement in terms of F-measure, when compared to using an acoustic-only model
High-dimensional sequence transduction
We investigate the problem of transforming an input sequence into a
high-dimensional output sequence in order to transcribe polyphonic audio music
into symbolic notation. We introduce a probabilistic model based on a recurrent
neural network that is able to learn realistic output distributions given the
input and we devise an efficient algorithm to search for the global mode of
that distribution. The resulting method produces musically plausible
transcriptions even under high levels of noise and drastically outperforms
previous state-of-the-art approaches on five datasets of synthesized sounds and
real recordings, approximately halving the test error rate
Deep Complex Networks
At present, the vast majority of building blocks, techniques, and
architectures for deep learning are based on real-valued operations and
representations. However, recent work on recurrent neural networks and older
fundamental theoretical analysis suggests that complex numbers could have a
richer representational capacity and could also facilitate noise-robust memory
retrieval mechanisms. Despite their attractive properties and potential for
opening up entirely new neural architectures, complex-valued deep neural
networks have been marginalized due to the absence of the building blocks
required to design such models. In this work, we provide the key atomic
components for complex-valued deep neural networks and apply them to
convolutional feed-forward networks and convolutional LSTMs. More precisely, we
rely on complex convolutions and present algorithms for complex
batch-normalization, complex weight initialization strategies for
complex-valued neural nets and we use them in experiments with end-to-end
training schemes. We demonstrate that such complex-valued models are
competitive with their real-valued counterparts. We test deep complex models on
several computer vision tasks, on music transcription using the MusicNet
dataset and on Speech Spectrum Prediction using the TIMIT dataset. We achieve
state-of-the-art performance on these audio-related tasks
- …