45,671 research outputs found
Transcription of piano music with deep learning
Transcription of music is a complex process of transcribing an audio recording into a symbolic notation. The goal of this thesis was to examine transcription of piano music with deep learning, for which three models of deep neural networks were implemented: multilayer perceptron, convolutional neural network and deep belief network. Through the use of deep belief network, unsupervised pretraining for automatic extraction of musical features from audio signals was also tested. Learning of these models and evaluation of transcription was performed with MAPS database for piano music transcription. A comparison between Fast Fourier Transform and Constant Q Transform for data pre-processing was also carried out. Final results show that deep learning with an appropriate learning schedule is potentially a powerful tool for automatic transcription of music
Transcription of piano music with deep learning
Transcription of music is a complex process of transcribing an audio recording into a symbolic notation. The goal of this thesis was to examine transcription of piano music with deep learning, for which three models of deep neural networks were implemented: multilayer perceptron, convolutional neural network and deep belief network. Through the use of deep belief network, unsupervised pretraining for automatic extraction of musical features from audio signals was also tested. Learning of these models and evaluation of transcription was performed with MAPS database for piano music transcription. A comparison between Fast Fourier Transform and Constant Q Transform for data pre-processing was also carried out. Final results show that deep learning with an appropriate learning schedule is potentially a powerful tool for automatic transcription of music
Transcription of piano music with deep learning
Transcription of music is a complex process of transcribing an audio recording into a symbolic notation. The goal of this thesis was to examine transcription of piano music with deep learning, for which three models of deep neural networks were implemented: multilayer perceptron, convolutional neural network and deep belief network. Through the use of deep belief network, unsupervised pretraining for automatic extraction of musical features from audio signals was also tested. Learning of these models and evaluation of transcription was performed with MAPS database for piano music transcription. A comparison between Fast Fourier Transform and Constant Q Transform for data pre-processing was also carried out. Final results show that deep learning with an appropriate learning schedule is potentially a powerful tool for automatic transcription of music
Audio-based music classification with a pretrained convolutional network
Recently the ‘Million Song Dataset’, containing audio features and metadata for one million songs, was made available. In this paper, we build a convolutional network that is then trained to perform artist recognition, genre recognition and key detection. The network is tailored to summarize the audio features over musically significant timescales. It is infeasible to train the network on all available data in a supervised fashion, so we use unsupervised pretraining to be able to harness the entire dataset: we train a convolutional deep belief network on all data, and then use the learnt parameters to initialize a convolutional multilayer perceptron with the same architecture. The MLP is then trained on a labeled subset of the data for each task. We also train the same MLP with randomly initialized weights. We find that our convolutional approach improves accuracy for the genre recognition and artist recognition tasks. Unsupervised pretraining improves convergence speed in all cases. For artist recognition it improves accuracy as well
Multiscale approaches to music audio feature learning
Content-based music information retrieval tasks are typically solved with a two-stage approach: features are extracted from music audio signals, and are then used as input to a regressor or classifier. These features can be engineered or learned from data. Although the former approach was dominant in the past, feature learning has started to receive more attention from the MIR community in recent years. Recent results in feature learning indicate that simple algorithms such as K-means can be very effective, sometimes surpassing more complicated approaches based on restricted Boltzmann machines, autoencoders or sparse coding. Furthermore, there has been increased interest in multiscale representations of music audio recently. Such representations are more versatile because music audio exhibits structure on multiple timescales, which are relevant for different MIR tasks to varying degrees. We develop and compare three approaches to multiscale audio feature learning using the spherical K-means algorithm. We evaluate them in an automatic tagging task and a similarity metric learning task on the Magnatagatune dataset
A Deep Representation for Invariance And Music Classification
Representations in the auditory cortex might be based on mechanisms similar
to the visual ventral stream; modules for building invariance to
transformations and multiple layers for compositionality and selectivity. In
this paper we propose the use of such computational modules for extracting
invariant and discriminative audio representations. Building on a theory of
invariance in hierarchical architectures, we propose a novel, mid-level
representation for acoustical signals, using the empirical distributions of
projections on a set of templates and their transformations. Under the
assumption that, by construction, this dictionary of templates is composed from
similar classes, and samples the orbit of variance-inducing signal
transformations (such as shift and scale), the resulting signature is
theoretically guaranteed to be unique, invariant to transformations and stable
to deformations. Modules of projection and pooling can then constitute layers
of deep networks, for learning composite representations. We present the main
theoretical and computational aspects of a framework for unsupervised learning
of invariant audio representations, empirically evaluated on music genre
classification.Comment: 5 pages, CBMM Memo No. 002, (to appear) IEEE 2014 International
Conference on Acoustics, Speech, and Signal Processing (ICASSP 2014
Deep Learning and Music Adversaries
OA Monitor ExerciseOA Monitor ExerciseAn {\em adversary} is essentially an algorithm intent on making a classification system perform in some particular way given an input, e.g., increase the probability of a false negative. Recent work builds adversaries for deep learning systems applied to image object recognition, which exploits the parameters of the system to find the minimal perturbation of the input image such that the network misclassifies it with high confidence. We adapt this approach to construct and deploy an adversary of deep learning systems applied to music content analysis. In our case, however, the input to the systems is magnitude spectral frames, which requires special care in order to produce valid input audio signals from network-derived perturbations. For two different train-test partitionings of two benchmark datasets, and two different deep architectures, we find that this adversary is very effective in defeating the resulting systems. We find the convolutional networks are more robust, however, compared with systems based on a majority vote over individually classified audio frames. Furthermore, we integrate the adversary into the training of new deep systems, but do not find that this improves their resilience against the same adversary
- …