19 research outputs found
Recommended from our members
Sequence Classification Restricted Boltzmann Machines With Gated Units
For the classification of sequential data, dynamic Bayesian networks and recurrent neural networks (RNNs) are the preferred models. While the former can explicitly model the temporal dependences between the variables, and the latter have the capability of learning representations. The recurrent temporal restricted Boltzmann machine (RTRBM) is a model that combines these two features. However, learning and inference in RTRBMs can be difficult because of the exponential nature of its gradient computations when maximizing log likelihoods. In this article, first, we address this intractability by optimizing a conditional rather than a joint probability distribution when performing sequence classification. This results in the ``sequence classification restricted Boltzmann machine'' (SCRBM). Second, we introduce gated SCRBMs (gSCRBMs), which use an information processing gate, as an integration of SCRBMs with long short-term memory (LSTM) models. In the experiments reported in this article, we evaluate the proposed models on optical character recognition, chunking, and multiresident activity recognition in smart homes. The experimental results show that gSCRBMs achieve the performance comparable to that of the state of the art in all three tasks. gSCRBMs require far fewer parameters in comparison with other recurrent networks with memory gates, in particular, LSTMs and gated recurrent units (GRUs)
Emotion-Guided Music Accompaniment Generation Based on Variational Autoencoder
Music accompaniment generation is a crucial aspect in the composition
process. Deep neural networks have made significant strides in this field, but
it remains a challenge for AI to effectively incorporate human emotions to
create beautiful accompaniments. Existing models struggle to effectively
characterize human emotions within neural network models while composing music.
To address this issue, we propose the use of an easy-to-represent emotion flow
model, the Valence/Arousal Curve, which allows for the compatibility of
emotional information within the model through data transformation and enhances
interpretability of emotional factors by utilizing a Variational Autoencoder as
the model structure. Further, we used relative self-attention to maintain the
structure of the music at music phrase level and to generate a richer
accompaniment when combined with the rules of music theory.Comment: Accepted By International Joint Conference on Neural Networks
2023(IJCNN2023
Recommended from our members
An RNN-based Music Language Model for Improving Automatic Music Transcription
In this paper, we investigate the use of Music Language Models (MLMs) for improving Automatic Music Transcription performance. The MLMs are trained on sequences of symbolic polyphonic music from the Nottingham dataset. We train Recurrent Neural Network (RNN)-based models, as they are capable of capturing complex temporal structure present in symbolic music data. Similar to the function of language models in automatic speech recognition, we use the MLMs to generate a prior probability for the occurrence of a sequence. The acoustic AMT model is based on probabilistic latent component analysis, and prior information from the MLM is incorporated into the transcription framework using Dirichlet priors. We test our hybrid models on a dataset of multiple-instrument polyphonic music and report a significant 3% improvement in terms of F-measure, when compared to using an acoustic-only model
Imposing Higher-Level Structure in Polyphonic Music Generation Using Convolutional Restricted Boltzmann Machines and Constraints
We introduce a method for imposing higher-level structure on generated, polyphonic music. A Convolutional Restricted Boltzmann Machine (C-RBM) as a generative model is combined with gradient des- cent constraint optimisation to provide further control over the generation process. Among other things, this allows for the use of a ātemplateā piece, from which some structural properties can be extracted, and transferred as constraints to the newly generated material. The sampling pro- cess is guided with Simulated Annealing to avoid local optima, and to find solutions that both satisfy the constraints, and are relatively stable with respect to the C-RBM. Results show that with this approach it is possible to control the higher-level self-similarity structure, the meter, and the tonal properties of the resulting musical piece, while preserving its local musical coherence
Recommended from our members
Neural ProbabilisticModels for Melody Prediction, Sequence Labelling and Classification
Data-driven sequence models have long played a role in the analysis and generation of musical information. Such models are of interest in computational musicology, computer-aided music composition, and tools for music education among other applications. This dissertation beginswith an experiment tomodel sequences of musical pitch in melodies with a class of purely data-driven predictive models collectively known as Connectionist models. It was demonstrated that a set of six such models could performon par with, or better than state-of-the-art n-gram models previously evaluated in an identical setting. A new model known as the Recurrent
Temporal Discriminative Restricted Boltzmann Machine (RTDRBM), was introduced in the process and found to outperform the rest of the models. A generalisation
of this modelling task was also explored, and involved extending the set of musical features used as input by the models while still predicting pitch as before. The improvement in predictive performance which resulted from adding these new input features is encouraging for future work in this direction.
Based on the above success of the RTDRBM, its application was extended to a non-musical sequence labelling task, namely Optical Character Recognition. This extension involved a modification to the modelās original prediction algorithm as a result of relaxing an assumption specific to the melody modelling task. The generalised model was evaluated on a benchmark dataset and compared against a set of 8 baseline models where it faired better than all of them. Furthermore, a theoretical extension to an existingmodel which was also employed in the above pitch prediction task - the Discriminative Restricted Boltzmann Machine (DRBM) - was
proposed. This led to three new variants of the DRBM (which originally contained Logistic Sigmoid hidden layer activations), withHyperbolic Tangent, Binomial and
Rectified Linear hidden layer activations respectively. The first two of these have been evaluated here on the benchmark MNIST dataset and shown to perform on par with the original DRBM
NBP 2.0: Updated Next Bar Predictor, an Improved Algorithmic Music Generator
Deep neural network advancements have enabled machines to produce melodies emulating human-composed music. However, the implementation of such machines is costly in terms of resources. In this paper, we present NBP 2.0, a refinement of the previous model next bar predictor (NBP) with two notable improvements: first, transforming each training instance to anchor all the notes to its musical scale, and second, changing the model architecture itself. NBP 2.0 maintained its straightforward and lightweight implementation, which is an advantage over the baseline models. Improvements were assessed using quantitative and qualitative metrics and, based on the results, the improvements from these changes made are notable