250 research outputs found

    Rethinking Recurrent Latent Variable Model for Music Composition

    Full text link
    We present a model for capturing musical features and creating novel sequences of music, called the Convolutional Variational Recurrent Neural Network. To generate sequential data, the model uses an encoder-decoder architecture with latent probabilistic connections to capture the hidden structure of music. Using the sequence-to-sequence model, our generative model can exploit samples from a prior distribution and generate a longer sequence of music. We compare the performance of our proposed model with other types of Neural Networks using the criteria of Information Rate that is implemented by Variable Markov Oracle, a method that allows statistical characterization of musical information dynamics and detection of motifs in a song. Our results suggest that the proposed model has a better statistical resemblance to the musical structure of the training data, which improves the creation of new sequences of music in the style of the originals.Comment: Published as a conference paper at IEEE MMSP 201

    Automatic transcription of polyphonic music exploiting temporal evolution

    Get PDF
    PhDAutomatic music transcription is the process of converting an audio recording into a symbolic representation using musical notation. It has numerous applications in music information retrieval, computational musicology, and the creation of interactive systems. Even for expert musicians, transcribing polyphonic pieces of music is not a trivial task, and while the problem of automatic pitch estimation for monophonic signals is considered to be solved, the creation of an automated system able to transcribe polyphonic music without setting restrictions on the degree of polyphony and the instrument type still remains open. In this thesis, research on automatic transcription is performed by explicitly incorporating information on the temporal evolution of sounds. First efforts address the problem by focusing on signal processing techniques and by proposing audio features utilising temporal characteristics. Techniques for note onset and offset detection are also utilised for improving transcription performance. Subsequent approaches propose transcription models based on shift-invariant probabilistic latent component analysis (SI-PLCA), modeling the temporal evolution of notes in a multiple-instrument case and supporting frequency modulations in produced notes. Datasets and annotations for transcription research have also been created during this work. Proposed systems have been privately as well as publicly evaluated within the Music Information Retrieval Evaluation eXchange (MIREX) framework. Proposed systems have been shown to outperform several state-of-the-art transcription approaches. Developed techniques have also been employed for other tasks related to music technology, such as for key modulation detection, temperament estimation, and automatic piano tutoring. Finally, proposed music transcription models have also been utilized in a wider context, namely for modeling acoustic scenes

    Polyphonic music generation using neural networks

    Get PDF
    In this project, the application of generative models for polyphonic music generation is investigated. Polyphonic music generation falls into the field of algorithmic composition, which is a field that aims to develop models to automate, partially or completely, the composition of musical pieces. This process has many challenges both in terms of how to achieve the generation of musical pieces that are enjoyable and also how to perform a robust evaluation of the model to guide improvements. An extensive survey of the development of the field and the state-of-the-art is carried out. From this, two distinct generative models were chosen to apply to the problem of polyphonic music generation. The models chosen were the Restricted Boltzmann Machine and the Generative Adversarial Network. In particular, for the GAN, two architectures were used, the Deep Convolutional GAN and the Wasserstein GAN with gradient penalty. To train these models, a dataset containing over 9000 samples of classical musical pieces was used. Using a piano-roll representation of the musical pieces, these were converted into binary 2D arrays in which the vertical dimensions related to the pitch while the horizontal dimension represented the time, and note events were represented by active units. The first 16 seconds of each piece was extracted and used for training the model after applying data cleansing and preprocessing. Using implementations of these models, samples of musical pieces were generated. Based on listening tests performed by participants, the Deep Convolutional GAN achieved the best scores, with its compositions being ranked on average 4.80 on a scale from 1-5 of how enjoyable the pieces were. To perform a more objective evaluation, different musical features that describe rhythmic and melodic characteristics were extracted from the generated pieces and compared against the training dataset. These features included the implementation of the Krumhansl-Schmuckler algorithm for musical key detection and the average information rate used as an estimator of long-term musical structure. Within each set of the generated musical samples, the pairwise cross-validation using the Euclidean distance between each feature was performed. This was also performed between each set of generated samples and the features extracted from the training data, resulting in two sets of distances, the intra-set and inter-set distances. Using kernel density estimation, the probability density functions of these are obtained. Finally, the Kullback-Liebler divergence between the intra-set and inter-set distance of each feature for each generative model was calculated. The lower divergence indicates that the distributions are more similar. On average, the Restricted Boltzmann Machine obtained the lowest Kullback-Liebler divergences

    AI Methods in Algorithmic Composition: A Comprehensive Survey

    Get PDF
    Algorithmic composition is the partial or total automation of the process of music composition by using computers. Since the 1950s, different computational techniques related to Artificial Intelligence have been used for algorithmic composition, including grammatical representations, probabilistic methods, neural networks, symbolic rule-based systems, constraint programming and evolutionary algorithms. This survey aims to be a comprehensive account of research on algorithmic composition, presenting a thorough view of the field for researchers in Artificial Intelligence.This study was partially supported by a grant for the MELOMICS project (IPT-300000-2010-010) from the Spanish Ministerio de Ciencia e InnovaciĂłn, and a grant for the CAUCE project (TSI-090302-2011-8) from the Spanish Ministerio de Industria, Turismo y Comercio. The first author was supported by a grant for the GENEX project (P09-TIC- 5123) from the ConsejerĂ­a de InnovaciĂłn y Ciencia de AndalucĂ­a

    An End-to-End Neural Network for Polyphonic Piano Music Transcription

    Get PDF
    We present a supervised neural network model for polyphonic piano music transcription. The architecture of the proposed model is analogous to speech recognition systems and comprises an acoustic model and a music language model. The acoustic model is a neural network used for estimating the probabilities of pitches in a frame of audio. The language model is a recurrent neural network that models the correlations between pitch combinations over time. The proposed model is general and can be used to transcribe polyphonic music without imposing any constraints on the polyphony. The acoustic and language model predictions are combined using a probabilistic graphical model. Inference over the output variables is performed using the beam search algorithm. We perform two sets of experiments. We investigate various neural network architectures for the acoustic models and also investigate the effect of combining acoustic and music language model predictions using the proposed architecture. We compare performance of the neural network based acoustic models with two popular unsupervised acoustic models. Results show that convolutional neural network acoustic models yields the best performance across all evaluation metrics. We also observe improved performance with the application of the music language models. Finally, we present an efficient variant of beam search that improves performance and reduces run-times by an order of magnitude, making the model suitable for real-time applications
    • …
    corecore