250 research outputs found
Rethinking Recurrent Latent Variable Model for Music Composition
We present a model for capturing musical features and creating novel
sequences of music, called the Convolutional Variational Recurrent Neural
Network. To generate sequential data, the model uses an encoder-decoder
architecture with latent probabilistic connections to capture the hidden
structure of music. Using the sequence-to-sequence model, our generative model
can exploit samples from a prior distribution and generate a longer sequence of
music. We compare the performance of our proposed model with other types of
Neural Networks using the criteria of Information Rate that is implemented by
Variable Markov Oracle, a method that allows statistical characterization of
musical information dynamics and detection of motifs in a song. Our results
suggest that the proposed model has a better statistical resemblance to the
musical structure of the training data, which improves the creation of new
sequences of music in the style of the originals.Comment: Published as a conference paper at IEEE MMSP 201
Recommended from our members
Detection and classification of acoustic scenes and events: an IEEE AASP challenge
Recommended from our members
Detection and Classification of Acoustic Scenes and Events
For intelligent systems to make best use of the audio modality, it is important that they can recognize not just speech and music, which have been researched as specific tasks, but also general sounds in everyday environments. To stimulate research in this field we conducted a public research challenge: the IEEE Audio and Acoustic Signal Processing Technical Committee challenge on Detection and Classification of Acoustic Scenes and Events (DCASE). In this paper, we report on the state of the art in automatically classifying audio scenes, and automatically detecting and classifying audio events. We survey prior work as well as the state of the art represented by the submissions to the challenge from various research groups. We also provide detail on the organization of the challenge, so that our experience as challenge hosts may be useful to those organizing challenges in similar domains. We created new audio datasets and baseline systems for the challenge; these, as well as some submitted systems, are publicly available under open licenses, to serve as benchmarks for further research in general-purpose machine listening
Automatic transcription of polyphonic music exploiting temporal evolution
PhDAutomatic music transcription is the process of converting an audio recording
into a symbolic representation using musical notation. It has numerous applications
in music information retrieval, computational musicology, and the
creation of interactive systems. Even for expert musicians, transcribing polyphonic
pieces of music is not a trivial task, and while the problem of automatic
pitch estimation for monophonic signals is considered to be solved, the creation
of an automated system able to transcribe polyphonic music without setting
restrictions on the degree of polyphony and the instrument type still remains
open.
In this thesis, research on automatic transcription is performed by explicitly
incorporating information on the temporal evolution of sounds. First efforts address
the problem by focusing on signal processing techniques and by proposing
audio features utilising temporal characteristics. Techniques for note onset and
offset detection are also utilised for improving transcription performance. Subsequent
approaches propose transcription models based on shift-invariant probabilistic
latent component analysis (SI-PLCA), modeling the temporal evolution
of notes in a multiple-instrument case and supporting frequency modulations in
produced notes. Datasets and annotations for transcription research have also
been created during this work. Proposed systems have been privately as well as
publicly evaluated within the Music Information Retrieval Evaluation eXchange
(MIREX) framework. Proposed systems have been shown to outperform several
state-of-the-art transcription approaches.
Developed techniques have also been employed for other tasks related to music
technology, such as for key modulation detection, temperament estimation,
and automatic piano tutoring. Finally, proposed music transcription models
have also been utilized in a wider context, namely for modeling acoustic scenes
Polyphonic music generation using neural networks
In this project, the application of generative models for polyphonic music generation is investigated. Polyphonic music generation falls into the field of algorithmic composition, which is a field that aims to develop models to automate, partially or completely, the composition of musical pieces. This process has many challenges both in terms of how to achieve the generation of musical pieces that are enjoyable and also how to perform a robust evaluation of the model to guide improvements. An extensive survey of the development of the field and the state-of-the-art is carried out. From this, two distinct generative models were chosen to apply to the problem of polyphonic music generation. The models chosen were the Restricted Boltzmann Machine and the Generative Adversarial Network. In particular, for the GAN, two architectures were used, the Deep Convolutional GAN and the Wasserstein GAN with gradient penalty. To train these models, a dataset containing over 9000 samples of classical musical pieces was used. Using a piano-roll representation of the musical pieces, these were converted into binary 2D arrays in which the vertical dimensions related to the pitch while the horizontal dimension represented the time, and note events were represented by active units. The first 16 seconds of each piece was extracted and used for training the model after applying data cleansing and preprocessing. Using implementations of these models, samples of musical pieces were generated. Based on listening tests performed by participants, the Deep Convolutional GAN achieved the best scores, with its compositions being ranked on average 4.80 on a scale from 1-5 of how enjoyable the pieces were. To perform a more objective evaluation, different musical features that describe rhythmic and melodic characteristics were extracted from the generated pieces and compared against the training dataset. These features included the implementation of the Krumhansl-Schmuckler algorithm for musical key detection and the average information rate used as an estimator of long-term musical structure. Within each set of the generated musical samples, the pairwise cross-validation using the Euclidean distance between each feature was performed. This was also performed between each set of generated samples and the features extracted from the training data, resulting in two sets of distances, the intra-set and inter-set distances. Using kernel density estimation, the probability density functions of these are obtained. Finally, the Kullback-Liebler divergence between the intra-set and inter-set distance of each feature for each generative model was calculated. The lower divergence indicates that the distributions are more similar. On average, the Restricted Boltzmann Machine obtained the lowest Kullback-Liebler divergences
AI Methods in Algorithmic Composition: A Comprehensive Survey
Algorithmic composition is the partial or total automation of the process of music composition
by using computers. Since the 1950s, different computational techniques related to
Artificial Intelligence have been used for algorithmic composition, including grammatical
representations, probabilistic methods, neural networks, symbolic rule-based systems, constraint
programming and evolutionary algorithms. This survey aims to be a comprehensive
account of research on algorithmic composition, presenting a thorough view of the field for
researchers in Artificial Intelligence.This study was partially supported by a grant for the MELOMICS project
(IPT-300000-2010-010) from the Spanish Ministerio de Ciencia e InnovaciĂłn, and a grant for
the CAUCE project (TSI-090302-2011-8) from the Spanish Ministerio de Industria, Turismo
y Comercio. The first author was supported by a grant for the GENEX project (P09-TIC-
5123) from the ConsejerĂa de InnovaciĂłn y Ciencia de AndalucĂa
An End-to-End Neural Network for Polyphonic Piano Music Transcription
We present a supervised neural network model for polyphonic piano music
transcription. The architecture of the proposed model is analogous to speech
recognition systems and comprises an acoustic model and a music language model.
The acoustic model is a neural network used for estimating the probabilities of
pitches in a frame of audio. The language model is a recurrent neural network
that models the correlations between pitch combinations over time. The proposed
model is general and can be used to transcribe polyphonic music without
imposing any constraints on the polyphony. The acoustic and language model
predictions are combined using a probabilistic graphical model. Inference over
the output variables is performed using the beam search algorithm. We perform
two sets of experiments. We investigate various neural network architectures
for the acoustic models and also investigate the effect of combining acoustic
and music language model predictions using the proposed architecture. We
compare performance of the neural network based acoustic models with two
popular unsupervised acoustic models. Results show that convolutional neural
network acoustic models yields the best performance across all evaluation
metrics. We also observe improved performance with the application of the music
language models. Finally, we present an efficient variant of beam search that
improves performance and reduces run-times by an order of magnitude, making the
model suitable for real-time applications
- …