4,150 research outputs found
The Effect of Explicit Structure Encoding of Deep Neural Networks for Symbolic Music Generation
With recent breakthroughs in artificial neural networks, deep generative
models have become one of the leading techniques for computational creativity.
Despite very promising progress on image and short sequence generation,
symbolic music generation remains a challenging problem since the structure of
compositions are usually complicated. In this study, we attempt to solve the
melody generation problem constrained by the given chord progression. This
music meta-creation problem can also be incorporated into a plan recognition
system with user inputs and predictive structural outputs. In particular, we
explore the effect of explicit architectural encoding of musical structure via
comparing two sequential generative models: LSTM (a type of RNN) and WaveNet
(dilated temporal-CNN). As far as we know, this is the first study of applying
WaveNet to symbolic music generation, as well as the first systematic
comparison between temporal-CNN and RNN for music generation. We conduct a
survey for evaluation in our generations and implemented Variable Markov Oracle
in music pattern discovery. Experimental results show that to encode structure
more explicitly using a stack of dilated convolution layers improved the
performance significantly, and a global encoding of underlying chord
progression into the generation procedure gains even more.Comment: 8 pages, 13 figure
Deep Learning Techniques for Music Generation -- A Survey
This paper is a survey and an analysis of different ways of using deep
learning (deep artificial neural networks) to generate musical content. We
propose a methodology based on five dimensions for our analysis:
Objective - What musical content is to be generated? Examples are: melody,
polyphony, accompaniment or counterpoint. - For what destination and for what
use? To be performed by a human(s) (in the case of a musical score), or by a
machine (in the case of an audio file).
Representation - What are the concepts to be manipulated? Examples are:
waveform, spectrogram, note, chord, meter and beat. - What format is to be
used? Examples are: MIDI, piano roll or text. - How will the representation be
encoded? Examples are: scalar, one-hot or many-hot.
Architecture - What type(s) of deep neural network is (are) to be used?
Examples are: feedforward network, recurrent network, autoencoder or generative
adversarial networks.
Challenge - What are the limitations and open challenges? Examples are:
variability, interactivity and creativity.
Strategy - How do we model and control the process of generation? Examples
are: single-step feedforward, iterative feedforward, sampling or input
manipulation.
For each dimension, we conduct a comparative analysis of various models and
techniques and we propose some tentative multidimensional typology. This
typology is bottom-up, based on the analysis of many existing deep-learning
based systems for music generation selected from the relevant literature. These
systems are described and are used to exemplify the various choices of
objective, representation, architecture, challenge and strategy. The last
section includes some discussion and some prospects.Comment: 209 pages. This paper is a simplified version of the book: J.-P.
Briot, G. Hadjeres and F.-D. Pachet, Deep Learning Techniques for Music
Generation, Computational Synthesis and Creative Systems, Springer, 201
Rethinking Recurrent Latent Variable Model for Music Composition
We present a model for capturing musical features and creating novel
sequences of music, called the Convolutional Variational Recurrent Neural
Network. To generate sequential data, the model uses an encoder-decoder
architecture with latent probabilistic connections to capture the hidden
structure of music. Using the sequence-to-sequence model, our generative model
can exploit samples from a prior distribution and generate a longer sequence of
music. We compare the performance of our proposed model with other types of
Neural Networks using the criteria of Information Rate that is implemented by
Variable Markov Oracle, a method that allows statistical characterization of
musical information dynamics and detection of motifs in a song. Our results
suggest that the proposed model has a better statistical resemblance to the
musical structure of the training data, which improves the creation of new
sequences of music in the style of the originals.Comment: Published as a conference paper at IEEE MMSP 201
Linear Memory Networks
Recurrent neural networks can learn complex transduction problems that
require maintaining and actively exploiting a memory of their inputs. Such
models traditionally consider memory and input-output functionalities
indissolubly entangled. We introduce a novel recurrent architecture based on
the conceptual separation between the functional input-output transformation
and the memory mechanism, showing how they can be implemented through different
neural components. By building on such conceptualization, we introduce the
Linear Memory Network, a recurrent model comprising a feedforward neural
network, realizing the non-linear functional transformation, and a linear
autoencoder for sequences, implementing the memory component. The resulting
architecture can be efficiently trained by building on closed-form solutions to
linear optimization problems. Further, by exploiting equivalence results
between feedforward and recurrent neural networks we devise a pretraining
schema for the proposed architecture. Experiments on polyphonic music datasets
show competitive results against gated recurrent networks and other state of
the art models
Continuous Melody Generation via Disentangled Short-Term Representations and Structural Conditions
Automatic music generation is an interdisciplinary research topic that
combines computational creativity and semantic analysis of music to create
automatic machine improvisations. An important property of such a system is
allowing the user to specify conditions and desired properties of the generated
music. In this paper we designed a model for composing melodies given a user
specified symbolic scenario combined with a previous music context. We add
manual labeled vectors denoting external music quality in terms of chord
function that provides a low dimensional representation of the harmonic tension
and resolution. Our model is capable of generating long melodies by regarding
8-beat note sequences as basic units, and shares consistent rhythm pattern
structure with another specific song. The model contains two stages and
requires separate training where the first stage adopts a Conditional
Variational Autoencoder (C-VAE) to build a bijection between note sequences and
their latent representations, and the second stage adopts long short-term
memory networks (LSTM) with structural conditions to continue writing future
melodies. We further exploit the disentanglement technique via C-VAE to allow
melody generation based on pitch contour information separately from
conditioning on rhythm patterns. Finally, we evaluate the proposed model using
quantitative analysis of rhythm and the subjective listening study. Results
show that the music generated by our model tends to have salient repetition
structures, rich motives, and stable rhythm patterns. The ability to generate
longer and more structural phrases from disentangled representations combined
with semantic scenario specification conditions shows a broad application of
our model.Comment: 9 pages, 12 figures, 4 tables. in 14th international conference on
semantic computing, ICSC 202
- …