5 research outputs found
Coupled Recurrent Models for Polyphonic Music Composition
This paper introduces a novel recurrent model for music composition that is
tailored to the structure of polyphonic music. We propose an efficient new
conditional probabilistic factorization of musical scores, viewing a score as a
collection of concurrent, coupled sequences: i.e. voices. To model the
conditional distributions, we borrow ideas from both convolutional and
recurrent neural models; we argue that these ideas are natural for capturing
music's pitch invariances, temporal structure, and polyphony. We train models
for single-voice and multi-voice composition on 2,300 scores from the
KernScores dataset.Comment: 13 pages; long version of the paper appearing in ISMIR 201
Modelling Symbolic Music: Beyond the Piano Roll
In this paper, we consider the problem of probabilistically modelling
symbolic music data. We introduce a representation which reduces polyphonic
music to a univariate categorical sequence. In this way, we are able to apply
state of the art natural language processing techniques, namely the long
short-term memory sequence model. The representation we employ permits
arbitrary rhythmic structure, which we assume to be given. We show that our
model is effective on four out of four piano roll based benchmark datasets. We
further improve our model by augmenting our training data set with
transpositions of the original pieces through all musical keys, thereby
convincingly advancing the state of the art on these benchmark problems. We
also fit models to music which is unconstrained in its rhythmic structure,
discuss the properties of this model, and provide musical samples which are
more sophisticated than previously possible with this class of recurrent neural
network sequence models. We also provide our newly preprocessed data set of non
piano-roll music data
The NES Music Database: A multi-instrumental dataset with expressive performance attributes
Existing research on music generation focuses on composition, but often
ignores the expressive performance characteristics required for plausible
renditions of resultant pieces. In this paper, we introduce the Nintendo
Entertainment System Music Database (NES-MDB), a large corpus allowing for
separate examination of the tasks of composition and performance. NES-MDB
contains thousands of multi-instrumental songs composed for playback by the
compositionally-constrained NES audio synthesizer. For each song, the dataset
contains a musical score for four instrument voices as well as expressive
attributes for the dynamics and timbre of each voice. Unlike datasets comprised
of General MIDI files, NES-MDB includes all of the information needed to render
exact acoustic performances of the original compositions. Alongside the
dataset, we provide a tool that renders generated compositions as NES-style
audio by emulating the device's audio processor. Additionally, we establish
baselines for the tasks of composition, which consists of learning the
semantics of composing for the NES synthesizer, and performance, which involves
finding a mapping between a composition and realistic expressive attributes.Comment: Published as a conference paper at ISMIR 201
Recurrent Neural Network from Adder's Perspective: Carry-lookahead RNN
The recurrent network architecture is a widely used model in sequence
modeling, but its serial dependency hinders the computation parallelization,
which makes the operation inefficient. The same problem was encountered in
serial adder at the early stage of digital electronics. In this paper, we
discuss the similarities between recurrent neural network (RNN) and serial
adder. Inspired by carry-lookahead adder, we introduce carry-lookahead module
to RNN, which makes it possible for RNN to run in parallel. Then, we design the
method of parallel RNN computation, and finally Carry-lookahead RNN (CL-RNN) is
proposed. CL-RNN takes advantages in parallelism and flexible receptive field.
Through a comprehensive set of tests, we verify that CL-RNN can perform better
than existing typical RNNs in sequence modeling tasks which are specially
designed for RNNs
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
For most deep learning practitioners, sequence modeling is synonymous with
recurrent networks. Yet recent results indicate that convolutional
architectures can outperform recurrent networks on tasks such as audio
synthesis and machine translation. Given a new sequence modeling task or
dataset, which architecture should one use? We conduct a systematic evaluation
of generic convolutional and recurrent architectures for sequence modeling. The
models are evaluated across a broad range of standard tasks that are commonly
used to benchmark recurrent networks. Our results indicate that a simple
convolutional architecture outperforms canonical recurrent networks such as
LSTMs across a diverse range of tasks and datasets, while demonstrating longer
effective memory. We conclude that the common association between sequence
modeling and recurrent networks should be reconsidered, and convolutional
networks should be regarded as a natural starting point for sequence modeling
tasks. To assist related work, we have made code available at
http://github.com/locuslab/TCN