832 research outputs found
Deep Learning Techniques for Music Generation -- A Survey
This paper is a survey and an analysis of different ways of using deep
learning (deep artificial neural networks) to generate musical content. We
propose a methodology based on five dimensions for our analysis:
Objective - What musical content is to be generated? Examples are: melody,
polyphony, accompaniment or counterpoint. - For what destination and for what
use? To be performed by a human(s) (in the case of a musical score), or by a
machine (in the case of an audio file).
Representation - What are the concepts to be manipulated? Examples are:
waveform, spectrogram, note, chord, meter and beat. - What format is to be
used? Examples are: MIDI, piano roll or text. - How will the representation be
encoded? Examples are: scalar, one-hot or many-hot.
Architecture - What type(s) of deep neural network is (are) to be used?
Examples are: feedforward network, recurrent network, autoencoder or generative
adversarial networks.
Challenge - What are the limitations and open challenges? Examples are:
variability, interactivity and creativity.
Strategy - How do we model and control the process of generation? Examples
are: single-step feedforward, iterative feedforward, sampling or input
manipulation.
For each dimension, we conduct a comparative analysis of various models and
techniques and we propose some tentative multidimensional typology. This
typology is bottom-up, based on the analysis of many existing deep-learning
based systems for music generation selected from the relevant literature. These
systems are described and are used to exemplify the various choices of
objective, representation, architecture, challenge and strategy. The last
section includes some discussion and some prospects.Comment: 209 pages. This paper is a simplified version of the book: J.-P.
Briot, G. Hadjeres and F.-D. Pachet, Deep Learning Techniques for Music
Generation, Computational Synthesis and Creative Systems, Springer, 201
AccoMontage-3: Full-Band Accompaniment Arrangement via Sequential Style Transfer and Multi-Track Function Prior
We propose AccoMontage-3, a symbolic music automation system capable of
generating multi-track, full-band accompaniment based on the input of a lead
melody with chords (i.e., a lead sheet). The system contains three modular
components, each modelling a vital aspect of full-band composition. The first
component is a piano arranger that generates piano accompaniment for the lead
sheet by transferring texture styles to the chords using latent chord-texture
disentanglement and heuristic retrieval of texture donors. The second component
orchestrates the piano accompaniment score into full-band arrangement according
to the orchestration style encoded by individual track functions. The third
component, which connects the previous two, is a prior model characterizing the
global structure of orchestration style over the whole piece of music. From end
to end, the system learns to generate full-band accompaniment in a
self-supervised fashion, applying style transfer at two levels of polyphonic
composition: texture and orchestration. Experiments show that our system
outperforms the baselines significantly, and the modular design offers
effective controls in a musically meaningful way
Self-Supervised Disentanglement of Harmonic and Rhythmic Features in Music Audio Signals
The aim of latent variable disentanglement is to infer the multiple
informative latent representations that lie behind a data generation process
and is a key factor in controllable data generation. In this paper, we propose
a deep neural network-based self-supervised learning method to infer the
disentangled rhythmic and harmonic representations behind music audio
generation. We train a variational autoencoder that generates an audio
mel-spectrogram from two latent features representing the rhythmic and harmonic
content. In the training phase, the variational autoencoder is trained to
reconstruct the input mel-spectrogram given its pitch-shifted version. At each
forward computation in the training phase, a vector rotation operation is
applied to one of the latent features, assuming that the dimensions of the
feature vectors are related to pitch intervals. Therefore, in the trained
variational autoencoder, the rotated latent feature represents the
pitch-related information of the mel-spectrogram, and the unrotated latent
feature represents the pitch-invariant information, i.e., the rhythmic content.
The proposed method was evaluated using a predictor-based disentanglement
metric on the learned features. Furthermore, we demonstrate its application to
the automatic generation of music remixes.Comment: Accepted to DAFx 202
Natural Language Processing Methods for Symbolic Music Generation and Information Retrieval: a Survey
Several adaptations of Transformers models have been developed in various
domains since its breakthrough in Natural Language Processing (NLP). This trend
has spread into the field of Music Information Retrieval (MIR), including
studies processing music data. However, the practice of leveraging NLP tools
for symbolic music data is not novel in MIR. Music has been frequently compared
to language, as they share several similarities, including sequential
representations of text and music. These analogies are also reflected through
similar tasks in MIR and NLP. This survey reviews NLP methods applied to
symbolic music generation and information retrieval studies following two axes.
We first propose an overview of representations of symbolic music adapted from
natural language sequential representations. Such representations are designed
by considering the specificities of symbolic music. These representations are
then processed by models. Such models, possibly originally developed for text
and adapted for symbolic music, are trained on various tasks. We describe these
models, in particular deep learning models, through different prisms,
highlighting music-specialized mechanisms. We finally present a discussion
surrounding the effective use of NLP tools for symbolic music data. This
includes technical issues regarding NLP methods and fundamental differences
between text and music, which may open several doors for further research into
more effectively adapting NLP tools to symbolic MIR.Comment: 36 pages, 5 figures, 4 table
ProgGP: From GuitarPro Tablature Neural Generation To Progressive Metal Production
Recent work in the field of symbolic music generation has shown value in
using a tokenization based on the GuitarPro format, a symbolic representation
supporting guitar expressive attributes, as an input and output representation.
We extend this work by fine-tuning a pre-trained Transformer model on ProgGP, a
custom dataset of 173 progressive metal songs, for the purposes of creating
compositions from that genre through a human-AI partnership. Our model is able
to generate multiple guitar, bass guitar, drums, piano and orchestral parts. We
examine the validity of the generated music using a mixed methods approach by
combining quantitative analyses following a computational musicology paradigm
and qualitative analyses following a practice-based research paradigm. Finally,
we demonstrate the value of the model by using it as a tool to create a
progressive metal song, fully produced and mixed by a human metal producer
based on AI-generated music.Comment: Pre-print accepted for publication at CMMR202
Automatic characterization and generation of music loops and instrument samples for electronic music production
Repurposing audio material to create new music - also known as sampling - was a foundation of electronic music and is a fundamental component of this practice. Currently, large-scale databases of audio offer vast collections of audio material for users to work with. The navigation on these databases is heavily focused on hierarchical tree directories. Consequently, sound retrieval is tiresome and often identified as an undesired interruption in the creative process.
We address two fundamental methods for navigating sounds: characterization and generation. Characterizing loops and one-shots in terms of instruments or instrumentation allows for organizing unstructured collections and a faster retrieval for music-making. The generation of loops and one-shot sounds enables the creation of new sounds not present in an audio collection through interpolation or modification of the existing material. To achieve this, we employ deep-learning-based data-driven methodologies for classification and generation.Repurposing audio material to create new music - also known as sampling - was a foundation of electronic music and is a fundamental component of this practice. Currently, large-scale databases of audio offer vast collections of audio material for users to work with. The navigation on these databases is heavily focused on hierarchical tree directories. Consequently, sound retrieval is tiresome and often identified as an undesired interruption in the creative process.
We address two fundamental methods for navigating sounds: characterization and generation. Characterizing loops and one-shots in terms of instruments or instrumentation allows for organizing unstructured collections and a faster retrieval for music-making. The generation of loops and one-shot sounds enables the creation of new sounds not present in an audio collection through interpolation or modification of the existing material. To achieve this, we employ deep-learning-based data-driven methodologies for classification and generation
- …