6,028 research outputs found
Deep Learning Techniques for Music Generation -- A Survey
This paper is a survey and an analysis of different ways of using deep
learning (deep artificial neural networks) to generate musical content. We
propose a methodology based on five dimensions for our analysis:
Objective - What musical content is to be generated? Examples are: melody,
polyphony, accompaniment or counterpoint. - For what destination and for what
use? To be performed by a human(s) (in the case of a musical score), or by a
machine (in the case of an audio file).
Representation - What are the concepts to be manipulated? Examples are:
waveform, spectrogram, note, chord, meter and beat. - What format is to be
used? Examples are: MIDI, piano roll or text. - How will the representation be
encoded? Examples are: scalar, one-hot or many-hot.
Architecture - What type(s) of deep neural network is (are) to be used?
Examples are: feedforward network, recurrent network, autoencoder or generative
adversarial networks.
Challenge - What are the limitations and open challenges? Examples are:
variability, interactivity and creativity.
Strategy - How do we model and control the process of generation? Examples
are: single-step feedforward, iterative feedforward, sampling or input
manipulation.
For each dimension, we conduct a comparative analysis of various models and
techniques and we propose some tentative multidimensional typology. This
typology is bottom-up, based on the analysis of many existing deep-learning
based systems for music generation selected from the relevant literature. These
systems are described and are used to exemplify the various choices of
objective, representation, architecture, challenge and strategy. The last
section includes some discussion and some prospects.Comment: 209 pages. This paper is a simplified version of the book: J.-P.
Briot, G. Hadjeres and F.-D. Pachet, Deep Learning Techniques for Music
Generation, Computational Synthesis and Creative Systems, Springer, 201
Usability of Musical Digital Libraries: a Multimodal Analysis.
There has been substantial research on technical aspects of musical digital libraries, but comparatively little on usability aspects. We have evaluated four web-accessible music libraries, focusing particularly on features that are particular to music libraries, such as music retrieval mechanisms. Although the original focus of the work was on how modalities are combined within the interactions with such libraries, that was not where the main difficulties were found. Libraries were generally well designed for use of different modalities. The main challenges identified relate to the details of melody matching and to simplifying the choices of file format. These issues are discussed in detail. 1
From Words to Music: A Study of Subword Tokenization Techniques in Symbolic Music Generation
Subword tokenization has been widely successful in text-based natural
language processing (NLP) tasks with Transformer-based models. As Transformer
models become increasingly popular in symbolic music-related studies, it is
imperative to investigate the efficacy of subword tokenization in the symbolic
music domain. In this paper, we explore subword tokenization techniques, such
as byte-pair encoding (BPE), in symbolic music generation and its impact on the
overall structure of generated songs. Our experiments are based on three types
of MIDI datasets: single track-melody only, multi-track with a single
instrument, and multi-track and multi-instrument. We apply subword tokenization
on post-musical tokenization schemes and find that it enables the generation of
longer songs at the same time and improves the overall structure of the
generated music in terms of objective metrics like structure indicator (SI),
Pitch Class Entropy, etc. We also compare two subword tokenization methods, BPE
and Unigram, and observe that both methods lead to consistent improvements. Our
study suggests that subword tokenization is a promising technique for symbolic
music generation and may have broader implications for music composition,
particularly in cases involving complex data such as multi-track songs
GETMusic: Generating Any Music Tracks with a Unified Representation and Diffusion Framework
Symbolic music generation aims to create musical notes, which can help users
compose music, such as generating target instrument tracks based on provided
source tracks. In practical scenarios where there's a predefined ensemble of
tracks and various composition needs, an efficient and effective generative
model that can generate any target tracks based on the other tracks becomes
crucial. However, previous efforts have fallen short in addressing this
necessity due to limitations in their music representations and models. In this
paper, we introduce a framework known as GETMusic, with ``GET'' standing for
``GEnerate music Tracks.'' This framework encompasses a novel music
representation ``GETScore'' and a diffusion model ``GETDiff.'' GETScore
represents musical notes as tokens and organizes tokens in a 2D structure, with
tracks stacked vertically and progressing horizontally over time. At a training
step, each track of a music piece is randomly selected as either the target or
source. The training involves two processes: In the forward process, target
tracks are corrupted by masking their tokens, while source tracks remain as the
ground truth; in the denoising process, GETDiff is trained to predict the
masked target tokens conditioning on the source tracks. Our proposed
representation, coupled with the non-autoregressive generative model, empowers
GETMusic to generate music with any arbitrary source-target track combinations.
Our experiments demonstrate that the versatile GETMusic outperforms prior works
proposed for certain specific composition tasks.Comment: 13 pages, 4 figure
Data-based melody generation through multi-objective evolutionary computation
Genetic-based composition algorithms are able to explore an immense space of possibilities, but the main difficulty has always been the implementation of the selection process. In this work, sets of melodies are utilized for training a machine learning approach to compute fitness, based on different metrics. The fitness of a candidate is provided by combining the metrics, but their values can range through different orders of magnitude and evolve in different ways, which makes it hard to combine these criteria. In order to solve this problem, a multi-objective fitness approach is proposed, in which the best individuals are those in the Pareto front of the multi-dimensional fitness space. Melodic trees are also proposed as a data structure for chromosomic representation of melodies and genetic operators are adapted to them. Some experiments have been carried out using a graphical interface prototype that allows one to explore the creative capabilities of the proposed system. An Online Supplement is provided and can be accessed at http://dx.doi.org/10.1080/17459737.2016.1188171, where the reader can find some technical details, information about the data used, generated melodies, and additional information about the developed prototype and its performance.This work was supported by the Spanish Ministerio de Educación, Cultura y Deporte [FPU fellowship AP2012-0939]; and the Spanish Ministerio de Economía y Competitividad project TIMuL supported by UE FEDER funds [No. TIN2013–48152–C2–1–R]
- …