6,028 research outputs found

    Deep Learning Techniques for Music Generation -- A Survey

    Full text link
    This paper is a survey and an analysis of different ways of using deep learning (deep artificial neural networks) to generate musical content. We propose a methodology based on five dimensions for our analysis: Objective - What musical content is to be generated? Examples are: melody, polyphony, accompaniment or counterpoint. - For what destination and for what use? To be performed by a human(s) (in the case of a musical score), or by a machine (in the case of an audio file). Representation - What are the concepts to be manipulated? Examples are: waveform, spectrogram, note, chord, meter and beat. - What format is to be used? Examples are: MIDI, piano roll or text. - How will the representation be encoded? Examples are: scalar, one-hot or many-hot. Architecture - What type(s) of deep neural network is (are) to be used? Examples are: feedforward network, recurrent network, autoencoder or generative adversarial networks. Challenge - What are the limitations and open challenges? Examples are: variability, interactivity and creativity. Strategy - How do we model and control the process of generation? Examples are: single-step feedforward, iterative feedforward, sampling or input manipulation. For each dimension, we conduct a comparative analysis of various models and techniques and we propose some tentative multidimensional typology. This typology is bottom-up, based on the analysis of many existing deep-learning based systems for music generation selected from the relevant literature. These systems are described and are used to exemplify the various choices of objective, representation, architecture, challenge and strategy. The last section includes some discussion and some prospects.Comment: 209 pages. This paper is a simplified version of the book: J.-P. Briot, G. Hadjeres and F.-D. Pachet, Deep Learning Techniques for Music Generation, Computational Synthesis and Creative Systems, Springer, 201

    Usability of Musical Digital Libraries: a Multimodal Analysis.

    Get PDF
    There has been substantial research on technical aspects of musical digital libraries, but comparatively little on usability aspects. We have evaluated four web-accessible music libraries, focusing particularly on features that are particular to music libraries, such as music retrieval mechanisms. Although the original focus of the work was on how modalities are combined within the interactions with such libraries, that was not where the main difficulties were found. Libraries were generally well designed for use of different modalities. The main challenges identified relate to the details of melody matching and to simplifying the choices of file format. These issues are discussed in detail. 1

    From Words to Music: A Study of Subword Tokenization Techniques in Symbolic Music Generation

    Full text link
    Subword tokenization has been widely successful in text-based natural language processing (NLP) tasks with Transformer-based models. As Transformer models become increasingly popular in symbolic music-related studies, it is imperative to investigate the efficacy of subword tokenization in the symbolic music domain. In this paper, we explore subword tokenization techniques, such as byte-pair encoding (BPE), in symbolic music generation and its impact on the overall structure of generated songs. Our experiments are based on three types of MIDI datasets: single track-melody only, multi-track with a single instrument, and multi-track and multi-instrument. We apply subword tokenization on post-musical tokenization schemes and find that it enables the generation of longer songs at the same time and improves the overall structure of the generated music in terms of objective metrics like structure indicator (SI), Pitch Class Entropy, etc. We also compare two subword tokenization methods, BPE and Unigram, and observe that both methods lead to consistent improvements. Our study suggests that subword tokenization is a promising technique for symbolic music generation and may have broader implications for music composition, particularly in cases involving complex data such as multi-track songs

    GETMusic: Generating Any Music Tracks with a Unified Representation and Diffusion Framework

    Full text link
    Symbolic music generation aims to create musical notes, which can help users compose music, such as generating target instrument tracks based on provided source tracks. In practical scenarios where there's a predefined ensemble of tracks and various composition needs, an efficient and effective generative model that can generate any target tracks based on the other tracks becomes crucial. However, previous efforts have fallen short in addressing this necessity due to limitations in their music representations and models. In this paper, we introduce a framework known as GETMusic, with ``GET'' standing for ``GEnerate music Tracks.'' This framework encompasses a novel music representation ``GETScore'' and a diffusion model ``GETDiff.'' GETScore represents musical notes as tokens and organizes tokens in a 2D structure, with tracks stacked vertically and progressing horizontally over time. At a training step, each track of a music piece is randomly selected as either the target or source. The training involves two processes: In the forward process, target tracks are corrupted by masking their tokens, while source tracks remain as the ground truth; in the denoising process, GETDiff is trained to predict the masked target tokens conditioning on the source tracks. Our proposed representation, coupled with the non-autoregressive generative model, empowers GETMusic to generate music with any arbitrary source-target track combinations. Our experiments demonstrate that the versatile GETMusic outperforms prior works proposed for certain specific composition tasks.Comment: 13 pages, 4 figure

    Data-based melody generation through multi-objective evolutionary computation

    Get PDF
    Genetic-based composition algorithms are able to explore an immense space of possibilities, but the main difficulty has always been the implementation of the selection process. In this work, sets of melodies are utilized for training a machine learning approach to compute fitness, based on different metrics. The fitness of a candidate is provided by combining the metrics, but their values can range through different orders of magnitude and evolve in different ways, which makes it hard to combine these criteria. In order to solve this problem, a multi-objective fitness approach is proposed, in which the best individuals are those in the Pareto front of the multi-dimensional fitness space. Melodic trees are also proposed as a data structure for chromosomic representation of melodies and genetic operators are adapted to them. Some experiments have been carried out using a graphical interface prototype that allows one to explore the creative capabilities of the proposed system. An Online Supplement is provided and can be accessed at http://dx.doi.org/10.1080/17459737.2016.1188171, where the reader can find some technical details, information about the data used, generated melodies, and additional information about the developed prototype and its performance.This work was supported by the Spanish Ministerio de Educación, Cultura y Deporte [FPU fellowship AP2012-0939]; and the Spanish Ministerio de Economía y Competitividad project TIMuL supported by UE FEDER funds [No. TIN2013–48152–C2–1–R]
    corecore