108 research outputs found
Deep Learning Techniques for Music Generation -- A Survey
This paper is a survey and an analysis of different ways of using deep
learning (deep artificial neural networks) to generate musical content. We
propose a methodology based on five dimensions for our analysis:
Objective - What musical content is to be generated? Examples are: melody,
polyphony, accompaniment or counterpoint. - For what destination and for what
use? To be performed by a human(s) (in the case of a musical score), or by a
machine (in the case of an audio file).
Representation - What are the concepts to be manipulated? Examples are:
waveform, spectrogram, note, chord, meter and beat. - What format is to be
used? Examples are: MIDI, piano roll or text. - How will the representation be
encoded? Examples are: scalar, one-hot or many-hot.
Architecture - What type(s) of deep neural network is (are) to be used?
Examples are: feedforward network, recurrent network, autoencoder or generative
adversarial networks.
Challenge - What are the limitations and open challenges? Examples are:
variability, interactivity and creativity.
Strategy - How do we model and control the process of generation? Examples
are: single-step feedforward, iterative feedforward, sampling or input
manipulation.
For each dimension, we conduct a comparative analysis of various models and
techniques and we propose some tentative multidimensional typology. This
typology is bottom-up, based on the analysis of many existing deep-learning
based systems for music generation selected from the relevant literature. These
systems are described and are used to exemplify the various choices of
objective, representation, architecture, challenge and strategy. The last
section includes some discussion and some prospects.Comment: 209 pages. This paper is a simplified version of the book: J.-P.
Briot, G. Hadjeres and F.-D. Pachet, Deep Learning Techniques for Music
Generation, Computational Synthesis and Creative Systems, Springer, 201
MIDI-VAE: Modeling Dynamics and Instrumentation of Music with Applications to Style Transfer
We introduce MIDI-VAE, a neural network model based on Variational
Autoencoders that is capable of handling polyphonic music with multiple
instrument tracks, as well as modeling the dynamics of music by incorporating
note durations and velocities. We show that MIDI-VAE can perform style transfer
on symbolic music by automatically changing pitches, dynamics and instruments
of a music piece from, e.g., a Classical to a Jazz style. We evaluate the
efficacy of the style transfer by training separate style validation
classifiers. Our model can also interpolate between short pieces of music,
produce medleys and create mixtures of entire songs. The interpolations
smoothly change pitches, dynamics and instrumentation to create a harmonic
bridge between two music pieces. To the best of our knowledge, this work
represents the first successful attempt at applying neural style transfer to
complete musical compositions.Comment: Paper accepted at the 19th International Society for Music
Information Retrieval Conference, ISMIR 2018, Paris, Franc
Toward Interactive Music Generation: A Position Paper
Music generation using deep learning has received considerable attention in recent years. Researchers have developed various generative models capable of imitating musical conventions, comprehending the musical corpora, and generating new samples based on the learning outcome. Although the samples generated by these models are persuasive, they often lack musical structure and creativity. For instance, a vanilla end-to-end approach, which deals with all levels of music representation at once, does not offer human-level control and interaction during the learning process, leading to constrained results. Indeed, music creation is a recurrent process that follows some principles by a musician, where various musical features are reused or adapted. On the other hand, a musical piece adheres to a musical style, breaking down into precise concepts of timbre style, performance style, composition style, and the coherency between these aspects. Here, we study and analyze the current advances in music generation using deep learning models through different criteria. We discuss the shortcomings and limitations of these models regarding interactivity and adaptability. Finally, we draw the potential future research direction addressing multi-agent systems and reinforcement learning algorithms to alleviate these shortcomings and limitations
VGM-RNN: Recurrent Neural Networks for Video Game Music Generation
The recent explosion of interest in deep neural networks has affected and in some cases reinvigorated work in fields as diverse as natural language processing, image recognition, speech recognition and many more. For sequence learning tasks, recurrent neural networks and in particular LSTM-based networks have shown promising results. Recently there has been interest – for example in the research by Google’s Magenta team – in applying so-called “language modeling” recurrent neural networks to musical tasks, including for the automatic generation of original music. In this work we demonstrate our own LSTM-based music language modeling recurrent network. We show that it is able to learn musical features from a MIDI dataset and generate output that is musically interesting while demonstrating features of melody, harmony and rhythm. We source our dataset from VGMusic.com, a collection of user-submitted MIDI transcriptions of video game songs, and attempt to generate output which emulates this kind of music
A Review of Intelligent Music Generation Systems
With the introduction of ChatGPT, the public's perception of AI-generated
content (AIGC) has begun to reshape. Artificial intelligence has significantly
reduced the barrier to entry for non-professionals in creative endeavors,
enhancing the efficiency of content creation. Recent advancements have seen
significant improvements in the quality of symbolic music generation, which is
enabled by the use of modern generative algorithms to extract patterns implicit
in a piece of music based on rule constraints or a musical corpus.
Nevertheless, existing literature reviews tend to present a conventional and
conservative perspective on future development trajectories, with a notable
absence of thorough benchmarking of generative models. This paper provides a
survey and analysis of recent intelligent music generation techniques,
outlining their respective characteristics and discussing existing methods for
evaluation. Additionally, the paper compares the different characteristics of
music generation techniques in the East and West as well as analysing the
field's development prospects
Music Generation by Deep Learning - Challenges and Directions
In addition to traditional tasks such as prediction, classification and
translation, deep learning is receiving growing attention as an approach for
music generation, as witnessed by recent research groups such as Magenta at
Google and CTRL (Creator Technology Research Lab) at Spotify. The motivation is
in using the capacity of deep learning architectures and training techniques to
automatically learn musical styles from arbitrary musical corpora and then to
generate samples from the estimated distribution. However, a direct application
of deep learning to generate content rapidly reaches limits as the generated
content tends to mimic the training set without exhibiting true creativity.
Moreover, deep learning architectures do not offer direct ways for controlling
generation (e.g., imposing some tonality or other arbitrary constraints).
Furthermore, deep learning architectures alone are autistic automata which
generate music autonomously without human user interaction, far from the
objective of interactively assisting musicians to compose and refine music.
Issues such as: control, structure, creativity and interactivity are the focus
of our analysis. In this paper, we select some limitations of a direct
application of deep learning to music generation, analyze why the issues are
not fulfilled and how to address them by possible approaches. Various examples
of recent systems are cited as examples of promising directions.Comment: 17 pages. arXiv admin note: substantial text overlap with
arXiv:1709.01620. Accepted for publication in Special Issue on Deep learning
for music and audio, Neural Computing & Applications, Springer Nature, 201
- …