2,259 research outputs found
ReLyMe: Improving Lyric-to-Melody Generation by Incorporating Lyric-Melody Relationships
Lyric-to-melody generation, which generates melody according to given lyrics,
is one of the most important automatic music composition tasks. With the rapid
development of deep learning, previous works address this task with end-to-end
neural network models. However, deep learning models cannot well capture the
strict but subtle relationships between lyrics and melodies, which compromises
the harmony between lyrics and generated melodies. In this paper, we propose
ReLyMe, a method that incorporates Relationships between Lyrics and Melodies
from music theory to ensure the harmony between lyrics and melodies.
Specifically, we first introduce several principles that lyrics and melodies
should follow in terms of tone, rhythm, and structure relationships. These
principles are then integrated into neural network lyric-to-melody models by
adding corresponding constraints during the decoding process to improve the
harmony between lyrics and melodies. We use a series of objective and
subjective metrics to evaluate the generated melodies. Experiments on both
English and Chinese song datasets show the effectiveness of ReLyMe,
demonstrating the superiority of incorporating lyric-melody relationships from
the music domain into neural lyric-to-melody generation.Comment: Accepted by ACMMM 2022, ora
SongComposer: A Large Language Model for Lyric and Melody Composition in Song Generation
We present SongComposer, an innovative LLM designed for song composition. It
could understand and generate melodies and lyrics in symbolic song
representations, by leveraging the capability of LLM. Existing music-related
LLM treated the music as quantized audio signals, while such implicit encoding
leads to inefficient encoding and poor flexibility. In contrast, we resort to
symbolic song representation, the mature and efficient way humans designed for
music, and enable LLM to explicitly compose songs like humans. In practice, we
design a novel tuple design to format lyric and three note attributes (pitch,
duration, and rest duration) in the melody, which guarantees the correct LLM
understanding of musical symbols and realizes precise alignment between lyrics
and melody. To impart basic music understanding to LLM, we carefully collected
SongCompose-PT, a large-scale song pretraining dataset that includes lyrics,
melodies, and paired lyrics-melodies in either Chinese or English. After
adequate pre-training, 10K carefully crafted QA pairs are used to empower the
LLM with the instruction-following capability and solve diverse tasks. With
extensive experiments, SongComposer demonstrates superior performance in
lyric-to-melody generation, melody-to-lyric generation, song continuation, and
text-to-song creation, outperforming advanced LLMs like GPT-4.Comment: project page: https://pjlab-songcomposer.github.io/ code:
https://github.com/pjlab-songcomposer/songcompose
Deep Learning Techniques for Music Generation -- A Survey
This paper is a survey and an analysis of different ways of using deep
learning (deep artificial neural networks) to generate musical content. We
propose a methodology based on five dimensions for our analysis:
Objective - What musical content is to be generated? Examples are: melody,
polyphony, accompaniment or counterpoint. - For what destination and for what
use? To be performed by a human(s) (in the case of a musical score), or by a
machine (in the case of an audio file).
Representation - What are the concepts to be manipulated? Examples are:
waveform, spectrogram, note, chord, meter and beat. - What format is to be
used? Examples are: MIDI, piano roll or text. - How will the representation be
encoded? Examples are: scalar, one-hot or many-hot.
Architecture - What type(s) of deep neural network is (are) to be used?
Examples are: feedforward network, recurrent network, autoencoder or generative
adversarial networks.
Challenge - What are the limitations and open challenges? Examples are:
variability, interactivity and creativity.
Strategy - How do we model and control the process of generation? Examples
are: single-step feedforward, iterative feedforward, sampling or input
manipulation.
For each dimension, we conduct a comparative analysis of various models and
techniques and we propose some tentative multidimensional typology. This
typology is bottom-up, based on the analysis of many existing deep-learning
based systems for music generation selected from the relevant literature. These
systems are described and are used to exemplify the various choices of
objective, representation, architecture, challenge and strategy. The last
section includes some discussion and some prospects.Comment: 209 pages. This paper is a simplified version of the book: J.-P.
Briot, G. Hadjeres and F.-D. Pachet, Deep Learning Techniques for Music
Generation, Computational Synthesis and Creative Systems, Springer, 201
A Study of Music: Music Psychology, Music Therapy, and Worship Music
There are three specific fields related to music: the psychology of Music and how it affects human brain and functions, the methodology of Music Therapy and how it affects individuals undergoing treatment, and the psychological effects of Worship Music and how it can be used in music therapy. Music therapy is a growing field in which the therapeutic outcomes greatly benefit the patients. The overall purpose is to create a greater understanding of music and music therapy in order to a provide a system for introducing group worship services into music therapy to ultimately bring spiritual healing to individuals
SongMASS: Automatic Song Writing with Pre-training and Alignment Constraint
Automatic song writing aims to compose a song (lyric and/or melody) by
machine, which is an interesting topic in both academia and industry. In
automatic song writing, lyric-to-melody generation and melody-to-lyric
generation are two important tasks, both of which usually suffer from the
following challenges: 1) the paired lyric and melody data are limited, which
affects the generation quality of the two tasks, considering a lot of paired
training data are needed due to the weak correlation between lyric and melody;
2) Strict alignments are required between lyric and melody, which relies on
specific alignment modeling. In this paper, we propose SongMASS to address the
above challenges, which leverages masked sequence to sequence (MASS)
pre-training and attention based alignment modeling for lyric-to-melody and
melody-to-lyric generation. Specifically, 1) we extend the original
sentence-level MASS pre-training to song level to better capture long
contextual information in music, and use a separate encoder and decoder for
each modality (lyric or melody); 2) we leverage sentence-level attention mask
and token-level attention constraint during training to enhance the alignment
between lyric and melody. During inference, we use a dynamic programming
strategy to obtain the alignment between each word/syllable in lyric and note
in melody. We pre-train SongMASS on unpaired lyric and melody datasets, and
both objective and subjective evaluations demonstrate that SongMASS generates
lyric and melody with significantly better quality than the baseline method
without pre-training or alignment constraint
- …