15 research outputs found
Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music Generation Task
Benefiting from large-scale datasets and pre-trained models, the field of
generative models has recently gained significant momentum. However, most
datasets for symbolic music are very small, which potentially limits the
performance of data-driven multimodal models. An intuitive solution to this
problem is to leverage pre-trained models from other modalities (e.g., natural
language) to improve the performance of symbolic music-related multimodal
tasks. In this paper, we carry out the first study of generating complete and
semantically consistent symbolic music scores from text descriptions, and
explore the efficacy of using publicly available checkpoints (i.e., BERT,
GPT-2, and BART) for natural language processing in the task of text-to-music
generation. Our experimental results show that the improvement from using
pre-trained checkpoints is statistically significant in terms of BLEU score and
edit distance similarity. We analyse the capabilities and limitations of our
model to better understand the potential of language-music models.Comment: 5 pages, 2 figures, 2 table
Gamma Sampling: Fine-grained Controlling Language Models without Training
The dominant approaches for controlling language models achieve prominence in
controlling high-level attributes (e.g. topic and sentiment). However, these
methods often require condition-specific data or are computationally expensive.
We propose a new simple guided decoding method, Gamma Sampling, which does not
require any training data to achieve fine-grained controllable text generation
while maintaining a fast generation speed. Gamma Sampling introduces
attribute-related information (provided by humans or language models
themselves) into the sampling process to guide language models to generate
texts with desired attributes. Since no training is involved, Gamma Sampling
can be easily applied to any language model for controllable text generation.
Through experiments, we show that Gamma Sampling-steered GPT2-small (117M)
outperforms baselines such as PPLM (345M) and CTRL (1.6B) in diversity,
attribute relevance, and overall quality of generated samples.Comment: 20 pages, 5 figure
Chord-Conditioned Melody Choralization with Controllable Harmonicity and Polyphonicity
Melody choralization, i.e. generating a four-part chorale based on a
user-given melody, has long been closely associated with J.S. Bach chorales.
Previous neural network-based systems rarely focus on chorale generation
conditioned on a chord progression, and none of them realised controllable
melody choralization. To enable neural networks to learn the general principles
of counterpoint from Bach's chorales, we first design a music representation
that encoded chord symbols for chord conditioning. We then propose DeepChoir, a
melody choralization system, which can generate a four-part chorale for a given
melody conditioned on a chord progression. Furthermore, with the improved
density sampling, a user can control the extent of harmonicity and
polyphonicity for the chorale generated by DeepChoir. Experimental results
reveal the effectiveness of our data representation and the controllability of
DeepChoir over harmonicity and polyphonicity. The code and generated samples
(chorales, folk songs and a symphony) of DeepChoir, and the dataset we use now
are available at https://github.com/sander-wood/deepchoir.Comment: 7 pages, 4 figures, 2 table
TunesFormer: Forming Irish Tunes with Control Codes by Bar Patching
This paper introduces TunesFormer, an efficient Transformer-based
dual-decoder model specifically designed for the generation of melodies that
adhere to user-defined musical forms. Trained on 214,122 Irish tunes,
TunesFormer utilizes techniques including bar patching and control codes. Bar
patching reduces sequence length and generation time, while control codes guide
TunesFormer in producing melodies that conform to desired musical forms. Our
evaluation demonstrates TunesFormer's superior efficiency, being 3.22 times
faster than GPT-2 and 1.79 times faster than a model with linear complexity of
equal scale while offering comparable performance in controllability and other
metrics. TunesFormer provides a novel tool for musicians, composers, and music
enthusiasts alike to explore the vast landscape of Irish music. Our model and
code are available at https://github.com/sander-wood/tunesformer.Comment: 5 pages, 3 figures, 1 tabl
WikiMT++ Dataset Card
WikiMT++ is an expanded and refined version of WikiMusicText (WikiMT),
featuring 1010 curated lead sheets in ABC notation. To expand application
scenarios of WikiMT, we add both objective (album, lyrics, video) and
subjective emotion (12 emotion adjectives) and emo\_4q (Russell 4Q) attributes,
enhancing its usability for music information retrieval, conditional music
generation, automatic composition, and emotion classification, etc.
Additionally, CLaMP is implemented to correct the attributes inherited from
WikiMT to reduce errors introduced during original data collection and enhance
the accuracy and completeness of our dataset
CLaMP: Contrastive Language-Music Pre-training for Cross-Modal Symbolic Music Information Retrieval
We introduce CLaMP: Contrastive Language-Music Pre-training, which learns
cross-modal representations between natural language and symbolic music using a
music encoder and a text encoder trained jointly with a contrastive loss. To
pre-train CLaMP, we collected a large dataset of 1.4 million music-text pairs.
It employed text dropout as a data augmentation technique and bar patching to
efficiently represent music data which reduces sequence length to less than
10\%. In addition, we developed a masked music model pre-training objective to
enhance the music encoder's comprehension of musical context and structure.
CLaMP integrates textual information to enable semantic search and zero-shot
classification for symbolic music, surpassing the capabilities of previous
models. To support the evaluation of semantic search and music classification,
we publicly release WikiMusicText (WikiMT), a dataset of 1010 lead sheets in
ABC notation, each accompanied by a title, artist, genre, and description. In
comparison to state-of-the-art models that require fine-tuning, zero-shot CLaMP
demonstrated comparable or superior performance on score-oriented datasets. Our
models and code are available at
https://github.com/microsoft/muzic/tree/main/clamp.Comment: 11 pages, 5 figures, 5 tables, accepted by ISMIR 202
Generating Chord Progression from Melody with Flexible Harmonic Rhythm and Controllable Harmonic Density
Melody harmonization, which involves generating a chord progression that
complements a user-provided melody, continues to pose a significant challenge.
A chord progression must not only be in harmony with the melody, but also
interdependent on its rhythmic pattern. While previous neural network-based
systems have been successful in producing chord progressions for given
melodies, they have not adequately addressed controllable melody harmonization,
nor have they focused on generating harmonic rhythms with flexibility in the
rates or patterns of chord changes. This paper presents AutoHarmonizer, a novel
system for harmonic density-controllable melody harmonization with such a
flexible harmonic rhythm. AutoHarmonizer is equipped with an extensive
vocabulary of 1,462 chord types and can generate chord progressions that vary
in harmonic density for a given melody. Experimental results indicate that the
AutoHarmonizer-generated chord progressions exhibit a diverse range of harmonic
rhythms and that the system's controllable harmonic density is effective.Comment: 12 pages, 6 figures, 1 table, accepted by EURASIP JASM
Visible-Light-Active Titanium Sulfonate Framework for Photocatalytic Organic Synthesis
In this work, the first visible-light-active titanium
sulfonate
metal–organic framework (denoted as FIR-138) with 2-fold interpenetrated
srs topology was synthesized by employing 2,5-dihydroxy-1,4-benzenedisulfonic
acid (H4DOBSC) as ligands. The strong chelating coordination
ability of the hydroxyl and sulfonate O atoms from H4DOBSC
endows the framework of FIR-138 with good stability, while the formation
of the Ti-phenolic motif ensures excellent visible light absorption
with a bandgap (Eg) of 1.74 eV. More importantly,
the extensive titanium active sites within the structure could trap
the photogenerated electrons and promote the charge separation effectively,
attributed to the excellent visible light photocatalytic performance
in organic reaction. FIR-138’s capability to harness visible
light for photocatalytic reactions presents a promising advancement
in the field of Ti-MOF photocatalysts. These results provide valuable
insights and open up new avenues for the rational design and synthesis
of visible-light-active Ti-MOF photocatalysts