206 research outputs found
Content-based Controls For Music Large Language Modeling
Recent years have witnessed a rapid growth of large-scale language models in
the domain of music audio. Such models enable end-to-end generation of
higher-quality music, and some allow conditioned generation using text
descriptions. However, the control power of text controls on music is
intrinsically limited, as they can only describe music indirectly through
meta-data (such as singers and instruments) or high-level representations (such
as genre and emotion). We aim to further equip the models with direct and
content-based controls on innate music languages such as pitch, chords and drum
track. To this end, we contribute Coco-Mulla, a content-based control method
for music large language modeling. It uses a parameter-efficient fine-tuning
(PEFT) method tailored for Transformer-based audio models. Experiments show
that our approach achieved high-quality music generation with low-resource
semi-supervised learning, tuning with less than 4% parameters compared to the
original model and training on a small dataset with fewer than 300 songs.
Moreover, our approach enables effective content-based controls, and we
illustrate the control power via chords and rhythms, two of the most salient
features of music audio. Furthermore, we show that by combining content-based
controls and text descriptions, our system achieves flexible music variation
generation and style transfer. Our source codes and demos are available online
Polyffusion: A Diffusion Model for Polyphonic Score Generation with Internal and External Controls
We propose Polyffusion, a diffusion model that generates polyphonic music
scores by regarding music as image-like piano roll representations. The model
is capable of controllable music generation with two paradigms: internal
control and external control. Internal control refers to the process in which
users pre-define a part of the music and then let the model infill the rest,
similar to the task of masked music generation (or music inpainting). External
control conditions the model with external yet related information, such as
chord, texture, or other features, via the cross-attention mechanism. We show
that by using internal and external controls, Polyffusion unifies a wide range
of music creation tasks, including melody generation given accompaniment,
accompaniment generation given melody, arbitrary music segment inpainting, and
music arrangement given chords or textures. Experimental results show that our
model significantly outperforms existing Transformer and sampling-based
baselines, and using pre-trained disentangled representations as external
conditions yields more effective controls.Comment: In Proceedings of the 24th Conference of the International Society
for Music Information Retrieval (ISMIR 2023), Milan, Ital
Adenosine deaminase acting on RNA 1 (ADAR1) as crucial regulators in cardiovascular diseases: structures, pathogenesis, and potential therapeutic approach
Cardiovascular diseases (CVDs) are a group of diseases that have a major impact on global health and are the leading cause of death. A large number of chemical base modifications in ribonucleic acid (RNA) are associated with cardiovascular diseases. A variety of ribonucleic acid modifications exist in cells, among which adenosine deaminase-dependent modification is one of the most common ribonucleic acid modifications. Adenosine deaminase acting on ribonucleic acid 1 (Adenosine deaminase acting on RNA 1) is a widely expressed double-stranded ribonucleic acid adenosine deaminase that forms inosine (A-to-I) by catalyzing the deamination of adenosine at specific sites of the target ribonucleic acid. In this review, we provide a comprehensive overview of the structure of Adenosine deaminase acting on RNA 1 and summarize the regulatory mechanisms of ADAR1-mediated ribonucleic acid editing in cardiovascular diseases, indicating Adenosine deaminase acting on RNA 1 as a promising therapeutic target in cardiovascular diseases
Deconfounding Causal Inference for Zero-shot Action Recognition
Zero-shot action recognition (ZSAR) aims to recognize unseen action categories in the test set without corresponding training examples. Most existing zero-shot methods follow the feature generation framework to transfer knowledge from seen action categories to model the feature distribution of unseen categories. However, due to the complexity and diversity of actions, it remains challenging to generate unseen feature distribution, especially for the cross-dataset scenario when there is potentially larger domain shift. This paper proposes a De confounding Ca usa l GAN (DeCalGAN) for generating unseen action video features with the following technical contributions: 1) Our model unifies compositional ZSAR with traditional visual-semantic models to incorporate local object information with global semantic information for feature generation. 2) A GAN-based architecture is proposed for causal inference and unseen distribution discovery. 3) A deconfounding module is proposed to refine representations of local object and global semantic information confounder in the training data. Action descriptions and random object feature after causal inference are then used to discover unseen distributions of novel actions in different datasets. Our extensive experiments on C ross- D ataset Z ero- S hot A ction R ecognition (CD-ZSAR) demonstrate substantial improvement over the UCF101 and HMDB51 standard benchmarks for this problem
- …