206 research outputs found

    Content-based Controls For Music Large Language Modeling

    Full text link
    Recent years have witnessed a rapid growth of large-scale language models in the domain of music audio. Such models enable end-to-end generation of higher-quality music, and some allow conditioned generation using text descriptions. However, the control power of text controls on music is intrinsically limited, as they can only describe music indirectly through meta-data (such as singers and instruments) or high-level representations (such as genre and emotion). We aim to further equip the models with direct and content-based controls on innate music languages such as pitch, chords and drum track. To this end, we contribute Coco-Mulla, a content-based control method for music large language modeling. It uses a parameter-efficient fine-tuning (PEFT) method tailored for Transformer-based audio models. Experiments show that our approach achieved high-quality music generation with low-resource semi-supervised learning, tuning with less than 4% parameters compared to the original model and training on a small dataset with fewer than 300 songs. Moreover, our approach enables effective content-based controls, and we illustrate the control power via chords and rhythms, two of the most salient features of music audio. Furthermore, we show that by combining content-based controls and text descriptions, our system achieves flexible music variation generation and style transfer. Our source codes and demos are available online

    Polyffusion: A Diffusion Model for Polyphonic Score Generation with Internal and External Controls

    Full text link
    We propose Polyffusion, a diffusion model that generates polyphonic music scores by regarding music as image-like piano roll representations. The model is capable of controllable music generation with two paradigms: internal control and external control. Internal control refers to the process in which users pre-define a part of the music and then let the model infill the rest, similar to the task of masked music generation (or music inpainting). External control conditions the model with external yet related information, such as chord, texture, or other features, via the cross-attention mechanism. We show that by using internal and external controls, Polyffusion unifies a wide range of music creation tasks, including melody generation given accompaniment, accompaniment generation given melody, arbitrary music segment inpainting, and music arrangement given chords or textures. Experimental results show that our model significantly outperforms existing Transformer and sampling-based baselines, and using pre-trained disentangled representations as external conditions yields more effective controls.Comment: In Proceedings of the 24th Conference of the International Society for Music Information Retrieval (ISMIR 2023), Milan, Ital

    Adenosine deaminase acting on RNA 1 (ADAR1) as crucial regulators in cardiovascular diseases: structures, pathogenesis, and potential therapeutic approach

    Get PDF
    Cardiovascular diseases (CVDs) are a group of diseases that have a major impact on global health and are the leading cause of death. A large number of chemical base modifications in ribonucleic acid (RNA) are associated with cardiovascular diseases. A variety of ribonucleic acid modifications exist in cells, among which adenosine deaminase-dependent modification is one of the most common ribonucleic acid modifications. Adenosine deaminase acting on ribonucleic acid 1 (Adenosine deaminase acting on RNA 1) is a widely expressed double-stranded ribonucleic acid adenosine deaminase that forms inosine (A-to-I) by catalyzing the deamination of adenosine at specific sites of the target ribonucleic acid. In this review, we provide a comprehensive overview of the structure of Adenosine deaminase acting on RNA 1 and summarize the regulatory mechanisms of ADAR1-mediated ribonucleic acid editing in cardiovascular diseases, indicating Adenosine deaminase acting on RNA 1 as a promising therapeutic target in cardiovascular diseases

    Deconfounding Causal Inference for Zero-shot Action Recognition

    Get PDF
    Zero-shot action recognition (ZSAR) aims to recognize unseen action categories in the test set without corresponding training examples. Most existing zero-shot methods follow the feature generation framework to transfer knowledge from seen action categories to model the feature distribution of unseen categories. However, due to the complexity and diversity of actions, it remains challenging to generate unseen feature distribution, especially for the cross-dataset scenario when there is potentially larger domain shift. This paper proposes a De confounding Ca usa l GAN (DeCalGAN) for generating unseen action video features with the following technical contributions: 1) Our model unifies compositional ZSAR with traditional visual-semantic models to incorporate local object information with global semantic information for feature generation. 2) A GAN-based architecture is proposed for causal inference and unseen distribution discovery. 3) A deconfounding module is proposed to refine representations of local object and global semantic information confounder in the training data. Action descriptions and random object feature after causal inference are then used to discover unseen distributions of novel actions in different datasets. Our extensive experiments on C ross- D ataset Z ero- S hot A ction R ecognition (CD-ZSAR) demonstrate substantial improvement over the UCF101 and HMDB51 standard benchmarks for this problem
    • …
    corecore