15 research outputs found

    Music Understanding LLaMA: Advancing Text-to-Music Generation with Question Answering and Captioning

    Full text link
    Text-to-music generation (T2M-Gen) faces a major obstacle due to the scarcity of large-scale publicly available music datasets with natural language captions. To address this, we propose the Music Understanding LLaMA (MU-LLaMA), capable of answering music-related questions and generating captions for music files. Our model utilizes audio representations from a pretrained MERT model to extract music features. However, obtaining a suitable dataset for training the MU-LLaMA model remains challenging, as existing publicly accessible audio question answering datasets lack the necessary depth for open-ended music question answering. To fill this gap, we present a methodology for generating question-answer pairs from existing audio captioning datasets and introduce the MusicQA Dataset designed for answering open-ended music-related questions. The experiments demonstrate that the proposed MU-LLaMA model, trained on our designed MusicQA dataset, achieves outstanding performance in both music question answering and music caption generation across various metrics, outperforming current state-of-the-art (SOTA) models in both fields and offering a promising advancement in the T2M-Gen research field

    Unified Pretraining Target Based Video-music Retrieval With Music Rhythm And Video Optical Flow Information

    Full text link
    Background music (BGM) can enhance the video's emotion. However, selecting an appropriate BGM often requires domain knowledge. This has led to the development of video-music retrieval techniques. Most existing approaches utilize pretrained video/music feature extractors trained with different target sets to obtain average video/music-level embeddings. The drawbacks are two-fold. One is that different target sets for video/music pretraining may cause the generated embeddings difficult to match. The second is that the underlying temporal correlation between video and music is ignored. In this paper, our proposed approach leverages a unified target set to perform video/music pretraining and produces clip-level embeddings to preserve temporal information. The downstream cross-modal matching is based on the clip-level features with embedded music rhythm and optical flow information. Experiments demonstrate that our proposed method can achieve superior performance over the state-of-the-art methods by a significant margin

    Reading the Underlying Information From Massive Metagenomic Sequencing Data

    No full text

    An Analytical Method for Coaxial Helicopter Ground Resonance

    No full text
    A time-frequency analytical method is presented to analyze physical mechanism of coaxial helicopter ground resonance. Eigenvalue calculation and numerical integration of disturbance equations of motions are used to obtain modal characters and time-domain response characters of coaxial helicopter ground resonance, and the interaction between rotors and body is revealed according to response of various DOFs. The analysis results show that regressive lag mode with upper rotor character is the most instability mode. In dynamic instability region, coaxial helicopter ground resonance is mainly due to energy transferred between periodic lag motion of upper rotor and body roll rotation. For this instability mode, energy transferred between periodic lag motion of lower rotor and body roll rotation is also existed, and it can enhance ground resonance instability of coaxial helicopter

    An Analytical Method for Coaxial Helicopter Ground Resonance

    Get PDF
    A time-frequency analytical method is presented to analyze physical mechanism of coaxial helicopter ground resonance. Eigenvalue calculation and numerical integration of disturbance equations of motions are used to obtain modal characters and time-domain response characters of coaxial helicopter ground resonance, and the interaction between rotors and body is revealed according to response of various DOFs. The analysis results show that regressive lag mode with upper rotor character is the most instability mode. In dynamic instability region, coaxial helicopter ground resonance is mainly due to energy transferred between periodic lag motion of upper rotor and body roll rotation. For this instability mode, energy transferred between periodic lag motion of lower rotor and body roll rotation is also existed, and it can enhance ground resonance instability of coaxial helicopter
    corecore