15 research outputs found
Music Understanding LLaMA: Advancing Text-to-Music Generation with Question Answering and Captioning
Text-to-music generation (T2M-Gen) faces a major obstacle due to the scarcity
of large-scale publicly available music datasets with natural language
captions. To address this, we propose the Music Understanding LLaMA (MU-LLaMA),
capable of answering music-related questions and generating captions for music
files. Our model utilizes audio representations from a pretrained MERT model to
extract music features. However, obtaining a suitable dataset for training the
MU-LLaMA model remains challenging, as existing publicly accessible audio
question answering datasets lack the necessary depth for open-ended music
question answering. To fill this gap, we present a methodology for generating
question-answer pairs from existing audio captioning datasets and introduce the
MusicQA Dataset designed for answering open-ended music-related questions. The
experiments demonstrate that the proposed MU-LLaMA model, trained on our
designed MusicQA dataset, achieves outstanding performance in both music
question answering and music caption generation across various metrics,
outperforming current state-of-the-art (SOTA) models in both fields and
offering a promising advancement in the T2M-Gen research field
Unified Pretraining Target Based Video-music Retrieval With Music Rhythm And Video Optical Flow Information
Background music (BGM) can enhance the video's emotion. However, selecting an
appropriate BGM often requires domain knowledge. This has led to the
development of video-music retrieval techniques. Most existing approaches
utilize pretrained video/music feature extractors trained with different target
sets to obtain average video/music-level embeddings. The drawbacks are
two-fold. One is that different target sets for video/music pretraining may
cause the generated embeddings difficult to match. The second is that the
underlying temporal correlation between video and music is ignored. In this
paper, our proposed approach leverages a unified target set to perform
video/music pretraining and produces clip-level embeddings to preserve temporal
information. The downstream cross-modal matching is based on the clip-level
features with embedded music rhythm and optical flow information. Experiments
demonstrate that our proposed method can achieve superior performance over the
state-of-the-art methods by a significant margin
An Analytical Method for Coaxial Helicopter Ground Resonance
A time-frequency analytical method is presented to analyze physical mechanism of coaxial helicopter ground resonance. Eigenvalue calculation and numerical integration of disturbance equations of motions are used to obtain modal characters and time-domain response characters of coaxial helicopter ground resonance, and the interaction between rotors and body is revealed according to response of various DOFs. The analysis results show that regressive lag mode with upper rotor character is the most instability mode. In dynamic instability region, coaxial helicopter ground resonance is mainly due to energy transferred between periodic lag motion of upper rotor and body roll rotation. For this instability mode, energy transferred between periodic lag motion of lower rotor and body roll rotation is also existed, and it can enhance ground resonance instability of coaxial helicopter
An Analytical Method for Coaxial Helicopter Ground Resonance
A time-frequency analytical method is presented to analyze physical mechanism of coaxial helicopter ground resonance. Eigenvalue calculation and numerical integration of disturbance equations of motions are used to obtain modal characters and time-domain response characters of coaxial helicopter ground resonance, and the interaction between rotors and body is revealed according to response of various DOFs. The analysis results show that regressive lag mode with upper rotor character is the most instability mode. In dynamic instability region, coaxial helicopter ground resonance is mainly due to energy transferred between periodic lag motion of upper rotor and body roll rotation. For this instability mode, energy transferred between periodic lag motion of lower rotor and body roll rotation is also existed, and it can enhance ground resonance instability of coaxial helicopter