7,482 research outputs found

    Pop Music Highlighter: Marking the Emotion Keypoints

    Get PDF
    The goal of music highlight extraction is to get a short consecutive segment of a piece of music that provides an effective representation of the whole piece. In a previous work, we introduced an attention-based convolutional recurrent neural network that uses music emotion classification as a surrogate task for music highlight extraction, for Pop songs. The rationale behind that approach is that the highlight of a song is usually the most emotional part. This paper extends our previous work in the following two aspects. First, methodology-wise we experiment with a new architecture that does not need any recurrent layers, making the training process faster. Moreover, we compare a late-fusion variant and an early-fusion variant to study which one better exploits the attention mechanism. Second, we conduct and report an extensive set of experiments comparing the proposed attention-based methods against a heuristic energy-based method, a structural repetition-based method, and a few other simple feature-based methods for this task. Due to the lack of public-domain labeled data for highlight extraction, following our previous work we use the RWC POP 100-song data set to evaluate how the detected highlights overlap with any chorus sections of the songs. The experiments demonstrate the effectiveness of our methods over competing methods. For reproducibility, we open source the code and pre-trained model at https://github.com/remyhuang/pop-music-highlighter/.Comment: Transactions of the ISMIR vol. 1, no.

    Revisiting the problem of audio-based hit song prediction using convolutional neural networks

    Full text link
    Being able to predict whether a song can be a hit has impor- tant applications in the music industry. Although it is true that the popularity of a song can be greatly affected by exter- nal factors such as social and commercial influences, to which degree audio features computed from musical signals (whom we regard as internal factors) can predict song popularity is an interesting research question on its own. Motivated by the recent success of deep learning techniques, we attempt to ex- tend previous work on hit song prediction by jointly learning the audio features and prediction models using deep learning. Specifically, we experiment with a convolutional neural net- work model that takes the primitive mel-spectrogram as the input for feature learning, a more advanced JYnet model that uses an external song dataset for supervised pre-training and auto-tagging, and the combination of these two models. We also consider the inception model to characterize audio infor- mation in different scales. Our experiments suggest that deep structures are indeed more accurate than shallow structures in predicting the popularity of either Chinese or Western Pop songs in Taiwan. We also use the tags predicted by JYnet to gain insights into the result of different models.Comment: To appear in the proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP

    How to Backdoor Diffusion Models?

    Full text link
    Diffusion models are state-of-the-art deep learning empowered generative models that are trained based on the principle of learning forward and reverse diffusion processes via progressive noise-addition and denoising. To gain a better understanding of the limitations and potential risks, this paper presents the first study on the robustness of diffusion models against backdoor attacks. Specifically, we propose BadDiffusion, a novel attack framework that engineers compromised diffusion processes during model training for backdoor implantation. At the inference stage, the backdoored diffusion model will behave just like an untampered generator for regular data inputs, while falsely generating some targeted outcome designed by the bad actor upon receiving the implanted trigger signal. Such a critical risk can be dreadful for downstream tasks and applications built upon the problematic model. Our extensive experiments on various backdoor attack settings show that BadDiffusion can consistently lead to compromised diffusion models with high utility and target specificity. Even worse, BadDiffusion can be made cost-effective by simply finetuning a clean pre-trained diffusion model to implant backdoors. We also explore some possible countermeasures for risk mitigation. Our results call attention to potential risks and possible misuse of diffusion models

    Decay Constants of Pseudoscalar DD-mesons in Lattice QCD with Domain-Wall Fermion

    Get PDF
    We present the first study of the masses and decay constants of the pseudoscalar D D mesons in two flavors lattice QCD with domain-wall fermion. The gauge ensembles are generated on the 243×4824^3 \times 48 lattice with the extent Ns=16 N_s = 16 in the fifth dimension, and the plaquette gauge action at β=6.10 \beta = 6.10 , for three sea-quark masses with corresponding pion masses in the range 260−475260-475 MeV. We compute the point-to-point quark propagators, and measure the time-correlation functions of the pseudoscalar and vector mesons. The inverse lattice spacing is determined by the Wilson flow, while the strange and the charm quark masses by the masses of the vector mesons ϕ(1020) \phi(1020) and J/ψ(3097) J/\psi(3097) respectively. Using heavy meson chiral perturbation theory (HMChPT) to extrapolate to the physical pion mass, we obtain fD=202.3(2.2)(2.6) f_D = 202.3(2.2)(2.6) MeV and fDs=258.7(1.1)(2.9) f_{D_s} = 258.7(1.1)(2.9) MeV.Comment: 15 pages, 3 figures. v2: the statistics of ensemble (A) with m_sea = 0.005 has been increased, more details on the systematic error, to appear in Phys. Lett.
    • …
    corecore