7,482 research outputs found
Pop Music Highlighter: Marking the Emotion Keypoints
The goal of music highlight extraction is to get a short consecutive segment
of a piece of music that provides an effective representation of the whole
piece. In a previous work, we introduced an attention-based convolutional
recurrent neural network that uses music emotion classification as a surrogate
task for music highlight extraction, for Pop songs. The rationale behind that
approach is that the highlight of a song is usually the most emotional part.
This paper extends our previous work in the following two aspects. First,
methodology-wise we experiment with a new architecture that does not need any
recurrent layers, making the training process faster. Moreover, we compare a
late-fusion variant and an early-fusion variant to study which one better
exploits the attention mechanism. Second, we conduct and report an extensive
set of experiments comparing the proposed attention-based methods against a
heuristic energy-based method, a structural repetition-based method, and a few
other simple feature-based methods for this task. Due to the lack of
public-domain labeled data for highlight extraction, following our previous
work we use the RWC POP 100-song data set to evaluate how the detected
highlights overlap with any chorus sections of the songs. The experiments
demonstrate the effectiveness of our methods over competing methods. For
reproducibility, we open source the code and pre-trained model at
https://github.com/remyhuang/pop-music-highlighter/.Comment: Transactions of the ISMIR vol. 1, no.
Revisiting the problem of audio-based hit song prediction using convolutional neural networks
Being able to predict whether a song can be a hit has impor- tant
applications in the music industry. Although it is true that the popularity of
a song can be greatly affected by exter- nal factors such as social and
commercial influences, to which degree audio features computed from musical
signals (whom we regard as internal factors) can predict song popularity is an
interesting research question on its own. Motivated by the recent success of
deep learning techniques, we attempt to ex- tend previous work on hit song
prediction by jointly learning the audio features and prediction models using
deep learning. Specifically, we experiment with a convolutional neural net-
work model that takes the primitive mel-spectrogram as the input for feature
learning, a more advanced JYnet model that uses an external song dataset for
supervised pre-training and auto-tagging, and the combination of these two
models. We also consider the inception model to characterize audio infor-
mation in different scales. Our experiments suggest that deep structures are
indeed more accurate than shallow structures in predicting the popularity of
either Chinese or Western Pop songs in Taiwan. We also use the tags predicted
by JYnet to gain insights into the result of different models.Comment: To appear in the proceedings of 2017 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP
How to Backdoor Diffusion Models?
Diffusion models are state-of-the-art deep learning empowered generative
models that are trained based on the principle of learning forward and reverse
diffusion processes via progressive noise-addition and denoising. To gain a
better understanding of the limitations and potential risks, this paper
presents the first study on the robustness of diffusion models against backdoor
attacks. Specifically, we propose BadDiffusion, a novel attack framework that
engineers compromised diffusion processes during model training for backdoor
implantation. At the inference stage, the backdoored diffusion model will
behave just like an untampered generator for regular data inputs, while falsely
generating some targeted outcome designed by the bad actor upon receiving the
implanted trigger signal. Such a critical risk can be dreadful for downstream
tasks and applications built upon the problematic model. Our extensive
experiments on various backdoor attack settings show that BadDiffusion can
consistently lead to compromised diffusion models with high utility and target
specificity. Even worse, BadDiffusion can be made cost-effective by simply
finetuning a clean pre-trained diffusion model to implant backdoors. We also
explore some possible countermeasures for risk mitigation. Our results call
attention to potential risks and possible misuse of diffusion models
Decay Constants of Pseudoscalar -mesons in Lattice QCD with Domain-Wall Fermion
We present the first study of the masses and decay constants of the
pseudoscalar mesons in two flavors lattice QCD with domain-wall fermion.
The gauge ensembles are generated on the lattice with the
extent in the fifth dimension, and the plaquette gauge action at , for three sea-quark masses with corresponding pion masses in
the range MeV. We compute the point-to-point quark propagators, and
measure the time-correlation functions of the pseudoscalar and vector mesons.
The inverse lattice spacing is determined by the Wilson flow, while the strange
and the charm quark masses by the masses of the vector mesons
and respectively. Using heavy meson chiral perturbation theory
(HMChPT) to extrapolate to the physical pion mass, we obtain MeV and MeV.Comment: 15 pages, 3 figures. v2: the statistics of ensemble (A) with m_sea =
0.005 has been increased, more details on the systematic error, to appear in
Phys. Lett.
- …