391 research outputs found
Emotion-Conditioned Melody Harmonization with Hierarchical Variational Autoencoder
Existing melody harmonization models have made great progress in improving
the quality of generated harmonies, but most of them ignored the emotions
beneath the music. Meanwhile, the variability of harmonies generated by
previous methods is insufficient. To solve these problems, we propose a novel
LSTM-based Hierarchical Variational Auto-Encoder (LHVAE) to investigate the
influence of emotional conditions on melody harmonization, while improving the
quality of generated harmonies and capturing the abundant variability of chord
progressions. Specifically, LHVAE incorporates latent variables and emotional
conditions at different levels (piece- and bar-level) to model the global and
local music properties. Additionally, we introduce an attention-based melody
context vector at each step to better learn the correspondence between melodies
and harmonies. Experimental results of the objective evaluation show that our
proposed model outperforms other LSTM-based models. Through subjective
evaluation, we conclude that only altering the chords hardly changes the
overall emotion of the music. The qualitative analysis demonstrates the ability
of our model to generate variable harmonies.Comment: Accepted by IEEE SMC 202
MusER: Musical Element-Based Regularization for Generating Symbolic Music with Emotion
Generating music with emotion is an important task in automatic music
generation, in which emotion is evoked through a variety of musical elements
(such as pitch and duration) that change over time and collaborate with each
other. However, prior research on deep learning-based emotional music
generation has rarely explored the contribution of different musical elements
to emotions, let alone the deliberate manipulation of these elements to alter
the emotion of music, which is not conducive to fine-grained element-level
control over emotions. To address this gap, we present a novel approach
employing musical element-based regularization in the latent space to
disentangle distinct elements, investigate their roles in distinguishing
emotions, and further manipulate elements to alter musical emotions.
Specifically, we propose a novel VQ-VAE-based model named MusER. MusER
incorporates a regularization loss to enforce the correspondence between the
musical element sequences and the specific dimensions of latent variable
sequences, providing a new solution for disentangling discrete sequences.
Taking advantage of the disentangled latent vectors, a two-level decoding
strategy that includes multiple decoders attending to latent vectors with
different semantics is devised to better predict the elements. By visualizing
latent space, we conclude that MusER yields a disentangled and interpretable
latent space and gain insights into the contribution of distinct elements to
the emotional dimensions (i.e., arousal and valence). Experimental results
demonstrate that MusER outperforms the state-of-the-art models for generating
emotional music in both objective and subjective evaluation. Besides, we
rearrange music through element transfer and attempt to alter the emotion of
music by transferring emotion-distinguishable elements.Comment: Accepted by AAAI 202
Pseudo-Bag Mixup Augmentation for Multiple Instance Learning Based Whole Slide Image Classification
Given the special situation of modeling gigapixel images, multiple instance
learning (MIL) has become one of the most important frameworks for Whole Slide
Image (WSI) classification. In current practice, most MIL networks often face
two unavoidable problems in training: i) insufficient WSI data, and ii) the
data memorization nature inherent in neural networks. These problems may hinder
MIL models from adequate and efficient training, suppressing the continuous
performance promotion of classification models on WSIs. Inspired by the basic
idea of Mixup, this paper proposes a Pseudo-bag Mixup (PseMix) data
augmentation scheme to improve the training of MIL models. This scheme
generalizes the Mixup strategy for general images to special WSIs via
pseudo-bags so as to be applied in MIL-based WSI classification. Cooperated by
pseudo-bags, our PseMix fulfills the critical size alignment and semantic
alignment in Mixup strategy. Moreover, it is designed as an efficient and
decoupled method adaptive to MIL, neither involving time-consuming operations
nor relying on MIL model predictions. Comparative experiments and ablation
studies are specially designed to evaluate the effectiveness and advantages of
our PseMix. Test results show that PseMix could often improve the performance
of MIL networks in WSI classification. Besides, it could also boost the
generalization capacity of MIL models, and promote their robustness to patch
occlusion and noisy labels. Our source code is available at
https://github.com/liupei101/PseMix.Comment: 10 pages, 6 figures, 8 table
Rethinking the Evaluation for Conversational Recommendation in the Era of Large Language Models
The recent success of large language models (LLMs) has shown great potential
to develop more powerful conversational recommender systems (CRSs), which rely
on natural language conversations to satisfy user needs. In this paper, we
embark on an investigation into the utilization of ChatGPT for conversational
recommendation, revealing the inadequacy of the existing evaluation protocol.
It might over-emphasize the matching with the ground-truth items or utterances
generated by human annotators, while neglecting the interactive nature of being
a capable CRS. To overcome the limitation, we further propose an interactive
Evaluation approach based on LLMs named iEvaLM that harnesses LLM-based user
simulators. Our evaluation approach can simulate various interaction scenarios
between users and systems. Through the experiments on two publicly available
CRS datasets, we demonstrate notable improvements compared to the prevailing
evaluation protocol. Furthermore, we emphasize the evaluation of
explainability, and ChatGPT showcases persuasive explanation generation for its
recommendations. Our study contributes to a deeper comprehension of the
untapped potential of LLMs for CRSs and provides a more flexible and
easy-to-use evaluation framework for future research endeavors. The codes and
data are publicly available at https://github.com/RUCAIBox/iEvaLM-CRS.Comment: Accepted by EMNLP 202
Microbial-feeding interactions reveal the effects of feeding blood on the gut microbiota of the aquaculture leech (Hirudo nipponica)
Leeches (Hirudo nipponica), as a kind of aquatic animal, mainly feed on fresh blood. After feeding, they needed to digest for a long time because the intestinal digestive enzyme content is low, so their digestive needed the help of gut microbiota. Here, we examined intestinal microbiota in captive Hirudo nipponica of different periods after feeding blood with high-throughput sequencing. The results showed that gut microbial diversity was lower before feeding than after. At the level of the core phylum of the gut microbiota of Hirudo nipponica, the focus was on Proteobacteria, Bacteroidetes, and Firmicutes. After feeding blood, the relative abundance of Proteobacteria decreased, while the opposite was true for Bacteroidetes and Firmicutes. The core bacteria at the genus level are Aeromonas and Mucinivorans. The results show that the structure of the gut microbiota and function are closely associated with the blood feeding. The study aimed to lay a theoretical foundation for the blood-digestive mechanism of Hirudo nipponica
Improving Conversational Recommendation Systems via Counterfactual Data Simulation
Conversational recommender systems (CRSs) aim to provide recommendation
services via natural language conversations. Although a number of approaches
have been proposed for developing capable CRSs, they typically rely on
sufficient training data for training. Since it is difficult to annotate
recommendation-oriented dialogue datasets, existing CRS approaches often suffer
from the issue of insufficient training due to the scarcity of training data.
To address this issue, in this paper, we propose a CounterFactual data
simulation approach for CRS, named CFCRS, to alleviate the issue of data
scarcity in CRSs. Our approach is developed based on the framework of
counterfactual data augmentation, which gradually incorporates the rewriting to
the user preference from a real dialogue without interfering with the entire
conversation flow. To develop our approach, we characterize user preference and
organize the conversation flow by the entities involved in the dialogue, and
design a multi-stage recommendation dialogue simulator based on a conversation
flow language model. Under the guidance of the learned user preference and
dialogue schema, the flow language model can produce reasonable, coherent
conversation flows, which can be further realized into complete dialogues.
Based on the simulator, we perform the intervention at the representations of
the interacted entities of target users, and design an adversarial training
method with a curriculum schedule that can gradually optimize the data
augmentation strategy. Extensive experiments show that our approach can
consistently boost the performance of several competitive CRSs, and outperform
other data augmentation methods, especially when the training data is limited.
Our code is publicly available at https://github.com/RUCAIBox/CFCRS.Comment: Accepted by KDD 2023. Code: https://github.com/RUCAIBox/CFCR
- …