4,914 research outputs found
Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling
Conversational Speech Synthesis (CSS) aims to accurately express an utterance
with the appropriate prosody and emotional inflection within a conversational
setting. While recognising the significance of CSS task, the prior studies have
not thoroughly investigated the emotional expressiveness problems due to the
scarcity of emotional conversational datasets and the difficulty of stateful
emotion modeling. In this paper, we propose a novel emotional CSS model, termed
ECSS, that includes two main components: 1) to enhance emotion understanding,
we introduce a heterogeneous graph-based emotional context modeling mechanism,
which takes the multi-source dialogue history as input to model the dialogue
context and learn the emotion cues from the context; 2) to achieve emotion
rendering, we employ a contrastive learning-based emotion renderer module to
infer the accurate emotion style for the target utterance. To address the issue
of data scarcity, we meticulously create emotional labels in terms of category
and intensity, and annotate additional emotional information on the existing
conversational dataset (DailyTalk). Both objective and subjective evaluations
suggest that our model outperforms the baseline models in understanding and
rendering emotions. These evaluations also underscore the importance of
comprehensive emotional annotations. Code and audio samples can be found at:
https://github.com/walker-hyf/ECSS.Comment: 9 pages, 4 figures, Accepted by AAAI'2024, Code and audio samples:
https://github.com/walker-hyf/ECS
Swap Action in a Solid-State Controllable Anisotropic Heisenberg Model
Correct swap action can be realized via the control of the anisotropic
Heisenberg interaction in solid-state quantum systems. The conditions of
performing a swap are derived by the dynamics of arbitrary bipartite pure
state. It is found that swap errors can be eliminated in the presence of
symmetric anisotropy. In realistic quantum computers with unavoidable
fluctuations, the gate fidelity of swap action is estimated. The scheme of
quantum computation via the anisotropic Heisenberg interaction is implemented
in a one dimensional quantum dots. The slanting and static magnetic field can
be used to adjust the anisotropy.Comment: 10 pages, 1 figur
PP-MeT: a Real-world Personalized Prompt based Meeting Transcription System
Speaker-attributed automatic speech recognition (SA-ASR) improves the
accuracy and applicability of multi-speaker ASR systems in real-world scenarios
by assigning speaker labels to transcribed texts. However, SA-ASR poses unique
challenges due to factors such as speaker overlap, speaker variability,
background noise, and reverberation. In this study, we propose PP-MeT system, a
real-world personalized prompt based meeting transcription system, which
consists of a clustering system, target-speaker voice activity detection
(TS-VAD), and TS-ASR. Specifically, we utilize target-speaker embedding as a
prompt in TS-VAD and TS-ASR modules in our proposed system. In constrast with
previous system, we fully leverage pre-trained models for system
initialization, thereby bestowing our approach with heightened generalizability
and precision. Experiments on M2MeT2.0 Challenge dataset show that our system
achieves a cp-CER of 11.27% on the test set, ranking first in both fixed and
open training conditions
LGBMDF: A cascade forest framework with LightGBM for predicting drug-target interactions
Prediction of drug-target interactions (DTIs) plays an important role in drug development. However, traditional laboratory methods to determine DTIs require a lot of time and capital costs. In recent years, many studies have shown that using machine learning methods to predict DTIs can speed up the drug development process and reduce capital costs. An excellent DTI prediction method should have both high prediction accuracy and low computational cost. In this study, we noticed that the previous research based on deep forests used XGBoost as the estimator in the cascade, we applied LightGBM instead of XGBoost to the cascade forest as the estimator, then the estimator group was determined experimentally as three LightGBMs and three ExtraTrees, this new model is called LGBMDF. We conducted 5-fold cross-validation on LGBMDF and other state-of-the-art methods using the same dataset, and compared their Sn, Sp, MCC, AUC and AUPR. Finally, we found that our method has better performance and faster calculation speed
- …