Search CORE

4,914 research outputs found

Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling

Author: Hu Yifan
Li Haizhou
Liu Rui
Ren Yi
Yin Xiang
Publication venue
Publication date: 19/12/2023
Field of study

Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting. While recognising the significance of CSS task, the prior studies have not thoroughly investigated the emotional expressiveness problems due to the scarcity of emotional conversational datasets and the difficulty of stateful emotion modeling. In this paper, we propose a novel emotional CSS model, termed ECSS, that includes two main components: 1) to enhance emotion understanding, we introduce a heterogeneous graph-based emotional context modeling mechanism, which takes the multi-source dialogue history as input to model the dialogue context and learn the emotion cues from the context; 2) to achieve emotion rendering, we employ a contrastive learning-based emotion renderer module to infer the accurate emotion style for the target utterance. To address the issue of data scarcity, we meticulously create emotional labels in terms of category and intensity, and annotate additional emotional information on the existing conversational dataset (DailyTalk). Both objective and subjective evaluations suggest that our model outperforms the baseline models in understanding and rendering emotions. These evaluations also underscore the importance of comprehensive emotional annotations. Code and audio samples can be found at: https://github.com/walker-hyf/ECSS.Comment: 9 pages, 4 figures, Accepted by AAAI'2024, Code and audio samples: https://github.com/walker-hyf/ECS

arXiv.org e-Print Archive

Swap Action in a Solid-State Controllable Anisotropic Heisenberg Model

Author: Bandyopadhyay
Bennett
Bonadeo
Burkard
Burkard
Cirac
Cory
Ekert
Gershenfeld
Hu
Hu
Imamoglu
Kane
Kavokin
Loss
Monroe
Nielsen
Petta
Poyatos
Privman
Shiqun Zhu
Shnirman
Sleator
Steane
Tokura
Turchette
van der Wiel
Vidal
Xiang Hao
Yin
Publication venue: 'Elsevier BV'
Publication date: 05/03/2008
Field of study

Correct swap action can be realized via the control of the anisotropic Heisenberg interaction in solid-state quantum systems. The conditions of performing a swap are derived by the dynamics of arbitrary bipartite pure state. It is found that swap errors can be eliminated in the presence of symmetric anisotropy. In realistic quantum computers with unavoidable fluctuations, the gate fidelity of swap action is estimated. The scheme of quantum computation via the anisotropic Heisenberg interaction is implemented in a one dimensional quantum dots. The slanting and static magnetic field can be used to adjust the anisotropy.Comment: 10 pages, 1 figur

arXiv.org e-Print Archive

Crossref

PP-MeT: a Real-world Personalized Prompt based Meeting Transcription System

Author: Cao Yuhang
Hu Yanni
Lu Heng
Lyu Xiang
Wang Qing
Yang Yuguang
Yin Jingjing
Zou Pengpeng
Publication venue
Publication date: 28/09/2023
Field of study

Speaker-attributed automatic speech recognition (SA-ASR) improves the accuracy and applicability of multi-speaker ASR systems in real-world scenarios by assigning speaker labels to transcribed texts. However, SA-ASR poses unique challenges due to factors such as speaker overlap, speaker variability, background noise, and reverberation. In this study, we propose PP-MeT system, a real-world personalized prompt based meeting transcription system, which consists of a clustering system, target-speaker voice activity detection (TS-VAD), and TS-ASR. Specifically, we utilize target-speaker embedding as a prompt in TS-VAD and TS-ASR modules in our proposed system. In constrast with previous system, we fully leverage pre-trained models for system initialization, thereby bestowing our approach with heightened generalizability and precision. Experiments on M2MeT2.0 Challenge dataset show that our system achieves a cp-CER of 11.27% on the test set, ranking first in both fixed and open training conditions

arXiv.org e-Print Archive

LGBMDF: A cascade forest framework with LightGBM for predicting drug-target interactions

Author: Shouwei Zhao
Xiang Hu
Yu Peng
Zhiliang Zeng
Zhixiang Yin
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2023
Field of study

Prediction of drug-target interactions (DTIs) plays an important role in drug development. However, traditional laboratory methods to determine DTIs require a lot of time and capital costs. In recent years, many studies have shown that using machine learning methods to predict DTIs can speed up the drug development process and reduce capital costs. An excellent DTI prediction method should have both high prediction accuracy and low computational cost. In this study, we noticed that the previous research based on deep forests used XGBoost as the estimator in the cascade, we applied LightGBM instead of XGBoost to the cascade forest as the estimator, then the estimator group was determined experimentally as three LightGBMs and three ExtraTrees, this new model is called LGBMDF. We conducted 5-fold cross-validation on LGBMDF and other state-of-the-art methods using the same dataset, and compared their Sn, Sp, MCC, AUC and AUPR. Finally, we found that our method has better performance and faster calculation speed

Directory of Open Access Journals