Search CORE

14 research outputs found

AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation

Author: Cheng Xize
He Jinzheng
Huang Rongjie
Li Linjun
Liu Huadai
Liu Jinglin
Ren Yi
Ye Zhenhui
Yin Xiang
Zhang Lichao
Zhao Zhou
Publication venue
Publication date: 24/05/2023
Field of study

Direct speech-to-speech translation (S2ST) aims to convert speech from one language into another, and has demonstrated significant progress to date. Despite the recent success, current S2ST models still suffer from distinct degradation in noisy environments and fail to translate visual speech (i.e., the movement of lips and teeth). In this work, we present AV-TranSpeech, the first audio-visual speech-to-speech (AV-S2ST) translation model without relying on intermediate text. AV-TranSpeech complements the audio stream with visual information to promote system robustness and opens up a host of practical applications: dictation or dubbing archival films. To mitigate the data scarcity with limited parallel AV-S2ST data, we 1) explore self-supervised pre-training with unlabeled audio-visual data to learn contextual representation, and 2) introduce cross-modal distillation with S2ST models trained on the audio-only corpus to further reduce the requirements of visual data. Experimental results on two language pairs demonstrate that AV-TranSpeech outperforms audio-only models under all settings regardless of the type of noise. With low-resource audio-visual data (10h, 30h), cross-modal distillation yields an improvement of 7.6 BLEU on average compared with baselines. Audio samples are available at https://AV-TranSpeech.github.ioComment: Accepted to ACL 202

arXiv.org e-Print Archive

Mega-TTS 2: Zero-Shot Text-to-Speech with Arbitrary Length Speech Prompts

Author: He Jinzheng
Jiang Ziyue
Liu Jinglin
Ma Zejun
Ren Yi
Wang Chunfeng
Wei Pengfei
Ye Zhenhui
Yin Xiang
Zhang Chen
Zhao Zhou
Publication venue
Publication date: 14/07/2023
Field of study

Zero-shot text-to-speech aims at synthesizing voices with unseen speech prompts. Previous large-scale multispeaker TTS models have successfully achieved this goal with an enrolled recording within 10 seconds. However, most of them are designed to utilize only short speech prompts. The limited information in short speech prompts significantly hinders the performance of fine-grained identity imitation. In this paper, we introduce Mega-TTS 2, a generic zero-shot multispeaker TTS model that is capable of synthesizing speech for unseen speakers with arbitrary-length prompts. Specifically, we 1) design a multi-reference timbre encoder to extract timbre information from multiple reference speeches; 2) and train a prosody language model with arbitrary-length speech prompts; With these designs, our model is suitable for prompts of different lengths, which extends the upper bound of speech quality for zero-shot text-to-speech. Besides arbitrary-length prompts, we introduce arbitrary-source prompts, which leverages the probabilities derived from multiple P-LLM outputs to produce expressive and controlled prosody. Furthermore, we propose a phoneme-level auto-regressive duration model to introduce in-context learning capabilities to duration modeling. Experiments demonstrate that our method could not only synthesize identity-preserving speech with a short prompt of an unseen speaker but also achieve improved performance with longer speech prompts. Audio samples can be found in https://mega-tts.github.io/mega2_demo/

arXiv.org e-Print Archive

Transcriptome analysis revealed the dynamic oil accumulation in Symplocos paniculata fruit

Author: A Conesa
A Kilaru
A Mortazavi
AHC Huang
AHC Huang
BL Fatland
C Andre
C Jako
Changzhu Li
D Yin
DB Sloan
DS Knutzon
E Novaes
EH Xia
FH Huang
G Sabbagh
GB Liu
Genhua Niu
GI Frandsen
H Chen
I Dugail
J Niu
J Shockey
J Solis
J Wu
J Zhang
J Zou
Jinzheng Chen
JJ Thelen
JP Zhang
JTC Tzen
K Hsieh
K Lardizabal
K Roesler
LB Wang
Lijuan Jiang
LSH Wu
M He
M Kanehisa
M Trick
M Zhang
MG Grabherr
MW Lassner
P Guimarães
P Natarajan
P Zheng
PA Stoutjesdijk
PD Bates
Peiwang Li
PK Wall
Q Liu
Q Liu
Q Liu
Q Liu
Qiang Liu
R Hou
R Jain
RL Tatusov
S Anders
S Maisonneuve
SK Gidda
T Beulé
T Shimakata
TD Schmittgen
TG Dunahay
V Mhaske
W Wei
W Yang
X Chen
X Tao
XR Wang
Y Madoka
Y Moriya
Y Tasaka
Y Yang
YL Liu
Youping Sun
YQ Yuan
Z Guan
ZX Guan
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation

Author: He Jinzheng
Huang Rongjie
Liu Huadai
Liu Jinglin
Ren Yi
Zhang Lichao
Zhao Zhou
Publication venue
Publication date: 25/05/2022
Field of study

Direct speech-to-speech translation (S2ST) systems leverage recent progress in speech representation learning, where a sequence of discrete representations (units) derived in a self-supervised manner, are predicted from the model and passed to a vocoder for speech synthesis, still facing the following challenges: 1) Acoustic multimodality: the discrete units derived from speech with same content could be indeterministic due to the acoustic property (e.g., rhythm, pitch, and energy), which causes deterioration of translation accuracy; 2) high latency: current S2ST systems utilize autoregressive models which predict each unit conditioned on the sequence previously generated, failing to take full advantage of parallelism. In this work, we propose TranSpeech, a speech-to-speech translation model with bilateral perturbation. To alleviate the acoustic multimodal problem, we propose bilateral perturbation, which consists of the style normalization and information enhancement stages, to learn only the linguistic information from speech samples and generate more deterministic representations. With reduced multimodality, we step forward and become the first to establish a non-autoregressive S2ST technique, which repeatedly masks and predicts unit choices and produces high-accuracy results in just a few cycles. Experimental results on three language pairs demonstrate the state-of-the-art results by up to 2.5 BLEU points over the best publicly-available textless S2ST baseline. Moreover, TranSpeech shows a significant improvement in inference latency, enabling speedup up to 21.4x than autoregressive technique. Audio samples are available at \url{https://TranSpeech.github.io/

arXiv.org e-Print Archive

Generative Zero-Shot Prompt Learning for Cross-Domain Slot Filling with Inverse Prompting

Author: Dong Guanting
He Keqing
Lei Hao
Li Xuefeng
Liu Jiachi
Wang Liwen
Xu Weiran
Zhao Jinzheng
Publication venue
Publication date: 06/07/2023
Field of study

Zero-shot cross-domain slot filling aims to transfer knowledge from the labeled source domain to the unlabeled target domain. Existing models either encode slot descriptions and examples or design handcrafted question templates using heuristic rules, suffering from poor generalization capability or robustness. In this paper, we propose a generative zero-shot prompt learning framework for cross-domain slot filling, both improving generalization and robustness than previous work. Besides, we introduce a novel inverse prompting strategy to distinguish different slot types to avoid the multiple prediction problem, and an efficient prompt-tuning strategy to boost higher performance by only training fewer prompt parameters. Experiments and analysis demonstrate the effectiveness of our proposed framework, especially huge improvements (+13.44% F1) on the unseen slots.Comment: Accepted by the Findings of ACL202

arXiv.org e-Print Archive

Nitric oxide induces cotyledon senescence involving co-operation of the NES1/MAD1 and EIN2-associated ORE1 signalling pathways in Arabidopsis

Author: Dongdong Kong
Fang Bao
Jinchan Xia
Jing Du
Jinzheng Wang
Lei Wang
Manli Li
Qiang Lv
Qingqiu Gong
Yikun He
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study

Crossref

Tetrahydrofolate Modulates Floral Transition through Epigenetic Silencing.

Author: Bao Fang
Cheng Qian
Du Jing
Guo Shouchun
Han Lida
Han Tingting
He Xin-Jian
He Yi-Kun
Hu Yong
Kong Dongdong
Lian Tong
Lv Qiang
Meng Shulin
Niu Guoqi
Pan Xiaojun
Wang Haiyang
Wang Jinzheng
Wang Lei
Wu Zili
Xia Jinchan
Yuan Dong
Zhang Chunyi
Zhao Xuanchao
Publication venue: 'American Society of Plant Biologists (ASPB)'
Publication date: 27/04/2017
Field of study

Folates, termed from tetrahydrofolate (THF) and its derivatives, function as coenzymes in one-carbon transfer reactions and play a central role in synthesis of nucleotides and amino acids. Dysfunction of cellular folate metabolism leads to serious defects in plant development; however, the molecular mechanisms of folate-mediated cellular modifications and physiological responses in plants are still largely unclear. Here, we reported that THF controls flowering time by adjusting DNA methylation-regulated gene expression in Arabidopsis (Arabidopsis thaliana). Wild-type seedlings supplied with THF as well as the high endogenous THF content mutant dihydrofolate synthetase folypoly-Glu synthetase homolog B exhibited significant up-regulation of the flowering repressor of Flowering Wageningen and thereby delaying floral transition in a dose-dependent manner. Genome-wide transcripts and DNA methylation profiling revealed that THF reduces DNA methylation so as to manipulate gene expression activity. Moreover, in accompaniment with elevated cellular ratios between monoglutamylated and polyglutamylated folates under increased THF levels, the content of S-adenosylhomo-Cys, a competitive inhibitor of methyltransferases, was obviously higher, indicating that enhanced THF accumulation may disturb cellular homeostasis of the concerted reactions between folate polyglutamylation and folate-dependent DNA methylation. In addition, we found that the loss-of-function mutant of CG DNA methyltransferase MET1 displayed much less responsiveness to THF-associated flowering time alteration. Taken together, our studies revealed a novel regulatory role of THF on epigenetic silencing, which will shed lights on the understanding of interrelations in folate homeostasis, epigenetic variation, and flowering control in plants

Crossref

Open Repository and Bibliography - Liège

Research on counter-rotating electrochemical machining of convex structures with different heights

Author: A Gok
Bin He
C Williams James
Dengyong Wang
Di Zhu
DY Wang
DY Wang
F Klocke
F Klocke
H Hocheng
J Pattavanitch
Jinzheng Li
K Kolluru
KP Rajurkar
M Purcar
W Sheng
YY Gao
Zengwei Zhu
ZZ Gu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Informal Lenders and Rural Finance in China: A Report from the Field

Author: Agabin Meliza H.
Berger Allen N.
Chandavarkar
Fei John C.H.
HE Guangwen
HE Guangwen
HE Liping
Hiroki Takeuchi
Larson Donald W.
Li Changping
Li Jian’Ge
LI Jinzheng
Li Zhou
Lin Yifu
MA Jiujie
Montgomery Heather
Peng Kaixiang
Ramirez Alvaro
Ray Debraj
Stiglitz Joseph E.
Tsai
Wen Tiejun
Wen Tiejun
Xie Ping
YU Yong
Zhang Xiuying
Zhou LI
Zhou LI
Zhou LI
Publication venue: 'SAGE Publications'
Publication date
Field of study

Crossref