Search CORE

374 research outputs found

Forward Attention in Sequence-to-sequence Acoustic Modelling for Speech Synthesis

Author: Dai Li-Rong
Ling Zhen-Hua
Zhang Jing-Xuan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 17/07/2018
Field of study

This paper proposes a forward attention method for the sequenceto- sequence acoustic modeling of speech synthesis. This method is motivated by the nature of the monotonic alignment from phone sequences to acoustic sequences. Only the alignment paths that satisfy the monotonic condition are taken into consideration at each decoder timestep. The modified attention probabilities at each timestep are computed recursively using a forward algorithm. A transition agent for forward attention is further proposed, which helps the attention mechanism to make decisions whether to move forward or stay at each decoder timestep. Experimental results show that the proposed forward attention method achieves faster convergence speed and higher stability than the baseline attention method. Besides, the method of forward attention with transition agent can also help improve the naturalness of synthetic speech and control the speed of synthetic speech effectively.Comment: 5 pages, 3 figures, 2 tables. Published in IEEE International Conference on Acoustics, Speech and Signal Processing 2018 (ICASSP2018

arXiv.org e-Print Archive

Crossref

Whisper-to-speech conversion using restricted Boltzmann machine arrays

Author: Chen L.‐H.
Ian V. McLoughlin
Jing‐jie Li
Kawahara H.
Li‐Rong Dai
Sharifzadeh H.R.
Toda T.
Tran V.‐A.
Zhen‐hua Ling
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 20/11/2014
Field of study

Whispers are a natural vocal communication mechanism, in which vocal cords do not vibrate normally. Lack of glottal-induced pitch leads to low energy, and an inherent noise-like spectral distribution reduces intelligibility. Much research has been devoted to processing of whispers, including conversion of whispers to speech. Unfortunately, among several approaches, the best reconstructed speech to date still contains obviously artificial muffles and suffers from an unnatural prosody. To address these issues, the novel use of multiple restricted Boltzmann machines (RBMs) is reported as a statistical conversion model between whisper and speech spectral envelopes. Moreover, the accuracy of estimated pitch is improved using machine learning techniques for pitch estimation within only voiced (V) regions. Both objective and subjective evaluations show that this new method improves the quality of whisper-reconstructed speech compared with the state-of-the-art approaches

Crossref

Kent Academic Repository

Adsorption of phenylacetylene on Si(100)-2×1: Reaction mechanism and formation of a styrene-like π-conjugation system

Author: Dai Yu Jing
Huang Hai Gou
Li Zhen Hua
Qiao Ming Hua
Tao Franklin Feng
Xu Guo Qin
Yang Lei
Publication venue: 'American Physical Society (APS)'
Publication date: 20/11/2015
Field of study

This is the published version. Copyright 2003 American Physical SocietyThe interactions of phentylacetylene and phenylacetylene−α−d1 with Si(100)−2×1 have been studied as a model system to mechanistically understand the adsorption of conjugated π-electron aromatic substitutions on Si(100)−2×1. Vibrational signatures show that phenylacetylene covalently binds to the surface through a [2+2]-like cycloaddition pathway between the external C≡C and Si=Si dimer, forming styrene-like conjugation structure which was further supported by the chemical-shift of C 1s core level. These experimental results are consistent with the density-functional theory [B3LYP/6−311//+G(d)] calculations. The resulting styrene-like conjugation structures may possibly be employed as an intermediate for further organic syntheses and fabrication of molecular architecture for modification and functionalization of Si surfaces, or as a monomer for polymerization on Si surfaces

KU ScholarWorks

Numerical Simulation on the Gas Explosion Propagation Related to Roadway

Author: Aihua Yan
Baisheng Nie
Hua Yang
Linchao Dai
Qinqin Zhang
Tiezhu Hu
Xinna Liu
Zhen Liu
Publication venue: Published by Elsevier Ltd.
Publication date: 31/12/2011
Field of study

AbstractBased on the combustion, explosions and air dynamics and related theory etc, this paper describes the mathematical model of gas explosion in detail, combined with the gas explosion transmission mechanism, make a research on two wave-three area structure of gas explosion and the energy change rule of the array face of precursor wave and the array face of flame wave, with the fluid dynamics analysis Fluent software, this paper makes a numerical simulation and analysis on the overpressure transmission rule when gas explosion takes place in different types roadways. The results of the study show that: Fluent software can be used to accurately simulate gas explosion condition, when explosion wave spreads in the roadway turns, the bigger of the overpressure value in corner, the stronger of the destructive power; when tunnel has bifurcation, the overpressure will release in bifurcation, but explosions wave with flame wave will produce more powerful destruction effect. The research results can be used as a certain reference for how to prevent and control the gas explosion, and how to reduce the power of the gas explosion etc

Elsevier - Publisher Connector

Improving Sequence-to-Sequence Acoustic Modeling by Adding Text-Supervision

Author: Dai Li-Rong
Jiang Yuan
Liang Chen
Ling Zhen-Hua
Liu Li-Juan
Zhang Jing-Xuan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 20/11/2018
Field of study

This paper presents methods of making using of text supervision to improve the performance of sequence-to-sequence (seq2seq) voice conversion. Compared with conventional frame-to-frame voice conversion approaches, the seq2seq acoustic modeling method proposed in our previous work achieved higher naturalness and similarity. In this paper, we further improve its performance by utilizing the text transcriptions of parallel training data. First, a multi-task learning structure is designed which adds auxiliary classifiers to the middle layers of the seq2seq model and predicts linguistic labels as a secondary task. Second, a data-augmentation method is proposed which utilizes text alignment to produce extra parallel sequences for model training. Experiments are conducted to evaluate our proposed method with training sets at different sizes. Experimental results show that the multi-task learning with linguistic labels is effective at reducing the errors of seq2seq voice conversion. The data-augmentation method can further improve the performance of seq2seq voice conversion when only 50 or 100 training utterances are available.Comment: 5 pages, 4 figures, 2 tables. Submitted to IEEE ICASSP 201

arXiv.org e-Print Archive

Crossref

5′-Adenosine Monophosphate-Induced Hypothermia Attenuates Brain Ischemia/Reperfusion Injury in a Rat Model by Inhibiting the Inflammatory Response

Author: Hui Wu
Jiong Dai
Shao-Feng Yang
Xiao-Hua Zhang
Yi-Feng Miao
Yong-Ming Qiu
Zhen-Yi Tao
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2015
Field of study

Crossref

Formant-controlled HMM-based speech synthesis

Author: Dai Li-Rong
King Simon
Lei Ming
Ling Zhen-Hua
Richmond Korin
Yamagishi Junichi
Publication venue
Publication date: 01/08/2011
Field of study

Edinburgh Research Explorer