374 research outputs found

    Forward Attention in Sequence-to-sequence Acoustic Modelling for Speech Synthesis

    Full text link
    This paper proposes a forward attention method for the sequenceto- sequence acoustic modeling of speech synthesis. This method is motivated by the nature of the monotonic alignment from phone sequences to acoustic sequences. Only the alignment paths that satisfy the monotonic condition are taken into consideration at each decoder timestep. The modified attention probabilities at each timestep are computed recursively using a forward algorithm. A transition agent for forward attention is further proposed, which helps the attention mechanism to make decisions whether to move forward or stay at each decoder timestep. Experimental results show that the proposed forward attention method achieves faster convergence speed and higher stability than the baseline attention method. Besides, the method of forward attention with transition agent can also help improve the naturalness of synthetic speech and control the speed of synthetic speech effectively.Comment: 5 pages, 3 figures, 2 tables. Published in IEEE International Conference on Acoustics, Speech and Signal Processing 2018 (ICASSP2018

    Whisper-to-speech conversion using restricted Boltzmann machine arrays

    Get PDF
    Whispers are a natural vocal communication mechanism, in which vocal cords do not vibrate normally. Lack of glottal-induced pitch leads to low energy, and an inherent noise-like spectral distribution reduces intelligibility. Much research has been devoted to processing of whispers, including conversion of whispers to speech. Unfortunately, among several approaches, the best reconstructed speech to date still contains obviously artificial muffles and suffers from an unnatural prosody. To address these issues, the novel use of multiple restricted Boltzmann machines (RBMs) is reported as a statistical conversion model between whisper and speech spectral envelopes. Moreover, the accuracy of estimated pitch is improved using machine learning techniques for pitch estimation within only voiced (V) regions. Both objective and subjective evaluations show that this new method improves the quality of whisper-reconstructed speech compared with the state-of-the-art approaches

    Adsorption of phenylacetylene on Si(100)-2×1: Reaction mechanism and formation of a styrene-like π-conjugation system

    Get PDF
    This is the published version. Copyright 2003 American Physical SocietyThe interactions of phentylacetylene and phenylacetylene−α−d1 with Si(100)−2×1 have been studied as a model system to mechanistically understand the adsorption of conjugated π-electron aromatic substitutions on Si(100)−2×1. Vibrational signatures show that phenylacetylene covalently binds to the surface through a [2+2]-like cycloaddition pathway between the external C≡C and Si=Si dimer, forming styrene-like conjugation structure which was further supported by the chemical-shift of C 1s core level. These experimental results are consistent with the density-functional theory [B3LYP/6−311//+G(d)] calculations. The resulting styrene-like conjugation structures may possibly be employed as an intermediate for further organic syntheses and fabrication of molecular architecture for modification and functionalization of Si surfaces, or as a monomer for polymerization on Si surfaces

    Numerical Simulation on the Gas Explosion Propagation Related to Roadway

    Get PDF
    AbstractBased on the combustion, explosions and air dynamics and related theory etc, this paper describes the mathematical model of gas explosion in detail, combined with the gas explosion transmission mechanism, make a research on two wave-three area structure of gas explosion and the energy change rule of the array face of precursor wave and the array face of flame wave, with the fluid dynamics analysis Fluent software, this paper makes a numerical simulation and analysis on the overpressure transmission rule when gas explosion takes place in different types roadways. The results of the study show that: Fluent software can be used to accurately simulate gas explosion condition, when explosion wave spreads in the roadway turns, the bigger of the overpressure value in corner, the stronger of the destructive power; when tunnel has bifurcation, the overpressure will release in bifurcation, but explosions wave with flame wave will produce more powerful destruction effect. The research results can be used as a certain reference for how to prevent and control the gas explosion, and how to reduce the power of the gas explosion etc

    Improving Sequence-to-Sequence Acoustic Modeling by Adding Text-Supervision

    Full text link
    This paper presents methods of making using of text supervision to improve the performance of sequence-to-sequence (seq2seq) voice conversion. Compared with conventional frame-to-frame voice conversion approaches, the seq2seq acoustic modeling method proposed in our previous work achieved higher naturalness and similarity. In this paper, we further improve its performance by utilizing the text transcriptions of parallel training data. First, a multi-task learning structure is designed which adds auxiliary classifiers to the middle layers of the seq2seq model and predicts linguistic labels as a secondary task. Second, a data-augmentation method is proposed which utilizes text alignment to produce extra parallel sequences for model training. Experiments are conducted to evaluate our proposed method with training sets at different sizes. Experimental results show that the multi-task learning with linguistic labels is effective at reducing the errors of seq2seq voice conversion. The data-augmentation method can further improve the performance of seq2seq voice conversion when only 50 or 100 training utterances are available.Comment: 5 pages, 4 figures, 2 tables. Submitted to IEEE ICASSP 201
    • 

    corecore