Search CORE

131 research outputs found

Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis

Author: Wang Xin
Yamagishi Junichi
Yasuda Yusuke
Publication venue: 'International Speech Communication Association'
Publication date: 27/08/2019
Field of study

Neural source-filter (NSF) models are deep neural networks that produce waveforms given input acoustic features. They use dilated-convolution-based neural filter modules to filter sine-based excitation for waveform generation, which is different from WaveNet and flow-based models. One of the NSF models, called harmonic-plus-noise NSF (h-NSF) model, uses separate pairs of source and neural filters to generate harmonic and noise waveform components. It is close to WaveNet in terms of speech quality while being superior in generation speed. The h-NSF model can be improved even further. While h-NSF merges the harmonic and noise components using pre-defined digital low- and high-pass filters, it is well known that the maximum voice frequency (MVF) that separates the periodic and aperiodic spectral bands are time-variant. Therefore, we propose a new h-NSF model with time-variant and trainable MVF. We parameterize the digital low- and high-pass filters as windowed-sinc filters and predict their cut-off frequency (i.e., MVF) from the input acoustic features. Our experiments demonstrated that the new model can predict a good trajectory of the MVF and produce high-quality speech for a text-to-speech synthesis system.Comment: Accepted by Speech Synthesis Workshop 201

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Continued growth of locally aggressive fibrous dysplasia of 22 years duration after reaching adulthood: a case report

Author: Aoki Kaoru
Kito Munehisa
Okamoto Masanori
Takahashi Jun
Yamagishi Yusuke
Yoshimura Yasuo
Publication venue: 'Oxford University Press (OUP)'
Publication date: 11/02/2020
Field of study

Fibrous dysplasia generally stops growing when patients reach adulthood. Locally aggressive fibrous dysplasia is an extremely rare subtype of fibrous dysplasia that is characterized by progressive enlargement after bone maturation, cortical bone destruction and soft tissue invasion but without malignant transformation. At 50 years of age, a tumor was found in the rib of a patient. The tumor gradually enlarged over time and imaging findings suggested a malignant tumor. The case was further complicated by restrictive lung disorder. Biopsies from multiple sites showed no malignant findings, and marginal resection with partial curettage was performed. The final diagnosis was locally aggressive fibrous dysplasia, and the restrictive lung disorder improved postoperatively. The natural history of the disease is also unknown. This is the first report in the literature to describe a case in which a lesion exhibited long-term growth over a period of 22 years after reaching adulthood.ArticleJournal of surgical case reports 2020(2) : rjz406(2020)journal articl

Shinshu University Institutional Repository

Investigation of Enhanced Tacotron Text-to-speech Synthesis Systems with Self-attention for Pitch Accent Language

Author: Takaki Shinji
Wang Xin
Yamagishi Junichi
Yasuda Yusuke
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 14/02/2019
Field of study

End-to-end speech synthesis is a promising approach that directly converts raw text to speech. Although it was shown that Tacotron2 outperforms classical pipeline systems with regards to naturalness in English, its applicability to other languages is still unknown. Japanese could be one of the most difficult languages for which to achieve end-to-end speech synthesis, largely due to its character diversity and pitch accents. Therefore, state-of-the-art systems are still based on a traditional pipeline framework that requires a separate text analyzer and duration model. Towards end-to-end Japanese speech synthesis, we extend Tacotron to systems with self-attention to capture long-term dependencies related to pitch accents and compare their audio quality with classical pipeline systems under various conditions to show their pros and cons. In a large-scale listening test, we investigated the impacts of the presence of accentual-type labels, the use of force or predicted alignments, and acoustic features used as local condition parameters of the Wavenet vocoder. Our results reveal that although the proposed systems still do not match the quality of a top-line pipeline system for Japanese, we show important stepping stones towards end-to-end Japanese speech synthesis.Comment: to be appeared at ICASSP 201

arXiv.org e-Print Archive

Edinburgh Research Explorer

Drug-induced hypersensitivity syndrome by liposomal amphotericin-B: a case report

Author: Hideo Kato
Hiroshige Mikamo
Jun Hirai
Katsuhiko Matsuura
Mao Hagihara
Yuka Yamagishi
Yukihiro Hamada
Yusuke Koizumi
Publication venue: Springer Nature
Publication date: 01/01/2015
Field of study

Springer - Publisher Connector

Fabrication and characterization of amorphous polyethylene terephthalate optical waveguides

Author: Iiyama Koichi
Ishida Terumasa
Maruyama Takeo
Ono Yusuke
Yamagishi Tadaaki
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

金沢大学理工研究域電子情報学系Amorphous polyethylene terephthalate (PET) optical waveguides are fabricated by the spin coating method and the optical properties are characterized. The refractive index measured by a spectroscopic ellipsometer is 1.5656, 1.5560, 1.5489, and 1.5477 at 633-, 830-, 1310-, and 1550-nm wavelength, respectively. Multimode optical waveguides with 12-μm thickness and 46-μm width are fabricated by mechanical grinding using a dicing saw to form the core ridge, and the propagation loss is measured by the cut-back method to be 0.30, 0.12, 0.35, and 0.70 dB/cm for 660-, 830-, 1310-, and 1550-nm wavelength, respectively. © 2011 IEEE

Kanazawa University Repository for Academic Resources