131 research outputs found

    Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis

    Get PDF
    Neural source-filter (NSF) models are deep neural networks that produce waveforms given input acoustic features. They use dilated-convolution-based neural filter modules to filter sine-based excitation for waveform generation, which is different from WaveNet and flow-based models. One of the NSF models, called harmonic-plus-noise NSF (h-NSF) model, uses separate pairs of source and neural filters to generate harmonic and noise waveform components. It is close to WaveNet in terms of speech quality while being superior in generation speed. The h-NSF model can be improved even further. While h-NSF merges the harmonic and noise components using pre-defined digital low- and high-pass filters, it is well known that the maximum voice frequency (MVF) that separates the periodic and aperiodic spectral bands are time-variant. Therefore, we propose a new h-NSF model with time-variant and trainable MVF. We parameterize the digital low- and high-pass filters as windowed-sinc filters and predict their cut-off frequency (i.e., MVF) from the input acoustic features. Our experiments demonstrated that the new model can predict a good trajectory of the MVF and produce high-quality speech for a text-to-speech synthesis system.Comment: Accepted by Speech Synthesis Workshop 201

    Continued growth of locally aggressive fibrous dysplasia of 22 years duration after reaching adulthood: a case report

    Get PDF
    Fibrous dysplasia generally stops growing when patients reach adulthood. Locally aggressive fibrous dysplasia is an extremely rare subtype of fibrous dysplasia that is characterized by progressive enlargement after bone maturation, cortical bone destruction and soft tissue invasion but without malignant transformation. At 50 years of age, a tumor was found in the rib of a patient. The tumor gradually enlarged over time and imaging findings suggested a malignant tumor. The case was further complicated by restrictive lung disorder. Biopsies from multiple sites showed no malignant findings, and marginal resection with partial curettage was performed. The final diagnosis was locally aggressive fibrous dysplasia, and the restrictive lung disorder improved postoperatively. The natural history of the disease is also unknown. This is the first report in the literature to describe a case in which a lesion exhibited long-term growth over a period of 22 years after reaching adulthood.ArticleJournal of surgical case reports 2020(2) : rjz406(2020)journal articl

    Investigation of Enhanced Tacotron Text-to-speech Synthesis Systems with Self-attention for Pitch Accent Language

    Get PDF
    End-to-end speech synthesis is a promising approach that directly converts raw text to speech. Although it was shown that Tacotron2 outperforms classical pipeline systems with regards to naturalness in English, its applicability to other languages is still unknown. Japanese could be one of the most difficult languages for which to achieve end-to-end speech synthesis, largely due to its character diversity and pitch accents. Therefore, state-of-the-art systems are still based on a traditional pipeline framework that requires a separate text analyzer and duration model. Towards end-to-end Japanese speech synthesis, we extend Tacotron to systems with self-attention to capture long-term dependencies related to pitch accents and compare their audio quality with classical pipeline systems under various conditions to show their pros and cons. In a large-scale listening test, we investigated the impacts of the presence of accentual-type labels, the use of force or predicted alignments, and acoustic features used as local condition parameters of the Wavenet vocoder. Our results reveal that although the proposed systems still do not match the quality of a top-line pipeline system for Japanese, we show important stepping stones towards end-to-end Japanese speech synthesis.Comment: to be appeared at ICASSP 201

    Fabrication and characterization of amorphous polyethylene terephthalate optical waveguides

    Get PDF
    金沢大学理工研究域電子情報学系Amorphous polyethylene terephthalate (PET) optical waveguides are fabricated by the spin coating method and the optical properties are characterized. The refractive index measured by a spectroscopic ellipsometer is 1.5656, 1.5560, 1.5489, and 1.5477 at 633-, 830-, 1310-, and 1550-nm wavelength, respectively. Multimode optical waveguides with 12-μm thickness and 46-μm width are fabricated by mechanical grinding using a dicing saw to form the core ridge, and the propagation loss is measured by the cut-back method to be 0.30, 0.12, 0.35, and 0.70 dB/cm for 660-, 830-, 1310-, and 1550-nm wavelength, respectively. © 2011 IEEE
    corecore