11 research outputs found

    A Fully Time-domain Neural Model for Subband-based Speech Synthesizer

    Full text link
    This paper introduces a deep neural network model for subband-based speech synthesizer. The model benefits from the short bandwidth of the subband signals to reduce the complexity of the time-domain speech generator. We employed the multi-level wavelet analysis/synthesis to decompose/reconstruct the signal into subbands in time domain. Inspired from the WaveNet, a convolutional neural network (CNN) model predicts subband speech signals fully in time domain. Due to the short bandwidth of the subbands, a simple network architecture is enough to train the simple patterns of the subbands accurately. In the ground truth experiments with teacher-forcing, the subband synthesizer outperforms the fullband model significantly in terms of both subjective and objective measures. In addition, by conditioning the model on the phoneme sequence using a pronunciation dictionary, we have achieved the fully time-domain neural model for subband-based text-to-speech (TTS) synthesizer, which is nearly end-to-end. The generated speech of the subband TTS shows comparable quality as the fullband one with a slighter network architecture for each subband.Comment: 5 pages, 3 figur

    Encoder-decoder multimodal speaker change detection

    Full text link
    The task of speaker change detection (SCD), which detects points where speakers change in an input, is essential for several applications. Several studies solved the SCD task using audio inputs only and have shown limited performance. Recently, multimodal SCD (MMSCD) models, which utilise text modality in addition to audio, have shown improved performance. In this study, the proposed model are built upon two main proposals, a novel mechanism for modality fusion and the adoption of a encoder-decoder architecture. Different to previous MMSCD works that extract speaker embeddings from extremely short audio segments, aligned to a single word, we use a speaker embedding extracted from 1.5s. A transformer decoder layer further improves the performance of an encoder-only MMSCD model. The proposed model achieves state-of-the-art results among studies that report SCD performance and is also on par with recent work that combines SCD with automatic speech recognition via human transcription.Comment: 5 pages, accepted for presentation at INTERSPEECH 202

    Unpaired Speech Enhancement by Acoustic and Adversarial Supervision for Speech Recognition

    No full text

    Deep CNNs Along the Time Axis With Intermap Pooling for Robustness to Spectral Variations

    No full text

    Nutrition Composition and Single, 14-Day and 13-Week Repeated Oral Dose Toxicity Studies of the Leaves and Stems of Rubus coreanus Miquel

    No full text
    The leaves and stems of the plant Rubus coreanus Miquel (RCMLS) are rich in vitamins, minerals and phytochemicals which have antioxidant, anti-hemolytic, anti-inflammatory, anti-fatigue and anti-cancer effects. However, RCMLS is not included in the Korean Food Standards Codex due to the lack of safety assurance concerning RCMLS. We evaluated single and repeated oral dose toxicity of RCMLS in Sprague-Dawley rats. RCMLS did not induce any significant toxicological changes in both male and female rats at a single doses of 2500 mg/kg/day. Repeated oral dose toxicity studies showed no adverse effects in clinical signs, body weight, food consumption, ophthalmic examination, urinalysis, hematology, serum biochemistry, necropsy findings, organ weight, and histopathology at doses of 625, 1250, and 2500 mg/kg/day. The LD50 and LOAEL of RCMLS might be over 2500 mg/kg body weight/day and no target organs were identified. Therefore, this study revealed that single and repeated oral doses of RCMLS are safe
    corecore