11 research outputs found
A Fully Time-domain Neural Model for Subband-based Speech Synthesizer
This paper introduces a deep neural network model for subband-based speech
synthesizer. The model benefits from the short bandwidth of the subband signals
to reduce the complexity of the time-domain speech generator. We employed the
multi-level wavelet analysis/synthesis to decompose/reconstruct the signal into
subbands in time domain. Inspired from the WaveNet, a convolutional neural
network (CNN) model predicts subband speech signals fully in time domain. Due
to the short bandwidth of the subbands, a simple network architecture is enough
to train the simple patterns of the subbands accurately. In the ground truth
experiments with teacher-forcing, the subband synthesizer outperforms the
fullband model significantly in terms of both subjective and objective
measures. In addition, by conditioning the model on the phoneme sequence using
a pronunciation dictionary, we have achieved the fully time-domain neural model
for subband-based text-to-speech (TTS) synthesizer, which is nearly end-to-end.
The generated speech of the subband TTS shows comparable quality as the
fullband one with a slighter network architecture for each subband.Comment: 5 pages, 3 figur
Encoder-decoder multimodal speaker change detection
The task of speaker change detection (SCD), which detects points where
speakers change in an input, is essential for several applications. Several
studies solved the SCD task using audio inputs only and have shown limited
performance. Recently, multimodal SCD (MMSCD) models, which utilise text
modality in addition to audio, have shown improved performance. In this study,
the proposed model are built upon two main proposals, a novel mechanism for
modality fusion and the adoption of a encoder-decoder architecture. Different
to previous MMSCD works that extract speaker embeddings from extremely short
audio segments, aligned to a single word, we use a speaker embedding extracted
from 1.5s. A transformer decoder layer further improves the performance of an
encoder-only MMSCD model. The proposed model achieves state-of-the-art results
among studies that report SCD performance and is also on par with recent work
that combines SCD with automatic speech recognition via human transcription.Comment: 5 pages, accepted for presentation at INTERSPEECH 202
Unpaired Speech Enhancement by Acoustic and Adversarial Supervision for Speech Recognition
Deep CNNs Along the Time Axis With Intermap Pooling for Robustness to Spectral Variations
Nutrition Composition and Single, 14-Day and 13-Week Repeated Oral Dose Toxicity Studies of the Leaves and Stems of Rubus coreanus Miquel
The leaves and stems of the plant Rubus coreanus Miquel (RCMLS) are rich in vitamins, minerals and phytochemicals which have antioxidant, anti-hemolytic, anti-inflammatory, anti-fatigue and anti-cancer effects. However, RCMLS is not included in the Korean Food Standards Codex due to the lack of safety assurance concerning RCMLS. We evaluated single and repeated oral dose toxicity of RCMLS in Sprague-Dawley rats. RCMLS did not induce any significant toxicological changes in both male and female rats at a single doses of 2500 mg/kg/day. Repeated oral dose toxicity studies showed no adverse effects in clinical signs, body weight, food consumption, ophthalmic examination, urinalysis, hematology, serum biochemistry, necropsy findings, organ weight, and histopathology at doses of 625, 1250, and 2500 mg/kg/day. The LD50 and LOAEL of RCMLS might be over 2500 mg/kg body weight/day and no target organs were identified. Therefore, this study revealed that single and repeated oral doses of RCMLS are safe