Search CORE

11 research outputs found

A Fully Time-domain Neural Model for Subband-based Speech Synthesizer

Author: Kim Geonmin
Kim Tae-Ho
Lee Soo-Young
Rabiee Azam
Publication venue
Publication date: 01/07/2019
Field of study

This paper introduces a deep neural network model for subband-based speech synthesizer. The model benefits from the short bandwidth of the subband signals to reduce the complexity of the time-domain speech generator. We employed the multi-level wavelet analysis/synthesis to decompose/reconstruct the signal into subbands in time domain. Inspired from the WaveNet, a convolutional neural network (CNN) model predicts subband speech signals fully in time domain. Due to the short bandwidth of the subbands, a simple network architecture is enough to train the simple patterns of the subbands accurately. In the ground truth experiments with teacher-forcing, the subband synthesizer outperforms the fullband model significantly in terms of both subjective and objective measures. In addition, by conditioning the model on the phoneme sequence using a pronunciation dictionary, we have achieved the fully time-domain neural model for subband-based text-to-speech (TTS) synthesizer, which is nearly end-to-end. The generated speech of the subband TTS shows comparable quality as the fullband one with a slighter network architecture for each subband.Comment: 5 pages, 3 figur

arXiv.org e-Print Archive

Encoder-decoder multimodal speaker change detection

Author: Heo Hee-Soo
Jung Jee-weon
Kim Geonmin
Kim You Jin
Kwon Young-ki
Lee Bong-Jin
Lee Minjae
Seo Soonshin
Publication venue
Publication date: 01/06/2023
Field of study

The task of speaker change detection (SCD), which detects points where speakers change in an input, is essential for several applications. Several studies solved the SCD task using audio inputs only and have shown limited performance. Recently, multimodal SCD (MMSCD) models, which utilise text modality in addition to audio, have shown improved performance. In this study, the proposed model are built upon two main proposals, a novel mechanism for modality fusion and the adoption of a encoder-decoder architecture. Different to previous MMSCD works that extract speaker embeddings from extremely short audio segments, aligned to a single word, we use a speaker embedding extracted from 1.5s. A transformer decoder layer further improves the performance of an encoder-only MMSCD model. The proposed model achieves state-of-the-art results among studies that report SCD performance and is also on par with recent work that combines SCD with automatic speech recognition via human transcription.Comment: 5 pages, accepted for presentation at INTERSPEECH 202

arXiv.org e-Print Archive

Unpaired Speech Enhancement by Acoustic and Adversarial Supervision for Speech Recognition

Author: Bo-Kyeong Kim
Geonmin Kim
Hwaran Lee
Sang-Hoon Oh
Soo-Young Lee
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Deep CNNs Along the Time Axis With Intermap Pooling for Robustness to Spectral Variations

Author: Geonmin Kim
Ho-Gyeong Kim
Hwaran Lee
Sang-Hoon Oh
Soo-Young Lee
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Nutrition Composition and Single, 14-Day and 13-Week Repeated Oral Dose Toxicity Studies of the Leaves and Stems of Rubus coreanus Miquel

Author: Ae-Son Om
GeonMin Noh
HaengRan Kim
JeongSook Choe
Yu-Na Song
Publication venue: MDPI AG
Publication date: 01/01/2016
Field of study

The leaves and stems of the plant Rubus coreanus Miquel (RCMLS) are rich in vitamins, minerals and phytochemicals which have antioxidant, anti-hemolytic, anti-inflammatory, anti-fatigue and anti-cancer effects. However, RCMLS is not included in the Korean Food Standards Codex due to the lack of safety assurance concerning RCMLS. We evaluated single and repeated oral dose toxicity of RCMLS in Sprague-Dawley rats. RCMLS did not induce any significant toxicological changes in both male and female rats at a single doses of 2500 mg/kg/day. Repeated oral dose toxicity studies showed no adverse effects in clinical signs, body weight, food consumption, ophthalmic examination, urinalysis, hematology, serum biochemistry, necropsy findings, organ weight, and histopathology at doses of 625, 1250, and 2500 mg/kg/day. The LD50 and LOAEL of RCMLS might be over 2500 mg/kg body weight/day and no target organs were identified. Therefore, this study revealed that single and repeated oral doses of RCMLS are safe

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Nutrition Composition and Single, 14-Day and 13-Week Repeated Oral Dose Toxicity Studies of the Leaves and Stems of Rubus coreanus Miquel

Author: Ae-Son Om
Bae
GeonMin Noh
HaengRan Kim
Jeon
JeongSook Choe
Kim
Kim
Lee
Lee
Moon
Pang
Park
Yu-Na Song
Publication venue: 'MDPI AG'
Publication date
Field of study

Crossref

Discrete-dipole approximation for the optical properties with morphological changes of silver nanoprism and nanosphere via galvanic reaction

Author: Amendola
Cao
Geonmin Ro
González
Guo
Hyejin Lee
Ju Min Kim
Lee
Lemineur
Millstone
Shahjamali
Sung
Sung
Sutter
Wang
Younghun Kim
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref