Search CORE

28 research outputs found

Neural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated Full- and Sub-Band Modeling

Author: Choi Shukjae
Cornell Samuele
Kim Byeong-Yeol
Lee Younglo
Wang Zhong-Qiu
Watanabe Shinji
Publication venue
Publication date: 17/04/2023
Field of study

We propose FSB-LSTM, a novel long short-term memory (LSTM) based architecture that integrates full- and sub-band (FSB) modeling, for single- and multi-channel speech enhancement in the short-time Fourier transform (STFT) domain. The model maintains an information highway to flow an over-complete input representation through multiple FSB-LSTM modules. Each FSB-LSTM module consists of a full-band block to model spectro-temporal patterns at all frequencies and a sub-band block to model patterns within each sub-band, where each of the two blocks takes a down-sampled representation as input and returns an up-sampled discriminative representation to be added to the block input via a residual connection. The model is designed to have a low algorithmic complexity, a small run-time buffer and a very low algorithmic latency, at the same time producing a strong enhancement performance on a noisy-reverberant speech enhancement task even if the hop size is as low as

2

ms.Comment: in ICASSP 202

arXiv.org e-Print Archive

TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation

Author: Choi Shukjae
Cornell Samuele
Kim Byeong-Yeol
Lee Younglo
Wang Zhong-Qiu
Watanabe Shinji
Publication venue
Publication date: 22/11/2022
Field of study

We propose TF-GridNet for speech separation. The model is a novel multi-path deep neural network (DNN) integrating full- and sub-band modeling in the time-frequency (T-F) domain. It stacks several multi-path blocks, each consisting of an intra-frame full-band module, a sub-band temporal module, and a cross-frame self-attention module. It is trained to perform complex spectral mapping, where the real and imaginary (RI) components of input signals are stacked as features to predict target RI components. We first evaluate it on monaural anechoic speaker separation. Without using data augmentation and dynamic mixing, it obtains a state-of-the-art 23.5 dB improvement in scale-invariant signal-to-distortion ratio (SI-SDR) on WSJ0-2mix, a standard dataset for two-speaker separation. To show its robustness to noise and reverberation, we evaluate it on monaural reverberant speaker separation using the SMS-WSJ dataset and on noisy-reverberant speaker separation using WHAMR!, and obtain state-of-the-art performance on both datasets. We then extend TF-GridNet to multi-microphone conditions through multi-microphone complex spectral mapping, and integrate it into a two-DNN system with a beamformer in between (named as MISO-BF-MISO in earlier studies), where the beamformer proposed in this paper is a novel multi-frame Wiener filter computed based on the outputs of the first DNN. State-of-the-art performance is obtained on the multi-channel tasks of SMS-WSJ and WHAMR!. Besides speaker separation, we apply the proposed algorithms to speech dereverberation and noisy-reverberant speech enhancement. State-of-the-art performance is obtained on a dereverberation dataset and on the dataset of the recent L3DAS22 multi-channel speech enhancement challenge.Comment: In submission. A sound demo is available at https://zqwang7.github.io/demos/TF-GridNet-demo/index.htm

arXiv.org e-Print Archive

Joint unsupervised and supervised learning for context-aware language identification

Author: Choi Shukjae
Kim Byeong-Yeol
Kim Hyung Yong
Lim Yunkyu
Park Jihwan
Park Jinseok
Publication venue
Publication date: 29/03/2023
Field of study

Language identification (LID) recognizes the language of a spoken utterance automatically. According to recent studies, LID models trained with an automatic speech recognition (ASR) task perform better than those trained with a LID task only. However, we need additional text labels to train the model to recognize speech, and acquiring the text labels is a cost high. In order to overcome this problem, we propose context-aware language identification using a combination of unsupervised and supervised learning without any text labels. The proposed method learns the context of speech through masked language modeling (MLM) loss and simultaneously trains to determine the language of the utterance with supervised learning loss. The proposed joint learning was found to reduce the error rate by 15.6% compared to the same structure model trained by supervised-only learning on a subset of the VoxLingua107 dataset consisting of sub-three-second utterances in 11 languages.Comment: Accepted by ICASSP 202

arXiv.org e-Print Archive

Efficacy of PEEK Cages and Plate Augmentation in Three-Level Anterior Cervical Fusion of Elderly Patients

Author: Barsa
Bohler
Bohlman
Byeong Yeol Choi
Celik
Cho
Cho
Demircan
Emery
Emery
Farey
Gercek
Gyu Hyung Kim
Hilibrand
Hwang
Kandziora
Kim
Koller
Kulkarni
Kyung Jin Song
Lee
Liao
Malloy
Mastronardi
Meier
Natarajan
Papadopoulos
Robinson
Schmieder
Song
van Jonbergen
Wang
Xie
Publication venue: The Korean Orthopaedic Association
Publication date: 01/01/2011
Field of study

Crossref

PubMed Central

Intramedullary Osteosclerosis Mimicking Lower Leg Radiating Pain

Author: Abdul-Karim
Balkissoon
Beals
Byeong-Yeol Choi
Byung-Wan Choi
Chanchairujira
Dion
Greenspan
Grey
Hurt
Skiadas
Ziran
Publication venue: 'The Korean Orthopaedic Association'
Publication date: 01/01/2014
Field of study

Crossref

Influence of Warm Isostatic Press Process on Mechanical Properties of a Part Fabricated by Metal Material Extrusion Process

Author: Byeong-Yeol Choi
Hyung-Giun Kim
Il-Hyuk Ahn
Seong-Je Park
Seung-Jun Han
Woo-Chun Choi
Yong Son
Publication venue: 'MDPI AG'
Publication date: 29/11/2022
Field of study

Material extrusion (ME) using a filament including metal powders has recently attracted considerable attention because it allows the production of metal parts at low cost. However, like other additive manufacturing processes, metal ME suffers from the problem of internal pores. In this study, warm isostatic pressure (WIP)—a post-process used to downsize or remove the pores in polymer ME—was employed in metal ME to improve the mechanical properties of the finished part. It was confirmed experimentally that the tensile strength and the strain at the ultimate tensile strength were increased by WIP. However, from hardness tests, two different results were obtained. On a microscopic scale, there was no change in hardness because the temperature of the WIP process was not high enough to change the microstructure, while on a macroscopic scale, the hardness changed owing to the collapse of the pores within the material under the indenter load. In specimens with relatively large pores, the hardness sensitivity increases with a larger indenter. Finally, factors affecting the WIP process parameters in metal ME were discussed

Multidisciplinary Digital Publishing Institute

Influence of Warm Isostatic Press Process on Mechanical Properties of a Part Fabricated by Metal Material Extrusion Process

Author: Byeong-Yeol Choi
Hyung-Giun Kim
Il-Hyuk Ahn
Seong-Je Park
Seung-Jun Han
Woo-Chun Choi
Yong Son
Publication venue: MDPI AG
Publication date: 01/11/2022
Field of study

Directory of Open Access Journals

Detection of a CO and NH3 gas mixture using carboxylic acid-functionalized single-walled carbon nanotubes

Author: Byeong-Kwon Ju
Byung Hyun Kang
Hyang Hee Choi
Jinnil Choi
Ki-Young Dong
Yang Doo Lee
Youn-Yeol Yu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Crossref

Springer - Publisher Connector

Deformation of Amorphous GeSe2 Film under Uniaxial Pressure Applied at Elevated Temperatures

Author: Byeong Kyou Jin
Jeong Han Yi
Jun Ho Lee
Sang Yeol Shin
Woo Hyung Lee
Yong Gyu Choi
Publication venue: 'Korean Ceramic Society'
Publication date
Field of study

Crossref