Search CORE

51 research outputs found

A Fuzzy-Based Multimedia Content Retrieval Method Using Mood Tags and Their Synonyms in Social Networks

Author: Kim Byeong Man
Lee Jong Yeol
Moon Chang Bae
Publication venue: 'Universidad Internacional de La Rioja'
Publication date: 16/12/2022
Field of study

The preferences of Web information purchasers are rapidly evolving. Cost-effectiveness is now becoming less regarded than cost-satisfaction, which emphasizes the purchaser’s psychological satisfaction. One method to improve a user’s cost-satisfaction in multimedia content retrieval is to utilize the mood inherent in multimedia items. An example of applications using this method is SNS (Social Network Services), which is based on folksonomy, but its applications encounter problems due to synonyms. In order to solve the problem of synonyms in our previous study, the mood of multimedia content is represented with arousal and valence (AV) in Thayer’s two-dimensional model as its internal tag. Although some problems of synonyms could now be solved, the retrieval performance of the previous study was less than that of a keyword-based method. In this paper, a new method that can solve the synonym problem is proposed, while simultaneously maintaining the same performance as the keyword-based approach. In the proposed method, a mood of multimedia content is represented with a fuzzy set of 12 moods of the Thayer model. For the analysis, the proposed method is compared with two methods, one based on AV value and the other based on keyword. The analysis results demonstrate that the proposed method is superior to the two methods

Re-UNIR

Joint unsupervised and supervised learning for context-aware language identification

Author: Choi Shukjae
Kim Byeong-Yeol
Kim Hyung Yong
Lim Yunkyu
Park Jihwan
Park Jinseok
Publication venue
Publication date: 29/03/2023
Field of study

Language identification (LID) recognizes the language of a spoken utterance automatically. According to recent studies, LID models trained with an automatic speech recognition (ASR) task perform better than those trained with a LID task only. However, we need additional text labels to train the model to recognize speech, and acquiring the text labels is a cost high. In order to overcome this problem, we propose context-aware language identification using a combination of unsupervised and supervised learning without any text labels. The proposed method learns the context of speech through masked language modeling (MLM) loss and simultaneously trains to determine the language of the utterance with supervised learning loss. The proposed joint learning was found to reduce the error rate by 15.6% compared to the same structure model trained by supervised-only learning on a subset of the VoxLingua107 dataset consisting of sub-three-second utterances in 11 languages.Comment: Accepted by ICASSP 202

arXiv.org e-Print Archive

Boosting Unknown-number Speaker Separation with Transformer Decoder-based Attractor

Author: Choi Shukjae
Kim Byeong-Yeol
Lee Younglo
Wang Zhong-Qiu
Watanabe Shinji
Publication venue
Publication date: 22/01/2024
Field of study

We propose a novel speech separation model designed to separate mixtures with an unknown number of speakers. The proposed model stacks 1) a dual-path processing block that can model spectro-temporal patterns, 2) a transformer decoder-based attractor (TDA) calculation module that can deal with an unknown number of speakers, and 3) triple-path processing blocks that can model inter-speaker relations. Given a fixed, small set of learned speaker queries and the mixture embedding produced by the dual-path blocks, TDA infers the relations of these queries and generates an attractor vector for each speaker. The estimated attractors are then combined with the mixture embedding by feature-wise linear modulation conditioning, creating a speaker dimension. The mixture embedding, conditioned with speaker information produced by TDA, is fed to the final triple-path blocks, which augment the dual-path blocks with an additional pathway dedicated to inter-speaker processing. The proposed approach outperforms the previous best reported in the literature, achieving 24.0 and 23.7 dB SI-SDR improvement (SI-SDRi) on WSJ0-2 and 3mix respectively, with a single model trained to separate 2- and 3-speaker mixtures. The proposed model also exhibits strong performance and generalizability at counting sources and separating mixtures with up to 5 speakers.Comment: 5 pages, 4 figures, accepted by ICASSP 202

arXiv.org e-Print Archive

Neural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated Full- and Sub-Band Modeling

Author: Choi Shukjae
Cornell Samuele
Kim Byeong-Yeol
Lee Younglo
Wang Zhong-Qiu
Watanabe Shinji
Publication venue
Publication date: 17/04/2023
Field of study

We propose FSB-LSTM, a novel long short-term memory (LSTM) based architecture that integrates full- and sub-band (FSB) modeling, for single- and multi-channel speech enhancement in the short-time Fourier transform (STFT) domain. The model maintains an information highway to flow an over-complete input representation through multiple FSB-LSTM modules. Each FSB-LSTM module consists of a full-band block to model spectro-temporal patterns at all frequencies and a sub-band block to model patterns within each sub-band, where each of the two blocks takes a down-sampled representation as input and returns an up-sampled discriminative representation to be added to the block input via a residual connection. The model is designed to have a low algorithmic complexity, a small run-time buffer and a very low algorithmic latency, at the same time producing a strong enhancement performance on a noisy-reverberant speech enhancement task even if the hop size is as low as

2

ms.Comment: in ICASSP 202

arXiv.org e-Print Archive

TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation

Author: Choi Shukjae
Cornell Samuele
Kim Byeong-Yeol
Lee Younglo
Wang Zhong-Qiu
Watanabe Shinji
Publication venue
Publication date: 22/11/2022
Field of study

We propose TF-GridNet for speech separation. The model is a novel multi-path deep neural network (DNN) integrating full- and sub-band modeling in the time-frequency (T-F) domain. It stacks several multi-path blocks, each consisting of an intra-frame full-band module, a sub-band temporal module, and a cross-frame self-attention module. It is trained to perform complex spectral mapping, where the real and imaginary (RI) components of input signals are stacked as features to predict target RI components. We first evaluate it on monaural anechoic speaker separation. Without using data augmentation and dynamic mixing, it obtains a state-of-the-art 23.5 dB improvement in scale-invariant signal-to-distortion ratio (SI-SDR) on WSJ0-2mix, a standard dataset for two-speaker separation. To show its robustness to noise and reverberation, we evaluate it on monaural reverberant speaker separation using the SMS-WSJ dataset and on noisy-reverberant speaker separation using WHAMR!, and obtain state-of-the-art performance on both datasets. We then extend TF-GridNet to multi-microphone conditions through multi-microphone complex spectral mapping, and integrate it into a two-DNN system with a beamformer in between (named as MISO-BF-MISO in earlier studies), where the beamformer proposed in this paper is a novel multi-frame Wiener filter computed based on the outputs of the first DNN. State-of-the-art performance is obtained on the multi-channel tasks of SMS-WSJ and WHAMR!. Besides speaker separation, we apply the proposed algorithms to speech dereverberation and noisy-reverberant speech enhancement. State-of-the-art performance is obtained on a dereverberation dataset and on the dataset of the recent L3DAS22 multi-channel speech enhancement challenge.Comment: In submission. A sound demo is available at https://zqwang7.github.io/demos/TF-GridNet-demo/index.htm

arXiv.org e-Print Archive

Effect of Wavelength and Intensity of Light on a-InGaZnO TFTs under Negative Bias Illumination Stress

Author: Cho Yong-Jung
Kim Byeong-Koo
Kim Ohyun
Kim Woo-Sic
Lee Yeol-Hyeong
Park Kyung Tae
Publication venue: 'The Electrochemical Society'
Publication date
Field of study

We investigated degradation mechanism of a-IGZO TFTs under NBIS with different wavelengths. and intensities IL of light. Negative gate bias was applied for 4000 s while drain and source were grounded, and illuminations with lambda = 450, 530, or 700 nm were applied. Illumination with photon energy exceeding similar to 2.3 eV (530 nm) induced noticeable change in threshold voltage shift Delta V-th, which can be interpreted in terms of ionization of oxygen vacancies V-O. In addition, I-L of blue illumination (450 nm) was varied from 6 to 200 lux and saturation in Delta V-th was observed after exceeding a certain I-L. We suggest that the saturation occurs because V-O-ionization rate is saturated by outward relaxation of metal atoms in the a-IGZO film. (C) The Author(s) 2016. Published by ECS.1174Ysciescopu

포항공과대학교

That's What I Said: Fully-Controllable Talking Face Generation

Author: Chung Joon Son
Jang Youngjoon
Kim Byeong-Yeol
Lee Hyeongkeun
Lim Youshin
Park Jihwan
Rho Kyeongha
Woo Jong-Bin
Publication venue
Publication date: 18/09/2023
Field of study

The goal of this paper is to synthesise talking faces with controllable facial motions. To achieve this goal, we propose two key ideas. The first is to establish a canonical space where every face has the same motion patterns but different identities. The second is to navigate a multimodal motion space that only represents motion-related features while eliminating identity information. To disentangle identity and motion, we introduce an orthogonality constraint between the two different latent spaces. From this, our method can generate natural-looking talking faces with fully controllable facial attributes and accurate lip synchronisation. Extensive experiments demonstrate that our method achieves state-of-the-art results in terms of both visual quality and lip-sync score. To the best of our knowledge, we are the first to develop a talking face generation framework that can accurately manifest full target facial motions including lip, head pose, and eye movements in the generated video without any additional supervision beyond RGB video with audio

arXiv.org e-Print Archive

Efficacy of PEEK Cages and Plate Augmentation in Three-Level Anterior Cervical Fusion of Elderly Patients

Author: Barsa
Bohler
Bohlman
Byeong Yeol Choi
Celik
Cho
Cho
Demircan
Emery
Emery
Farey
Gercek
Gyu Hyung Kim
Hilibrand
Hwang
Kandziora
Kim
Koller
Kulkarni
Kyung Jin Song
Lee
Liao
Malloy
Mastronardi
Meier
Natarajan
Papadopoulos
Robinson
Schmieder
Song
van Jonbergen
Wang
Xie
Publication venue: The Korean Orthopaedic Association
Publication date: 01/01/2011
Field of study

Crossref

PubMed Central

Epstein-Barr Virus-Positivity in Tumor has no Correlation with the Clinical Outcomes of Patients with Angioimmunoblastic T-cell Lymphoma

Author: Anagnostopoulos
Baek Yeol Ryoo
Bornkamm
Byeong-Bae Park
Cheolwon Suh
Cohen
Dong Bok Shin
Hasserjian
Herbst
Heslop
Higgins
Huang
Jae Hoon Lee
Jee-Hyun Kim
Jong Seok Lee
Jung Hun Kang
Kawano
Keun-Wook Lee
Khan
Lee
Macsween
Park
Reiser
Savage
Siegert
Smith
Soo-Mee Bang
Su
Sung
Tan
Thorley-Lawson
Weiss
Won Seok Kim
Yuna Lee
Zettl
Publication venue: The Korean Association of Internal Medicine
Publication date: 01/01/2008
Field of study

Crossref

PubMed Central

Inspection System for Vehicle Headlight Defects Based on Convolutional Neural Network

Author: Byeong-Man Kim
Chang-Bae Moon
Dong-Seong Kim
Jong-Yeol Lee
Publication venue: 'MDPI AG'
Publication date: 12/05/2021
Field of study

This paper proposes a method to detect the defects in the region of interest (ROI) based on a convolutional neural network (CNN) after alignment (position and rotation calibration) of a manufacturer’s headlights to determine whether the vehicle headlights are defective. The results were compared with an existing method for distinguishing defects among the previously proposed methods. One hundred original headlight images were acquired for each of the two vehicle types for the purpose of this experiment, and 20,000 high quality images and 20,000 defective images were obtained by applying the position and rotation transformation to the original images. It was found that the method proposed in this paper demonstrated a performance improvement of more than 0.1569 (15.69% on average) as compared to the existing method

Multidisciplinary Digital Publishing Institute