Search CORE

1,066 research outputs found

Efficient Parallel Audio Generation using Group Masked Language Modeling

Author: Jeong Myeonghun
Kim Minchan
Kim Nam Soo
Lee Joun Yeop
Publication venue
Publication date: 02/01/2024
Field of study

We present a fast and high-quality codec language model for parallel audio generation. While SoundStorm, a state-of-the-art parallel audio generation model, accelerates inference speed compared to autoregressive models, it still suffers from slow inference due to iterative sampling. To resolve this problem, we propose Group-Masked Language Modeling~(G-MLM) and Group Iterative Parallel Decoding~(G-IPD) for efficient parallel audio generation. Both the training and sampling schemes enable the model to synthesize high-quality audio with a small number of iterations by effectively modeling the group-wise conditional dependencies. In addition, our model employs a cross-attention-based architecture to capture the speaker style of the prompt voice and improves computational efficiency. Experimental results demonstrate that our proposed model outperforms the baselines in prompt-based audio generation.Comment: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

arXiv.org e-Print Archive

SNAC: Speaker-normalized affine coupling layer in flow-based architecture for zero-shot multi-speaker text-to-speech

Author: Choi Byoung Jin
Jeong Myeonghun
Kim Nam Soo
Lee Joun Yeop
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/11/2022
Field of study

Zero-shot multi-speaker text-to-speech (ZSM-TTS) models aim to generate a speech sample with the voice characteristic of an unseen speaker. The main challenge of ZSM-TTS is to increase the overall speaker similarity for unseen speakers. One of the most successful speaker conditioning methods for flow-based multi-speaker text-to-speech (TTS) models is to utilize the functions which predict the scale and bias parameters of the affine coupling layers according to the given speaker embedding vector. In this letter, we improve on the previous speaker conditioning method by introducing a speaker-normalized affine coupling (SNAC) layer which allows for unseen speaker speech synthesis in a zero-shot manner leveraging a normalization-based conditioning technique. The newly designed coupling layer explicitly normalizes the input by the parameters predicted from a speaker embedding vector while training, enabling an inverse process of denormalizing for a new speaker embedding at inference. The proposed conditioning scheme yields the state-of-the-art performance in terms of the speech quality and speaker similarity in a ZSM-TTS setting.Comment: Accepted to IEEE Signal Processing Letter

arXiv.org e-Print Archive

Impact of sensory modality and tempo in motor timing

Author: Hyejin Seo
Jaeuk Jeong
Soo Mi Nam
Publication venue: Frontiers Media S.A.
Publication date: 01/08/2024
Field of study

BackgroundAccurate motor timing requires the coordinated control of actions in response to external stimuli. Over the past few years, several studies have investigated the effect of sensory input on motor timing; however, the evidence remains conflicting. The purpose of this study was to examine the impact of sensory modality and tempo on the accuracy of timed movements and explore strategies for enhancing motor timing.MethodsParticipants (n = 30) performed synchronization and adaptation circle drawing tasks in virtual reality. In Experiment 1, participants synchronized circle drawing with repeated stimuli based on sensory modalities (auditory, visual, tactile, audio-visual, audio-tactile, and visual-tactile) and tempos (20, 30, and 60 bpm). In Experiment 2, we examined timing adaptation in circle drawing tasks under conditions of unexpected tempo changes, whether increased or decreased.ResultsA significant interaction effect between modality and tempo was observed in the comparison of timing accuracy. Tactile stimuli exhibited significantly higher timing accuracy at 60 bpm, whereas auditory stimuli demonstrated a peak accuracy at 30 bpm. The analysis revealed a significantly larger timing error when adapting to changes in the tempo-down condition compared with the tempo-up condition.DiscussionThrough Experiment 1, we found that sensory modality impacts motor timing differently depending on the tempo, with tactile modality being effective at a faster tempo and auditory modality being beneficial at a moderate tempo. Additionally, Experiment 2 revealed that adapting to changes by correcting timing errors is more challenging with decreasing tempo than with increasing tempo. Our findings suggest that motor timing is intricately influenced by sensory modality and tempo variation. Therefore, to enhance the motor timing, a comprehensive understanding of these factors and their applications is imperative

Directory of Open Access Journals

High-Performance PVC Gel for Adaptive Micro-Lenses with Variable Focal Length.

Author: Bae Jin
Choi Dong-Soo
Jeong Jaeu
Kim Sang-Youn
Lee Jong
LIN Liwei
Nam Byeong
Shin Eun-Jae
Publication venue: eScholarship, University of California
Publication date: 01/05/2017
Field of study

This paper presents a bio-inspired adaptive micro-lens with electrically tunable focus made of non-ionic high-molecular-weight polyvinyl chloride (PVC) gel. The optical device mimics the design of the crystalline lens and ciliary muscle of the human eye. It consists of a plano-convex PVC gel micro-lens on Indium Tin Oxide (ITO) glass, confined with an annular electrode operating as an artificial ciliary muscle. Upon electrical activation, the electroactive adhesive force of the PVC gel is exerted on the annular anode electrode, which reduces the sagittal height of the plano-convex PVC gel lens, resulting in focal length variation of the micro-lens. The focal length increases from 3.8 mm to 22.3 mm as the applied field is varied from 200 V/mm to 800 V/mm, comparable to that of the human lens. The device combines excellent optical characteristics with structural simplicity, fast response speed, silent operation, and low power consumption. The results show the PVC gel micro-lens is expected to open up new perspectives on practical tunable optics

Crossref

eScholarship - University of California

Two years since the effectuation of the Korea-US FTA

Author: Jeong Min-kook
Ji Seong-tae
Lee Hyun-keun
Moon Han-pil
Nam Kyung-soo
Publication venue: [Seoul]
Publication date: 01/01/2014
Field of study

K-Developedia(KDI School) Repository

Feature Re-calibration based Multiple Instance Learning for Whole Slide Image Classification

Author: Chikontwe Philip
Go Heounjeong
Kim Meejeong
Nam Soo Jeong
Park Sang Hyun
Sung Hyun Jung
Publication venue
Publication date: 21/07/2022
Field of study

Whole slide image (WSI) classification is a fundamental task for the diagnosis and treatment of diseases; but, curation of accurate labels is time-consuming and limits the application of fully-supervised methods. To address this, multiple instance learning (MIL) is a popular method that poses classification as a weakly supervised learning task with slide-level labels only. While current MIL methods apply variants of the attention mechanism to re-weight instance features with stronger models, scant attention is paid to the properties of the data distribution. In this work, we propose to re-calibrate the distribution of a WSI bag (instances) by using the statistics of the max-instance (critical) feature. We assume that in binary MIL, positive bags have larger feature magnitudes than negatives, thus we can enforce the model to maximize the discrepancy between bags with a metric feature loss that models positive bags as out-of-distribution. To achieve this, unlike existing MIL methods that use single-batch training modes, we propose balanced-batch sampling to effectively use the feature loss i.e., (+/-) bags simultaneously. Further, we employ a position encoding module (PEM) to model spatial/morphological information, and perform pooling by multi-head self-attention (PSMA) with a Transformer encoder. Experimental results on existing benchmark datasets show our approach is effective and improves over state-of-the-art MIL methods.Comment: MICCAI 202

arXiv.org e-Print Archive

DGIST Library Institutional Repository

Clinical Efficacy of Primary Tumor Volume Measurements: Comparison of Different Primary Sites

Author: Baek Seung-Kuk
Chung Eun-Jae
Jung Kwang-Yoon
Kwon Soon-Young
Lee Nam-Joon
Woo Jeong-Soo
Publication venue: Korean Society of Otorhinolaryngology-Head and Neck Surgery
Publication date: 01/06/2009
Field of study

ObjectivesThe purpose of study was to determine the clinical efficacy of primary tumor volume measurements of different primary sites in the oropharynx compared to the oral cavity.MethodsA retrospective analysis of 85 patients with oral cavity or oropharynx cancer. The tumor area was manually outlined from axial magnetic resonance (MR) series. The software calculated the tumor volumes, automatically. The values of the primary tumor volumes were then subdivided into separate groups (≤3,500 mm3, >3,500 mm3).ResultsThe prognostic indicators were the cT and cN (oral cavity); age, primary site, cT, cN, and primary tumor volume (oropharynx) on the univariate analysis. There was no significant prognostic factor for oral cavity cancer on the multivariate analysis. Primary site, cN, and primary tumor volume were independent prognostic indicators for oropharynx cancer by multivariate analysis.ConclusionPrimary tumor volume measurement is a reliable way to stratify outcome, and make up for the weak points in the American Joint Committee on Cancer staging system with oropharynx cancer

Directory of Open Access Journals

PubMed Central