Search CORE

36 research outputs found

DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding

Author: Choi Jeongsoo
Hong Joanna
Ro Yong Man
Publication venue
Publication date: 15/08/2023
Field of study

Recent research has demonstrated impressive results in video-to-speech synthesis which involves reconstructing speech solely from visual input. However, previous works have struggled to accurately synthesize speech due to a lack of sufficient guidance for the model to infer the correct content with the appropriate sound. To resolve the issue, they have adopted an extra speaker embedding as a speaking style guidance from a reference auditory information. Nevertheless, it is not always possible to obtain the audio information from the corresponding video input, especially during the inference time. In this paper, we present a novel vision-guided speaker embedding extractor using a self-supervised pre-trained model and prompt tuning technique. In doing so, the rich speaker embedding information can be produced solely from input visual information, and the extra audio information is not necessary during the inference time. Using the extracted vision-guided speaker embedding representations, we further develop a diffusion-based video-to-speech synthesis model, so called DiffV2S, conditioned on those speaker embeddings and the visual representation extracted from the input video. The proposed DiffV2S not only maintains phoneme details contained in the input video frames, but also creates a highly intelligible mel-spectrogram in which the speaker identities of the multiple speakers are all preserved. Our experimental results show that DiffV2S achieves the state-of-the-art performance compared to the previous video-to-speech synthesis technique.Comment: ICCV 202

arXiv.org e-Print Archive

SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory

Author: Choi Jeongsoo
Hong Joanna
Kim Minsu
Park Se Jin
Ro Yong Man
Publication venue
Publication date: 02/11/2022
Field of study

The challenge of talking face generation from speech lies in aligning two different modal information, audio and video, such that the mouth region corresponds to input audio. Previous methods either exploit audio-visual representation learning or leverage intermediate structural information such as landmarks and 3D models. However, they struggle to synthesize fine details of the lips varying at the phoneme level as they do not sufficiently provide visual information of the lips at the video synthesis step. To overcome this limitation, our work proposes Audio-Lip Memory that brings in visual information of the mouth region corresponding to input audio and enforces fine-grained audio-visual coherence. It stores lip motion features from sequential ground truth images in the value memory and aligns them with corresponding audio features so that they can be retrieved using audio input at inference time. Therefore, using the retrieved lip motion features as visual hints, it can easily correlate audio with visual dynamics in the synthesis step. By analyzing the memory, we demonstrate that unique lip features are stored in each memory slot at the phoneme level, capturing subtle lip motion based on memory addressing. In addition, we introduce visual-visual synchronization loss which can enhance lip-syncing performance when used along with audio-visual synchronization loss in our model. Extensive experiments are performed to verify that our method generates high-quality video with mouth shapes that best align with the input audio, outperforming previous state-of-the-art methods.Comment: Accepted at AAAI 2022 (Oral

arXiv.org e-Print Archive

水溶液プロセスで作製した高い透明性および導電性を持つ酸化亜鉛膜

Author: Hong JeongSoo
Hong Jeongsoo
Publication venue
Publication date: 03/09/2014
Field of study

Institutional Repositories DataBase (IRDB)

Influence of Oxygen on the Properties of Transparent Ga-Al-ZnO Thin Film

Author: Hong JeongSoo
Hong Jeongsoo
MATSUSHITA NOBUHIRO
松下伸広
Publication venue
Publication date: 22/11/2013
Field of study

Institutional Repositories DataBase (IRDB)

High-conductivity solution-processed ZnO films realized via UV irradiation and hydrogen treatment

Author: Hong JeongSoo
Hong Jeongsoo
Katsumata Ken-ichi
MATSUSHITA NOBUHIRO
勝又健一
松下伸広
Publication venue: 'Elsevier BV'
Publication date: 07/12/2015
Field of study

Institutional Repositories DataBase (IRDB)

Solution process to fabricate ZnO transparent conductive oxide film

Author: Hong JeongSoo
Hong Jeongsoo
Katsumata Ken-ichi
MATSUSHITA NOBUHIRO
Seino Yuuto
勝又健一
松下伸広
清野裕斗
Publication venue
Publication date: 20/11/2015
Field of study

Institutional Repositories DataBase (IRDB)

The Properties of Microwave-Assisted Synthesis of Metal–Organic Frameworks and Their Applications

Author: Jeongsoo Hong
Ngo Tran
Pham Thi Phan
Thi Hoa Le
Publication venue: 'MDPI AG'
Publication date: 01/01/2023
Field of study

Metal–organic frameworks (MOF) are a class of porous materials with various functions based on their host-guest chemistry. Their selectivity, diffusion kinetics, and catalytic activity are influenced by their design and synthetic procedure. The synthesis of different MOFs has been of considerable interest during the past decade thanks to their various applications in the arena of sensors, catalysts, adsorption, and electronic devices. Among the different techniques for the synthesis of MOFs, such as the solvothermal, sonochemical, ionothermal, and mechanochemical processes, microwave-assisted synthesis has clinched a significant place in MOF synthesis. The main assets of microwave-assisted synthesis are the short reaction time, the fast rate of nucleation, and the modified properties of MOFs. The review encompasses the development of the microwave-assisted synthesis of MOFs, their properties, and their applications in various fields

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

PubMed Central