Search CORE

1,995 research outputs found

Intuitive Multilingual Audio-Visual Speech Recognition with a Single-Trained Model

Author: Hong Joanna
Park Se Jin
Ro Yong Man
Publication venue
Publication date: 23/10/2023
Field of study

We present a novel approach to multilingual audio-visual speech recognition tasks by introducing a single model on a multilingual dataset. Motivated by a human cognitive system where humans can intuitively distinguish different languages without any conscious effort or guidance, we propose a model that can capture which language is given as an input speech by distinguishing the inherent similarities and differences between languages. To do so, we design a prompt fine-tuning technique into the largely pre-trained audio-visual representation model so that the network can recognize the language class as well as the speech with the corresponding language. Our work contributes to developing robust and efficient multilingual audio-visual speech recognition systems, reducing the need for language-specific models.Comment: EMNLP 2023 Finding

arXiv.org e-Print Archive

Label-free quantitative phosphorylation analysis of human transgelin2 in Jurkat T cells reveals distinct phosphorylation patterns under PKA and PKC activation conditions

Author: Chang-Duk Jun
Se Hwan Jang
Zee-Yong Park
Publication venue: Springer Nature
Publication date: 01/01/2015
Field of study

Springer - Publisher Connector

Reprogramming Audio-driven Talking Face Synthesis into Text-driven

Author: Choi Jeongsoo
Kim Minsu
Park Se Jin
Ro Yong Man
Publication venue
Publication date: 28/06/2023
Field of study

In this paper, we propose a method to reprogram pre-trained audio-driven talking face synthesis models to be able to operate with text inputs. As the audio-driven talking face synthesis model takes speech audio as inputs, in order to generate a talking avatar with the desired speech content, speech recording needs to be performed in advance. However, this is burdensome to record audio for every video to be generated. In order to alleviate this problem, we propose a novel method that embeds input text into the learned audio latent space of the pre-trained audio-driven model. To this end, we design a Text-to-Audio Embedding Module (TAEM) which is guided to learn to map a given text input to the audio latent features. Moreover, to model the speaker characteristics lying in the audio features, we propose to inject visual speaker embedding into the TAEM, which is obtained from a single face image. After training, we can synthesize talking face videos with either text or speech audio

arXiv.org e-Print Archive

DF-3DFace: One-to-Many Speech Synchronized 3D Face Animation with Diffusion

Author: Hong Joanna
Kim Minsu
Park Se Jin
Ro Yong Man
Publication venue
Publication date: 23/08/2023
Field of study

Speech-driven 3D facial animation has gained significant attention for its ability to create realistic and expressive facial animations in 3D space based on speech. Learning-based methods have shown promising progress in achieving accurate facial motion synchronized with speech. However, one-to-many nature of speech-to-3D facial synthesis has not been fully explored: while the lip accurately synchronizes with the speech content, other facial attributes beyond speech-related motions are variable with respect to the speech. To account for the potential variance in the facial attributes within a single speech, we propose DF-3DFace, a diffusion-driven speech-to-3D face mesh synthesis. DF-3DFace captures the complex one-to-many relationships between speech and 3D face based on diffusion. It concurrently achieves aligned lip motion by exploiting audio-mesh synchronization and masked conditioning. Furthermore, the proposed method jointly models identity and pose in addition to facial motions so that it can generate 3D face animation without requiring a reference identity mesh and produce natural head poses. We contribute a new large-scale 3D facial mesh dataset, 3D-HDTF to enable the synthesis of variations in identities, poses, and facial motions of 3D face mesh. Extensive experiments demonstrate that our method successfully generates highly variable facial shapes and motions from speech and simultaneously achieves more realistic facial animation than the state-of-the-art methods

arXiv.org e-Print Archive

Accounting Conservatism, Changes In Real Investment, And Analysts Earnings Forecasts

Author: Choi Kyong Soo
Lee Se Joong
Park Soo Yeon
Yoo Yong Keun
Publication venue: 'Clute Institute'
Publication date: 04/03/2015
Field of study

This study examines whether sell-side analysts fully incorporate into their earnings forecasts the joint effects between accounting conservatism and changes in real investment on the quality of current earnings. Our results indicate that sell-side analysts do not fully incorporate such effects when they forecast future earnings so that they overestimate (underestimate) future earnings when current earnings are inflated (depressed) by those effects. Thus, we conclude that sell-side analysts do not recognize fully the joint effects between accounting conservatism and real activity on the earnings quality and that they need to mitigate their bias to enhance market efficiency by providing investors with a good benchmark for their earnings expectation

Clute Institute: Journals