14 research outputs found
Seeing Through the Conversation: Audio-Visual Speech Separation based on Diffusion Model
The objective of this work is to extract target speaker's voice from a
mixture of voices using visual cues. Existing works on audio-visual speech
separation have demonstrated their performance with promising intelligibility,
but maintaining naturalness remains a challenge. To address this issue, we
propose AVDiffuSS, an audio-visual speech separation model based on a diffusion
mechanism known for its capability in generating natural samples. For an
effective fusion of the two modalities for diffusion, we also propose a
cross-attention-based feature fusion mechanism. This mechanism is specifically
tailored for the speech domain to integrate the phonetic information from
audio-visual correspondence in speech generation. In this way, the fusion
process maintains the high temporal resolution of the features, without
excessive computational requirements. We demonstrate that the proposed
framework achieves state-of-the-art results on two benchmarks, including
VoxCeleb2 and LRS3, producing speech with notably better naturalness.Comment: Project page with demo: https://mm.kaist.ac.kr/projects/avdiffuss
That's What I Said: Fully-Controllable Talking Face Generation
The goal of this paper is to synthesise talking faces with controllable
facial motions. To achieve this goal, we propose two key ideas. The first is to
establish a canonical space where every face has the same motion patterns but
different identities. The second is to navigate a multimodal motion space that
only represents motion-related features while eliminating identity information.
To disentangle identity and motion, we introduce an orthogonality constraint
between the two different latent spaces. From this, our method can generate
natural-looking talking faces with fully controllable facial attributes and
accurate lip synchronisation. Extensive experiments demonstrate that our method
achieves state-of-the-art results in terms of both visual quality and lip-sync
score. To the best of our knowledge, we are the first to develop a talking face
generation framework that can accurately manifest full target facial motions
including lip, head pose, and eye movements in the generated video without any
additional supervision beyond RGB video with audio
?????? ?????????????????? ?????? 2-??? ????????? ?????? ??????
????????? ?????? ????????? ???????????? ????????? ????????? ?????? ???????????? ????????? ????????? ???????????? ?????? ????????? ?????? ????????? ?????? ???????????? ???????????? ????????? ????????? ???????????? ????????? ????????? ????????? ???????????? ?????????. ??? ??????????????? ?????? ?????? ?????? ????????? ????????? ????????? ??????????????? ????????? ?????? 2-??? ????????? ?????? ????????? ?????? ??????????????? ???????????? ?????? ????????? ??????. ??? ??????????????? ??????????????? ????????? ???????????? ????????? ????????? ????????? ??? ???????????? ?????? ????????? ????????? ?????? ????????? ??? ??????. ??????????????? ????????? ????????? ????????? ????????? ?????? ???????????? ?????? ????????? ???????????? ?????? ????????????
Convective initiation detection using Himawari-8 Advanced Himawari Imager data and random forest
Estimating Ground-level Particulate Matter Concentrations Using Satellite-Derived Aerosol Oprical Depth
Estimation of ground-level nitrogen dioxide and ozone concentrations using satellite data and numerical model output
Surface Temperature in Twentieth Century at the Styx Glacier, Northern Victoria Land, Antarctica, From Borehole Thermometry
International audienc
Search prospects for axionlike particles at rare nuclear isotope accelerator facilities
We propose a novel experimental scheme, called DAMSA (Dump-produced Aboriginal Matter Searches at an Accelerator), for searching for dark-sector particles, using rare nuclear isotope accelerator facilities that provide high-flux proton beams to produce a large number of rare nuclear isotopes. The high-intensity nature of their beams enables the investigation of dark-sector particles, including axionlike particles (ALPs) and dark photons. By contrast, their typical beam energies are not large enough to produce the backgrounds such as neutrinos resulting from secondary charged particles. The detector of DAMSA is then placed immediate downstream of the proton beam dump to maximize the prompt decay signals of dark-sector particles, which are often challenging to probe in other beam-dump-type experiments featuring a longer baseline, at the expense of an enormous amount of the beam-related neutron background. We demonstrate that beam-related neutrons can be significantly suppressed if the signal accompanies multiple, correlated visible particles in the final state. We show that the close proximity of the detector to the ALP production dump makes it possible to probe a high-mass region of ALP parameter space that the existing experiments have never explored.Y