40 research outputs found
Hierarchical Cross-Modal Talking Face Generationwith Dynamic Pixel-Wise Loss
We devise a cascade GAN approach to generate talking face video, which is
robust to different face shapes, view angles, facial characteristics, and noisy
audio conditions. Instead of learning a direct mapping from audio to video
frames, we propose first to transfer audio to high-level structure, i.e., the
facial landmarks, and then to generate video frames conditioned on the
landmarks. Compared to a direct audio-to-image approach, our cascade approach
avoids fitting spurious correlations between audiovisual signals that are
irrelevant to the speech content. We, humans, are sensitive to temporal
discontinuities and subtle artifacts in video. To avoid those pixel jittering
problems and to enforce the network to focus on audiovisual-correlated regions,
we propose a novel dynamically adjustable pixel-wise loss with an attention
mechanism. Furthermore, to generate a sharper image with well-synchronized
facial movements, we propose a novel regression-based discriminator structure,
which considers sequence-level information along with frame-level information.
Thoughtful experiments on several datasets and real-world samples demonstrate
significantly better results obtained by our method than the state-of-the-art
methods in both quantitative and qualitative comparisons
Beyond Voxel Prediction Uncertainty: Identifying brain lesions you can trust
Deep neural networks have become the gold-standard approach for the automated
segmentation of 3D medical images. Their full acceptance by clinicians remains
however hampered by the lack of intelligible uncertainty assessment of the
provided results. Most approaches to quantify their uncertainty, such as the
popular Monte Carlo dropout, restrict to some measure of uncertainty in
prediction at the voxel level. In addition not to be clearly related to genuine
medical uncertainty, this is not clinically satisfying as most objects of
interest (e.g. brain lesions) are made of groups of voxels whose overall
relevance may not simply reduce to the sum or mean of their individual
uncertainties. In this work, we propose to go beyond voxel-wise assessment
using an innovative Graph Neural Network approach, trained from the outputs of
a Monte Carlo dropout model. This network allows the fusion of three estimators
of voxel uncertainty: entropy, variance, and model's confidence; and can be
applied to any lesion, regardless of its shape or size. We demonstrate the
superiority of our approach for uncertainty estimate on a task of Multiple
Sclerosis lesions segmentation.Comment: Accepted for presentation at the Workshop on Interpretability of
Machine Intelligence in Medical Image Computing (iMIMIC) at MICCAI 202