504 research outputs found

    Hierarchical Cross-Modal Talking Face Generationwith Dynamic Pixel-Wise Loss

    Full text link
    We devise a cascade GAN approach to generate talking face video, which is robust to different face shapes, view angles, facial characteristics, and noisy audio conditions. Instead of learning a direct mapping from audio to video frames, we propose first to transfer audio to high-level structure, i.e., the facial landmarks, and then to generate video frames conditioned on the landmarks. Compared to a direct audio-to-image approach, our cascade approach avoids fitting spurious correlations between audiovisual signals that are irrelevant to the speech content. We, humans, are sensitive to temporal discontinuities and subtle artifacts in video. To avoid those pixel jittering problems and to enforce the network to focus on audiovisual-correlated regions, we propose a novel dynamically adjustable pixel-wise loss with an attention mechanism. Furthermore, to generate a sharper image with well-synchronized facial movements, we propose a novel regression-based discriminator structure, which considers sequence-level information along with frame-level information. Thoughtful experiments on several datasets and real-world samples demonstrate significantly better results obtained by our method than the state-of-the-art methods in both quantitative and qualitative comparisons

    Normalized Feature Distillation for Semantic Segmentation

    Full text link
    As a promising approach in model compression, knowledge distillation improves the performance of a compact model by transferring the knowledge from a cumbersome one. The kind of knowledge used to guide the training of the student is important. Previous distillation methods in semantic segmentation strive to extract various forms of knowledge from the features, which involve elaborate manual design relying on prior information and have limited performance gains. In this paper, we propose a simple yet effective feature distillation method called normalized feature distillation (NFD), aiming to enable effective distillation with the original features without the need to manually design new forms of knowledge. The key idea is to prevent the student from focusing on imitating the magnitude of the teacher's feature response by normalization. Our method achieves state-of-the-art distillation results for semantic segmentation on Cityscapes, VOC 2012, and ADE20K datasets. Code will be available
    • …
    corecore