Search CORE

11,446 research outputs found

Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition

Author: Chen Xiaokang
He Dongliang
Hu Tianshu
Liu Jingtuo
Tang Jiaxiang
Wang Jingdong
Wang Kaisiyuan
Zeng Gang
Zhou Hang
Publication venue
Publication date: 22/11/2022
Field of study

While dynamic Neural Radiance Fields (NeRF) have shown success in high-fidelity 3D modeling of talking portraits, the slow training and inference speed severely obstruct their potential usage. In this paper, we propose an efficient NeRF-based framework that enables real-time synthesizing of talking portraits and faster convergence by leveraging the recent success of grid-based NeRF. Our key insight is to decompose the inherently high-dimensional talking portrait representation into three low-dimensional feature grids. Specifically, a Decomposed Audio-spatial Encoding Module models the dynamic head with a 3D spatial grid and a 2D audio grid. The torso is handled with another 2D grid in a lightweight Pseudo-3D Deformable Module. Both modules focus on efficiency under the premise of good rendering quality. Extensive experiments demonstrate that our method can generate realistic and audio-lips synchronized talking portrait videos, while also being highly efficient compared to previous methods.Comment: Project page: https://me.kiui.moe/radnerf

arXiv.org e-Print Archive

Personalized Cinemagraphs using Semantic Understanding and Collaborative Learning

Author: Joo Kyungdon
Joshi Neel
Kang Sing Bing
Kweon In So
Oh Tae-Hyun
Wang Baoyuan
Publication venue
Publication date: 09/08/2017
Field of study

Cinemagraphs are a compelling way to convey dynamic aspects of a scene. In these media, dynamic and still elements are juxtaposed to create an artistic and narrative experience. Creating a high-quality, aesthetically pleasing cinemagraph requires isolating objects in a semantically meaningful way and then selecting good start times and looping periods for those objects to minimize visual artifacts (such a tearing). To achieve this, we present a new technique that uses object recognition and semantic segmentation as part of an optimization method to automatically create cinemagraphs from videos that are both visually appealing and semantically meaningful. Given a scene with multiple objects, there are many cinemagraphs one could create. Our method evaluates these multiple candidates and presents the best one, as determined by a model trained to predict human preferences in a collaborative way. We demonstrate the effectiveness of our approach with multiple results and a user study.Comment: To appear in ICCV 2017. Total 17 pages including the supplementary materia

arXiv.org e-Print Archive

포항공과대학교

End-to-End Photo-Sketch Generation via Fully Convolutional Representation Learning

Author: Ding Shengyong
Lin Liang
Wu Xian
Zhang Lei
Zhang Liliang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 11/04/2015
Field of study

Sketch-based face recognition is an interesting task in vision and multimedia research, yet it is quite challenging due to the great difference between face photos and sketches. In this paper, we propose a novel approach for photo-sketch generation, aiming to automatically transform face photos into detail-preserving personal sketches. Unlike the traditional models synthesizing sketches based on a dictionary of exemplars, we develop a fully convolutional network to learn the end-to-end photo-sketch mapping. Our approach takes whole face photos as inputs and directly generates the corresponding sketch images with efficient inference and learning, in which the architecture are stacked by only convolutional kernels of very small sizes. To well capture the person identity during the photo-sketch transformation, we define our optimization objective in the form of joint generative-discriminative minimization. In particular, a discriminative regularization term is incorporated into the photo-sketch generation, enhancing the discriminability of the generated person sketches against other individuals. Extensive experiments on several standard benchmarks suggest that our approach outperforms other state-of-the-art methods in both photo-sketch generation and face sketch verification.Comment: 8 pages, 6 figures. Proceeding in ACM International Conference on Multimedia Retrieval (ICMR), 201

arXiv.org e-Print Archive

Crossref

Bitplane image coding with parallel coefficient processing

Author: Auli-Llinas Francesc
Enfedaque Pablo
Moure Juan C.
Sanchez Silva Victor
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Image coding systems have been traditionally tailored for multiple instruction, multiple data (MIMD) computing. In general, they partition the (transformed) image in codeblocks that can be coded in the cores of MIMD-based processors. Each core executes a sequential flow of instructions to process the coefficients in the codeblock, independently and asynchronously from the others cores. Bitplane coding is a common strategy to code such data. Most of its mechanisms require sequential processing of the coefficients. The last years have seen the upraising of processing accelerators with enhanced computational performance and power efficiency whose architecture is mainly based on the single instruction, multiple data (SIMD) principle. SIMD computing refers to the execution of the same instruction to multiple data in a lockstep synchronous way. Unfortunately, current bitplane coding strategies cannot fully profit from such processors due to inherently sequential coding task. This paper presents bitplane image coding with parallel coefficient (BPC-PaCo) processing, a coding method that can process many coefficients within a codeblock in parallel and synchronously. To this end, the scanning order, the context formation, the probability model, and the arithmetic coder of the coding engine have been re-formulated. The experimental results suggest that the penalization in coding performance of BPC-PaCo with respect to the traditional strategies is almost negligible

CiteSeerX

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Warwick Research Archives Portal Repository

Diposit Digital de Documents de la UAB

The Many Moods of Emotion

Author: Jurie Frédéric
Kervadec Corentin
Pateux Stéphane
Vielzeuf Valentin
Publication venue
Publication date: 31/10/2018
Field of study

This paper presents a novel approach to the facial expression generation problem. Building upon the assumption of the psychological community that emotion is intrinsically continuous, we first design our own continuous emotion representation with a 3-dimensional latent space issued from a neural network trained on discrete emotion classification. The so-obtained representation can be used to annotate large in the wild datasets and later used to trained a Generative Adversarial Network. We first show that our model is able to map back to discrete emotion classes with a objectively and subjectively better quality of the images than usual discrete approaches. But also that we are able to pave the larger space of possible facial expressions, generating the many moods of emotion. Moreover, two axis in this space may be found to generate similar expression changes as in traditional continuous representations such as arousal-valence. Finally we show from visual interpretation, that the third remaining dimension is highly related to the well-known dominance dimension from psychology

arXiv.org e-Print Archive

HAL - Normandie Université

Entropy in Image Analysis II

Author
Publication venue: 'MDPI AG'
Publication date: 01/05/2021
Field of study

Image analysis is a fundamental task for any application where extracting information from images is required. The analysis requires highly sophisticated numerical and analytical methods, particularly for those applications in medicine, security, and other fields where the results of the processing consist of data of vital importance. This fact is evident from all the articles composing the Special Issue "Entropy in Image Analysis II", in which the authors used widely tested methods to verify their results. In the process of reading the present volume, the reader will appreciate the richness of their methods and applications, in particular for medical imaging and image security, and a remarkable cross-fertilization among the proposed research areas

Directory of Open Access Books (DOAB)

MCPNS: A Macropixel Collocated Position and Its Neighbors Search for Plenoptic 2.0 Video Coding

Author: Huu Thuc Nguyen
Jeon Byeungwoo
Van Duong Vinh
Yim Jonghoon
Publication venue
Publication date: 27/11/2023
Field of study

Recently, it was demonstrated that a newly focused plenoptic 2.0 camera can capture much higher spatial resolution owing to its effective light field sampling, as compared to a traditional unfocused plenoptic 1.0 camera. However, due to the nature difference of the optical structure between the plenoptic 1.0 and 2.0 cameras, the existing fast motion estimation (ME) method for plenoptic 1.0 videos is expected to be sub-optimal for encoding plenoptic 2.0 videos. In this paper, we point out the main motion characteristic differences between plenoptic 1.0 and 2.0 videos and then propose a new fast ME, called macropixel collocated position and its neighbors search (MCPNS) for plenoptic 2.0 videos. In detail, we propose to reduce the number of macropixel collocated position (MCP) search candidates based on the new observation of center-biased motion vector distribution at macropixel resolution. After that, due to large motion deviation behavior around each MCP location in plenoptic 2.0 videos, we propose to select a certain number of key MCP locations with the lowest matching cost to perform the neighbors MCP search to improve the motion search accuracy. Different from existing methods, our method can achieve better performance without requiring prior knowledge of microlens array orientations. Our simulation results confirmed the effectiveness of the proposed algorithm in terms of both bitrate savings and computational costs compared to existing methods.Comment: Under revie

arXiv.org e-Print Archive