938 research outputs found
HeadOn: Real-time Reenactment of Human Portrait Videos
We propose HeadOn, the first real-time source-to-target reenactment approach
for complete human portrait videos that enables transfer of torso and head
motion, face expression, and eye gaze. Given a short RGB-D video of the target
actor, we automatically construct a personalized geometry proxy that embeds a
parametric head, eye, and kinematic torso model. A novel real-time reenactment
algorithm employs this proxy to photo-realistically map the captured motion
from the source actor to the target actor. On top of the coarse geometric
proxy, we propose a video-based rendering technique that composites the
modified target portrait video via view- and pose-dependent texturing, and
creates photo-realistic imagery of the target actor under novel torso and head
poses, facial expressions, and gaze directions. To this end, we propose a
robust tracking of the face and torso of the source actor. We extensively
evaluate our approach and show significant improvements in enabling much
greater flexibility in creating realistic reenacted output videos.Comment: Video: https://www.youtube.com/watch?v=7Dg49wv2c_g Presented at
Siggraph'1
{PIE}: {P}ortrait Image Embedding for Semantic Control
Editing of portrait images is a very popular and important research topic with a large variety of applications. For ease of use, control should be provided via a semantically meaningful parameterization that is akin to computer animation controls. The vast majority of existing techniques do not provide such intuitive and fine-grained control, or only enable coarse editing of a single isolated control parameter. Very recently, high-quality semantically controlled editing has been demonstrated, however only on synthetically created StyleGAN images. We present the first approach for embedding real portrait images in the latent space of StyleGAN, which allows for intuitive editing of the head pose, facial expression, and scene illumination in the image. Semantic editing in parameter space is achieved based on StyleRig, a pretrained neural network that maps the control space of a 3D morphable face model to the latent space of the GAN. We design a novel hierarchical non-linear optimization problem to obtain the embedding. An identity preservation energy term allows spatially coherent edits while maintaining facial integrity. Our approach runs at interactive frame rates and thus allows the user to explore the space of possible edits. We evaluate our approach on a wide set of portrait photos, compare it to the current state of the art, and validate the effectiveness of its components in an ablation study
High quality dynamic reflectance and surface reconstruction from video
The creation of high quality animations of real-world human actors has long been a challenging problem in computer graphics. It involves the modeling of the shape of the virtual actors, creating their motion, and the reproduction of very fine dynamic details. In order to render the actor under arbitrary lighting, it is required that reflectance properties are modeled for each point on the surface. These steps, that are usually performed manually by professional modelers, are time consuming and cumbersome.
In this thesis, we show that algorithmic solutions for some of the problems that arise in the creation of high quality animation of real-world people are possible using multi-view video data. First, we present a novel spatio-temporal approach to create a personalized avatar from multi-view video data of a moving person. Thereafter, we propose two enhancements to a method that captures human shape, motion and reflectance properties of amoving human using eightmulti-view video streams. Afterwards we extend this work, and in order to add very fine dynamic details to the geometric models, such as wrinkles and folds in the clothing, we make use of the multi-view video recordings and present a statistical method that can passively capture the fine-grain details of time-varying scene geometry. Finally, in order to reconstruct structured shape and animation of the subject from video, we present a dense 3D correspondence finding method that enables spatiotemporally coherent reconstruction of surface animations directly frommulti-view video data.
These algorithmic solutions can be combined to constitute a complete animation pipeline for acquisition, reconstruction and rendering of high quality virtual actors from multi-view video data. They can also be used individually in a system that require the solution of a specific algorithmic sub-problem. The results demonstrate that using multi-view video data it is possible to find the model description that enables realistic appearance of animated virtual actors under different lighting conditions and exhibits high quality dynamic details in the geometry.Die Entwicklung hochqualitativer Animationen von menschlichen Schauspielern ist seit langem ein schwieriges Problem in der Computergrafik. Es beinhaltet das Modellieren einer dreidimensionaler Abbildung des Akteurs, seiner Bewegung und die Wiedergabe sehr feiner dynamischer Details. Um den Schauspieler unter einer beliebigen Beleuchtung zu rendern, müssen auch die Reflektionseigenschaften jedes einzelnen Punktes modelliert werden. Diese Schritte, die gewöhnlich manuell von Berufsmodellierern durchgeführt werden, sind zeitaufwendig und beschwerlich.
In dieser These schlagen wir algorithmische Lösungen für einige der Probleme vor, die in der Entwicklung solch hochqualitativen Animationen entstehen. Erstens präsentieren wir einen neuartigen, räumlich-zeitlichen Ansatz um einen Avatar von Mehransicht-Videodaten einer bewegenden Person zu schaffen. Danach beschreiben wir einen videobasierten Modelierungsansatz mit Hilfe einer animierten Schablone eines menschlichen Körpers. Unter Zuhilfenahme einer handvoll synchronisierter Videoaufnahmen berechnen wir die dreidimensionale Abbildung, seine Bewegung und Reflektionseigenschaften der Oberfläche. Um sehr feine dynamische Details, wie Runzeln und Falten in der Kleidung zu den geometrischen Modellen hinzuzufügen, zeigen wir eine statistische Methode, die feinen Details der zeitlich variierenden Szenegeometrie passiv erfassen kann. Und schließlich zeigen wir eine Methode, die dichte 3D Korrespondenzen findet, um die strukturierte Abbildung und die zugehörige Bewegung aus einem Video zu extrahieren. Dies ermöglicht eine räumlich-zeitlich zusammenhängende Rekonstruktion von Oberflächenanimationen direkt aus Mehransicht-Videodaten.
Diese algorithmischen Lösungen können kombiniert eingesetzt werden, um eine Animationspipeline für die Erfassung, die Rekonstruktion und das Rendering von Animationen hoher Qualität aus Mehransicht-Videodaten zu ermöglichen. Sie können auch einzeln in einem System verwendet werden, das nach einer Lösung eines spezifischen algorithmischen Teilproblems verlangt. Das Ergebnis ist eine Modelbeschreibung, das realistisches Erscheinen von animierten virtuellen Schauspielern mit dynamischen Details von hoher Qualität unter verschiedenen Lichtverhältnissen ermöglicht
Investigating 3D Visual Speech Animation Using 2D Videos
Lip motion accuracy is of paramount importance for speech intelligibility, especially for users who are hard of hearing or foreign language learners. Furthermore, generating a high level of realism in lip movements is required for the game and film production industries. This thesis focuses on the mapping of tracked lip motions of front-view 2D videos of a real speaker to a synthetic 3D head. A data-driven approach is used based on a 3D morphable model (3DMM) built using 3D synthetic head poses. The 3DMMs have been widely used for different tasks such as face recognition, detect facial expressions and lip motions in 2D videos. However, investigating factors such as the required facial landmarks for the mapping process, the amount of data for constructing the 3DMM, and differences in facial features between real faces and 3D faces that may influence the resulting animation have not been considered yet. Therefore, this research centers around investigating the impact of these factors on the final 3D lip motions.
The thesis explores how different sets of facial features used in the mapping process
influence the resulting 3D motions. Five sets of the facial features are used for mapping the real faces to the corresponding 3D faces. The results show that the inclusion of eyebrows, eyes, nose, and lips improves the 3D lip motions, while face contour features (i.e. the outside boundary of the front view of the face) restrict the face’s mesh, distorting the resulting animation.
This thesis investigates how using different amounts of data when constructing the 3DMM affects the 3D lip motions. The results show that using a wider range of synthetic head poses for different phoneme intensities to create a 3DMM, as well as a combination of front- and side-view photographs of real speakers to produce initial neutral 3D synthetic head poses, provides better animation results compared to ground truth data consisting of front- and side-view 2D videos of real speakers.
The thesis also investigates the impact of differences and similarities in facial features between real speakers and the 3DMMs on the resulting 3D lip motions by mapping between non-similar faces based on differences and similarities in vertical mouth height and mouth width. The objective and user test results show that mapping 2D videos of real speakers with low vertical mouth heights to 3D heads that correspond to real speakers with high vertical mouth heights, or vice versa, generates less good 3D lip motions. It is thus important that this is considered when using a 2D recording of a real actor’s lip movements to control a 3D synthetic character
Fashioning Seoul: Everyday Practices of Dress in the Korean Wave
Senior Project submitted to The Division of Social Studies of Bard College
Virtual Heritage
Virtual heritage has been explained as virtual reality applied to cultural heritage, but this definition only scratches the surface of the fascinating applications, tools and challenges of this fast-changing interdisciplinary field. This book provides an accessible but concise edited coverage of the main topics, tools and issues in virtual heritage. Leading international scholars have provided chapters to explain current issues in accuracy and precision; challenges in adopting advanced animation techniques; shows how archaeological learning can be developed in Minecraft; they propose mixed reality is conceptual rather than just technical; they explore how useful Linked Open Data can be for art history; explain how accessible photogrammetry can be but also ethical and practical issues for applying at scale; provide insight into how to provide interaction in museums involving the wider public; and describe issues in evaluating virtual heritage projects not often addressed even in scholarly papers. The book will be of particular interest to students and scholars in museum studies, digital archaeology, heritage studies, architectural history and modelling, virtual environments
Deep Nostalgia: remediated memory, algorithmic nostalgia, and technological ambivalence
Digital recreations of the past, and of the deceased, are part of the Internet’s present. They circulate within social networks where logics of connection and connectivity underpin increasingly performative memory work. In this article we explore these developments through a case study of the MyHeritage deep learning feature, Deep Nostalgia. Our analysis is informed by a close critical study of Deep Nostalgia creations, and discourses circulating around them, shared on Twitter during the two-week period following its launch, February 2021 (n.6935). We examine how memory is evoked, framed, re-worked and distorted through algorithmic processes, and within social networks in particular, and explore what this tells us about peoples' need to connect with their pasts. First, we analyse how the shift from photo to video ‘revives’ the dead via a process that we have termed ‘remediated memory’. Second, we explore the affective dimensions and resonances of Deep Nostalgia creations. In doing so, we introduce the concept of ‘algorithmic nostalgia’ to describe the ways nostalgia is generated, organised and exploited through Deep Nostalgia’s automated and recursive algorithmic mechanisms. Third, we interrogate the ways social media logics shape the use and influence of these outputs. Our study’s scholarly contribution is at the intersection of memory, automation, and algorithms. We highlight the importance of studying the ambivalence of emerging media at their nexus with memory studies and, critically, of attending to the ways corporate interests increasingly shape – and assimilate – these activities
- …