570 research outputs found

    HeadOn: Real-time Reenactment of Human Portrait Videos

    Get PDF
    We propose HeadOn, the first real-time source-to-target reenactment approach for complete human portrait videos that enables transfer of torso and head motion, face expression, and eye gaze. Given a short RGB-D video of the target actor, we automatically construct a personalized geometry proxy that embeds a parametric head, eye, and kinematic torso model. A novel real-time reenactment algorithm employs this proxy to photo-realistically map the captured motion from the source actor to the target actor. On top of the coarse geometric proxy, we propose a video-based rendering technique that composites the modified target portrait video via view- and pose-dependent texturing, and creates photo-realistic imagery of the target actor under novel torso and head poses, facial expressions, and gaze directions. To this end, we propose a robust tracking of the face and torso of the source actor. We extensively evaluate our approach and show significant improvements in enabling much greater flexibility in creating realistic reenacted output videos.Comment: Video: https://www.youtube.com/watch?v=7Dg49wv2c_g Presented at Siggraph'1

    Tex2Shape: Detailed Full Human Body Geometry From a Single Image

    No full text
    We present a simple yet effective method to infer detailed full human body shape from only a single photograph. Our model can infer full-body shape including face, hair, and clothing including wrinkles at interactive frame-rates. Results feature details even on parts that are occluded in the input image. Our main idea is to turn shape regression into an aligned image-to-image translation problem. The input to our method is a partial texture map of the visible region obtained from off-the-shelf methods. From a partial texture, we estimate detailed normal and vector displacement maps, which can be applied to a low-resolution smooth body model to add detail and clothing. Despite being trained purely with synthetic data, our model generalizes well to real-world photographs. Numerous results demonstrate the versatility and robustness of our method

    Tex2Shape: Detailed Full Human Body Geometry From a Single Image

    Get PDF
    We present a simple yet effective method to infer detailed full human body shape from only a single photograph. Our model can infer full-body shape including face, hair, and clothing including wrinkles at interactive frame-rates. Results feature details even on parts that are occluded in the input image. Our main idea is to turn shape regression into an aligned image-to-image translation problem. The input to our method is a partial texture map of the visible region obtained from off-the-shelf methods. From a partial texture, we estimate detailed normal and vector displacement maps, which can be applied to a low-resolution smooth body model to add detail and clothing. Despite being trained purely with synthetic data, our model generalizes well to real-world photographs. Numerous results demonstrate the versatility and robustness of our method

    BareSkinNet: De-makeup and De-lighting via 3D Face Reconstruction

    Full text link
    We propose BareSkinNet, a novel method that simultaneously removes makeup and lighting influences from the face image. Our method leverages a 3D morphable model and does not require a reference clean face image or a specified light condition. By combining the process of 3D face reconstruction, we can easily obtain 3D geometry and coarse 3D textures. Using this information, we can infer normalized 3D face texture maps (diffuse, normal, roughness, and specular) by an image-translation network. Consequently, reconstructed 3D face textures without undesirable information will significantly benefit subsequent processes, such as re-lighting or re-makeup. In experiments, we show that BareSkinNet outperforms state-of-the-art makeup removal methods. In addition, our method is remarkably helpful in removing makeup to generate consistent high-fidelity texture maps, which makes it extendable to many realistic face generation applications. It can also automatically build graphic assets of face makeup images before and after with corresponding 3D data. This will assist artists in accelerating their work, such as 3D makeup avatar creation.Comment: accepted at PG202

    To Affinity and Beyond: Interactive Digital Humans as a Human Computer Interface

    Get PDF
    The field of human computer interaction is increasingly exploring the use of more natural, human-like user interfaces to build intelligent agents to aid in everyday life. This is coupled with a move to people using ever more realistic avatars to represent themselves in their digital lives. As the ability to produce emotionally engaging digital human representations is only just now becoming technically possible, there is little research into how to approach such tasks. This is due to both technical complexity and operational implementation cost. This is now changing as we are at a nexus point with new approaches, faster graphics processing and enabling new technologies in machine learning and computer vision becoming available. I articulate the issues required for such digital humans to be considered successfully located on the other side of the phenomenon known as the Uncanny Valley. My results show that a complex mix of perceived and contextual aspects affect the sense making on digital humans and highlights previously undocumented effects of interactivity on the affinity. Users are willing to accept digital humans as a new form of user interface and they react to them emotionally in previously unanticipated ways. My research shows that it is possible to build an effective interactive digital human that crosses the Uncanny Valley. I directly explore what is required to build a visually realistic digital human as a primary research question and I explore if such a realistic face provides sufficient benefit to justify the challenges involved in building it. I conducted a Delphi study to inform the research approaches and then produced a complex digital human character based on these insights. This interactive and realistic digital human avatar represents a major technical undertaking involving multiple teams around the world. Finally, I explored a framework for examining the ethical implications and signpost future research areas

    FlightGoggles: A Modular Framework for Photorealistic Camera, Exteroceptive Sensor, and Dynamics Simulation

    Full text link
    FlightGoggles is a photorealistic sensor simulator for perception-driven robotic vehicles. The key contributions of FlightGoggles are twofold. First, FlightGoggles provides photorealistic exteroceptive sensor simulation using graphics assets generated with photogrammetry. Second, it provides the ability to combine (i) synthetic exteroceptive measurements generated in silico in real time and (ii) vehicle dynamics and proprioceptive measurements generated in motio by vehicle(s) in a motion-capture facility. FlightGoggles is capable of simulating a virtual-reality environment around autonomous vehicle(s). While a vehicle is in flight in the FlightGoggles virtual reality environment, exteroceptive sensors are rendered synthetically in real time while all complex extrinsic dynamics are generated organically through the natural interactions of the vehicle. The FlightGoggles framework allows for researchers to accelerate development by circumventing the need to estimate complex and hard-to-model interactions such as aerodynamics, motor mechanics, battery electrochemistry, and behavior of other agents. The ability to perform vehicle-in-the-loop experiments with photorealistic exteroceptive sensor simulation facilitates novel research directions involving, e.g., fast and agile autonomous flight in obstacle-rich environments, safe human interaction, and flexible sensor selection. FlightGoggles has been utilized as the main test for selecting nine teams that will advance in the AlphaPilot autonomous drone racing challenge. We survey approaches and results from the top AlphaPilot teams, which may be of independent interest.Comment: Initial version appeared at IROS 2019. Supplementary material can be found at https://flightgoggles.mit.edu. Revision includes description of new FlightGoggles features, such as a photogrammetric model of the MIT Stata Center, new rendering settings, and a Python AP

    Beyond deep fakes: Conceptual framework, applications, and research agenda for neural rendering of realistic digital faces

    Get PDF
    Neural rendering (NR) has emerged as a novel technology for the generation and animation of realistic digital human faces. NR is based on machine learning techniques such as generative adversarial networks and is used to infer human face features and their animation from large amounts of (video) training data. NR shot to prominence with the deep fake phenomenon, the malicious and unwanted use of someone’s face for deception or satire. In this paper we demonstrate that the potential uses of NR far outstrip its use for deep fakes. We contrast NR approaches with traditional computer graphics approaches, discuss typical types of NR applications in digital face generation, and derive a conceptual framework for both guiding the design of digital characters, and for classifying existing NR use cases. We conclude with research ideas for studying the potential applications and implications of NR-based digital characters

    High quality dynamic reflectance and surface reconstruction from video

    Get PDF
    The creation of high quality animations of real-world human actors has long been a challenging problem in computer graphics. It involves the modeling of the shape of the virtual actors, creating their motion, and the reproduction of very fine dynamic details. In order to render the actor under arbitrary lighting, it is required that reflectance properties are modeled for each point on the surface. These steps, that are usually performed manually by professional modelers, are time consuming and cumbersome. In this thesis, we show that algorithmic solutions for some of the problems that arise in the creation of high quality animation of real-world people are possible using multi-view video data. First, we present a novel spatio-temporal approach to create a personalized avatar from multi-view video data of a moving person. Thereafter, we propose two enhancements to a method that captures human shape, motion and reflectance properties of amoving human using eightmulti-view video streams. Afterwards we extend this work, and in order to add very fine dynamic details to the geometric models, such as wrinkles and folds in the clothing, we make use of the multi-view video recordings and present a statistical method that can passively capture the fine-grain details of time-varying scene geometry. Finally, in order to reconstruct structured shape and animation of the subject from video, we present a dense 3D correspondence finding method that enables spatiotemporally coherent reconstruction of surface animations directly frommulti-view video data. These algorithmic solutions can be combined to constitute a complete animation pipeline for acquisition, reconstruction and rendering of high quality virtual actors from multi-view video data. They can also be used individually in a system that require the solution of a specific algorithmic sub-problem. The results demonstrate that using multi-view video data it is possible to find the model description that enables realistic appearance of animated virtual actors under different lighting conditions and exhibits high quality dynamic details in the geometry.Die Entwicklung hochqualitativer Animationen von menschlichen Schauspielern ist seit langem ein schwieriges Problem in der Computergrafik. Es beinhaltet das Modellieren einer dreidimensionaler Abbildung des Akteurs, seiner Bewegung und die Wiedergabe sehr feiner dynamischer Details. Um den Schauspieler unter einer beliebigen Beleuchtung zu rendern, mĂŒssen auch die Reflektionseigenschaften jedes einzelnen Punktes modelliert werden. Diese Schritte, die gewöhnlich manuell von Berufsmodellierern durchgefĂŒhrt werden, sind zeitaufwendig und beschwerlich. In dieser These schlagen wir algorithmische Lösungen fĂŒr einige der Probleme vor, die in der Entwicklung solch hochqualitativen Animationen entstehen. Erstens prĂ€sentieren wir einen neuartigen, rĂ€umlich-zeitlichen Ansatz um einen Avatar von Mehransicht-Videodaten einer bewegenden Person zu schaffen. Danach beschreiben wir einen videobasierten Modelierungsansatz mit Hilfe einer animierten Schablone eines menschlichen Körpers. Unter Zuhilfenahme einer handvoll synchronisierter Videoaufnahmen berechnen wir die dreidimensionale Abbildung, seine Bewegung und Reflektionseigenschaften der OberflĂ€che. Um sehr feine dynamische Details, wie Runzeln und Falten in der Kleidung zu den geometrischen Modellen hinzuzufĂŒgen, zeigen wir eine statistische Methode, die feinen Details der zeitlich variierenden Szenegeometrie passiv erfassen kann. Und schließlich zeigen wir eine Methode, die dichte 3D Korrespondenzen findet, um die strukturierte Abbildung und die zugehörige Bewegung aus einem Video zu extrahieren. Dies ermöglicht eine rĂ€umlich-zeitlich zusammenhĂ€ngende Rekonstruktion von OberflĂ€chenanimationen direkt aus Mehransicht-Videodaten. Diese algorithmischen Lösungen können kombiniert eingesetzt werden, um eine Animationspipeline fĂŒr die Erfassung, die Rekonstruktion und das Rendering von Animationen hoher QualitĂ€t aus Mehransicht-Videodaten zu ermöglichen. Sie können auch einzeln in einem System verwendet werden, das nach einer Lösung eines spezifischen algorithmischen Teilproblems verlangt. Das Ergebnis ist eine Modelbeschreibung, das realistisches Erscheinen von animierten virtuellen Schauspielern mit dynamischen Details von hoher QualitĂ€t unter verschiedenen LichtverhĂ€ltnissen ermöglicht
    • 

    corecore