117 research outputs found

    Panorama View With Spatiotemporal Occlusion Compensation for 3D Video Coding

    Get PDF

    Data-driven 3D Reconstruction and View Synthesis of Dynamic Scene Elements

    Get PDF
    Our world is filled with living beings and other dynamic elements. It is important to record dynamic things and events for the sake of education, archeology, and culture inheritance. From vintage to modern times, people have recorded dynamic scene elements in different ways, from sequences of cave paintings to frames of motion pictures. This thesis focuses on two key computer vision techniques by which dynamic element representation moves beyond video capture: towards 3D reconstruction and view synthesis. Although previous methods on these two aspects have been adopted to model and represent static scene elements, dynamic scene elements present unique and difficult challenges for the tasks. This thesis focuses on three types of dynamic scene elements, namely 1) dynamic texture with static shape, 2) dynamic shapes with static texture, and 3) dynamic illumination of static scenes. Two research aspects will be explored to represent and visualize them: dynamic 3D reconstruction and dynamic view synthesis. Dynamic 3D reconstruction aims to recover the 3D geometry of dynamic objects and, by modeling the objects’ movements, bring 3D reconstructions to life. Dynamic view synthesis, on the other hand, summarizes or predicts the dynamic appearance change of dynamic objects – for example, the daytime-to-nighttime illumination of a building or the future movements of a rigid body. We first target the problem of reconstructing dynamic textures of objects that have (approximately) fixed 3D shape but time-varying appearance. Examples of such objects include waterfalls, fountains, and electronic billboards. Since the appearance of dynamic-textured objects can be random and complicated, estimating the 3D geometry of these objects from 2D images/video requires novel tools beyond the appearance-based point correspondence methods of traditional 3D computer vision. To perform this 3D reconstruction, we introduce a method that simultaneously 1) segments dynamically textured scene objects in the input images and 2) reconstructs the 3D geometry of the entire scene, assuming a static 3D shape for the dynamically textured objects. Compared to dynamic textures, the appearance change of dynamic shapes is due to physically defined motions like rigid body movements. In these cases, assumptions can be made about the object’s motion constraints in order to identify corresponding points on the object at different timepoints. For example, two points on a rigid object have constant distance between them in the 3D space, no matter how the object moves. Based on this assumption of local rigidity, we propose a robust method to correctly identify point correspondences of two images viewing the same moving object from different viewpoints and at different times. Dense 3D geometry could be obtained from the computed point correspondences. We apply this method on unsynchronized video streams, and observe that the number of inlier correspondences found by this method can be used as indicator for frame alignment among the different streams. To model dynamic scene appearance caused by illumination changes, we propose a framework to find a sequence of images that have similar geometric composition as a single reference image and also show a smooth transition in illumination throughout the day. These images could be registered to visualize patterns of illumination change from a single viewpoint. The final topic of this thesis involves predicting the movements of dynamic shapes in the image domain. Towards this end, we propose deep neural network architectures to predict future views of dynamic motions, such as rigid body movements and flowers blooming. Instead of predicting image pixels from the network, my methods predict pixel offsets and iteratively synthesize future views.Doctor of Philosoph

    High-quality face capture, animation and editing from monocular video

    Get PDF
    Digitization of virtual faces in movies requires complex capture setups and extensive manual work to produce superb animations and video-realistic editing. This thesis pushes the boundaries of the digitization pipeline by proposing automatic algorithms for high-quality 3D face capture and animation, as well as photo-realistic face editing. These algorithms reconstruct and modify faces in 2D videos recorded in uncontrolled scenarios and illumination. In particular, advances in three main areas offer solutions for the lack of depth and overall uncertainty in video recordings. First, contributions in capture include model-based reconstruction of detailed, dynamic 3D geometry that exploits optical and shading cues, multilayer parametric reconstruction of accurate 3D models in unconstrained setups based on inverse rendering, and regression-based 3D lip shape enhancement from high-quality data. Second, advances in animation are video-based face reenactment based on robust appearance metrics and temporal clustering, performance-driven retargeting of detailed facial models in sync with audio, and the automatic creation of personalized controllable 3D rigs. Finally, advances in plausible photo-realistic editing are dense face albedo capture and mouth interior synthesis using image warping and 3D teeth proxies. High-quality results attained on challenging application scenarios confirm the contributions and show great potential for the automatic creation of photo-realistic 3D faces.Die Digitalisierung von Gesichtern zum Einsatz in der Filmindustrie erfordert komplizierte Aufnahmevorrichtungen und die manuelle Nachbearbeitung von Rekonstruktionen, um perfekte Animationen und realistische Videobearbeitung zu erzielen. Diese Dissertation erweitert vorhandene Digitalisierungsverfahren durch die Erforschung von automatischen Verfahren zur qualitativ hochwertigen 3D Rekonstruktion, Animation und Modifikation von Gesichtern. Diese Algorithmen erlauben es, Gesichter in 2D Videos, die unter allgemeinen Bedingungen und unbekannten Beleuchtungsverhältnissen aufgenommen wurden, zu rekonstruieren und zu modifizieren. Vor allem Fortschritte in den folgenden drei Hauptbereichen tragen zur Kompensation von fehlender Tiefeninformation und der allgemeinen Mehrdeutigkeit von 2D Videoaufnahmen bei. Erstens, Beiträge zur modellbasierten Rekonstruktion von detaillierter und dynamischer 3D Geometrie durch optische Merkmale und die Shading-Eigenschaften des Gesichts, mehrschichtige parametrische Rekonstruktion von exakten 3D Modellen mittels inversen Renderings in allgemeinen Szenen und regressionsbasierter 3D Lippenformverfeinerung mittels qualitativ hochwertigen Daten. Zweitens, Fortschritte im Bereich der Computeranimation durch videobasierte Gesichtsausdrucksübertragung und temporaler Clusterbildung, Übertragung von detaillierten Gesichtsmodellen, deren Mundbewegung mit Ton synchronisiert ist, und die automatische Erstellung von personalisierten "3D Face Rigs". Schließlich werden Fortschritte im Bereich der realistischen Videobearbeitung vorgestellt, welche auf der dichten Rekonstruktion von Hautreflektionseigenschaften und der Mundinnenraumsynthese mittels bildbasierten und geometriebasierten Verfahren aufbauen. Qualitativ hochwertige Ergebnisse in anspruchsvollen Anwendungen untermauern die Wichtigkeit der geleisteten Beiträgen und zeigen das große Potential der automatischen Erstellung von realistischen digitalen 3D Gesichtern auf

    3D computational modeling and perceptual analysis of kinetic depth effects

    Get PDF
    Humans have the ability to perceive kinetic depth effects, i.e., to perceived 3D shapes from 2D projections of rotating 3D objects. This process is based on a variety of visual cues such as lighting and shading effects. However, when such cues are weak or missing, perception can become faulty, as demonstrated by the famous silhouette illusion example of the spinning dancer. Inspired by this, we establish objective and subjective evaluation models of rotated 3D objects by taking their projected 2D images as input. We investigate five different cues: ambient luminance, shading, rotation speed, perspective, and color difference between the objects and background. In the objective evaluation model, we first apply 3D reconstruction algorithms to obtain an objective reconstruction quality metric, and then use quadratic stepwise regression analysis to determine weights of depth cues to represent the reconstruction quality. In the subjective evaluation model, we use a comprehensive user study to reveal correlations with reaction time and accuracy, rotation speed, and perspective. The two evaluation models are generally consistent, and potentially of benefit to inter-disciplinary research into visual perception and 3D reconstruction

    High quality dynamic reflectance and surface reconstruction from video

    Get PDF
    The creation of high quality animations of real-world human actors has long been a challenging problem in computer graphics. It involves the modeling of the shape of the virtual actors, creating their motion, and the reproduction of very fine dynamic details. In order to render the actor under arbitrary lighting, it is required that reflectance properties are modeled for each point on the surface. These steps, that are usually performed manually by professional modelers, are time consuming and cumbersome. In this thesis, we show that algorithmic solutions for some of the problems that arise in the creation of high quality animation of real-world people are possible using multi-view video data. First, we present a novel spatio-temporal approach to create a personalized avatar from multi-view video data of a moving person. Thereafter, we propose two enhancements to a method that captures human shape, motion and reflectance properties of amoving human using eightmulti-view video streams. Afterwards we extend this work, and in order to add very fine dynamic details to the geometric models, such as wrinkles and folds in the clothing, we make use of the multi-view video recordings and present a statistical method that can passively capture the fine-grain details of time-varying scene geometry. Finally, in order to reconstruct structured shape and animation of the subject from video, we present a dense 3D correspondence finding method that enables spatiotemporally coherent reconstruction of surface animations directly frommulti-view video data. These algorithmic solutions can be combined to constitute a complete animation pipeline for acquisition, reconstruction and rendering of high quality virtual actors from multi-view video data. They can also be used individually in a system that require the solution of a specific algorithmic sub-problem. The results demonstrate that using multi-view video data it is possible to find the model description that enables realistic appearance of animated virtual actors under different lighting conditions and exhibits high quality dynamic details in the geometry.Die Entwicklung hochqualitativer Animationen von menschlichen Schauspielern ist seit langem ein schwieriges Problem in der Computergrafik. Es beinhaltet das Modellieren einer dreidimensionaler Abbildung des Akteurs, seiner Bewegung und die Wiedergabe sehr feiner dynamischer Details. Um den Schauspieler unter einer beliebigen Beleuchtung zu rendern, müssen auch die Reflektionseigenschaften jedes einzelnen Punktes modelliert werden. Diese Schritte, die gewöhnlich manuell von Berufsmodellierern durchgeführt werden, sind zeitaufwendig und beschwerlich. In dieser These schlagen wir algorithmische Lösungen für einige der Probleme vor, die in der Entwicklung solch hochqualitativen Animationen entstehen. Erstens präsentieren wir einen neuartigen, räumlich-zeitlichen Ansatz um einen Avatar von Mehransicht-Videodaten einer bewegenden Person zu schaffen. Danach beschreiben wir einen videobasierten Modelierungsansatz mit Hilfe einer animierten Schablone eines menschlichen Körpers. Unter Zuhilfenahme einer handvoll synchronisierter Videoaufnahmen berechnen wir die dreidimensionale Abbildung, seine Bewegung und Reflektionseigenschaften der Oberfläche. Um sehr feine dynamische Details, wie Runzeln und Falten in der Kleidung zu den geometrischen Modellen hinzuzufügen, zeigen wir eine statistische Methode, die feinen Details der zeitlich variierenden Szenegeometrie passiv erfassen kann. Und schließlich zeigen wir eine Methode, die dichte 3D Korrespondenzen findet, um die strukturierte Abbildung und die zugehörige Bewegung aus einem Video zu extrahieren. Dies ermöglicht eine räumlich-zeitlich zusammenhängende Rekonstruktion von Oberflächenanimationen direkt aus Mehransicht-Videodaten. Diese algorithmischen Lösungen können kombiniert eingesetzt werden, um eine Animationspipeline für die Erfassung, die Rekonstruktion und das Rendering von Animationen hoher Qualität aus Mehransicht-Videodaten zu ermöglichen. Sie können auch einzeln in einem System verwendet werden, das nach einer Lösung eines spezifischen algorithmischen Teilproblems verlangt. Das Ergebnis ist eine Modelbeschreibung, das realistisches Erscheinen von animierten virtuellen Schauspielern mit dynamischen Details von hoher Qualität unter verschiedenen Lichtverhältnissen ermöglicht
    • …
    corecore