250 research outputs found

    Doctor of Philosophy

    Get PDF
    dissertationInteractive editing and manipulation of digital media is a fundamental component in digital content creation. One media in particular, digital imagery, has seen a recent increase in popularity of its large or even massive image formats. Unfortunately, current systems and techniques are rarely concerned with scalability or usability with these large images. Moreover, processing massive (or even large) imagery is assumed to be an off-line, automatic process, although many problems associated with these datasets require human intervention for high quality results. This dissertation details how to design interactive image techniques that scale. In particular, massive imagery is typically constructed as a seamless mosaic of many smaller images. The focus of this work is the creation of new technologies to enable user interaction in the formation of these large mosaics. While an interactive system for all stages of the mosaic creation pipeline is a long-term research goal, this dissertation concentrates on the last phase of the mosaic creation pipeline - the composition of registered images into a seamless composite. The work detailed in this dissertation provides the technologies to fully realize interactive editing in mosaic composition on image collections ranging from the very small to massive in scale

    3D Human Face Reconstruction and 2D Appearance Synthesis

    Get PDF
    3D human face reconstruction has been an extensive research for decades due to its wide applications, such as animation, recognition and 3D-driven appearance synthesis. Although commodity depth sensors are widely available in recent years, image based face reconstruction are significantly valuable as images are much easier to access and store. In this dissertation, we first propose three image-based face reconstruction approaches according to different assumption of inputs. In the first approach, face geometry is extracted from multiple key frames of a video sequence with different head poses. The camera should be calibrated under this assumption. As the first approach is limited to videos, we propose the second approach then focus on single image. This approach also improves the geometry by adding fine grains using shading cue. We proposed a novel albedo estimation and linear optimization algorithm in this approach. In the third approach, we further loose the constraint of the input image to arbitrary in the wild images. Our proposed approach can robustly reconstruct high quality model even with extreme expressions and large poses. We then explore the applicability of our face reconstructions on four interesting applications: video face beautification, generating personalized facial blendshape from image sequences, face video stylizing and video face replacement. We demonstrate great potentials of our reconstruction approaches on these real-world applications. In particular, with the recent surge of interests in VR/AR, it is increasingly common to see people wearing head-mounted displays. However, the large occlusion on face is a big obstacle for people to communicate in a face-to-face manner. Our another application is that we explore hardware/software solutions for synthesizing the face image with presence of HMDs. We design two setups (experimental and mobile) which integrate two near IR cameras and one color camera to solve this problem. With our algorithm and prototype, we can achieve photo-realistic results. We further propose a deep neutral network to solve the HMD removal problem considering it as a face inpainting problem. This approach doesn\u27t need special hardware and run in real-time with satisfying results

    MONOCULAR POSE ESTIMATION AND SHAPE RECONSTRUCTION OF QUASI-ARTICULATED OBJECTS WITH CONSUMER DEPTH CAMERA

    Get PDF
    Quasi-articulated objects, such as human beings, are among the most commonly seen objects in our daily lives. Extensive research have been dedicated to 3D shape reconstruction and motion analysis for this type of objects for decades. A major motivation is their wide applications, such as in entertainment, surveillance and health care. Most of existing studies relied on one or more regular video cameras. In recent years, commodity depth sensors have become more and more widely available. The geometric measurements delivered by the depth sensors provide significantly valuable information for these tasks. In this dissertation, we propose three algorithms for monocular pose estimation and shape reconstruction of quasi-articulated objects using a single commodity depth sensor. These three algorithms achieve shape reconstruction with increasing levels of granularity and personalization. We then further develop a method for highly detailed shape reconstruction based on our pose estimation techniques. Our first algorithm takes advantage of a motion database acquired with an active marker-based motion capture system. This method combines pose detection through nearest neighbor search with pose refinement via non-rigid point cloud registration. It is capable of accommodating different body sizes and achieves more than twice higher accuracy compared to a previous state of the art on a publicly available dataset. The above algorithm performs frame by frame estimation and therefore is less prone to tracking failure. Nonetheless, it does not guarantee temporal consistent of the both the skeletal structure and the shape and could be problematic for some applications. To address this problem, we develop a real-time model-based approach for quasi-articulated pose and 3D shape estimation based on Iterative Closest Point (ICP) principal with several novel constraints that are critical for monocular scenario. In this algorithm, we further propose a novel method for automatic body size estimation that enables its capability to accommodate different subjects. Due to the local search nature, the ICP-based method could be trapped to local minima in the case of some complex and fast motions. To address this issue, we explore the potential of using statistical model for soft point correspondences association. Towards this end, we propose a unified framework based on Gaussian Mixture Model for joint pose and shape estimation of quasi-articulated objects. This method achieves state-of-the-art performance on various publicly available datasets. Based on our pose estimation techniques, we then develop a novel framework that achieves highly detailed shape reconstruction by only requiring the user to move naturally in front of a single depth sensor. Our experiments demonstrate reconstructed shapes with rich geometric details for various subjects with different apparels. Last but not the least, we explore the applicability of our method on two real-world applications. First of all, we combine our ICP-base method with cloth simulation techniques for Virtual Try-on. Our system delivers the first promising 3D-based virtual clothing system. Secondly, we explore the possibility to extend our pose estimation algorithms to assist physical therapist to identify their patients’ movement dysfunctions that are related to injuries. Our preliminary experiments have demonstrated promising results by comparison with the gold standard active marker-based commercial system. Throughout the dissertation, we develop various state-of-the-art algorithms for pose estimation and shape reconstruction of quasi-articulated objects by leveraging the geometric information from depth sensors. We also demonstrate their great potentials for different real-world applications

    Real-time simulation and visualisation of cloth using edge-based adaptive meshes

    Get PDF
    Real-time rendering and the animation of realistic virtual environments and characters has progressed at a great pace, following advances in computer graphics hardware in the last decade. The role of cloth simulation is becoming ever more important in the quest to improve the realism of virtual environments. The real-time simulation of cloth and clothing is important for many applications such as virtual reality, crowd simulation, games and software for online clothes shopping. A large number of polygons are necessary to depict the highly exible nature of cloth with wrinkling and frequent changes in its curvature. In combination with the physical calculations which model the deformations, the effort required to simulate cloth in detail is very computationally expensive resulting in much diffculty for its realistic simulation at interactive frame rates. Real-time cloth simulations can lack quality and realism compared to their offline counterparts, since coarse meshes must often be employed for performance reasons. The focus of this thesis is to develop techniques to allow the real-time simulation of realistic cloth and clothing. Adaptive meshes have previously been developed to act as a bridge between low and high polygon meshes, aiming to adaptively exploit variations in the shape of the cloth. The mesh complexity is dynamically increased or refined to balance quality against computational cost during a simulation. A limitation of many approaches is they do not often consider the decimation or coarsening of previously refined areas, or otherwise are not fast enough for real-time applications. A novel edge-based adaptive mesh is developed for the fast incremental refinement and coarsening of a triangular mesh. A mass-spring network is integrated into the mesh permitting the real-time adaptive simulation of cloth, and techniques are developed for the simulation of clothing on an animated character

    Vocaodoru - Rhythm Gaming and Artificial Cinematography in Virtual Reality

    Get PDF
    Vocaodoru is a virtual reality rhythm game centered around two novel components. The gameplay of Vocaodoru is a never before-seen pose-based gameplay system that uses a player’s measurements to adapt gameplay to their needs. Tied to the gameplay is a human-in-the-loop utility AI that controls a cinematographic camera to allow streamers to broadcast a more interesting, dynamic view of the player. We discuss our efforts to develop and connect these components and how we plan to continue development after the conclusion of the MQP

    Fusing Multimedia Data Into Dynamic Virtual Environments

    Get PDF
    In spite of the dramatic growth of virtual and augmented reality (VR and AR) technology, content creation for immersive and dynamic virtual environments remains a significant challenge. In this dissertation, we present our research in fusing multimedia data, including text, photos, panoramas, and multi-view videos, to create rich and compelling virtual environments. First, we present Social Street View, which renders geo-tagged social media in its natural geo-spatial context provided by 360° panoramas. Our system takes into account visual saliency and uses maximal Poisson-disc placement with spatiotemporal filters to render social multimedia in an immersive setting. We also present a novel GPU-driven pipeline for saliency computation in 360° panoramas using spherical harmonics (SH). Our spherical residual model can be applied to virtual cinematography in 360° videos. We further present Geollery, a mixed-reality platform to render an interactive mirrored world in real time with three-dimensional (3D) buildings, user-generated content, and geo-tagged social media. Our user study has identified several use cases for these systems, including immersive social storytelling, experiencing the culture, and crowd-sourced tourism. We next present Video Fields, a web-based interactive system to create, calibrate, and render dynamic videos overlaid on 3D scenes. Our system renders dynamic entities from multiple videos, using early and deferred texture sampling. Video Fields can be used for immersive surveillance in virtual environments. Furthermore, we present VRSurus and ARCrypt projects to explore the applications of gestures recognition, haptic feedback, and visual cryptography for virtual and augmented reality. Finally, we present our work on Montage4D, a real-time system for seamlessly fusing multi-view video textures with dynamic meshes. We use geodesics on meshes with view-dependent rendering to mitigate spatial occlusion seams while maintaining temporal consistency. Our experiments show significant enhancement in rendering quality, especially for salient regions such as faces. We believe that Social Street View, Geollery, Video Fields, and Montage4D will greatly facilitate several applications such as virtual tourism, immersive telepresence, and remote education

    Synthesizing and Editing Photo-realistic Visual Objects

    Get PDF
    In this thesis we investigate novel methods of synthesizing new images of a deformable visual object using a collection of images of the object. We investigate both parametric and non-parametric methods as well as a combination of the two methods for the problem of image synthesis. Our main focus are complex visual objects, specifically deformable objects and objects with varying numbers of visible parts. We first introduce sketch-driven image synthesis system, which allows the user to draw ellipses and outlines in order to sketch a rough shape of animals as a constraint to the synthesized image. This system interactively provides feedback in the form of ellipse and contour suggestions to the partial sketch of the user. The user's sketch guides the non-parametric synthesis algorithm that blends patches from two exemplar images in a coarse-to-fine fashion to create a final image. We evaluate the method and synthesized images through two user studies. Instead of non-parametric blending of patches, a parametric model of the appearance is more desirable as its appearance representation is shared between all images of the dataset. Hence, we propose Context-Conditioned Component Analysis, a probabilistic generative parametric model, which described images with a linear combination of basis functions. The basis functions are evaluated for each pixel using a context vector computed from the local shape information. We evaluate C-CCA qualitatively and quantitatively on inpainting, appearance transfer and reconstruction tasks. Drawing samples of C-CCA generates novel, globally-coherent images, which, unfortunately, lack high-frequency details due to dimensionality reduction and misalignment. We develop a non-parametric model that enhances the samples of C-CCA with locally-coherent, high-frequency details. The non-parametric model efficiently finds patches from the dataset that match the C-CCA sample and blends the patches together. We analyze the results of the combined method on the datasets of horse and elephant images

    Image synthesis based on a model of human vision

    Get PDF
    Modern computer graphics systems are able to construct renderings of such high quality that viewers are deceived into regarding the images as coming from a photographic source. Large amounts of computing resources are expended in this rendering process, using complex mathematical models of lighting and shading. However, psychophysical experiments have revealed that viewers only regard certain informative regions within a presented image. Furthermore, it has been shown that these visually important regions contain low-level visual feature differences that attract the attention of the viewer. This thesis will present a new approach to image synthesis that exploits these experimental findings by modulating the spatial quality of image regions by their visual importance. Efficiency gains are therefore reaped, without sacrificing much of the perceived quality of the image. Two tasks must be undertaken to achieve this goal. Firstly, the design of an appropriate region-based model of visual importance, and secondly, the modification of progressive rendering techniques to effect an importance-based rendering approach. A rule-based fuzzy logic model is presented that computes, using spatial feature differences, the relative visual importance of regions in an image. This model improves upon previous work by incorporating threshold effects induced by global feature difference distributions and by using texture concentration measures. A modified approach to progressive ray-tracing is also presented. This new approach uses the visual importance model to guide the progressive refinement of an image. In addition, this concept of visual importance has been incorporated into supersampling, texture mapping and computer animation techniques. Experimental results are presented, illustrating the efficiency gains reaped from using this method of progressive rendering. This visual importance-based rendering approach is expected to have applications in the entertainment industry, where image fidelity may be sacrificed for efficiency purposes, as long as the overall visual impression of the scene is maintained. Different aspects of the approach should find many other applications in image compression, image retrieval, progressive data transmission and active robotic vision

    Hierarchical processing, editing and rendering of acquired geometry

    Get PDF
    La représentation des surfaces du monde réel dans la mémoire d’une machine peut désormais être obtenue automatiquement via divers périphériques de capture tels que les scanners 3D. Ces nouvelles sources de données, précises et rapides, amplifient de plusieurs ordres de grandeur la résolution des surfaces 3D, apportant un niveau de précision élevé pour les applications nécessitant des modèles numériques de surfaces telles que la conception assistée par ordinateur, la simulation physique, la réalité virtuelle, l’imagerie médicale, l’architecture, l’étude archéologique, les effets spéciaux, l’animation ou bien encore les jeux video. Malheureusement, la richesse de la géométrie produite par ces méthodes induit une grande, voire gigantesque masse de données à traiter, nécessitant de nouvelles structures de données et de nouveaux algorithmes capables de passer à l’échelle d’objets pouvant atteindre le milliard d’échantillons. Dans cette thèse, je propose des solutions performantes en temps et en espace aux problèmes de la modélisation, du traitement géométrique, de l’édition intéractive et de la visualisation de ces surfaces 3D complexes. La méthodologie adoptée pendant l’élaboration transverse de ces nouveaux algorithmes est articulée autour de 4 éléments clés : une approche hiérarchique systématique, une réduction locale de la dimension des problèmes, un principe d’échantillonage-reconstruction et une indépendance à l’énumération explicite des relations topologiques aussi appelée approche basée-points. En pratique, ce manuscrit propose un certain nombre de contributions, parmi lesquelles : une nouvelle structure hiérarchique hybride de partitionnement, l’Arbre Volume-Surface (VS-Tree) ainsi que de nouveaux algorithmes de simplification et de reconstruction ; un système d’édition intéractive de grands objets ; un noyau temps-réel de synthèse géométrique par raffinement et une structure multi-résolution offrant un rendu efficace de grands objets. Ces structures, algorithmes et systèmes forment une chaîne capable de traiter les objets en provenance du pipeline d’acquisition, qu’ils soient représentés par des nuages de points ou des maillages, possiblement non 2-variétés. Les solutions obtenues ont été appliquées avec succès aux données issues des divers domaines d’application précités.Digital representations of real-world surfaces can now be obtained automatically using various acquisition devices such as 3D scanners and stereo camera systems. These new fast and accurate data sources increase 3D surface resolution by several orders of magnitude, borrowing higher precision to applications which require digital surfaces. All major computer graphics applications can take benefit of this automatic modeling process, including: computer-aided design, physical simulation, virtual reality, medical imaging, architecture, archaeological study, special effects, computer animation and video games. Unfortunately, the richness of the geometry produced by these media comes at the price of a large, possibility gigantic, amount of data which requires new efficient data structures and algorithms offering scalability for processing such objects. This thesis proposes time and space efficient solutions for modeling, editing and rendering such complex surfaces, solving these problems with new algorithms sharing 4 fundamental elements: a systematic hierarchical approach, a local dimension reduction, a sampling-reconstruction paradigm and a pointbased basis. Basically, this manuscript proposes several contributions, including: a new hierarchical space subdivision structure, the Volume-Surface Tree, for geometry processing such as simplification and reconstruction; a streaming system featuring new algorithms for interactive editing of large objects, an appearancepreserving multiresolution structure for efficient rendering of large point-based surfaces, and a generic kernel for real-time geometry synthesis by refinement. These elements form a pipeline able to process acquired geometry, either represented by point clouds or non-manifold meshes. Effective results have been successfully obtained with data coming from the various applications mentioned
    • …
    corecore