295 research outputs found

    Photorealistic retrieval of occluded facial information using a performance-driven face model

    Get PDF
    Facial occlusions can cause both human observers and computer algorithms to fail in a variety of important tasks such as facial action analysis and expression classification. This is because the missing information is not reconstructed accurately enough for the purpose of the task in hand. Most current computer methods that are used to tackle this problem implement complex three-dimensional polygonal face models that are generally timeconsuming to produce and unsuitable for photorealistic reconstruction of missing facial features and behaviour. In this thesis, an image-based approach is adopted to solve the occlusion problem. A dynamic computer model of the face is used to retrieve the occluded facial information from the driver faces. The model consists of a set of orthogonal basis actions obtained by application of principal component analysis (PCA) on image changes and motion fields extracted from a sequence of natural facial motion (Cowe 2003). Examples of occlusion affected facial behaviour can then be projected onto the model to compute coefficients of the basis actions and thus produce photorealistic performance-driven animations. Visual inspection shows that the PCA face model recovers aspects of expressions in those areas occluded in the driver sequence, but the expression is generally muted. To further investigate this finding, a database of test sequences affected by a considerable set of artificial and natural occlusions is created. A number of suitable metrics is developed to measure the accuracy of the reconstructions. Regions of the face that are most important for performance-driven mimicry and that seem to carry the best information about global facial configurations are revealed using Bubbles, thus in effect identifying facial areas that are most sensitive to occlusions. Recovery of occluded facial information is enhanced by applying an appropriate scaling factor to the respective coefficients of the basis actions obtained by PCA. This method improves the reconstruction of the facial actions emanating from the occluded areas of the face. However, due to the fact that PCA produces bases that encode composite, correlated actions, such an enhancement also tends to affect actions in non-occluded areas of the face. To avoid this, more localised controls for facial actions are produced using independent component analysis (ICA). Simple projection of the data onto an ICA model is not viable due to the non-orthogonality of the extracted bases. Thus occlusion-affected mimicry is first generated using the PCA model and then enhanced by accordingly manipulating the independent components that are subsequently extracted from the mimicry. This combination of methods yields significant improvements and results in photorealistic reconstructions of occluded facial actions

    Image-based rendering and synthesis

    Get PDF
    Multiview imaging (MVI) is currently the focus of some research as it has a wide range of applications and opens up research in other topics and applications, including virtual view synthesis for three-dimensional (3D) television (3DTV) and entertainment. However, a large amount of storage is needed by multiview systems and are difficult to construct. The concept behind allowing 3D scenes and objects to be visualized in a realistic way without full 3D model reconstruction is image-based rendering (IBR). Using images as the primary substrate, IBR has many potential applications including for video games, virtual travel and others. The technique creates new views of scenes which are reconstructed from a collection of densely sampled images or videos. The IBR concept has different classification such as knowing 3D models and the lighting conditions and be rendered using conventional graphic techniques. Another is lightfield or lumigraph rendering which depends on dense sampling with no or very little geometry for rendering without recovering the exact 3D-models.published_or_final_versio

    Analyzing interfaces and workflows for light field editing

    Get PDF
    With the increasing number of available consumer light field cameras, such as Lytro, Raytrix, or Pelican Imaging, this new form of photography is progressively becoming more common. However, there are still very few tools for light field editing, and the interfaces to create those edits remain largely unexplored. Given the extended dimensionality of light field data, it is not clear what the most intuitive interfaces and optimal workflows are, in contrast with well-studied two-dimensional (2-D) image manipulation software. In this work, we provide a detailed description of subjects' performance and preferences for a number of simple editing tasks, which form the basis for more complex operations. We perform a detailed state sequence analysis and hidden Markov chain analysis based on the sequence of tools and interaction paradigms users employ while editing light fields. These insights can aid researchers and designers in creating new light field editing tools and interfaces, thus helping to close the gap between 4-D and 2-D image editing

    Depth-Assisted Semantic Segmentation, Image Enhancement and Parametric Modeling

    Get PDF
    This dissertation addresses the problem of employing 3D depth information on solving a number of traditional challenging computer vision/graphics problems. Humans have the abilities of perceiving the depth information in 3D world, which enable humans to reconstruct layouts, recognize objects and understand the geometric space and semantic meanings of the visual world. Therefore it is significant to explore how the 3D depth information can be utilized by computer vision systems to mimic such abilities of humans. This dissertation aims at employing 3D depth information to solve vision/graphics problems in the following aspects: scene understanding, image enhancements and 3D reconstruction and modeling. In addressing scene understanding problem, we present a framework for semantic segmentation and object recognition on urban video sequence only using dense depth maps recovered from the video. Five view-independent 3D features that vary with object class are extracted from dense depth maps and used for segmenting and recognizing different object classes in street scene images. We demonstrate a scene parsing algorithm that uses only dense 3D depth information to outperform using sparse 3D or 2D appearance features. In addressing image enhancement problem, we present a framework to overcome the imperfections of personal photographs of tourist sites using the rich information provided by large-scale internet photo collections (IPCs). By augmenting personal 2D images with 3D information reconstructed from IPCs, we address a number of traditionally challenging image enhancement techniques and achieve high-quality results using simple and robust algorithms. In addressing 3D reconstruction and modeling problem, we focus on parametric modeling of flower petals, the most distinctive part of a plant. The complex structure, severe occlusions and wide variations make the reconstruction of their 3D models a challenging task. We overcome these challenges by combining data driven modeling techniques with domain knowledge from botany. Taking a 3D point cloud of an input flower scanned from a single view, each segmented petal is fitted with a scale-invariant morphable petal shape model, which is constructed from individually scanned 3D exemplar petals. Novel constraints based on botany studies are incorporated into the fitting process for realistically reconstructing occluded regions and maintaining correct 3D spatial relations. The main contribution of the dissertation is in the intelligent usage of 3D depth information on solving traditional challenging vision/graphics problems. By developing some advanced algorithms either automatically or with minimum user interaction, the goal of this dissertation is to demonstrate that computed 3D depth behind the multiple images contains rich information of the visual world and therefore can be intelligently utilized to recognize/ understand semantic meanings of scenes, efficiently enhance and augment single 2D images, and reconstruct high-quality 3D models

    3D Human Face Reconstruction and 2D Appearance Synthesis

    Get PDF
    3D human face reconstruction has been an extensive research for decades due to its wide applications, such as animation, recognition and 3D-driven appearance synthesis. Although commodity depth sensors are widely available in recent years, image based face reconstruction are significantly valuable as images are much easier to access and store. In this dissertation, we first propose three image-based face reconstruction approaches according to different assumption of inputs. In the first approach, face geometry is extracted from multiple key frames of a video sequence with different head poses. The camera should be calibrated under this assumption. As the first approach is limited to videos, we propose the second approach then focus on single image. This approach also improves the geometry by adding fine grains using shading cue. We proposed a novel albedo estimation and linear optimization algorithm in this approach. In the third approach, we further loose the constraint of the input image to arbitrary in the wild images. Our proposed approach can robustly reconstruct high quality model even with extreme expressions and large poses. We then explore the applicability of our face reconstructions on four interesting applications: video face beautification, generating personalized facial blendshape from image sequences, face video stylizing and video face replacement. We demonstrate great potentials of our reconstruction approaches on these real-world applications. In particular, with the recent surge of interests in VR/AR, it is increasingly common to see people wearing head-mounted displays. However, the large occlusion on face is a big obstacle for people to communicate in a face-to-face manner. Our another application is that we explore hardware/software solutions for synthesizing the face image with presence of HMDs. We design two setups (experimental and mobile) which integrate two near IR cameras and one color camera to solve this problem. With our algorithm and prototype, we can achieve photo-realistic results. We further propose a deep neutral network to solve the HMD removal problem considering it as a face inpainting problem. This approach doesn\u27t need special hardware and run in real-time with satisfying results

    Realistic reconstruction and rendering of detailed 3D scenarios from multiple data sources

    Get PDF
    During the last years, we have witnessed significant improvements in digital terrain modeling, mainly through photogrammetric techniques based on satellite and aerial photography, as well as laser scanning. These techniques allow the creation of Digital Elevation Models (DEM) and Digital Surface Models (DSM) that can be streamed over the network and explored through virtual globe applications like Google Earth or NASA WorldWind. The resolution of these 3D scenes has improved noticeably in the last years, reaching in some urban areas resolutions up to 1m or less for DEM and buildings, and less than 10 cm per pixel in the associated aerial imagery. However, in rural, forest or mountainous areas, the typical resolution for elevation datasets ranges between 5 and 30 meters, and typical resolution of corresponding aerial photographs ranges between 25 cm to 1 m. This current level of detail is only sufficient for aerial points of view, but as the viewpoint approaches the surface the terrain loses its realistic appearance. One approach to augment the detail on top of currently available datasets is adding synthetic details in a plausible manner, i.e. including elements that match the features perceived in the aerial view. By combining the real dataset with the instancing of models on the terrain and other procedural detail techniques, the effective resolution can potentially become arbitrary. There are several applications that do not need an exact reproduction of the real elements but would greatly benefit from plausibly enhanced terrain models: videogames and entertainment applications, visual impact assessment (e.g. how a new ski resort would look), virtual tourism, simulations, etc. In this thesis we propose new methods and tools to help the reconstruction and synthesis of high-resolution terrain scenes from currently available data sources, in order to achieve realistically looking ground-level views. In particular, we decided to focus on rural scenarios, mountains and forest areas. Our main goal is the combination of plausible synthetic elements and procedural detail with publicly available real data to create detailed 3D scenes from existing locations. Our research has focused on the following contributions: - An efficient pipeline for aerial imagery segmentation - Plausible terrain enhancement from high-resolution examples - Super-resolution of DEM by transferring details from the aerial photograph - Synthesis of arbitrary tree picture variations from a reduced set of photographs - Reconstruction of 3D tree models from a single image - A compact and efficient tree representation for real-time rendering of forest landscapesDurant els darrers anys, hem presenciat avenços significatius en el modelat digital de terrenys, principalment gràcies a tècniques fotogramètriques, basades en fotografia aèria o satèl·lit, i a escàners làser. Aquestes tècniques permeten crear Models Digitals d'Elevacions (DEM) i Models Digitals de Superfícies (DSM) que es poden retransmetre per la xarxa i ser explorats mitjançant aplicacions de globus virtuals com ara Google Earth o NASA WorldWind. La resolució d'aquestes escenes 3D ha millorat considerablement durant els darrers anys, arribant a algunes àrees urbanes a resolucions d'un metre o menys per al DEM i edificis, i fins a menys de 10 cm per píxel a les fotografies aèries associades. No obstant, en entorns rurals, boscos i zones muntanyoses, la resolució típica per a dades d'elevació es troba entre 5 i 30 metres, i per a les corresponents fotografies aèries varia entre 25 cm i 1m. Aquest nivell de detall només és suficient per a punts de vista aeris, però a mesura que ens apropem a la superfície el terreny perd tot el realisme. Una manera d'augmentar el detall dels conjunts de dades actuals és afegint a l'escena detalls sintètics de manera plausible, és a dir, incloure elements que encaixin amb les característiques que es perceben a la vista aèria. Així, combinant les dades reals amb instàncies de models sobre el terreny i altres tècniques de detall procedural, la resolució efectiva del model pot arribar a ser arbitrària. Hi ha diverses aplicacions per a les quals no cal una reproducció exacta dels elements reals, però que es beneficiarien de models de terreny augmentats de manera plausible: videojocs i aplicacions d'entreteniment, avaluació de l'impacte visual (per exemple, com es veuria una nova estació d'esquí), turisme virtual, simulacions, etc. En aquesta tesi, proposem nous mètodes i eines per ajudar a la reconstrucció i síntesi de terrenys en alta resolució partint de conjunts de dades disponibles públicament, per tal d'aconseguir vistes a nivell de terra realistes. En particular, hem decidit centrar-nos en escenes rurals, muntanyes i àrees boscoses. El nostre principal objectiu és la combinació d'elements sintètics plausibles i detall procedural amb dades reals disponibles públicament per tal de generar escenes 3D d'ubicacions existents. La nostra recerca s'ha centrat en les següents contribucions: - Un pipeline eficient per a segmentació d'imatges aèries - Millora plausible de models de terreny a partir d'exemples d’alta resolució - Super-resolució de models d'elevacions transferint-hi detalls de la fotografia aèria - Síntesis d'un nombre arbitrari de variacions d’imatges d’arbres a partir d'un conjunt reduït de fotografies - Reconstrucció de models 3D d'arbres a partir d'una única fotografia - Una representació compacta i eficient d'arbres per a navegació en temps real d'escenesPostprint (published version

    Efficient data structures for piecewise-smooth video processing

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 95-102).A number of useful image and video processing techniques, ranging from low level operations such as denoising and detail enhancement to higher level methods such as object manipulation and special effects, rely on piecewise-smooth functions computed from the input data. In this thesis, we present two computationally efficient data structures for representing piecewise-smooth visual information and demonstrate how they can dramatically simplify and accelerate a variety of video processing algorithms. We start by introducing the bilateral grid, an image representation that explicitly accounts for intensity edges. By interpreting brightness values as Euclidean coordinates, the bilateral grid enables simple expressions for edge-aware filters. Smooth functions defined on the bilateral grid are piecewise-smooth in image space. Within this framework, we derive efficient reinterpretations of a number of edge-aware filters commonly used in computational photography as operations on the bilateral grid, including the bilateral filter, edgeaware scattered data interpolation, and local histogram equalization. We also show how these techniques can be easily parallelized onto modern graphics hardware for real-time processing of high definition video. The second data structure we introduce is the video mesh, designed as a flexible central data structure for general-purpose video editing. It represents objects in a video sequence as 2.5D "paper cutouts" and allows interactive editing of moving objects and modeling of depth, which enables 3D effects and post-exposure camera control. In our representation, we assume that motion and depth are piecewise-smooth, and encode them sparsely as a set of points tracked over time. The video mesh is a triangulation over this point set and per-pixel information is obtained by interpolation. To handle occlusions and detailed object boundaries, we rely on the user to rotoscope the scene at a sparse set of frames using spline curves. We introduce an algorithm to robustly and automatically cut the mesh into local layers with proper occlusion topology, and propagate the splines to the remaining frames. Object boundaries are refined with per-pixel alpha mattes. At its core, the video mesh is a collection of texture-mapped triangles, which we can edit and render interactively using graphics hardware. We demonstrate the effectiveness of our representation with special effects such as 3D viewpoint changes, object insertion, depthof- field manipulation, and 2D to 3D video conversion.by Jiawen Chen.Ph.D

    A robust patch-based synthesis framework for combining inconsistent images

    Get PDF
    Current methods for combining different images produce visible artifacts when the sources have very different textures and structures, come from far view points, or capture dynamic scenes with motions. In this thesis, we propose a patch-based synthesis algorithm to plausibly combine different images that have color, texture, structural, and geometric inconsistencies. For some applications such as cloning and stitching where a gradual blend is required, we present a new method for synthesizing a transition region between two source images, such that inconsistent properties change gradually from one source to the other. We call this process image melding. For gradual blending, we generalized patch-based optimization foundation with three key generalizations: First, we enrich the patch search space with additional geometric and photometric transformations. Second, we integrate image gradients into the patch representation and replace the usual color averaging with a screened Poisson equation solver. Third, we propose a new energy based on mixed L2/L0 norms for colors and gradients that produces a gradual transition between sources without sacrificing texture sharpness. Together, all three generalizations enable patch-based solutions to a broad class of image melding problems involving inconsistent sources: object cloning, stitching challenging panoramas, hole filling from multiple photos, and image harmonization. We also demonstrate another application which requires us to address inconsistencies across the images: high dynamic range (HDR) reconstruction using sequential exposures. In this application, the results will suffer from objectionable artifacts for dynamic scenes if the inconsistencies caused by significant scene motions are not handled properly. In this thesis, we propose a new approach to HDR reconstruction that uses information in all exposures while being more robust to motion than previous techniques. Our algorithm is based on a novel patch-based energy-minimization formulation that integrates alignment and reconstruction in a joint optimization through an equation we call the HDR image synthesis equation. This allows us to produce an HDR result that is aligned to one of the exposures yet contains information from all of them. These two applications (image melding and high dynamic range reconstruction) show that patch based methods like the one proposed in this dissertation can address inconsistent images and could open the door to many new image editing applications in the future

    Algorithmen zur Korrespondenzschätzung und Bildinterpolation für die photorealistische Bildsynthese

    Get PDF
    Free-viewpoint video is a new form of visual medium that has received considerable attention in the last 10 years. Most systems reconstruct the geometry of the scene, thus restricting themselves to synchronized multi-view footage and Lambertian scenes. In this thesis we follow a different approach and describe contributions to a purely image-based end-to-end system operating on sparse, unsynchronized multi-view footage. In particular, we focus on dense correspondence estimation and synthesis of in-between views. In contrast to previous approaches, our correspondence estimation is specifically tailored to the needs of image interpolation; our multi-image interpolation technique advances the state-of-the-art by disposing the conventional blending step. Both algorithms are put to work in an image-based free-viewpoint video system and we demonstrate their applicability to space-time visual effects production as well as to stereoscopic content creation.3D-Video mit Blickpunktnavigation ist eine neues digitales Medium welchem die Forschung in den letzten 10 Jahren viel Aufmerksamkeit gewidmet hat. Die meisten Verfahren rekonstruieren dabei die Szenengeometrie und schränken sich somit auf Lambertsche Szenen und synchron aufgenommene Eingabedaten ein. In dieser Dissertation beschreiben wir Beiträge zu einem rein bild-basierten System welches auf unsynchronisierten Eingabevideos arbeitet. Unser Fokus liegt dabei auf der Schätzung dichter Korrespondenzkarten und auf der Synthese von Zwischenbildern. Im Gegensatz zu bisherigen Verfahren ist unser Ansatz der Korrespondenzschätzung auf die Bedürfnisse der Bilderinterpolation ausgerichtet; unsere Zwischenbildsynthese verzichtet auf das Überblenden der Eingabebilder zu Gunsten der Lösung eines Labelingproblems. Das resultierende System eignet sich sowohl zur Produktion räumlich-zeitlicher Spezialeffekte als auch zur Erzeugung stereoskopischer Videosequenzen
    • …
    corecore