10 research outputs found

    コンピュータビジョン・グラフィックスのための影の消去と補間

    Get PDF
    University of Tokyo (東京大学

    Computer Vision

    Full text link

    Die Virtuelle Videokamera: ein System zur Blickpunktsynthese in beliebigen, dynamischen Szenen

    Get PDF
    The Virtual Video Camera project strives to create free viewpoint video from casually captured multi-view data. Multiple video streams of a dynamic scene are captured with off-the-shelf camcorders, and the user can re-render the scene from novel perspectives. In this thesis the algorithmic core of the Virtual Video Camera is presented. This includes the algorithm for image correspondence estimation as well as the image-based renderer. Furthermore, its application in the context of an actual video production is showcased, and the rendering and image processing pipeline is extended to incorporate depth information.Das Virtual Video Camera Projekt dient der Erzeugung von Free Viewpoint Video Ansichten von Multi-View Aufnahmen: Material mehrerer Videoströme wird hierzu mit handelsüblichen Camcordern aufgezeichnet. Im Anschluss kann die Szene aus beliebigen, von den ursprünglichen Kameras nicht abgedeckten Blickwinkeln betrachtet werden. In dieser Dissertation wird der algorithmische Kern der Virtual Video Camera vorgestellt. Dies beinhaltet das Verfahren zur Bildkorrespondenzschätzung sowie den bildbasierten Renderer. Darüber hinaus wird die Anwendung im Kontext einer Videoproduktion beleuchtet. Dazu wird die bildbasierte Erzeugung neuer Blickpunkte um die Erzeugung und Einbindung von Tiefeninformationen erweitert

    Image Based View Synthesis

    Get PDF
    This dissertation deals with the image-based approach to synthesize a virtual scene using sparse images or a video sequence without the use of 3D models. In our scenario, a real dynamic or static scene is captured by a set of un-calibrated images from different viewpoints. After automatically recovering the geometric transformations between these images, a series of photo-realistic virtual views can be rendered and a virtual environment covered by these several static cameras can be synthesized. This image-based approach has applications in object recognition, object transfer, video synthesis and video compression. In this dissertation, I have contributed to several sub-problems related to image based view synthesis. Before image-based view synthesis can be performed, images need to be segmented into individual objects. Assuming that a scene can approximately be described by multiple planar regions, I have developed a robust and novel approach to automatically extract a set of affine or projective transformations induced by these regions, correctly detect the occlusion pixels over multiple consecutive frames, and accurately segment the scene into several motion layers. First, a number of seed regions using correspondences in two frames are determined, and the seed regions are expanded and outliers are rejected employing the graph cuts method integrated with level set representation. Next, these initial regions are merged into several initial layers according to the motion similarity. Third, the occlusion order constraints on multiple frames are explored, which guarantee that the occlusion area increases with the temporal order in a short period and effectively maintains segmentation consistency over multiple consecutive frames. Then the correct layer segmentation is obtained by using a graph cuts algorithm, and the occlusions between the overlapping layers are explicitly determined. Several experimental results are demonstrated to show that our approach is effective and robust. Recovering the geometrical transformations among images of a scene is a prerequisite step for image-based view synthesis. I have developed a wide baseline matching algorithm to identify the correspondences between two un-calibrated images, and to further determine the geometric relationship between images, such as epipolar geometry or projective transformation. In our approach, a set of salient features, edge-corners, are detected to provide robust and consistent matching primitives. Then, based on the Singular Value Decomposition (SVD) of an affine matrix, we effectively quantize the search space into two independent subspaces for rotation angle and scaling factor, and then we use a two-stage affine matching algorithm to obtain robust matches between these two frames. The experimental results on a number of wide baseline images strongly demonstrate that our matching method outperforms the state-of-art algorithms even under the significant camera motion, illumination variation, occlusion, and self-similarity. Given the wide baseline matches among images I have developed a novel method for Dynamic view morphing. Dynamic view morphing deals with the scenes containing moving objects in presence of camera motion. The objects can be rigid or non-rigid, each of them can move in any orientation or direction. The proposed method can generate a series of continuous and physically accurate intermediate views from only two reference images without any knowledge about 3D. The procedure consists of three steps: segmentation, morphing and post-warping. Given a boundary connection constraint, the source and target scenes are segmented into several layers for morphing. Based on the decomposition of affine transformation between corresponding points, we uniquely determine a physically correct path for post-warping by the least distortion method. I have successfully generalized the dynamic scene synthesis problem from the simple scene with only rotation to the dynamic scene containing non-rigid objects. My method can handle dynamic rigid or non-rigid objects, including complicated objects such as humans. Finally, I have also developed a novel algorithm for tri-view morphing. This is an efficient image-based method to navigate a scene based on only three wide-baseline un-calibrated images without the explicit use of a 3D model. After automatically recovering corresponding points between each pair of images using our wide baseline matching method, an accurate trifocal plane is extracted from the trifocal tensor implied in these three images. Next, employing a trinocular-stereo algorithm and barycentric blending technique, we generate an arbitrary novel view to navigate the scene in a 2D space. Furthermore, after self-calibration of the cameras, a 3D model can also be correctly augmented into this virtual environment synthesized by the tri-view morphing algorithm. We have applied our view morphing framework to several interesting applications: 4D video synthesis, automatic target recognition, multi-view morphing

    Advanced methods for relightable scene representations in image space

    Get PDF
    The realistic reproduction of visual appearance of real-world objects requires accurate computer graphics models that describe the optical interaction of a scene with its surroundings. Data-driven approaches that model the scene globally as a reflectance field function in eight parameters deliver high quality and work for most material combinations, but are costly to acquire and store. Image-space relighting, which constrains the application to create photos with a virtual, fix camera in freely chosen illumination, requires only a 4D data structure to provide full fidelity. This thesis contributes to image-space relighting on four accounts: (1) We investigate the acquisition of 4D reflectance fields in the context of sampling and propose a practical setup for pre-filtering of reflectance data during recording, and apply it in an adaptive sampling scheme. (2) We introduce a feature-driven image synthesis algorithm for the interpolation of coarsely sampled reflectance data in software to achieve highly realistic images. (3) We propose an implicit reflectance data representation, which uses a Bayesian approach to relight complex scenes from the example of much simpler reference objects. (4) Finally, we construct novel, passive devices out of optical components that render reflectance field data in real-time, shaping the incident illumination into the desired imageDie realistische Wiedergabe der visuellen Erscheinung einer realen Szene setzt genaue Modelle aus der Computergraphik für die Interaktion der Szene mit ihrer Umgebung voraus. Globale Ansätze, die das Verhalten der Szene insgesamt als Reflektanzfeldfunktion in acht Parametern modellieren, liefern hohe Qualität für viele Materialtypen, sind aber teuer aufzuzeichnen und zu speichern. Verfahren zur Neubeleuchtung im Bildraum schränken die Anwendbarkeit auf fest gewählte Kameras ein, ermöglichen aber die freie Wahl der Beleuchtung, und erfordern dadurch lediglich eine 4D - Datenstruktur für volle Wiedergabetreue. Diese Arbeit enthält vier Beiträge zu diesem Thema: (1) wir untersuchen die Aufzeichnung von 4D Reflektanzfeldern im Kontext der Abtasttheorie und schlagen einen praktischen Aufbau vor, der Reflektanzdaten bereits während der Messung vorfiltert. Wir verwenden ihn in einem adaptiven Abtastschema. (2) Wir führen einen merkmalgesteuerten Bildsynthesealgorithmus für die Interpolation von grob abgetasteten Reflektanzdaten ein. (3) Wir schlagen eine implizite Beschreibung von Reflektanzdaten vor, die mit einem Bayesschen Ansatz komplexe Szenen anhand des Beispiels eines viel einfacheren Referenzobjektes neu beleuchtet. (4) Unter der Verwendung optischer Komponenten schaffen wir passive Aufbauten zur Darstellung von Reflektanzfeldern in Echtzeit, indem wir einfallende Beleuchtung direkt in das gewünschte Bild umwandeln

    A system for image-based modeling and photo editing

    Get PDF
    Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Architecture, 2002.Includes bibliographical references (p. 169-178).Traditionally in computer graphics, a scene is represented by geometric primitives composed of various materials and a collection of lights. Recently, techniques for modeling and rendering scenes from a set of pre-acquired images have emerged as an alternative approach, known as image-based modeling and rendering. Much of the research in this field has focused on reconstructing and rerendering from a set of photographs, while little work has been done to address the problem of editing and modifying these scenes. On the other hand, photo-editing systems, such as Adobe Photoshop, provide a powerful, intuitive, and practical means to edit images. However, these systems are limited by their two-dimensional nature. In this thesis, we present a system that extends photo editing to 3D. Starting from a single input image, the system enables the user to reconstruct a 3D representation of the captured scene, and edit it with the ease and versatility of 2D photo editing. The scene is represented as layers of images with depth, where each layer is an image that encodes both color and depth. A suite of user-assisted tools are employed, based on a painting metaphor, to extract layers and assign depths. The system enables editing from different viewpoints, extracting and grouping of image-based objects, and modifying the shape, color, and illumination of these objects. As part of the system, we introduce three powerful new editing tools. These include two new clone brushing tools: the non-distorted clone brush and the structure-preserving clone brush. They permit copying of parts of an image to another via a brush interface, but alleviate distortions due to perspective foreshortening and object geometry.(cont.) The non-distorted clone brush works on arbitrary 3D geometry, while the structure-preserving clone brush, a 2D version, assumes a planar surface, but has the added advantage of working directly in 2D photo-editing systems that lack depth information. The third tool, a texture-illuminance decoupling filter, discounts the effect of illumination on uniformly textured areas by decoupling large- and small-scale features via bilateral filtering. This tool is crucial for relighting and changing the materials of the scene. There are many applications for such a system, for example architectural, lighting and landscape design, entertainment and special effects, games, and virtual TV sets. The system allows the user to superimpose scaled architectural models into real environments, or to quickly paint a desired lighting scheme of an interior, while being able to navigate within the scene for a fully immersive 3D experience. We present examples and results of complex architectural scenes, 360-degree panoramas, and even paintings, where the user can change viewpoints, edit the geometry and materials, and relight the environment.by Byong Mok Oh.Ph.D

    Surface Appearance Estimation from Video Sequences

    Get PDF
    The realistic virtual reproduction of real world objects using Computer Graphics techniques requires the accurate acquisition and reconstruction of both 3D geometry and surface appearance. Unfortunately, in several application contexts, such as Cultural Heritage (CH), the reflectance acquisition can be very challenging due to the type of object to acquire and the digitization conditions. Although several methods have been proposed for the acquisition of object reflectance, some intrinsic limitations still make its acquisition a complex task for CH artworks: the use of specialized instruments (dome, special setup for camera and light source, etc.); the need of highly controlled acquisition environments, such as a dark room; the difficulty to extend to objects of arbitrary shape and size; the high level of expertise required to assess the quality of the acquisition. The Ph.D. thesis proposes novel solutions for the acquisition and the estimation of the surface appearance in fixed and uncontrolled lighting conditions with several degree of approximations (from a perceived near diffuse color to a SVBRDF), taking advantage of the main features that differentiate a video sequences from an unordered photos collections: the temporal coherence; the data redundancy; the easy of the acquisition, which allows acquisition of many views of the object in a short time. Finally, Reflectance Transformation Imaging (RTI) is an example of widely used technology for the acquisition of the surface appearance in the CH field, even if limited to single view Reflectance Fields of nearly flat objects. In this context, the thesis addresses also two important issues in RTI usage: how to provide better and more flexible virtual inspection capabilities with a set of operators that improve the perception of details, features and overall shape of the artwork; how to increase the possibility to disseminate this data and to support remote visual inspection of both scholar and ordinary public

    Multi-camera reconstruction and rendering for free-viewpoint video

    Get PDF
    While virtual environments in interactive entertainment become more and more lifelike and sophisticated, traditional media like television and video have not yet embraced the new possibilities provided by the rapidly advancing processing power. In particular, they remain as non-interactive as ever, and do not allow the viewer to change the camera perspective to his liking. The goal of this work is to advance in this direction, and provide essential ingredients for a free-viewpoint video system, where the viewpoint can be chosen interactively during playback. Knowledge of scene geometry is required to synthesize novel views. Therefore, we describe 3D reconstruction methods for two distinct kinds of camera setups. The first one is depth reconstruction for camera arrays with parallel optical axes, the second one surface reconstruction, in the case that the cameras are distributed around the scene. Another vital part of a 3D video system is the interactive rendering from different viewpoints, which has to perform in real-time. We cover this topic in the last part of this thesis.Während die virtuellen Welten in interaktiven Unterhaltungsmedien immer realitätsnäher werden, machen traditionellere Medien wie Fernsehen und Video von den neuen Möglichkeiten der rasant wachsenden Rechenkapazität bisher kaum Gebrauch. Insbesondere mangelt es ihnen immer noch an Interaktivität, und sie erlauben dem Konsumenten nicht, elementare Parameter wie zum Beispiel die Kameraperspektive seinen Wünschen anzupassen. Ziel dieser Arbeit ist es, die Entwicklung in diese Richtung voranzubringen und essentielle Bausteine für ein Videosystem bereitzustellen, bei dem der Blickpunkt während der Wiedergabe jederzeit völlig frei gewählt werden kann. Um neue Ansichten synthetisieren zu können, ist zunächst Kenntnis von der 3D Geometrie der Szene notwendig. Wir entwickeln daher Rekonstruktionsalgorithmen für zwei verschiedene Anordnungen von Kameras. Falls die Kameras eng beieinanderliegen und parallele optische Achsen haben, können lediglich Tiefenkarten geschätzt werden. Sind die Kameras jedoch im einer Halbkugel um die Szene herum montiert, so rekonstruieren wir sogar echte Oberflächengeometrie. Ein weiterer wichtiger Aspekt ist die interaktive Darstellung der Szene aus neuen Blickwinkeln, die wir im letzten Teil der Arbeit in Angriff nehmen
    corecore