6 research outputs found

    The Video Mesh: A Data Structure for Image-based Three-dimensional Video Editing

    Get PDF
    This paper introduces the video mesh, a data structure for representing video as 2.5D “paper cutouts.” The video mesh allows interactive editing of moving objects and modeling of depth, which enables 3D effects and post-exposure camera control. The video mesh sparsely encodes optical flow as well as depth, and handles occlusion using local layering and alpha mattes. Motion is described by a sparse set of points tracked over time. Each point also stores a depth value. The video mesh is a triangulation over this point set and per-pixel information is obtained by interpolation. The user rotoscopes occluding contours and we introduce an algorithm to cut the video mesh along them. Object boundaries are refined with per-pixel alpha values. The video mesh is at its core a set of texture mapped triangles, we leverage graphics hardware to enable interactive editing and rendering of a variety of effects. We demonstrate the effectiveness of our representation with special effects such as 3D viewpoint changes, object insertion, depth-of-field manipulation, and 2D to 3D video conversion

    A MATCH-MOVING METHOD COMBINING AI AND SFM ALGORITHMS IN HISTORICAL FILM FOOTAGE

    Get PDF
    Searching for suitable material for photogrammetry is a key part in the documentation of Cultural Heritage. Photogrammetry can be used to produce a metrically certified 3D model. Material contained in historical film footage archives is especially useful for documentation when the heritage has been lost. In this research an innovative match-moving method is proposed that aims to exploit Artificial Intelligence and SfM algorithms to identify the frames extracted from a film footage in which the lost monument appears and that are suitable to be processed with photogrammetry for its 3D reconstruction. First of all the identification and tracking of the heritage in the videos was performed training an object detection Neural Network. Then the frames detected were automatically extracted with the coordinates of the bounding boxes that contain the monument. The camera motions were identified by selecting only the shots taken from multiple points of view of the same scene and analysing the evolution of the bounding boxes position over time. A further check of the material was necessary to select only sequences and to eliminate single frames and images from different historic periods. After this process, only the correct frames were automatically selected and processed with photogrammetry and the quality of the obtained 3D model was assessed. The method experimented in this research represents a powerful tool in the field of Cultural Heritage because it makes the selection of suitable material for photogrammetry automatic. Moreover it offers important insights that could be extended to other sectors

    The Video Mesh: A Data Structure for Image-based Video Editing

    Get PDF
    This paper introduces the video mesh, a data structure for representing video as 2.5D "paper cutouts." The video mesh allows interactive editing of moving objects and modeling of depth, which enables 3D effects and post-exposure camera control. The video mesh sparsely encodes optical flow as well as depth, and handles occlusion using local layering and alpha mattes. Motion is described by a sparse set of points tracked over time. Each point also stores a depth value. The video mesh is a triangulation over this point set and per-pixel information is obtained by interpolation. The user rotoscopes occluding contours and we introduce an algorithm to cut the video mesh along them. Object boundaries are refined with perpixel alpha values. The video mesh is at its core a set of texture mapped triangles, we leverage graphics hardware to enable interactive editing and rendering of a variety of effects. We demonstrate the effectiveness of our representation with a number of special effects including 3D viewpoint changes, object insertion, and depth-of-field manipulation

    Single View Modeling and View Synthesis

    Get PDF
    This thesis develops new algorithms to produce 3D content from a single camera. Today, amateurs can use hand-held camcorders to capture and display the 3D world in 2D, using mature technologies. However, there is always a strong desire to record and re-explore the 3D world in 3D. To achieve this goal, current approaches usually make use of a camera array, which suffers from tedious setup and calibration processes, as well as lack of portability, limiting its application to lab experiments. In this thesis, I try to produce the 3D contents using a single camera, making it as simple as shooting pictures. It requires a new front end capturing device rather than a regular camcorder, as well as more sophisticated algorithms. First, in order to capture the highly detailed object surfaces, I designed and developed a depth camera based on a novel technique called light fall-off stereo (LFS). The LFS depth camera outputs color+depth image sequences and achieves 30 fps, which is necessary for capturing dynamic scenes. Based on the output color+depth images, I developed a new approach that builds 3D models of dynamic and deformable objects. While the camera can only capture part of a whole object at any instance, partial surfaces are assembled together to form a complete 3D model by a novel warping algorithm. Inspired by the success of single view 3D modeling, I extended my exploration into 2D-3D video conversion that does not utilize a depth camera. I developed a semi-automatic system that converts monocular videos into stereoscopic videos, via view synthesis. It combines motion analysis with user interaction, aiming to transfer as much depth inferring work from the user to the computer. I developed two new methods that analyze the optical flow in order to provide additional qualitative depth constraints. The automatically extracted depth information is presented in the user interface to assist with user labeling work. In this thesis, I developed new algorithms to produce 3D contents from a single camera. Depending on the input data, my algorithm can build high fidelity 3D models for dynamic and deformable objects if depth maps are provided. Otherwise, it can turn the video clips into stereoscopic video

    Efficient data structures for piecewise-smooth video processing

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 95-102).A number of useful image and video processing techniques, ranging from low level operations such as denoising and detail enhancement to higher level methods such as object manipulation and special effects, rely on piecewise-smooth functions computed from the input data. In this thesis, we present two computationally efficient data structures for representing piecewise-smooth visual information and demonstrate how they can dramatically simplify and accelerate a variety of video processing algorithms. We start by introducing the bilateral grid, an image representation that explicitly accounts for intensity edges. By interpreting brightness values as Euclidean coordinates, the bilateral grid enables simple expressions for edge-aware filters. Smooth functions defined on the bilateral grid are piecewise-smooth in image space. Within this framework, we derive efficient reinterpretations of a number of edge-aware filters commonly used in computational photography as operations on the bilateral grid, including the bilateral filter, edgeaware scattered data interpolation, and local histogram equalization. We also show how these techniques can be easily parallelized onto modern graphics hardware for real-time processing of high definition video. The second data structure we introduce is the video mesh, designed as a flexible central data structure for general-purpose video editing. It represents objects in a video sequence as 2.5D "paper cutouts" and allows interactive editing of moving objects and modeling of depth, which enables 3D effects and post-exposure camera control. In our representation, we assume that motion and depth are piecewise-smooth, and encode them sparsely as a set of points tracked over time. The video mesh is a triangulation over this point set and per-pixel information is obtained by interpolation. To handle occlusions and detailed object boundaries, we rely on the user to rotoscope the scene at a sparse set of frames using spline curves. We introduce an algorithm to robustly and automatically cut the mesh into local layers with proper occlusion topology, and propagate the splines to the remaining frames. Object boundaries are refined with per-pixel alpha mattes. At its core, the video mesh is a collection of texture-mapped triangles, which we can edit and render interactively using graphics hardware. We demonstrate the effectiveness of our representation with special effects such as 3D viewpoint changes, object insertion, depthof- field manipulation, and 2D to 3D video conversion.by Jiawen Chen.Ph.D
    corecore