212 research outputs found

    Holistic Video Stitching for Street Panorama

    Get PDF
    Coordinated Science Laboratory was formerly known as Control Systems LaboratoryIn this paper, we address how to automatically generate a panorama for a street view from a long video sequence. We model the panorama as a low-rank matrix and formulate the problem as one of robust recovery of the low-rank matrix from highly incomplete, corrupted, deformed measurements (the video frames). We leverage powerful high-dimensional convex optimization tools from compressive sensing of sparse signals and low-rank matrices to solve this problem. In particular, we show how the new method can effectively remove severe occlusions or corruptions (caused by trees, cars, or reflections, etc.), and obtain clean, intrinsic street panoramas that are consistent with all frames. We also show how our method can automatically and robustly establish pixel-wise accurate registration among all the video frames. We demonstrate the effectiveness of our method by conducting extensive experimental comparison with other popular video stitching methods such as AutoStitch and Adobe Photoshop.National Science Foundation / NSF IIS 11-1601

    Structure-from-motion in Spherical Video using the von Mises-Fisher Distribution

    Get PDF
    In this paper, we present a complete pipeline for computing structure-from-motion from the sequences of spherical images. We revisit problems from multiview geometry in the context of spherical images. In particular, we propose methods suited to spherical camera geometry for the spherical-n-point problem (estimating camera pose for a spherical image) and calibrated spherical reconstruction (estimating the position of a 3-D point from multiple spherical images). We introduce a new probabilistic interpretation of spherical structure-from-motion which uses the von Mises-Fisher distribution to model noise in spherical feature point positions. This model provides an alternate objective function that we use in bundle adjustment. We evaluate our methods quantitatively and qualitatively on both synthetic and real world data and show that our methods developed for spherical images outperform straightforward adaptations of methods developed for perspective images. As an application of our method, we use the structure-from-motion output to stabilise the viewing direction in fully spherical video

    Applying image processing techniques to pose estimation and view synthesis.

    Get PDF
    Fung Yiu-fai Phineas.Thesis (M.Phil.)--Chinese University of Hong Kong, 1999.Includes bibliographical references (leaves 142-148).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Model-based Pose Estimation --- p.3Chapter 1.1.1 --- Application - 3D Motion Tracking --- p.4Chapter 1.2 --- Image-based View Synthesis --- p.4Chapter 1.3 --- Thesis Contribution --- p.7Chapter 1.4 --- Thesis Outline --- p.8Chapter 2 --- General Background --- p.9Chapter 2.1 --- Notations --- p.9Chapter 2.2 --- Camera Models --- p.10Chapter 2.2.1 --- Generic Camera Model --- p.10Chapter 2.2.2 --- Full-perspective Camera Model --- p.11Chapter 2.2.3 --- Affine Camera Model --- p.12Chapter 2.2.4 --- Weak-perspective Camera Model --- p.13Chapter 2.2.5 --- Paraperspective Camera Model --- p.14Chapter 2.3 --- Model-based Motion Analysis --- p.15Chapter 2.3.1 --- Point Correspondences --- p.16Chapter 2.3.2 --- Line Correspondences --- p.18Chapter 2.3.3 --- Angle Correspondences --- p.19Chapter 2.4 --- Panoramic Representation --- p.20Chapter 2.4.1 --- Static Mosaic --- p.21Chapter 2.4.2 --- Dynamic Mosaic --- p.22Chapter 2.4.3 --- Temporal Pyramid --- p.23Chapter 2.4.4 --- Spatial Pyramid --- p.23Chapter 2.5 --- Image Pre-processing --- p.24Chapter 2.5.1 --- Feature Extraction --- p.24Chapter 2.5.2 --- Spatial Filtering --- p.27Chapter 2.5.3 --- Local Enhancement --- p.31Chapter 2.5.4 --- Dynamic Range Stretching or Compression --- p.32Chapter 2.5.5 --- YIQ Color Model --- p.33Chapter 3 --- Model-based Pose Estimation --- p.35Chapter 3.1 --- Previous Work --- p.35Chapter 3.1.1 --- Estimation from Established Correspondences --- p.36Chapter 3.1.2 --- Direct Estimation from Image Intensities --- p.49Chapter 3.1.3 --- Perspective-3-Point Problem --- p.51Chapter 3.2 --- Our Iterative P3P Algorithm --- p.58Chapter 3.2.1 --- Gauss-Newton Method --- p.60Chapter 3.2.2 --- Dealing with Ambiguity --- p.61Chapter 3.2.3 --- 3D-to-3D Motion Estimation --- p.66Chapter 3.3 --- Experimental Results --- p.68Chapter 3.3.1 --- Synthetic Data --- p.68Chapter 3.3.2 --- Real Images --- p.72Chapter 3.4 --- Discussions --- p.73Chapter 4 --- Panoramic View Analysis --- p.76Chapter 4.1 --- Advanced Mosaic Representation --- p.76Chapter 4.1.1 --- Frame Alignment Policy --- p.77Chapter 4.1.2 --- Multi-resolution Representation --- p.77Chapter 4.1.3 --- Parallax-based Representation --- p.78Chapter 4.1.4 --- Multiple Moving Objects --- p.79Chapter 4.1.5 --- Layers and Tiles --- p.79Chapter 4.2 --- Panorama Construction --- p.79Chapter 4.2.1 --- Image Acquisition --- p.80Chapter 4.2.2 --- Image Alignment --- p.82Chapter 4.2.3 --- Image Integration --- p.88Chapter 4.2.4 --- Significant Residual Estimation --- p.89Chapter 4.3 --- Advanced Alignment Algorithms --- p.90Chapter 4.3.1 --- Patch-based Alignment --- p.91Chapter 4.3.2 --- Global Alignment (Block Adjustment) --- p.92Chapter 4.3.3 --- Local Alignment (Deghosting) --- p.93Chapter 4.4 --- Mosaic Application --- p.94Chapter 4.4.1 --- Visualization Tool --- p.94Chapter 4.4.2 --- Video Manipulation --- p.95Chapter 4.5 --- Experimental Results --- p.96Chapter 5 --- Panoramic Walkthrough --- p.99Chapter 5.1 --- Problem Statement and Notations --- p.100Chapter 5.2 --- Previous Work --- p.101Chapter 5.2.1 --- 3D Modeling and Rendering --- p.102Chapter 5.2.2 --- Branching Movies --- p.103Chapter 5.2.3 --- Texture Window Scaling --- p.104Chapter 5.2.4 --- Problems with Simple Texture Window Scaling --- p.105Chapter 5.3 --- Our Walkthrough Approach --- p.106Chapter 5.3.1 --- Cylindrical Projection onto Image Plane --- p.106Chapter 5.3.2 --- Generating Intermediate Frames --- p.108Chapter 5.3.3 --- Occlusion Handling --- p.114Chapter 5.4 --- Experimental Results --- p.116Chapter 5.5 --- Discussions --- p.116Chapter 6 --- Conclusion --- p.121Chapter A --- Formulation of Fischler and Bolles' Method for P3P Problems --- p.123Chapter B --- Derivation of z1 and z3 in terms of z2 --- p.127Chapter C --- Derivation of e1 and e2 --- p.129Chapter D --- Derivation of the Update Rule for Gauss-Newton Method --- p.130Chapter E --- Proof of (λ1λ2-λ 4)>〉0 --- p.132Chapter F --- Derivation of φ and hi --- p.133Chapter G --- Derivation of w1j to w4j --- p.134Chapter H --- More Experimental Results on Panoramic Stitching Algorithms --- p.138Bibliography --- p.14

    Reflected Object Removal in 360 Panoramic Images

    Get PDF
    Department of Electrical EngineeringWhen the reflected scene is captured in the 360 panoramic image, the actual scene of the reflection also taken together in the image. Based on this observation, we propose an algorithm that distinguishes the reflection and transmission layer in the glass image using the actual scene of reflection and removes the reflected objects from the panoramic image. We first separate the glass and background image by the user-assist manner and then extract feature points to warp the background image. However, it is challenging to match glass and background keypoints since the two images have different characteristics such as color and transparency. Therefore, we extract initial pairs of matched points using on the edge pixels and use the Dense-SIFT descriptor to match the correspondence points. We then transform the background image through APAP to generate a reference image, which we use to discriminate the reflection and transmission edge in the glass image. Then we determine the reflection and transmission edges based on the gradient angle and magnitude. Finally, based on the two separated edges, we solve the layer separation problem by minimizing the gradients of transmission image based on the gradient sparsity prior.clos

    Doctor of Philosophy

    Get PDF
    dissertationInteractive editing and manipulation of digital media is a fundamental component in digital content creation. One media in particular, digital imagery, has seen a recent increase in popularity of its large or even massive image formats. Unfortunately, current systems and techniques are rarely concerned with scalability or usability with these large images. Moreover, processing massive (or even large) imagery is assumed to be an off-line, automatic process, although many problems associated with these datasets require human intervention for high quality results. This dissertation details how to design interactive image techniques that scale. In particular, massive imagery is typically constructed as a seamless mosaic of many smaller images. The focus of this work is the creation of new technologies to enable user interaction in the formation of these large mosaics. While an interactive system for all stages of the mosaic creation pipeline is a long-term research goal, this dissertation concentrates on the last phase of the mosaic creation pipeline - the composition of registered images into a seamless composite. The work detailed in this dissertation provides the technologies to fully realize interactive editing in mosaic composition on image collections ranging from the very small to massive in scale

    Low-rank Based Algorithms for Rectification, Repetition Detection and De-noising in Urban Images

    Full text link
    In this thesis, we aim to solve the problem of automatic image rectification and repeated patterns detection on 2D urban images, using novel low-rank based techniques. Repeated patterns (such as windows, tiles, balconies and doors) are prominent and significant features in urban scenes. Detection of the periodic structures is useful in many applications such as photorealistic 3D reconstruction, 2D-to-3D alignment, facade parsing, city modeling, classification, navigation, visualization in 3D map environments, shape completion, cinematography and 3D games. However both of the image rectification and repeated patterns detection problems are challenging due to scene occlusions, varying illumination, pose variation and sensor noise. Therefore, detection of these repeated patterns becomes very important for city scene analysis. Given a 2D image of urban scene, we automatically rectify a facade image and extract facade textures first. Based on the rectified facade texture, we exploit novel algorithms that extract repeated patterns by using Kronecker product based modeling that is based on a solid theoretical foundation. We have tested our algorithms in a large set of images, which includes building facades from Paris, Hong Kong and New York

    Videos in Context for Telecommunication and Spatial Browsing

    Get PDF
    The research presented in this thesis explores the use of videos embedded in panoramic imagery to transmit spatial and temporal information describing remote environments and their dynamics. Virtual environments (VEs) through which users can explore remote locations are rapidly emerging as a popular medium of presence and remote collaboration. However, capturing visual representation of locations to be used in VEs is usually a tedious process that requires either manual modelling of environments or the employment of specific hardware. Capturing environment dynamics is not straightforward either, and it is usually performed through specific tracking hardware. Similarly, browsing large unstructured video-collections with available tools is difficult, as the abundance of spatial and temporal information makes them hard to comprehend. At the same time, on a spectrum between 3D VEs and 2D images, panoramas lie in between, as they offer the same 2D images accessibility while preserving 3D virtual environments surrounding representation. For this reason, panoramas are an attractive basis for videoconferencing and browsing tools as they can relate several videos temporally and spatially. This research explores methods to acquire, fuse, render and stream data coming from heterogeneous cameras, with the help of panoramic imagery. Three distinct but interrelated questions are addressed. First, the thesis considers how spatially localised video can be used to increase the spatial information transmitted during video mediated communication, and if this improves quality of communication. Second, the research asks whether videos in panoramic context can be used to convey spatial and temporal information of a remote place and the dynamics within, and if this improves users' performance in tasks that require spatio-temporal thinking. Finally, the thesis considers whether there is an impact of display type on reasoning about events within videos in panoramic context. These research questions were investigated over three experiments, covering scenarios common to computer-supported cooperative work and video browsing. To support the investigation, two distinct video+context systems were developed. The first telecommunication experiment compared our videos in context interface with fully-panoramic video and conventional webcam video conferencing in an object placement scenario. The second experiment investigated the impact of videos in panoramic context on quality of spatio-temporal thinking during localization tasks. To support the experiment, a novel interface to video-collection in panoramic context was developed and compared with common video-browsing tools. The final experimental study investigated the impact of display type on reasoning about events. The study explored three adaptations of our video-collection interface to three display types. The overall conclusion is that videos in panoramic context offer a valid solution to spatio-temporal exploration of remote locations. Our approach presents a richer visual representation in terms of space and time than standard tools, showing that providing panoramic contexts to video collections makes spatio-temporal tasks easier. To this end, videos in context are suitable alternative to more difficult, and often expensive solutions. These findings are beneficial to many applications, including teleconferencing, virtual tourism and remote assistance
    corecore