24 research outputs found

    Multiperspective mosaics and layered representation for scene visualization

    Get PDF
    This thesis documents the efforts made to implement multiperspective mosaicking for the purpose of mosaicking undervehicle and roadside sequences. For the undervehicle sequences, it is desired to create a large, high-resolution mosaic that may used to quickly inspect the entire scene shot by a camera making a single pass underneath the vehicle. Several constraints are placed on the video data, in order to facilitate the assumption that the entire scene in the sequence exists on a single plane. Therefore, a single mosaic is used to represent a single video sequence. Phase correlation is used to perform motion analysis in this case. For roadside video sequences, it is assumed that the scene is composed of several planar layers, as opposed to a single plane. Layer extraction techniques are implemented in order to perform this decomposition. Instead of using phase correlation to perform motion analysis, the Lucas-Kanade motion tracking algorithm is used in order to create dense motion maps. Using these motion maps, spatial support for each layer is determined based on a pre-initialized layer model. By separating the pixels in the scene into motion-specific layers, it is possible to sample each element in the scene correctly while performing multiperspective mosaicking. It is also possible to fill in many gaps in the mosaics caused by occlusions, hence creating more complete representations of the objects of interest. The results are several mosaics with each mosaic representing a single planar layer of the scene

    MegaParallax: Casual 360° Panoramas with Motion Parallax

    Get PDF

    Dynamic 3D Urban Scene Modeling Using Multiple Pushbroom Mosaics

    Full text link
    In this paper, a unified, segmentation-based approach is proposed to deal with both stereo reconstruction and moving objects detection problems using multiple stereo mosaics. Each set of parallel-perspective (pushbroom) stereo mosaics is generated from a video sequence captured by a single video camera. First a colorsegmentation approach is used to extract the so-called natural matching primitives from a reference view of a pair of stereo mosaics to facilitate both 3D reconstruction of textureless urban scenes and man-made moving targets (e.g. vehicles). Multiple pairs of stereo mosaics are used to improve the accuracy and robustness in 3D recovery and occlusion handling. Moving targets are detected by inspecting their 3D anomalies, either violating the epipolar geometry of the pushbroom stereo or exhibiting abnormal 3D structure. Experimental results on both simulated and real video sequences are provided to show the effectiveness of our approach. 1

    Localisation and tracking of stationary users for extended reality

    Get PDF
    In this thesis, we investigate the topics of localisation and tracking in the context of Extended Reality. In many on-site or outdoor Augmented Reality (AR) applications, users are standing or sitting in one place and performing mostly rotational movements, i.e. stationary. This type of stationary motion also occurs in Virtual Reality (VR) applications such as panorama capture by moving a camera in a circle. Both applications require us to track the motion of a camera in potentially very large and open environments. State-of-the-art methods such as Structure-from-Motion (SfM), and Simultaneous Localisation and Mapping (SLAM), tend to rely on scene reconstruction from significant translational motion in order to compute camera positions. This can often lead to failure in application scenarios such as tracking for seated sport spectators, or stereo panorama capture where the translational movement is small compared to the scale of the environment. To begin with, we investigate the topic of localisation as it is key to providing global context for many stationary applications. To achieve this, we capture our own datasets in a variety of large open spaces including two sports stadia. We then develop and investigate these techniques in the context of these sports stadia using a variety of state-of-the-art localisation approaches. We cover geometry-based methods to handle dynamic aspects of a stadium environment, as well as appearance-based methods, and compare them to a state-of-the-art SfM system to identify the most applicable methods for server-based and on-device localisation. Recent work in SfM has shown that the type of stationary motion that we target can be reliably estimated by applying spherical constraints to the pose estimation. In this thesis, we extend these concepts into a real-time keyframe-based SLAM system for the purposes of AR, and develop a unique data structure for simplifying keyframe selection. We show that our constrained approach can track more robustly in these challenging stationary scenarios compared to state-of-the-art SLAM through both synthetic and real-data tests. In the application of capturing stereo panoramas for VR, this thesis demonstrates the unsuitability of standard SfM techniques for reconstructing these circular videos. We apply and extend recent research in spherically constrained SfM to creating stereo panoramas and compare this with state-of-the-art general SfM in a technical evaluation. With a user study, we show that the motion requirements of our SfM approach are similar to the natural motion of users, and that a constrained SfM approach is sufficient for providing stereoscopic effects when viewing the panoramas in VR

    Computer Vision and Image Understanding xxx

    Get PDF
    Abstract 12 A compact visual representation, called the 3D layered, adaptive-resolution, and multi-13 perspective panorama (LAMP), is proposed for representing large-scale 3D scenes with large 14 variations of depths and obvious occlusions. Two kinds of 3D LAMP representations are 15 proposed: the relief-like LAMP and the image-based LAMP. Both types of LAMPs con-16 cisely represent almost all the information from a long image sequence. Methods to con-17 struct LAMP representations from video sequences with dominant translation are 18 provided. The relief-like LAMP is basically a single extended multi-perspective panoramic 19 view image. Each pixel has a pair of texture and depth values, but each pixel may also have 20 multiple pairs of texture-depth values to represent occlusion in layers, in addition to adap-21 tive resolution changing with depth. The image-based LAMP, on the other hand, consists of 22 a set of multi-perspective layers, each of which has a pair of 2D texture and depth maps, 23 but with adaptive time-sampling scales depending on depths of scene points. Several exam-24 ples of 3D LAMP construction for real image sequences are given. The 3D LAMP is a con-25 cise and powerful representation for image-based rendering. 2

    Progressive Refinement Imaging

    Get PDF
    This paper presents a novel technique for progressive online integration of uncalibrated image sequences with substantial geometric and/or photometric discrepancies into a single, geometrically and photometrically consistent image. Our approach can handle large sets of images, acquired from a nearly planar or infinitely distant scene at different resolutions in object domain and under variable local or global illumination conditions. It allows for efficient user guidance as its progressive nature provides a valid and consistent reconstruction at any moment during the online refinement process. // Our approach avoids global optimization techniques, as commonly used in the field of image refinement, and progressively incorporates new imagery into a dynamically extendable and memory‐efficient Laplacian pyramid. Our image registration process includes a coarse homography and a local refinement stage using optical flow. Photometric consistency is achieved by retaining the photometric intensities given in a reference image, while it is being refined. Globally blurred imagery and local geometric inconsistencies due to, e.g. motion are detected and removed prior to image fusion. // We demonstrate the quality and robustness of our approach using several image and video sequences, including handheld acquisition with mobile phones and zooming sequences with consumer cameras

    Low-rank Based Algorithms for Rectification, Repetition Detection and De-noising in Urban Images

    Full text link
    In this thesis, we aim to solve the problem of automatic image rectification and repeated patterns detection on 2D urban images, using novel low-rank based techniques. Repeated patterns (such as windows, tiles, balconies and doors) are prominent and significant features in urban scenes. Detection of the periodic structures is useful in many applications such as photorealistic 3D reconstruction, 2D-to-3D alignment, facade parsing, city modeling, classification, navigation, visualization in 3D map environments, shape completion, cinematography and 3D games. However both of the image rectification and repeated patterns detection problems are challenging due to scene occlusions, varying illumination, pose variation and sensor noise. Therefore, detection of these repeated patterns becomes very important for city scene analysis. Given a 2D image of urban scene, we automatically rectify a facade image and extract facade textures first. Based on the rectified facade texture, we exploit novel algorithms that extract repeated patterns by using Kronecker product based modeling that is based on a solid theoretical foundation. We have tested our algorithms in a large set of images, which includes building facades from Paris, Hong Kong and New York
    corecore