16 research outputs found

    Fast, High-Quality Hierarchical Depth-Map Super-Resolution

    Get PDF

    Image-guided ToF depth upsampling: a survey

    Get PDF
    Recently, there has been remarkable growth of interest in the development and applications of time-of-flight (ToF) depth cameras. Despite the permanent improvement of their characteristics, the practical applicability of ToF cameras is still limited by low resolution and quality of depth measurements. This has motivated many researchers to combine ToF cameras with other sensors in order to enhance and upsample depth images. In this paper, we review the approaches that couple ToF depth images with high-resolution optical images. Other classes of upsampling methods are also briefly discussed. Finally, we provide an overview of performance evaluation tests presented in the related studies

    Graph Spectral Image Processing

    Full text link
    Recent advent of graph signal processing (GSP) has spurred intensive studies of signals that live naturally on irregular data kernels described by graphs (e.g., social networks, wireless sensor networks). Though a digital image contains pixels that reside on a regularly sampled 2D grid, if one can design an appropriate underlying graph connecting pixels with weights that reflect the image structure, then one can interpret the image (or image patch) as a signal on a graph, and apply GSP tools for processing and analysis of the signal in graph spectral domain. In this article, we overview recent graph spectral techniques in GSP specifically for image / video processing. The topics covered include image compression, image restoration, image filtering and image segmentation

    A Review of Remote Sensing Image Dehazing.

    Full text link
    Remote sensing (RS) is one of the data collection technologies that help explore more earth surface information. However, RS data captured by satellite are susceptible to particles suspended during the imaging process, especially for data with visible light band. To make up for such deficiency, numerous dehazing work and efforts have been made recently, whose strategy is to directly restore single hazy data without the need for using any extra information. In this paper, we first classify the current available algorithm into three categories, i.e., image enhancement, physical dehazing, and data-driven. The advantages and disadvantages of each type of algorithm are then summarized in detail. Finally, the evaluation indicators used to rank the recovery performance and the application scenario of the RS data haze removal technique are discussed, respectively. In addition, some common deficiencies of current available methods and future research focus are elaborated

    Enhancing Multi-View 3D-Reconstruction Using Multi-Frame Super Resolution

    Get PDF
    Multi-view stereo is a popular method for 3D-reconstruction. Super resolution is a technique used to produce high resolution output from low resolution input. Since the quality of 3D-reconstruction is directly dependent on the input, a simple path is to improve the resolution of the input. In this dissertation, we explore the idea of using super resolution to improve 3D-reconstruction at the input stage of the multi-view stereo framework. In particular, we show that multi-view stereo when combined with multi-frame super resolution produces a more accurate 3D-reconstruction. The proposed method utilizes images with sub-pixel camera movements to produce high resolution output. This enhanced output is fed through the multi-view stereo pipeline to produce an improved 3D-model. As a performance test, the improved 3D-model is compared to similarly generated 3D-reconstructions using bicubic and single image super resolution at the input stage of the multi-view stereo framework. This is done by comparing the point clouds of the generated models to a reference model using the metrics: average, median, and max distance. The model that has the metrics that are closest to the reference model is considered to be the better model. The overall experimental results show that the generated models, using our technique, have point clouds with average mean, median, and max distances of 4.3\%, 8.8\%, and 6\% closer to the reference model, respectively. This indicates an improvement in 3D-reconstruction using our technique. In addition, our technique has a significant speed advantage over the single image super resolution analogs being at least 6.8x faster. The use of multi-frame super resolution in conjunction with the multi-view stereo framework is a practical solution for enhancing the quality of 3D-reconstruction and shows promising results over single image up-sampling techniques

    Learning to Enhance RGB and Depth Images with Guidance

    Get PDF
    Image enhancement improves the visual quality of the input image to better identify key features and make it more suitable for other vision applications. Structure degradation remains a challenging problem in image enhancement, which refers to blurry edges or discontinuous structures due to unbalanced or inconsistent intensity transitions on structural regions. To overcome this issue, it is popular to make use of a guidance image to provide additional structural cues. In this thesis, we focus on two image enhancement tasks, i.e., RGB image smoothing and depth image completion. Through the two research problems, we aim to have a better understanding of what constitutes suitable guidance and how its proper use can benefit the reduction of structure degradation in image enhancement. Image smoothing retains salient structures and removes insignificant textures in an image. Structure degradation results from the difficulty in distinguishing structures and textures with low-level cues. Structures may be inevitably blurred if the filter tries to remove some strong textures that have high contrast. Moreover, these strong textures may also be mistakenly retained as structures. We address this issue by applying two forms of guidance for structures and textures respectively. We first design a kernel-based double-guided filter (DGF), where we adopt semantic edge detection as structure guidance, and texture decomposition as texture guidance. The DGF is the first kernel filter that simultaneously leverages structure guidance and texture guidance to be both ''structure-aware'' and ''texture-aware''. Considering that textures present high randomness and variations in spatial distribution and intensities, it is not robust to localize and identify textures with hand-crafted features. Hence, we take advantage of deep learning for richer feature extraction and better generalization. Specifically, we generate synthetic data by blending natural textures with clean structure-only images. With the data, we build a texture prediction network (TPN) that estimates the location and magnitude of textures. We then combine the texture prediction results from TPN with a semantic structure prediction network so that the final texture and structure aware filtering network (TSAFN) is able to distinguish structures and textures more effectively. Our model achieves superior smoothing results than existing filters. Depth completion recovers dense depth from sparse measurements, e.g., LiDAR. Existing depth-only methods use sparse depth as the only input and suffer from structure degradation, i.e., failing to recover semantically consistent boundaries or small/thin objects due to (1) the sparse nature of depth points and (2) the lack of images to provide structural cues. In the thesis, we deal with the structure degradation issue by using RGB image guidance in both supervised and unsupervised depth-only settings. For the supervised model, the unique design is that it simultaneously outputs a reconstructed image and a dense depth map. Specifically, we treat image reconstruction from sparse depth as an auxiliary task during training that is supervised by the image. For the unsupervised model, we regard dense depth as a reconstructed result of the sparse input, and formulate our model as an auto-encoder. To reduce structure degradation, we employ the image to guide latent features by penalizing their difference in the training process. The image guidance loss in both models enables them to acquire more dense and structural cues that are beneficial for producing more accurate and consistent depth values. For inference, the two models only take sparse depth as input and no image is required. On the KITTI Depth Completion Benchmark, we validate the effectiveness of the proposed image guidance through extensive experiments and achieve competitive performance over state-of-the-art supervised and unsupervised methods. Our approach is also applicable to indoor scenes

    Novel Motion Anchoring Strategies for Wavelet-based Highly Scalable Video Compression

    Full text link
    This thesis investigates new motion anchoring strategies that are targeted at wavelet-based highly scalable video compression (WSVC). We depart from two practices that are deeply ingrained in existing video compression systems. Instead of the commonly used block motion, which has poor scalability attributes, we employ piecewise-smooth motion together with a highly scalable motion boundary description. The combination of this more “physical” motion description together with motion discontinuity information allows us to change the conventional strategy of anchoring motion at target frames to anchoring motion at reference frames, which improves motion inference across time. In the proposed reference-based motion anchoring strategies, motion fields are mapped from reference to target frames, where they serve as prediction references; during this mapping process, disoccluded regions are readily discovered. Observing that motion discontinuities displace with foreground objects, we propose motion-discontinuity driven motion mapping operations that handle traditionally challenging regions around moving objects. The reference-based motion anchoring exposes an intricate connection between temporal frame interpolation (TFI) and video compression. When employed in a compression system, all anchoring strategies explored in this thesis perform TFI once all residual information is quantized to zero at a given temporal level. The interpolation performance is evaluated on both natural and synthetic sequences, where we show favourable comparisons with state-of-the-art TFI schemes. We explore three reference-based motion anchoring strategies. In the first one, the motion anchoring is “flipped” with respect to a hierarchical B-frame structure. We develop an analytical model to determine the weights of the different spatio-temporal subbands, and assess the suitability and benefits of this reference-based WSVC for (highly scalable) video compression. Reduced motion coding cost and improved frame prediction, especially around moving objects, result in improved rate-distortion performance compared to a target-based WSVC. As the thesis evolves, the motion anchoring is progressively simplified to one where all motion is anchored at one base frame; this central motion organization facilitates the incorporation of higher-order motion models, which improve the prediction performance in regions following motion with non-constant velocity
    corecore