    A Joint Intensity and Depth Co-Sparse Analysis Model for Depth Map Super-Resolution

    High-resolution depth maps can be inferred from low-resolution depth measurements and an additional high-resolution intensity image of the same scene. To that end, we introduce a bimodal co-sparse analysis model, which is able to capture the interdependency of registered intensity and depth information. This model is based on the assumption that the co-supports of corresponding bimodal image structures are aligned when computed by a suitable pair of analysis operators. No analytic form of such operators exist and we propose a method for learning them from a set of registered training signals. This learning process is done offline and returns a bimodal analysis operator that is universally applicable to natural scenes. We use this to exploit the bimodal co-sparse analysis model as a prior for solving inverse problems, which leads to an efficient algorithm for depth map super-resolution.Comment: 13 pages, 4 figure

    Patch-based graphical models for image restoration

    Deep panoramic depth prediction and completion for indoor scenes

    We introduce a novel end-to-end deep-learning solution for rapidly estimating a dense spherical depth map of an indoor environment. Our input is a single equirectangular image registered with a sparse depth map, as provided by a variety of common capture setups. Depth is inferred by an efficient and lightweight single-branch network, which employs a dynamic gating system to process together dense visual data and sparse geometric data. We exploit the characteristics of typical man-made environments to efficiently compress multi-resolution features and find short- and long-range relations among scene parts. Furthermore, we introduce a new augmentation strategy to make the model robust to different types of sparsity, including those generated by various structured light sensors and LiDAR setups. The experimental results demonstrate that our method provides interactive performance and outperforms state-of-the-art solutions in computational efficiency, adaptivity to variable depth sparsity patterns, and prediction accuracy for challenging indoor data, even when trained solely on synthetic data without any fine tuning. (Figure presented.

    Filter-Based Probabilistic Markov Random Field Image Priors: Learning, Evaluation, and Image Analysis

    Markov random fields (MRF) based on linear filter responses are one of the most popular forms for modeling image priors due to their rigorous probabilistic interpretations and versatility in various applications. In this dissertation, we propose an application-independent method to quantitatively evaluate MRF image priors using model samples. To this end, we developed an efficient auxiliary-variable Gibbs samplers for a general class of MRFs with flexible potentials. We found that the popular pairwise and high-order MRF priors capture image statistics quite roughly and exhibit poor generative properties. We further developed new learning strategies and obtained high-order MRFs that well capture the statistics of the inbuilt features, thus being real maximum-entropy models, and other important statistical properties of natural images, outlining the capabilities of MRFs. We suggest a multi-modal extension of MRF potentials which not only allows to train more expressive priors, but also helps to reveal more insights of MRF variants, based on which we are able to train compact, fully-convolutional restricted Boltzmann machines (RBM) that can model visual repetitive textures even better than more complex and deep models. The learned high-order MRFs allow us to develop new methods for various real-world image analysis problems. For denoising of natural images and deconvolution of microscopy images, the MRF priors are employed in a pure generative setting. We propose efficient sampling-based methods to infer Bayesian minimum mean squared error (MMSE) estimates, which substantially outperform maximum a-posteriori (MAP) estimates and can compete with state-of-the-art discriminative methods. For non-rigid registration of live cell nuclei in time-lapse microscopy images, we propose a global optical flow-based method. The statistics of noise in fluorescence microscopy images are studied to derive an adaptive weighting scheme for increasing model robustness. High-order MRFs are also employed to train image filters for extracting important features of cell nuclei and the deformation of nuclei are then estimated in the learned feature spaces. The developed method outperforms previous approaches in terms of both registration accuracy and computational efficiency

    Gradient Based Mrf Learning For Image Restoration And Segmentation

    The undirected graphical model or Markov Random Field (MRF) is one of the more popular models used in computer vision and is the type of model with which this work is concerned. Models based on these methods have proven to be particularly useful in low-level vision systems and have led to state-of-the-art results for MRF-based systems. The research presented will describe a new discriminative training algorithm and its implementation. The MRF model will be trained by optimizing its parameters so that the minimum energy solution of the model is as similar as possible to the ground-truth. While previous work has relied on time-consuming iterative approximations or stochastic approximations, this work will demonstrate how implicit differentiation can be used to analytically differentiate the overall training loss with respect to the MRF parameters. This framework leads to an efficient, flexible learning algorithm that can be applied to a number of different models. The effectiveness of the proposed learning method will then be demonstrated by learning the parameters of two related models applied to the task of denoising images. The experimental results will demonstrate that the proposed learning algorithm is comparable and, at times, better than previous training methods applied to the same tasks. A new segmentation model will also be introduced and trained using the proposed learning method. The proposed segmentation model is based on an energy minimization framework that is iii novel in how it incorporates priors on the size of the segments in a way that is straightforward to implement. While other methods, such as normalized cuts, tend to produce segmentations of similar sizes, this method is able to overcome that problem and produce more realistic segmentations

    A PatchMatch-based Dense-field Algorithm for Video Copy-Move Detection and Localization

    We propose a new algorithm for the reliable detection and localization of video copy-move forgeries. Discovering well crafted video copy-moves may be very difficult, especially when some uniform background is copied to occlude foreground objects. To reliably detect both additive and occlusive copy-moves we use a dense-field approach, with invariant features that guarantee robustness to several post-processing operations. To limit complexity, a suitable video-oriented version of PatchMatch is used, with a multiresolution search strategy, and a focus on volumes of interest. Performance assessment relies on a new dataset, designed ad hoc, with realistic copy-moves and a wide variety of challenging situations. Experimental results show the proposed method to detect and localize video copy-moves with good accuracy even in adverse conditions

    Data-driven depth and 3D architectural layout estimation of an interior environment from monocular panoramic input

    Recent years have seen significant interest in the automatic 3D reconstruction of indoor scenes, leading to a distinct and very-active sub-field within 3D reconstruction. The main objective is to convert rapidly measured data representing real-world indoor environments into models encompassing geometric, structural, and visual abstractions. This thesis focuses on the particular subject of extracting geometric information from single panoramic images, using either visual data alone or sparse registered depth information. The appeal of this setup lies in the efficiency and cost-effectiveness of data acquisition using 360o images. The challenge, however, is that creating a comprehensive model from mostly visual input is extremely difficult, due to noise, missing data, and clutter. My research has concentrated on leveraging prior information, in the form of architectural and data-driven priors derived from large annotated datasets, to develop end-to-end deep learning solutions for specific tasks in the structured reconstruction pipeline. My first contribution consists in a deep neural network architecture for estimating a depth map from a single monocular indoor panorama, operating directly on the equirectangular projection. Leveraging the characteristics of indoor 360-degree images and recognizing the impact of gravity on indoor scene design, the network efficiently encodes the scene into vertical spherical slices. By exploiting long- and short- term relationships among these slices, it recovers an equirectangular depth map directly from the corresponding RGB image. My second contribution generalizes the approach to handle multimodal input, also covering the situation in which the equirectangular input image is paired with a sparse depth map, as provided from common capture setups. Depth is inferred using an efficient single-branch network with a dynamic gating system, processing both dense visual data and sparse geometric data. Additionally, a new augmentation strategy enhances the model's robustness to various types of sparsity, including those from structured light sensors and LiDAR setups. While the first two contributions focus on per-pixel geometric information, my third contribution addresses the recovery of the 3D shape of permanent room surfaces from a single panoramic image. Unlike previous methods, this approach tackles the problem in 3D, expanding the reconstruction space. It employs a graph convolutional network to directly infer the room structure as a 3D mesh, deforming a graph- encoded tessellated sphere mapped to the spherical panorama. Gravity- aligned features are actively incorporated using a projection layer with multi-head self-attention, and specialized losses guide plausible solutions in the presence of clutter and occlusions. The benchmarks on publicly available data show that all three methods provided significant improvements over the state-of-the-art
