54 research outputs found

    Sparse Modeling for Image and Vision Processing

    Get PDF
    In recent years, a large amount of multi-disciplinary research has been conducted on sparse models and their applications. In statistics and machine learning, the sparsity principle is used to perform model selection---that is, automatically selecting a simple model among a large collection of them. In signal processing, sparse coding consists of representing data with linear combinations of a few dictionary elements. Subsequently, the corresponding tools have been widely adopted by several scientific communities such as neuroscience, bioinformatics, or computer vision. The goal of this monograph is to offer a self-contained view of sparse modeling for visual recognition and image processing. More specifically, we focus on applications where the dictionary is learned and adapted to data, yielding a compact representation that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics and Visio

    InSPECtor: an end-to-end design framework for compressive pixelated hyperspectral instruments

    Full text link
    Classic designs of hyperspectral instrumentation densely sample the spatial and spectral information of the scene of interest. Data may be compressed after the acquisition. In this paper we introduce a framework for the design of an optimized, micro-patterned snapshot hyperspectral imager that acquires an optimized subset of the spatial and spectral information in the scene. The data is thereby compressed already at the sensor level, but can be restored to the full hyperspectral data cube by the jointly optimized reconstructor. This framework is implemented with TensorFlow and makes use of its automatic differentiation for the joint optimization of the layout of the micro-patterned filter array as well as the reconstructor. We explore the achievable compression ratio for different numbers of filter passbands, number of scanning frames, and filter layouts using data collected by the Hyperscout instrument. We show resulting instrument designs that take snapshot measurements without losing significant information while reducing the data volume, acquisition time, or detector space by a factor of 40 as compared to classic, dense sampling. The joint optimization of a compressive hyperspectral imager design and the accompanying reconstructor provides an avenue to substantially reduce the data volume from hyperspectral imagers.Comment: 23 pages, 12 figures, published in Applied Optic

    Deep Structured Layers for Instance-Level Optimization in 2D and 3D Vision

    Get PDF
    The approach we present in this thesis is that of integrating optimization problems as layers in deep neural networks. Optimization-based modeling provides an additional set of tools enabling the design of powerful neural networks for a wide battery of computer vision tasks. This thesis shows formulations and experiments for vision tasks ranging from image reconstruction to 3D reconstruction. We first propose an unrolled optimization method with implicit regularization properties for reconstructing images from noisy camera readings. The method resembles an unrolled majorization minimization framework with convolutional neural networks acting as regularizers. We report state-of-the-art performance in image reconstruction on both noisy and noise-free evaluation setups across many datasets. We further focus on the task of monocular 3D reconstruction of articulated objects using video self-supervision. The proposed method uses a structured layer for accurate object deformation that controls a 3D surface by displacing a small number of learnable handles. While relying on a small set of training data per category for self-supervision, the method obtains state-of-the-art reconstruction accuracy with diverse shapes and viewpoints for multiple articulated objects. We finally address the shortcomings of the previous method that revolve around regressing the camera pose using multiple hypotheses. We propose a method that recovers a 3D shape from a 2D image by relying solely on 3D-2D correspondences regressed from a convolutional neural network. These correspondences are used in conjunction with an optimization problem to estimate per sample the camera pose and deformation. We quantitatively show the effectiveness of the proposed method on self-supervised 3D reconstruction on multiple categories without the need for multiple hypotheses

    Joint Demosaicking / Rectification of Fisheye Camera Images using Multi-color Graph Laplacian Regulation

    Get PDF
    To compose one 360 degrees image from multiple viewpoint images taken from different fisheye cameras on a rig for viewing on a head-mounted display (HMD), a conventional processing pipeline first performs demosaicking on each fisheye camera's Bayer-patterned grid, then translates demosaicked pixels from the camera grid to a rectified image grid. By performing two image interpolation steps in sequence, interpolation errors can accumulate, and acquisition noise in each captured pixel can pollute its neighbors, resulting in correlated noise. In this paper, a joint processing framework is proposed that performs demosaicking and grid-to-grid mapping simultaneously, thus limiting noise pollution to one interpolation. Specifically, a reverse mapping function is first obtained from a regular on-grid location in the rectified image to an irregular off-grid location in the camera's Bayer-patterned image. For each pair of adjacent pixels in the rectified grid, its gradient is estimated using the pair's neighboring pixel gradients in three colors in the Bayer-patterned grid. A similarity graph is constructed based on the estimated gradients, and pixels are interpolated in the rectified grid directly via graph Laplacian regularization (GLR). To establish ground truth for objective testing, a large dataset containing pairs of simulated images both in the fisheye camera grid and the rectified image grid is built. Experiments show that the proposed joint demosaicking / rectification method outperforms competing schemes that execute demosaicking and rectification in sequence in both objective and subjective measures

    Joint Demosaicking / Rectification of Fisheye Camera Images using Multi-color Graph Laplacian Regulation

    Get PDF
    To compose one 360 degrees image from multiple viewpoint images taken from different fisheye cameras on a rig for viewing on a head-mounted display (HMD), a conventional processing pipeline first performs demosaicking on each fisheye camera's Bayer-patterned grid, then translates demosaicked pixels from the camera grid to a rectified image grid. By performing two image interpolation steps in sequence, interpolation errors can accumulate, and acquisition noise in each captured pixel can pollute its neighbors, resulting in correlated noise. In this paper, a joint processing framework is proposed that performs demosaicking and grid-to-grid mapping simultaneously, thus limiting noise pollution to one interpolation. Specifically, a reverse mapping function is first obtained from a regular on-grid location in the rectified image to an irregular off-grid location in the camera's Bayer-patterned image. For each pair of adjacent pixels in the rectified grid, its gradient is estimated using the pair's neighboring pixel gradients in three colors in the Bayer-patterned grid. A similarity graph is constructed based on the estimated gradients, and pixels are interpolated in the rectified grid directly via graph Laplacian regularization (GLR). To establish ground truth for objective testing, a large dataset containing pairs of simulated images both in the fisheye camera grid and the rectified image grid is built. Experiments show that the proposed joint demosaicking / rectification method outperforms competing schemes that execute demosaicking and rectification in sequence in both objective and subjective measures

    Joint Reconstruction of Multi-channel, Spectral CT Data via Constrained Total Nuclear Variation Minimization

    Full text link
    We explore the use of the recently proposed "total nuclear variation" (TNV) as a regularizer for reconstructing multi-channel, spectral CT images. This convex penalty is a natural extension of the total variation (TV) to vector-valued images and has the advantage of encouraging common edge locations and a shared gradient direction among image channels. We show how it can be incorporated into a general, data-constrained reconstruction framework and derive update equations based on the first-order, primal-dual algorithm of Chambolle and Pock. Early simulation studies based on the numerical XCAT phantom indicate that the inter-channel coupling introduced by the TNV leads to better preservation of image features at high levels of regularization, compared to independent, channel-by-channel TV reconstructions.Comment: Submitted to Physics in Medicine and Biolog

    Vector-Valued Image Processing by Parallel Level Sets

    Get PDF
    Vector-valued images such as RGB color images or multimodal medical images show a strong interchannel correlation, which is not exploited by most image processing tools. We propose a new notion of treating vector-valued images which is based on the angle between the spatial gradients of their channels. Through minimizing a cost functional that penalizes large angles, images with parallel level sets can be obtained. After formally introducing this idea and the corresponding cost functionals, we discuss their Gâteaux derivatives that lead to a diffusion-like gradient descent scheme. We illustrate the properties of this cost functional by several examples in denoising and demosaicking of RGB color images. They show that parallel level sets are a suitable concept for color image enhancement. Demosaicking with parallel level sets gives visually perfect results for low noise levels. Furthermore, the proposed functional yields sharper images than the other approaches in comparison
    • …
    corecore