54 research outputs found
Sparse Modeling for Image and Vision Processing
In recent years, a large amount of multi-disciplinary research has been
conducted on sparse models and their applications. In statistics and machine
learning, the sparsity principle is used to perform model selection---that is,
automatically selecting a simple model among a large collection of them. In
signal processing, sparse coding consists of representing data with linear
combinations of a few dictionary elements. Subsequently, the corresponding
tools have been widely adopted by several scientific communities such as
neuroscience, bioinformatics, or computer vision. The goal of this monograph is
to offer a self-contained view of sparse modeling for visual recognition and
image processing. More specifically, we focus on applications where the
dictionary is learned and adapted to data, yielding a compact representation
that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics
and Visio
InSPECtor: an end-to-end design framework for compressive pixelated hyperspectral instruments
Classic designs of hyperspectral instrumentation densely sample the spatial
and spectral information of the scene of interest. Data may be compressed after
the acquisition. In this paper we introduce a framework for the design of an
optimized, micro-patterned snapshot hyperspectral imager that acquires an
optimized subset of the spatial and spectral information in the scene. The data
is thereby compressed already at the sensor level, but can be restored to the
full hyperspectral data cube by the jointly optimized reconstructor. This
framework is implemented with TensorFlow and makes use of its automatic
differentiation for the joint optimization of the layout of the micro-patterned
filter array as well as the reconstructor. We explore the achievable
compression ratio for different numbers of filter passbands, number of scanning
frames, and filter layouts using data collected by the Hyperscout instrument.
We show resulting instrument designs that take snapshot measurements without
losing significant information while reducing the data volume, acquisition
time, or detector space by a factor of 40 as compared to classic, dense
sampling. The joint optimization of a compressive hyperspectral imager design
and the accompanying reconstructor provides an avenue to substantially reduce
the data volume from hyperspectral imagers.Comment: 23 pages, 12 figures, published in Applied Optic
Deep Structured Layers for Instance-Level Optimization in 2D and 3D Vision
The approach we present in this thesis is that of integrating optimization problems
as layers in deep neural networks. Optimization-based modeling provides an additional set of tools enabling the design of powerful neural networks for a wide
battery of computer vision tasks. This thesis shows formulations and experiments
for vision tasks ranging from image reconstruction to 3D reconstruction.
We first propose an unrolled optimization method with implicit regularization
properties for reconstructing images from noisy camera readings. The method resembles an unrolled majorization minimization framework with convolutional neural networks acting as regularizers. We report state-of-the-art performance in image
reconstruction on both noisy and noise-free evaluation setups across many datasets.
We further focus on the task of monocular 3D reconstruction of articulated objects using video self-supervision. The proposed method uses a structured layer for
accurate object deformation that controls a 3D surface by displacing a small number
of learnable handles. While relying on a small set of training data per category for
self-supervision, the method obtains state-of-the-art reconstruction accuracy with
diverse shapes and viewpoints for multiple articulated objects.
We finally address the shortcomings of the previous method that revolve
around regressing the camera pose using multiple hypotheses. We propose a method
that recovers a 3D shape from a 2D image by relying solely on 3D-2D correspondences regressed from a convolutional neural network. These correspondences are
used in conjunction with an optimization problem to estimate per sample the camera pose and deformation. We quantitatively show the effectiveness of the proposed
method on self-supervised 3D reconstruction on multiple categories without the need for multiple hypotheses
Joint Demosaicking / Rectification of Fisheye Camera Images using Multi-color Graph Laplacian Regulation
To compose one 360 degrees image from multiple viewpoint images taken from different fisheye cameras on a rig for viewing on a head-mounted display (HMD), a conventional processing pipeline first performs demosaicking on each fisheye camera's Bayer-patterned grid, then translates demosaicked pixels from the camera grid to a rectified image grid. By performing two image interpolation steps in sequence, interpolation errors can accumulate, and acquisition noise in each captured pixel can pollute its neighbors, resulting in correlated noise. In this paper, a joint processing framework is proposed that performs demosaicking and grid-to-grid mapping simultaneously, thus limiting noise pollution to one interpolation. Specifically, a reverse mapping function is first obtained from a regular on-grid location in the rectified image to an irregular off-grid location in the camera's Bayer-patterned image. For each pair of adjacent pixels in the rectified grid, its gradient is estimated using the pair's neighboring pixel gradients in three colors in the Bayer-patterned grid. A similarity graph is constructed based on the estimated gradients, and pixels are interpolated in the rectified grid directly via graph Laplacian regularization (GLR). To establish ground truth for objective testing, a large dataset containing pairs of simulated images both in the fisheye camera grid and the rectified image grid is built. Experiments show that the proposed joint demosaicking / rectification method outperforms competing schemes that execute demosaicking and rectification in sequence in both objective and subjective measures
Joint Demosaicking / Rectification of Fisheye Camera Images using Multi-color Graph Laplacian Regulation
To compose one 360 degrees image from multiple viewpoint images taken from different fisheye cameras on a rig for viewing on a head-mounted display (HMD), a conventional processing pipeline first performs demosaicking on each fisheye camera's Bayer-patterned grid, then translates demosaicked pixels from the camera grid to a rectified image grid. By performing two image interpolation steps in sequence, interpolation errors can accumulate, and acquisition noise in each captured pixel can pollute its neighbors, resulting in correlated noise. In this paper, a joint processing framework is proposed that performs demosaicking and grid-to-grid mapping simultaneously, thus limiting noise pollution to one interpolation. Specifically, a reverse mapping function is first obtained from a regular on-grid location in the rectified image to an irregular off-grid location in the camera's Bayer-patterned image. For each pair of adjacent pixels in the rectified grid, its gradient is estimated using the pair's neighboring pixel gradients in three colors in the Bayer-patterned grid. A similarity graph is constructed based on the estimated gradients, and pixels are interpolated in the rectified grid directly via graph Laplacian regularization (GLR). To establish ground truth for objective testing, a large dataset containing pairs of simulated images both in the fisheye camera grid and the rectified image grid is built. Experiments show that the proposed joint demosaicking / rectification method outperforms competing schemes that execute demosaicking and rectification in sequence in both objective and subjective measures
Joint Reconstruction of Multi-channel, Spectral CT Data via Constrained Total Nuclear Variation Minimization
We explore the use of the recently proposed "total nuclear variation" (TNV)
as a regularizer for reconstructing multi-channel, spectral CT images. This
convex penalty is a natural extension of the total variation (TV) to
vector-valued images and has the advantage of encouraging common edge locations
and a shared gradient direction among image channels. We show how it can be
incorporated into a general, data-constrained reconstruction framework and
derive update equations based on the first-order, primal-dual algorithm of
Chambolle and Pock. Early simulation studies based on the numerical XCAT
phantom indicate that the inter-channel coupling introduced by the TNV leads to
better preservation of image features at high levels of regularization,
compared to independent, channel-by-channel TV reconstructions.Comment: Submitted to Physics in Medicine and Biolog
Vector-Valued Image Processing by Parallel Level Sets
Vector-valued images such as RGB color images or
multimodal medical images show a strong interchannel correlation,
which is not exploited by most image processing tools. We
propose a new notion of treating vector-valued images which is
based on the angle between the spatial gradients of their channels.
Through minimizing a cost functional that penalizes large angles,
images with parallel level sets can be obtained. After formally
introducing this idea and the corresponding cost functionals, we
discuss their Gâteaux derivatives that lead to a diffusion-like gradient
descent scheme. We illustrate the properties of this cost
functional by several examples in denoising and demosaicking of
RGB color images. They show that parallel level sets are a suitable
concept for color image enhancement. Demosaicking with parallel
level sets gives visually perfect results for low noise levels. Furthermore,
the proposed functional yields sharper images than the other
approaches in comparison
- …