260 research outputs found
Non-convex optimization for 3D point source localization using a rotating point spread function
We consider the high-resolution imaging problem of 3D point source image
recovery from 2D data using a method based on point spread function (PSF)
engineering. The method involves a new technique, recently proposed by
S.~Prasad, based on the use of a rotating PSF with a single lobe to obtain
depth from defocus. The amount of rotation of the PSF encodes the depth
position of the point source. Applications include high-resolution single
molecule localization microscopy as well as the problem addressed in this paper
on localization of space debris using a space-based telescope. The localization
problem is discretized on a cubical lattice where the coordinates of nonzero
entries represent the 3D locations and the values of these entries the fluxes
of the point sources. Finding the locations and fluxes of the point sources is
a large-scale sparse 3D inverse problem. A new nonconvex regularization method
with a data-fitting term based on Kullback-Leibler (KL) divergence is proposed
for 3D localization for the Poisson noise model. In addition, we propose a new
scheme of estimation of the source fluxes from the KL data-fitting term.
Numerical experiments illustrate the efficiency and stability of the algorithms
that are trained on a random subset of image data before being applied to other
images. Our 3D localization algorithms can be readily applied to other kinds of
depth-encoding PSFs as well.Comment: 28 page
Coding depth perception from image defocus
AbstractAs a result of the spider experiments in Nagata et al. (2012), it was hypothesized that the depth perception mechanisms of these animals should be based on how much images are defocused. In the present paper, assuming that relative chromatic aberrations or blur radii values are known, we develop a formulation relating the values of these cues to the actual depth distance. Taking into account the form of the resulting signals, we propose the use of latency coding from a spiking neuron obeying Izhikevich’s ‘simple model’. If spider jumps can be viewed as approximately parabolic, some estimates allow for a sensory-motor relation between the time to the first spike and the magnitude of the initial velocity of the jump
From small to large baseline multiview stereo : dealing with blur, clutter and occlusions
This thesis addresses the problem of reconstructing the three-dimensional
(3D) digital model of a scene from a collection of two-dimensional (2D)
images taken from it. To address this fundamental computer vision
problem, we propose three algorithms. They are the main contributions
of this thesis.
First, we solve multiview stereo with the o -axis aperture camera.
This system has a very small baseline as images are captured from
viewpoints close to each other. The key idea is to change the size or
the 3D location of the aperture of the camera so as to extract selected
portions of the scene. Our imaging model takes both defocus and
stereo information into account and allows to solve shape reconstruction
and image restoration in one go. The o -axis aperture camera can
be used in a small-scale space where the camera motion is constrained
by the surrounding environment, such as in 3D endoscopy.
Second, to solve multiview stereo with large baseline, we present a
framework that poses the problem of recovering a 3D surface in the
scene as a regularized minimal partition problem of a visibility function.
The formulation is convex and hence guarantees that the solution
converges to the global minimum. Our formulation is robust
to view-varying extensive occlusions, clutter and image noise. At
any stage during the estimation process the method does not rely on
the visual hull, 2D silhouettes, approximate depth maps, or knowing
which views are dependent(i.e., overlapping) and which are independent(
i.e., non overlapping). Furthermore, the degenerate solution, the
null surface, is not included as a global solution in this formulation.
One limitation of this algorithm is that its computation complexity
grows with the number of views that we combine simultaneously. To
address this limitation, we propose a third formulation. In this formulation,
the visibility functions are integrated within a narrow band
around the estimated surface by setting weights to each point along
optical rays.
This thesis presents technical descriptions for each algorithm and detailed
analyses to show how these algorithms improve existing reconstruction
techniques
Let's Enhance: A Deep Learning Approach to Extreme Deblurring of Text Images
This work presents a novel deep-learning-based pipeline for the inverse
problem of image deblurring, leveraging augmentation and pre-training with
synthetic data. Our results build on our winning submission to the recent
Helsinki Deblur Challenge 2021, whose goal was to explore the limits of
state-of-the-art deblurring algorithms in a real-world data setting. The task
of the challenge was to deblur out-of-focus images of random text, thereby in a
downstream task, maximizing an optical-character-recognition-based score
function. A key step of our solution is the data-driven estimation of the
physical forward model describing the blur process. This enables a stream of
synthetic data, generating pairs of ground-truth and blurry images on-the-fly,
which is used for an extensive augmentation of the small amount of challenge
data provided. The actual deblurring pipeline consists of an approximate
inversion of the radial lens distortion (determined by the estimated forward
model) and a U-Net architecture, which is trained end-to-end. Our algorithm was
the only one passing the hardest challenge level, achieving over
character recognition accuracy. Our findings are well in line with the paradigm
of data-centric machine learning, and we demonstrate its effectiveness in the
context of inverse problems. Apart from a detailed presentation of our
methodology, we also analyze the importance of several design choices in a
series of ablation studies. The code of our challenge submission is available
under https://github.com/theophil-trippe/HDC_TUBerlin_version_1.Comment: This article has been published in a revised form in Inverse Problems
and Imagin
A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation
Recent work has shown that optical flow estimation can be formulated as a
supervised learning task and can be successfully solved with convolutional
networks. Training of the so-called FlowNet was enabled by a large
synthetically generated dataset. The present paper extends the concept of
optical flow estimation via convolutional networks to disparity and scene flow
estimation. To this end, we propose three synthetic stereo video datasets with
sufficient realism, variation, and size to successfully train large networks.
Our datasets are the first large-scale datasets to enable training and
evaluating scene flow methods. Besides the datasets, we present a convolutional
network for real-time disparity estimation that provides state-of-the-art
results. By combining a flow and disparity estimation network and training it
jointly, we demonstrate the first scene flow estimation with a convolutional
network.Comment: Includes supplementary materia
Depth Acquisition from Digital Images
Introduction: Depth acquisition from digital images captured with a conventional camera, by analysing focus/defocus cues which are related to depth via an optical model of the camera, is a popular approach to depth-mapping a 3D scene. The majority of methods analyse the neighbourhood of a point in an image to infer its depth, which has disadvantages. A more elegant, but more difficult, solution is to evaluate only the single pixel displaying a point in order to infer its depth. This thesis investigates if a per-pixel method can be implemented without compromising accuracy and generality compared to window-based methods, whilst minimising the number of input images.
Method: A geometric optical model of the camera was used to predict the relationship between focus/defocus and intensity at a pixel. Using input images with different focus settings, the relationship was used to identify the focal plane depth (i.e. focus setting) where a point is in best focus, from which the depth of the point can be resolved if camera parameters are known. Two metrics were implemented, one to identify the best focus setting for a point from the discrete input set, and one to fit a model to the input data to estimate the depth of perfect focus of the point on a continuous scale.
Results: The method gave generally accurate results for a simple synthetic test scene, with a relatively low number of input images compared to similar methods. When tested on a more complex scene, the method achieved its objectives of separating complex objects from the background by depth, and produced a similar resolution of a complex 3D surface as a similar method which used significantly more input data.
Conclusions: The method demonstrates that it is possible to resolve depth on a per-pixel basis without compromising accuracy and generality, and using a similar amount of input data, compared to more traditional window-based methods. In practice, the presented method offers a convenient new option for depth-based image processing applications, as the depth-map is per-pixel, but the process of capturing and preparing images for the method is not too practically cumbersome and could be easily automated unlike other per-pixel methods reviewed. However, the method still suffers from the general limitations of the depth acquisition approach using images from a conventional camera, which limits its use as a general depth acquisition solution beyond specifically depth-based image processing applications
Light field image processing: an overview
Light field imaging has emerged as a technology allowing to capture richer visual information from our world. As opposed to traditional photography, which captures a 2D projection of the light in the scene integrating the angular domain, light fields collect radiance from rays in all directions, demultiplexing the angular information lost in conventional photography. On the one hand, this higher dimensional representation of visual data offers powerful capabilities for scene understanding, and substantially improves the performance of traditional computer vision problems such as depth sensing, post-capture refocusing, segmentation, video stabilization, material classification, etc. On the other hand, the high-dimensionality of light fields also brings up new challenges in terms of data capture, data compression, content editing, and display. Taking these two elements together, research in light field image processing has become increasingly popular in the computer vision, computer graphics, and signal processing communities. In this paper, we present a comprehensive overview and discussion of research in this field over the past 20 years. We focus on all aspects of light field image processing, including basic light field representation and theory, acquisition, super-resolution, depth estimation, compression, editing, processing algorithms for light field display, and computer vision applications of light field data
Coded aperture imaging
This thesis studies the coded aperture camera, a device consisting of a conventional
camera with a modified aperture mask, that enables the recovery
of both depth map and all-in-focus image from a single 2D input image.
Key contributions of this work are the modeling of the statistics of natural
images and the design of efficient blur identification methods in a Bayesian
framework. Two cases are distinguished: 1) when the aperture can be decomposed
in a small set of identical holes, and 2) when the aperture has a
more general configuration. In the first case, the formulation of the problem
incorporates priors about the statistical variation of the texture to avoid
ambiguities in the solution. This allows to bypass the recovery of the sharp
image and concentrate only on estimating depth. In the second case, the
depth reconstruction is addressed via convolutions with a bank of linear
filters. Key advantages over competing methods are the higher numerical
stability and the ability to deal with large blur. The all-in-focus image can
then be recovered by using a deconvolution step with the estimated depth
map. Furthermore, for the purpose of depth estimation alone, the proposed
algorithm does not require information about the mask in use. The
comparison with existing algorithms in the literature shows that the proposed
methods achieve state-of-the-art performance. This solution is also
extended for the first time to images affected by both defocus and motion
blur and, finally, to video sequences with moving and deformable objects
Online Video Deblurring via Dynamic Temporal Blending Network
State-of-the-art video deblurring methods are capable of removing non-uniform
blur caused by unwanted camera shake and/or object motion in dynamic scenes.
However, most existing methods are based on batch processing and thus need
access to all recorded frames, rendering them computationally demanding and
time consuming and thus limiting their practical use. In contrast, we propose
an online (sequential) video deblurring method based on a spatio-temporal
recurrent network that allows for real-time performance. In particular, we
introduce a novel architecture which extends the receptive field while keeping
the overall size of the network small to enable fast execution. In doing so,
our network is able to remove even large blur caused by strong camera shake
and/or fast moving objects. Furthermore, we propose a novel network layer that
enforces temporal consistency between consecutive frames by dynamic temporal
blending which compares and adaptively (at test time) shares features obtained
at different time steps. We show the superiority of the proposed method in an
extensive experimental evaluation.Comment: 10 page
- …