155 research outputs found
Raw or Cooked? Object Detection on RAW Images
Images fed to a deep neural network have in general undergone several
handcrafted image signal processing (ISP) operations, all of which have been
optimized to produce visually pleasing images. In this work, we investigate the
hypothesis that the intermediate representation of visually pleasing images is
sub-optimal for downstream computer vision tasks compared to the RAW image
representation. We suggest that the operations of the ISP instead should be
optimized towards the end task, by learning the parameters of the operations
jointly during training. We extend previous works on this topic and propose a
new learnable operation that enables an object detector to achieve superior
performance when compared to both previous works and traditional RGB images. In
experiments on the open PASCALRAW dataset, we empirically confirm our
hypothesis
A Comparison of Image Denoising Methods
The advancement of imaging devices and countless images generated everyday
pose an increasingly high demand on image denoising, which still remains a
challenging task in terms of both effectiveness and efficiency. To improve
denoising quality, numerous denoising techniques and approaches have been
proposed in the past decades, including different transforms, regularization
terms, algebraic representations and especially advanced deep neural network
(DNN) architectures. Despite their sophistication, many methods may fail to
achieve desirable results for simultaneous noise removal and fine detail
preservation. In this paper, to investigate the applicability of existing
denoising techniques, we compare a variety of denoising methods on both
synthetic and real-world datasets for different applications. We also introduce
a new dataset for benchmarking, and the evaluations are performed from four
different perspectives including quantitative metrics, visual effects, human
ratings and computational cost. Our experiments demonstrate: (i) the
effectiveness and efficiency of representative traditional denoisers for
various denoising tasks, (ii) a simple matrix-based algorithm may be able to
produce similar results compared with its tensor counterparts, and (iii) the
notable achievements of DNN models, which exhibit impressive generalization
ability and show state-of-the-art performance on various datasets. In spite of
the progress in recent years, we discuss shortcomings and possible extensions
of existing techniques. Datasets, code and results are made publicly available
and will be continuously updated at
https://github.com/ZhaomingKong/Denoising-Comparison.Comment: In this paper, we intend to collect and compare various denoising
methods to investigate their effectiveness, efficiency, applicability and
generalization ability with both synthetic and real-world experiment
Learning-based Wavelet-like Transforms For Fully Scalable and Accessible Image Compression
The goal of this thesis is to improve the existing wavelet transform with the aid of machine learning techniques, so as to enhance coding efficiency of wavelet-based image compression frameworks, such as JPEG 2000.
In this thesis, we first propose to augment the conventional base wavelet transform with two additional learned lifting steps -- a high-to-low step followed by a low-to-high step. The high-to-low step suppresses aliasing in the low-pass band by using the detail bands at the same resolution, while the low-to-high step aims to further remove redundancy from detail bands by using the corresponding low-pass band. These two additional steps reduce redundancy (notably aliasing information) amongst the wavelet subbands, and also improve the visual quality of reconstructed images at reduced resolutions.
To train these two networks in an end-to-end fashion, we develop a backward annealing approach to overcome the non-differentiability of the quantization and cost functions during back-propagation. Importantly, the two additional networks share a common architecture, named a proposal-opacity topology, which is inspired and guided by a specific theoretical argument related to geometric flow. This particular network topology is compact and with limited non-linearities, allowing a fully scalable system; one pair of trained network parameters are applied for all levels of decomposition and for all bit-rates of interest. By employing the additional lifting networks within the JPEG2000 image coding standard, we can achieve up to 17.4% average BD bit-rate saving over a wide range of bit-rates, while retaining the quality and resolution scalability features of JPEG2000.
Built upon the success of the high-to-low and low-to-high steps, we then study more broadly the extension of neural networks to all lifting steps that correspond to the base wavelet transform. The purpose of this comprehensive study is to understand what is the most effective way to develop learned wavelet-like transforms for highly scalable and accessible image compression. Specifically, we examine the impact of the number of learned lifting steps, the number of layers and the number of channels in each learned lifting network, and kernel support in each layer. To facilitate the study, we develop a generic training methodology that is simultaneously appropriate to all lifting structures considered. Experimental results ultimately suggest that to improve the existing wavelet transform, it is more profitable to augment a larger wavelet transform with more diverse high-to-low and low-to-high steps, rather than developing deep fully learned lifting structures
Joint demosaicing and fusion of multiresolution coded acquisitions: A unified image formation and reconstruction method
Novel optical imaging devices allow for hybrid acquisition modalities such as
compressed acquisitions with locally different spatial and spectral resolutions
captured by a single focal plane array. In this work, we propose to model the
capturing system of a multiresolution coded acquisition (MRCA) in a unified
framework, which natively includes conventional systems such as those based on
spectral/color filter arrays, compressed coded apertures, and multiresolution
sensing. We also propose a model-based image reconstruction algorithm
performing a joint demosaicing and fusion (JoDeFu) of any acquisition modeled
in the MRCA framework. The JoDeFu reconstruction algorithm solves an inverse
problem with a proximal splitting technique and is able to reconstruct an
uncompressed image datacube at the highest available spatial and spectral
resolution. An implementation of the code is available at
https://github.com/danaroth83/jodefu.Comment: 15 pages, 7 figures; regular pape
Deep Structured Layers for Instance-Level Optimization in 2D and 3D Vision
The approach we present in this thesis is that of integrating optimization problems
as layers in deep neural networks. Optimization-based modeling provides an additional set of tools enabling the design of powerful neural networks for a wide
battery of computer vision tasks. This thesis shows formulations and experiments
for vision tasks ranging from image reconstruction to 3D reconstruction.
We first propose an unrolled optimization method with implicit regularization
properties for reconstructing images from noisy camera readings. The method resembles an unrolled majorization minimization framework with convolutional neural networks acting as regularizers. We report state-of-the-art performance in image
reconstruction on both noisy and noise-free evaluation setups across many datasets.
We further focus on the task of monocular 3D reconstruction of articulated objects using video self-supervision. The proposed method uses a structured layer for
accurate object deformation that controls a 3D surface by displacing a small number
of learnable handles. While relying on a small set of training data per category for
self-supervision, the method obtains state-of-the-art reconstruction accuracy with
diverse shapes and viewpoints for multiple articulated objects.
We finally address the shortcomings of the previous method that revolve
around regressing the camera pose using multiple hypotheses. We propose a method
that recovers a 3D shape from a 2D image by relying solely on 3D-2D correspondences regressed from a convolutional neural network. These correspondences are
used in conjunction with an optimization problem to estimate per sample the camera pose and deformation. We quantitatively show the effectiveness of the proposed
method on self-supervised 3D reconstruction on multiple categories without the need for multiple hypotheses
InSPECtor: an end-to-end design framework for compressive pixelated hyperspectral instruments
Classic designs of hyperspectral instrumentation densely sample the spatial
and spectral information of the scene of interest. Data may be compressed after
the acquisition. In this paper we introduce a framework for the design of an
optimized, micro-patterned snapshot hyperspectral imager that acquires an
optimized subset of the spatial and spectral information in the scene. The data
is thereby compressed already at the sensor level, but can be restored to the
full hyperspectral data cube by the jointly optimized reconstructor. This
framework is implemented with TensorFlow and makes use of its automatic
differentiation for the joint optimization of the layout of the micro-patterned
filter array as well as the reconstructor. We explore the achievable
compression ratio for different numbers of filter passbands, number of scanning
frames, and filter layouts using data collected by the Hyperscout instrument.
We show resulting instrument designs that take snapshot measurements without
losing significant information while reducing the data volume, acquisition
time, or detector space by a factor of 40 as compared to classic, dense
sampling. The joint optimization of a compressive hyperspectral imager design
and the accompanying reconstructor provides an avenue to substantially reduce
the data volume from hyperspectral imagers.Comment: 23 pages, 12 figures, published in Applied Optic
Erfassung der Sichtweite bei Nebel durch die Kombination aus hochauflösendem Scheinwerfer und einem Kamerasystem
Das automatisierte Fahren ist einer der Megatrends in der Automobilindustrie. Die Relevanz von Kamerasensoren als eine Kernkomponente spiegelt sich im Einsatz in den meisten automatisierten Fahrsystemen wider. Im Zusammenhang mit den Umwelteinflüssen, denen die Sensoren ausgesetzt sind, spielt Nebel eine entscheidende Rolle. Zugleich halten hochauflösende Scheinwerfersysteme im automobilen Bereich verstärkt Einzug in die Fahrzeuge und bieten völlig neue Möglichkeiten. So ist es zum Beispiel möglich, die Lichtverteilung hochgenau anzupassen und Strukturen zu projizieren.
Aus diesem Grund beschäftigt sich diese Arbeit mit der Nutzung von hochauflösenden Scheinwerfersystemen in Kombination mit Kamerasensoren als Sichtweitensensoren bei Nebel. Um ebenso die Nutzung von Strahlung im infraroten Wellenlängenbereich zu untersuchen, werden für die Untersuchungen zwei infrarote Strahlungsquellen im Zusammenspiel mit einer Infrarotkamera genutzt.
Das System nutzt die normalerweise störende Streuung der Strahlung im Nebel, um Daten zu generieren. Im Rahmen der Arbeit werden dazu durch das Scheinwerfersystem Strukturen in den Nebel eingebracht, deren Veränderung durch den Nebel analysiert wird. Es wird dabei auf die Theorie der Mie-Streuung und die Definition der meteorologischen Sichtweite nach der MOR (engl. meteorological optical range) gesetzt, der Sichtweitendefinition der in Wetterstationen eingesetzte Sichtweitensensoren folgen. In der Arbeit wird zwischen der Intensität der Streuung, spektralen Veränderungen aufgrund des wellenlängenabhängigen Streuverhaltens am Aerosol, einer Veränderung der eingebrachten Struktur durch die hochaufgelöste Scheinwerferprojektion sowie der Struktur, die im Nebel durch die Inhomogenitäten vorherrscht und durch das Scheinwerferlicht verstärkt wird unterschieden. Es werden dabei Eigenschaften der Scheinwerferprojektion sowie der Kamera, wie die Anbauposition oder Auflösung, analysiert, um entscheidende Eigenschaften für die Umsetzung eines solchen Systems zu definieren. Eine Kamera entspricht der Anbauposition hinter der Windschutzscheibe, die andere eines in den Scheinwerfer integrierten Systems.
Die Validierung der erwähnten Eigenschaften wurde in der Nebelkammer von CEREMA in Clermont-Ferrand in Frankreich durchgeführt. Die Untersuchungen haben gezeigt, dass eine Winkelauflösung des Scheinwerfers von 2,50° für die Erfassung von Nebel das Optimum darstellt. Zudem ist die Projektion von mehreren Linien, insbesondere zur Erfassung bei höheren Sichtweiten, als besser anzusehen. Eine weitere Verbesserung der Messdaten kann durch die Verteilung von Lichtaustrittsflächen über die komplette Fahrzeugfront erreicht werden. Die Wahl der Kameraposition ist nicht auf eine feste Position beschränkt, solange die Projektion darauf adaptiert wird. Jede Position besitzt unterschiedliche Vorteile gegenüber einer anderen, was in der Arbeit thematisiert wird. Einflüsse des Untergrundes auf die Messdaten lassen sich als Information nutzen und bei der Auswertung berücksichtigen. Negativkontraste scheinen dabei ein sinnvoller Schritt zu sein, um die Projektionen für das menschliche Auge zu verschleiern und so nicht den Fahrer sowie andere Verkehrsteilnehmer durch selbige zu irritieren.
Die Ergebnisse haben im Allgemeinen ergeben, dass ein solcher Sichtweitensensor über ein Scheinwerfersystem umsetzbar ist. Ein großer Vorteil des Systems ist, dass die Umsetzung in der Regel lediglich eine Softwarefrage ist. Es muss in vielen Fahrzeugen keine zusätzliche Hardware eingebaut werden. Bei dem System handelt es sich um einen sinnvollen Weg, um die hochauflösenden Scheinwerfermodule, die ursprünglich für den Menschen konzipiert wurden, in die Zukunft des automatisierten Fahrens zu überführen
Computational Spectral Imaging: A Contemporary Overview
Spectral imaging collects and processes information along spatial and
spectral coordinates quantified in discrete voxels, which can be treated as a
3D spectral data cube. The spectral images (SIs) allow identifying objects,
crops, and materials in the scene through their spectral behavior. Since most
spectral optical systems can only employ 1D or maximum 2D sensors, it is
challenging to directly acquire the 3D information from available commercial
sensors. As an alternative, computational spectral imaging (CSI) has emerged as
a sensing tool where the 3D data can be obtained using 2D encoded projections.
Then, a computational recovery process must be employed to retrieve the SI. CSI
enables the development of snapshot optical systems that reduce acquisition
time and provide low computational storage costs compared to conventional
scanning systems. Recent advances in deep learning (DL) have allowed the design
of data-driven CSI to improve the SI reconstruction or, even more, perform
high-level tasks such as classification, unmixing, or anomaly detection
directly from 2D encoded projections. This work summarises the advances in CSI,
starting with SI and its relevance; continuing with the most relevant
compressive spectral optical systems. Then, CSI with DL will be introduced, and
the recent advances in combining the physical optical design with computational
DL algorithms to solve high-level tasks
- …