1,193 research outputs found
Multimodal enhancement-fusion technique for natural images.
Masters Degree. University of KwaZulu-Natal, Durban.This dissertation presents a multimodal enhancement-fusion (MEF) technique for natural images. The MEF is expected to contribute value to machine vision applications and personal image collections for the human user. Image enhancement techniques and the metrics that are used to assess their performance are prolific, and each is usually optimised for a specific objective. The MEF proposes a framework that adaptively fuses multiple enhancement objectives into a seamless pipeline. Given a segmented input image and a set of enhancement methods, the MEF applies all the enhancers to the image in parallel. The most appropriate enhancement in each image segment is identified, and finally, the differentially enhanced segments are seamlessly fused. To begin with, this dissertation studies targeted contrast enhancement methods and performance metrics that can be utilised in the proposed MEF. It addresses a selection of objective assessment metrics for contrast-enhanced images and determines their relationship with the subjective assessment of human visual systems. This is to identify which objective metrics best approximate human assessment and may therefore be used as an effective replacement for tedious human assessment surveys. A subsequent human visual assessment survey is conducted on the same dataset to ascertain image quality as perceived by a human observer. The interrelated concepts of naturalness and detail were found to be key motivators of human visual assessment. Findings show that when assessing the quality or accuracy of these methods, no single quantitative metric correlates well with human perception of naturalness and detail, however, a combination of two or more metrics may be used to approximate the complex human visual response.
Thereafter, this dissertation proposes the multimodal enhancer that adaptively selects the optimal enhancer for each image segment. MEF focusses on improving chromatic irregularities such as poor contrast distribution. It deploys a concurrent enhancement pathway that subjects an image to multiple image enhancers in parallel, followed by a fusion algorithm that creates a composite image that combines the strengths of each enhancement path. The study develops a framework for parallel image enhancement, followed by parallel image assessment and selection, leading to final merging of selected regions from the enhanced set. The output combines desirable attributes from each enhancement pathway to produce a result that is superior to each path taken alone. The study showed that the proposed MEF technique performs well for most image types. MEF is subjectively favourable to a human panel and achieves better performance for objective image quality assessment compared to other enhancement methods
Algorithms for the enhancement of dynamic range and colour constancy of digital images & video
One of the main objectives in digital imaging is to mimic the capabilities of the human eye, and perhaps, go beyond in certain aspects. However, the human visual system is so versatile, complex, and only partially understood that no up-to-date imaging technology has been able to accurately reproduce the capabilities of the it. The extraordinary capabilities of the human eye have become a crucial shortcoming in digital imaging, since digital photography, video recording, and computer vision applications have continued to demand more realistic and accurate imaging reproduction and analytic capabilities.
Over decades, researchers have tried to solve the colour constancy problem, as well as extending the dynamic range of digital imaging devices by proposing a number of algorithms and instrumentation approaches. Nevertheless, no unique solution has been identified; this is partially due to the wide range of computer vision applications that require colour constancy and high dynamic range imaging, and the complexity of the human visual system to achieve effective colour constancy and dynamic range capabilities.
The aim of the research presented in this thesis is to enhance the overall image quality within an image signal processor of digital cameras by achieving colour constancy and extending dynamic range capabilities. This is achieved by developing a set of advanced image-processing algorithms that are robust to a number of practical challenges and feasible to be implemented within an image signal processor used in consumer electronics imaging devises.
The experiments conducted in this research show that the proposed algorithms supersede state-of-the-art methods in the fields of dynamic range and colour constancy. Moreover, this unique set of image processing algorithms show that if they are used within an image signal processor, they enable digital camera devices to mimic the human visual system s dynamic range and colour constancy capabilities; the ultimate goal of any state-of-the-art technique, or commercial imaging device
Virtual Rephotography: Novel View Prediction Error for 3D Reconstruction
The ultimate goal of many image-based modeling systems is to render
photo-realistic novel views of a scene without visible artifacts. Existing
evaluation metrics and benchmarks focus mainly on the geometric accuracy of the
reconstructed model, which is, however, a poor predictor of visual accuracy.
Furthermore, using only geometric accuracy by itself does not allow evaluating
systems that either lack a geometric scene representation or utilize coarse
proxy geometry. Examples include light field or image-based rendering systems.
We propose a unified evaluation approach based on novel view prediction error
that is able to analyze the visual quality of any method that can render novel
views from input images. One of the key advantages of this approach is that it
does not require ground truth geometry. This dramatically simplifies the
creation of test datasets and benchmarks. It also allows us to evaluate the
quality of an unknown scene during the acquisition and reconstruction process,
which is useful for acquisition planning. We evaluate our approach on a range
of methods including standard geometry-plus-texture pipelines as well as
image-based rendering techniques, compare it to existing geometry-based
benchmarks, and demonstrate its utility for a range of use cases.Comment: 10 pages, 12 figures, paper was submitted to ACM Transactions on
Graphics for revie
Deep Bilateral Learning for Real-Time Image Enhancement
Performance is a critical challenge in mobile image processing. Given a
reference imaging pipeline, or even human-adjusted pairs of images, we seek to
reproduce the enhancements and enable real-time evaluation. For this, we
introduce a new neural network architecture inspired by bilateral grid
processing and local affine color transforms. Using pairs of input/output
images, we train a convolutional neural network to predict the coefficients of
a locally-affine model in bilateral space. Our architecture learns to make
local, global, and content-dependent decisions to approximate the desired image
transformation. At runtime, the neural network consumes a low-resolution
version of the input image, produces a set of affine transformations in
bilateral space, upsamples those transformations in an edge-preserving fashion
using a new slicing node, and then applies those upsampled transformations to
the full-resolution image. Our algorithm processes high-resolution images on a
smartphone in milliseconds, provides a real-time viewfinder at 1080p
resolution, and matches the quality of state-of-the-art approximation
techniques on a large class of image operators. Unlike previous work, our model
is trained off-line from data and therefore does not require access to the
original operator at runtime. This allows our model to learn complex,
scene-dependent transformations for which no reference implementation is
available, such as the photographic edits of a human retoucher.Comment: 12 pages, 14 figures, Siggraph 201
The Quanta Image Sensor: Every Photon Counts
The Quanta Image Sensor (QIS) was conceived when contemplating shrinking pixel sizes and storage capacities, and the steady increase in digital processing power. In the single-bit QIS, the output of each field is a binary bit plane, where each bit represents the presence or absence of at least one photoelectron in a photodetector. A series of bit planes is generated through high-speed readout, and a kernel or âcubicleâ of bits (x, y, t) is used to create a single output image pixel. The size of the cubicle can be adjusted post-acquisition to optimize image quality. The specialized sub-diffraction-limit photodetectors in the QIS are referred to as âjotsâ and a QIS may have a gigajot or more, read out at 1000 fps, for a data rate exceeding 1 Tb/s. Basically, we are trying to count photons as they arrive at the sensor. This paper reviews the QIS concept and its imaging characteristics. Recent progress towards realizing the QIS for commercial and scientific purposes is discussed. This includes implementation of a pump-gate jot device in a 65 nm CIS BSI process yielding read noise as low as 0.22 eâ r.m.s. and conversion gain as high as 420 ”V/eâ, power efficient readout electronics, currently as low as 0.4 pJ/b in the same process, creating high dynamic range images from jot data, and understanding the imaging characteristics of single-bit and multi-bit QIS devices. The QIS represents a possible major paradigm shift in image capture
An Asynchronous Linear Filter Architecture for Hybrid Event-Frame Cameras
Event cameras are ideally suited to capture High Dynamic Range (HDR) visual
information without blur but provide poor imaging capability for static or
slowly varying scenes. Conversely, conventional image sensors measure absolute
intensity of slowly changing scenes effectively but do poorly on HDR or quickly
changing scenes. In this paper, we present an asynchronous linear filter
architecture, fusing event and frame camera data, for HDR video reconstruction
and spatial convolution that exploits the advantages of both sensor modalities.
The key idea is the introduction of a state that directly encodes the
integrated or convolved image information and that is updated asynchronously as
each event or each frame arrives from the camera. The state can be read-off
as-often-as and whenever required to feed into subsequent vision modules for
real-time robotic systems. Our experimental results are evaluated on both
publicly available datasets with challenging lighting conditions and fast
motions, along with a new dataset with HDR reference that we provide. The
proposed AKF pipeline outperforms other state-of-the-art methods in both
absolute intensity error (69.4% reduction) and image similarity indexes
(average 35.5% improvement). We also demonstrate the integration of image
convolution with linear spatial kernels Gaussian, Sobel, and Laplacian as an
application of our architecture.Comment: 17 pages, 10 figures, Accepted by IEEE Transactions on Pattern
Analysis and Machine Intelligence (TPAMI) in August 202
Model-Based Environmental Visual Perception for Humanoid Robots
The visual perception of a robot should answer two fundamental questions: What? and Where? In order to properly and efficiently reply to these questions, it is essential to establish a bidirectional coupling between the external stimuli and the internal representations. This coupling links the physical world with the inner abstraction models by sensor transformation, recognition, matching and optimization algorithms. The objective of this PhD is to establish this sensor-model coupling
- âŠ