66,216 research outputs found
Learning Image-Adaptive Codebooks for Class-Agnostic Image Restoration
Recent work on discrete generative priors, in the form of codebooks, has
shown exciting performance for image reconstruction and restoration, as the
discrete prior space spanned by the codebooks increases the robustness against
diverse image degradations. Nevertheless, these methods require separate
training of codebooks for different image categories, which limits their use to
specific image categories only (e.g. face, architecture, etc.), and fail to
handle arbitrary natural images. In this paper, we propose AdaCode for learning
image-adaptive codebooks for class-agnostic image restoration. Instead of
learning a single codebook for each image category, we learn a set of basis
codebooks. For a given input image, AdaCode learns a weight map with which we
compute a weighted combination of these basis codebooks for adaptive image
restoration. Intuitively, AdaCode is a more flexible and expressive discrete
generative prior than previous work. Experimental results demonstrate that
AdaCode achieves state-of-the-art performance on image reconstruction and
restoration tasks, including image super-resolution and inpainting
Towards Joint Super-Resolution and High Dynamic Range Image Reconstruction
The main objective for digital image- and video camera systems is to reproduce a real-world scene in such a way that a high visual quality is obtained. A crucial aspect in this regard is, naturally, the quality of the hardware components of the camera device. There are, however, always some undesired limitations imposed by the sensor of the camera. To begin with, the dynamic range of light intensities that the sensor can capture in its nonsaturated region is much smaller than the dynamic range of most common daylight scenes. Secondly, the achievable spatial resolution of the camera is limited, especially for video capture with a high frame rate. Signal processing software algorithms can be used that fuse the information from a sequence of images into one enhanced image. Thus, the dynamic range limitation can be overcome, and the spatial resolution can be improved.
This thesis discusses different methods that utilize data from a set of multiple images, that exhibits photometric diversity, spatial diversity, or both. For the case where the images are differently exposed, photometric alignment is performed prior to reconstructing an image of a higher dynamic range. For the case where there is spatial diversity, a Super-Resolution reconstruction method is applied, in which an inverse problem is formulated and solved to obtain a high resolution reconstruction result. For either case, as well as for the optimistic and promising combination of the two methods, the problem formulation should consider how the scene information is perceived by humans. Incorporating the properties of the human vision system in novel mathematical formulations for joint high dynamic range and high resolution image reconstruction is the main contribution of the thesis, in particular of the published papers that are included. The potential usefulness of high dynamic range image reconstruction on the one hand, and Super-Resolution image reconstruction on the other, are demonstrated. Finally, the combination of the two is discussed and results from simulations are given
A Reverse Hierarchy Model for Predicting Eye Fixations
A number of psychological and physiological evidences suggest that early
visual attention works in a coarse-to-fine way, which lays a basis for the
reverse hierarchy theory (RHT). This theory states that attention propagates
from the top level of the visual hierarchy that processes gist and abstract
information of input, to the bottom level that processes local details.
Inspired by the theory, we develop a computational model for saliency detection
in images. First, the original image is downsampled to different scales to
constitute a pyramid. Then, saliency on each layer is obtained by image
super-resolution reconstruction from the layer above, which is defined as
unpredictability from this coarse-to-fine reconstruction. Finally, saliency on
each layer of the pyramid is fused into stochastic fixations through a
probabilistic model, where attention initiates from the top layer and
propagates downward through the pyramid. Extensive experiments on two standard
eye-tracking datasets show that the proposed method can achieve competitive
results with state-of-the-art models.Comment: CVPR 2014, 27th IEEE Conference on Computer Vision and Pattern
Recognition (CVPR). CVPR 201
- …