2,328 research outputs found
Deep Bilateral Learning for Real-Time Image Enhancement
Performance is a critical challenge in mobile image processing. Given a
reference imaging pipeline, or even human-adjusted pairs of images, we seek to
reproduce the enhancements and enable real-time evaluation. For this, we
introduce a new neural network architecture inspired by bilateral grid
processing and local affine color transforms. Using pairs of input/output
images, we train a convolutional neural network to predict the coefficients of
a locally-affine model in bilateral space. Our architecture learns to make
local, global, and content-dependent decisions to approximate the desired image
transformation. At runtime, the neural network consumes a low-resolution
version of the input image, produces a set of affine transformations in
bilateral space, upsamples those transformations in an edge-preserving fashion
using a new slicing node, and then applies those upsampled transformations to
the full-resolution image. Our algorithm processes high-resolution images on a
smartphone in milliseconds, provides a real-time viewfinder at 1080p
resolution, and matches the quality of state-of-the-art approximation
techniques on a large class of image operators. Unlike previous work, our model
is trained off-line from data and therefore does not require access to the
original operator at runtime. This allows our model to learn complex,
scene-dependent transformations for which no reference implementation is
available, such as the photographic edits of a human retoucher.Comment: 12 pages, 14 figures, Siggraph 201
Dense semantic labeling of sub-decimeter resolution images with convolutional neural networks
Semantic labeling (or pixel-level land-cover classification) in ultra-high
resolution imagery (< 10cm) requires statistical models able to learn high
level concepts from spatial data, with large appearance variations.
Convolutional Neural Networks (CNNs) achieve this goal by learning
discriminatively a hierarchy of representations of increasing abstraction.
In this paper we present a CNN-based system relying on an
downsample-then-upsample architecture. Specifically, it first learns a rough
spatial map of high-level representations by means of convolutions and then
learns to upsample them back to the original resolution by deconvolutions. By
doing so, the CNN learns to densely label every pixel at the original
resolution of the image. This results in many advantages, including i)
state-of-the-art numerical accuracy, ii) improved geometric accuracy of
predictions and iii) high efficiency at inference time.
We test the proposed system on the Vaihingen and Potsdam sub-decimeter
resolution datasets, involving semantic labeling of aerial images of 9cm and
5cm resolution, respectively. These datasets are composed by many large and
fully annotated tiles allowing an unbiased evaluation of models making use of
spatial information. We do so by comparing two standard CNN architectures to
the proposed one: standard patch classification, prediction of local label
patches by employing only convolutions and full patch labeling by employing
deconvolutions. All the systems compare favorably or outperform a
state-of-the-art baseline relying on superpixels and powerful appearance
descriptors. The proposed full patch labeling CNN outperforms these models by a
large margin, also showing a very appealing inference time.Comment: Accepted in IEEE Transactions on Geoscience and Remote Sensing, 201
Acceleration of Histogram-Based Contrast Enhancement via Selective Downsampling
In this paper, we propose a general framework to accelerate the universal
histogram-based image contrast enhancement (CE) algorithms. Both spatial and
gray-level selective down- sampling of digital images are adopted to decrease
computational cost, while the visual quality of enhanced images is still
preserved and without apparent degradation. Mapping function calibration is
novelly proposed to reconstruct the pixel mapping on the gray levels missed by
downsampling. As two case studies, accelerations of histogram equalization (HE)
and the state-of-the-art global CE algorithm, i.e., spatial mutual information
and PageRank (SMIRANK), are presented detailedly. Both quantitative and
qualitative assessment results have verified the effectiveness of our proposed
CE acceleration framework. In typical tests, computational efficiencies of HE
and SMIRANK have been speeded up by about 3.9 and 13.5 times, respectively.Comment: accepted by IET Image Processin
Apparent sharpness of 3D video when one eye's view is more blurry.
When the images presented to each eye differ in sharpness, the fused percept remains relatively sharp. Here, we measure this effect by showing stereoscopic videos that have been blurred for one eye, or both eyes, and psychophysically determining when they appear equally sharp. For a range of blur magnitudes, the fused percept always appeared significantly sharper than the blurrier view. From these data, we investigate to what extent discarding high spatial frequencies from just one eye's view reduces the bandwidth necessary to transmit perceptually sharp 3D content. We conclude that relatively high-resolution video transmission has the most potential benefit from this method
- …