41 research outputs found
Hydra: An Accelerator for Real-Time Edge-Aware Permeability Filtering in 65nm CMOS
Many modern video processing pipelines rely on edge-aware (EA) filtering
methods. However, recent high-quality methods are challenging to run in
real-time on embedded hardware due to their computational load. To this end, we
propose an area-efficient and real-time capable hardware implementation of a
high quality EA method. In particular, we focus on the recently proposed
permeability filter (PF) that delivers promising quality and performance in the
domains of HDR tone mapping, disparity and optical flow estimation. We present
an efficient hardware accelerator that implements a tiled variant of the PF
with low on-chip memory requirements and a significantly reduced external
memory bandwidth (6.4x w.r.t. the non-tiled PF). The design has been taped out
in 65 nm CMOS technology, is able to filter 720p grayscale video at 24.8 Hz and
achieves a high compute density of 6.7 GFLOPS/mm2 (12x higher than embedded
GPUs when scaled to the same technology node). The low area and bandwidth
requirements make the accelerator highly suitable for integration into SoCs
where silicon area budget is constrained and external memory is typically a
heavily contended resource
Acceleration of Histogram-Based Contrast Enhancement via Selective Downsampling
In this paper, we propose a general framework to accelerate the universal
histogram-based image contrast enhancement (CE) algorithms. Both spatial and
gray-level selective down- sampling of digital images are adopted to decrease
computational cost, while the visual quality of enhanced images is still
preserved and without apparent degradation. Mapping function calibration is
novelly proposed to reconstruct the pixel mapping on the gray levels missed by
downsampling. As two case studies, accelerations of histogram equalization (HE)
and the state-of-the-art global CE algorithm, i.e., spatial mutual information
and PageRank (SMIRANK), are presented detailedly. Both quantitative and
qualitative assessment results have verified the effectiveness of our proposed
CE acceleration framework. In typical tests, computational efficiencies of HE
and SMIRANK have been speeded up by about 3.9 and 13.5 times, respectively.Comment: accepted by IET Image Processin
Contrast Enhancement of Brightness-Distorted Images by Improved Adaptive Gamma Correction
As an efficient image contrast enhancement (CE) tool, adaptive gamma
correction (AGC) was previously proposed by relating gamma parameter with
cumulative distribution function (CDF) of the pixel gray levels within an
image. ACG deals well with most dimmed images, but fails for globally bright
images and the dimmed images with local bright regions. Such two categories of
brightness-distorted images are universal in real scenarios, such as improper
exposure and white object regions. In order to attenuate such deficiencies,
here we propose an improved AGC algorithm. The novel strategy of negative
images is used to realize CE of the bright images, and the gamma correction
modulated by truncated CDF is employed to enhance the dimmed ones. As such,
local over-enhancement and structure distortion can be alleviated. Both
qualitative and quantitative experimental results show that our proposed method
yields consistently good CE results
Accelerating local laplacian filters on FPGAs
Images when processed using various enhancement techniques often lead to edge
degradation and other unwanted artifacts such as halos. These artifacts pose a
major problem for photographic applications where they can denude the quality
of an image. There is a plethora of edge-aware techniques proposed in the field
of image processing. However, these require the application of complex
optimization or post-processing methods. Local Laplacian Filtering is an
edge-aware image processing technique that involves the construction of simple
Gaussian and Laplacian pyramids. This technique can be successfully applied for
detail smoothing, detail enhancement, tone mapping and inverse tone mapping of
an image while keeping it artifact-free. The problem though with this approach
is that it is computationally expensive. Hence, parallelization schemes using
multi-core CPUs and GPUs have been proposed. As is well known, they are not
power-efficient, and a well-designed hardware architecture on an FPGA can do
better on the performance per watt metric. In this paper, we propose a hardware
accelerator, which exploits fully the available parallelism in the Local
Laplacian Filtering algorithm, while minimizing the utilization of on-chip FPGA
resources. On Virtex-7 FPGA, we obtain a 7.5x speed-up to process a 1 MB image
when compared to an optimized baseline CPU implementation. To the best of our
knowledge, we are not aware of any other hardware accelerators proposed in the
research literature for the Local Laplacian Filtering problem.Comment: 6 pages, 5 figures, 2 table