DRCAS: Deep Restoration Network for Hardware Based Compressive
  Acquisition Scheme by Gupta, Pravir Singh et al.
DRCAS: Deep Restoration Network for Hardware Based
Compressive Acquisition Scheme
Pravir Singh Gupta1, Xin Yuan2, Gwan Seong Choi1
Texas A&M University, College Station, Texas, USA1
Nokia-Bell Labs, Murray Hill, New Jersey, USA2
pravir@tamu.edu, xyuan@bell-labs.com, gwanchoi@tamu.edu
Abstract
We investigate the power and performance improvement
in image acquisition devices by the use of CAS (Com-
pressed Acquisition Scheme) and DNN (Deep Neural Net-
works). Towards this end, we propose a novel image acqui-
sition scheme HCAS (Hardware based Compressed Acqui-
sition Scheme) using hardware-based binning (downsam-
pling), bit truncation and JPEG compression and develop
a deep learning based reconstruction network for images
acquired using the same. HCAS is motivated by the fact
that in-situ compression of raw data using binning and bit
truncation results in reduction in data traffic and power in
the entire downstream image processing pipeline and addi-
tional compression of processed data using JPEG will help
in storage/transmission of images. The combination of in-
situ compression with JPEG leads to high compression ra-
tios, significant power savings with further advantages of
image acquisition simplification. Bearing these concerns in
mind, we propose DRCAS (Deep Restoration network for
hardware based Compressed Acquisition Scheme), which to
our best knowledge, is the first work proposed in the liter-
ature for restoration of images acquired using acquisition
scheme like HCAS. When compared with the CAS meth-
ods (bicubic downsampling) used in super resolution tasks
in literature, HCAS proposed in this paper performs supe-
rior in terms of both compression ratio and being hardware
friendly. The restoration network DRCAS also perform su-
perior than state-of-the-art super resolution networks while
being much smaller. Thus HCAS and DRCAS technique will
enable us to design much simpler and power efficient image
acquisition pipelines.
1. Introduction
We are living in a multimedia world. With the advent of
Internet and mobile devices, the amount of multimedia con-
Figure 1. Proposed Image Acquisition Methodology: It consists
of 6 stages. In the stages 1-3 (HCAS), an image gets compressed
using downsampling, bit truncation and JPEG. Stage 4 represents
transmission of image which can be a wireless medium or an on-
chip bus or even storage. Stage 5 performs JPEG decompression
and stage 6 consists of the proposed DRCAS to restore the desired
image.
tent generated by users is increasing at a tremendous rate.
In addition, with the improvement in VLSI (Very Large
Scale Integration) technology, resolution of image sensors
are also increasing. Smartphones with image sensor res-
olution greater than 20 Megapixel are commonly used and
image sensor vendors are also offering sensors more than 40
Megapixel range. However, lately Moore’s law has started
to saturate and hence there is an increasing pressure to ex-
tract performance improvements from architectural and al-
gorithmic innovations than device scaling. Videos and im-
ages present a huge burden in processing, storage and trans-
mission networks. It also presents a challenge in terms of
power consumption in image acquisition devices such as
mobile devices.
On the other hand, the advanced machine learning (ML)
techniques especially deep learning perform much better
than traditional computer vision techniques for tasks like
super-resolution, object detection etc. Inspired by this, this
paper exploits the power of deep learning techniques to re-
duce the data traffic in the image acquisition pipeline which
will make it easier to meet power and performance require-
ments. [5, 21]. Motivated by the fact that if hardware con-
straints of imaging system are taken into consideration then
1
ar
X
iv
:1
90
9.
10
13
6v
2 
 [e
es
s.I
V]
  1
5 N
ov
 20
19
Figure 2. Left: 0703 image from the DIV2K dataset [2]. Down-
sampling is performed via 2×2 binning, with bit truncation such
that B.W. (bit width) = 7 and JPEG compression at quality
Q = 90. Right: (i) HR patch from the left image, (ii) recon-
struction using our proposed DRCAS, (iii) reconstruction using
Bicubic interpolation, and (iv) reconstruction with EDSR [10].
deep neural networks (DNN) will be able to address these
problems of imaging systems, we propose HCAS (Hard-
ware based Compression Acquisition Scheme) and a recon-
struction network for the same. The proposed HRCAS con-
sists of combination of binning (downsampling), bit trun-
cation and JPEG compression. To reconstruct the original
image, we propose DRCAS, which is a DNN based recon-
struction network to perform image restoration by super-
resolution (to restore loss of resolution caused by binning)
and artifact removal (caused by bit truncation and JPEG
compression).
The proposed acquisition scheme HCAS is shown in
Fig. 1, which is different from existing super-resolution net-
works, and one exemplar reconstruction is shown in Fig. 2.
While DNN has been used for tasks like super-resolution
(Ref. [4, 7, 9, 23, 24]) and image denoising [28], this work
is novel in the following perspectives:
i) We propose a new image acquisition framework,
HCAS, (Fig. 1), which is compression based. It uses
realistic and hardware based compression schemes
from imaging system perspective like binning, bit trun-
cation and JPEG compression. It performs compres-
sion on entire image acquisition pipeline i.e. from raw
data at source (image sensor) to processed data (using
JPEG).
ii) Downsampling operation in our framework is averag-
ing and rounding instead of bicubic which is more
popular for superresolution tasks [21]. Again our
method is more realistic as averaging operation is easy
to implement in hardware especially at image sensor
level using pixel binning technique available in com-
mercial image sensors.
iii) By performing in-situ compression on raw data us-
ing binning and bit truncation, HCAS performs power
savings in downstream power hungry components like
ADC (Analog to Digital Converters) and DSP units.
Following this, our proposed sensing framework HCAS
in Fig. 1 results in significant reduction in raw data rate i.e.
data generated from image sensor as well as final image
size. This has direct potential for power savings, transmis-
sion bandwidth savings and simplified hardware implemen-
tation. One can argue that instead of downsampling one can
use a low resolution image sensor itself. This argument will
hold valid only if reconstruction or super resolution process
is exact, which is not possible. Thus downsampling using
binning gives user an option to choose between the two de-
pending upon requirements. There are some other works in
the literature which have proposed pixel bit depth enhance-
ments [11, 12, 18]. However these works were targeted at
converting low bit depth images to high bit depth display.
Another work proposed super-resolution and bitdepth en-
hancement [22], however, the authors used A+ (Adjusted
anchored neighborhood regression algorithm) [20] method
for super-resolution instead of Neural Networks. Some
works have also focused on denoising and compression ar-
tifact removal tasks [14, 27, 28]. To the best of our knowl-
edge, this is the first work to investigate the DNN based im-
age restoration considering the combination of downsam-
pling, bit truncation and JPEG for compression and power
savings.
2. Background
In this section, we review the background of building
blocks of the proposed image acquisition methodology in
Fig. 1. Via introducing the image sensors in Sec. 2.1, we
understand why binning is preferred in hardware. By an-
alyzing the power consumption in Sec. 2.2, we understand
how to save power of the device using super resolution. The
JPEG compression is reviewed in Sec. 2.3, and following
this, the transmission and storage techniques are described
in Sec. 2.4. As a widely used interpolation approach, Bicu-
bic interpolation is reviewed in Sec. 2.5 and the residual
network, which will be used in our design is introduced in
Sec. 2.6.
2.1. Image sensors
An image sensor converts light intensity into electrical
signals. It consists of a rectangular grid of pixels. Typically
pixels are addressed in a row wise fashion i.e. all pixels in
the same row are addressed simultaneously. The addressed
row outputs the signal in column lines which is connected
to Analog-to-Digital Converter (ADC). The ADC produces
digitized version of the image. The number of digitized
bits depend upon the resolution of image sensor. A simple
Figure 3. A simple schematic of image sensor.
schematic of image sensor is shown in Fig. 3.
Most commercially available image sensors have a fea-
ture called ”binning” which simply means to combine infor-
mation of a group of neighboring pixels together to form a
single pixel in the output image. Binning results in loss of
spatial resolution by a factor equal to the number of pixels
binned together, which can be either an addition operation
or an averaging operation on image pixels. Addition opera-
tion has the advantage in low light situations as it increases
the low light sensitivity of image sensor [6]; on the other
hand, averaging operation has advantage in common light-
ing conditions as it prevents saturation of pixels in that sce-
nario [29]. Binning has been used in various applications
successfully like low light imaging [6], background noise
suppression [3], power reduction [8], and multi-resolution
[26, 29], etc. Since binning reduces the number of samples,
it results in significant reduction in power consumption in
an image sensor as ADC is responsible for major chunk
of power consumption [15]. By decreasing the bit resolu-
tion of ADC one can reduce the consumption exponentially.
This is because noise and linearity requirements are relaxed
at smaller bit resolutions [25]. In this paper, we use 2 × 1,
2× 2 and 4× 4 binning modes with averaging.
2.2. Power and performance analysis of digital cir-
cuits
Power consumption is one of the major concerns in dig-
ital circuits especially in mobile devices. The are two main
independent components of power consumption/dissipation
in a digital circuit - Static power and Dynamic power.
Hence total power consumption can be written as
Ptotal = Pstatic + Pdynamic. (1)
Static power is the power consumed when there is no ac-
tivity in a digital circuit and Dynamic power is the power
consumed due to switching signals in a digital circuit. Dy-
namic power can be further broken down as
Pdynamic = Pswitching + Pshort circuit, (2)
where Pswitching refers to the power required to charge
or discharge the switching nodes in digital circuit and
Pshort circuit refers to transient power consumption due to
short circuit current when a gate switches from one state to
another.
Pswitching = αCV
2F, (3)
where α is switching activity factor i.e., number of times
a signal switches from 0 to 1 per cycle, F denotes the fre-
quency, V represents the voltage and C denotes switched
capacitance at the node. Pswitching is one of the major con-
cerns when comes to power consumption in a digital cir-
cuit. One can see from Eq. (3) that power consumption is
linearly proportional to frequency F and quadratically pro-
portional to voltage V . In general frequency of digital sys-
tem is determined by the data processing requirements i.e.,
how much data must be processed per second. Thus if one
wants to process half the amount of data in the same time,
one can halve the operating frequency. Voltage and Fre-
quency in Eq. (3) are not independent quantities but follow
a proportional relationship. If the operating frequency is in-
creased, the operating voltage must be increased and vice-
versa due to device physics and noise margin requirements.
Therefore, if there is a reduction in the data to be pro-
cessed one can decrease voltage and frequency to achieve
quadratic reduction in energy consumption. When this re-
duction in voltage and frequency is performed on the fly
depending upon the data processing requirement it is called
DVFS (Dynamic Voltage Frequency Scaling).
Another important factor when designing a digital sys-
tem is bitwidth of the data. If the datapath is serial then it
will imply longer time to transfer data and if it is parallel it
will imply a wider data bus. Longer bitwidth implies larger
arithmetic circuits such as adders, multipliers etc. or mul-
tiple clock cycles of operation. For instance, a single 4 bit
adder can add two 8 bit numbers in 2 cycles or two 4 bit
adder can add 8 bit numbers in one cycle. This increases
the delay of the most critical path or the slowest path in the
circuit resulting in slower operation. To speed up one might
have to use faster and power hungry circuits.
While there are many other factors impacting power and
performance, a detailed analysis of those are beyond the
scope of this work and only relevant issues are discussed
here.
2.3. JPEG
After the sensor captures the image, the image is com-
pressed by some method. JPEG (Joint Photographics Ex-
pert Group) is one of the most widely used lossy compres-
sion technique for images. Though JPEG can perform both
lossy and lossless compression, the former one is popular
due to little loss of perceptual quality. This is because most
of the image information is contained in a very few coeffi-
cients in the discrete cosine transform (DCT) domain and
hence insignificant coefficients can be discarded without
much loss in perceptual quality producing large compres-
sion ratios. One can also control the amount of loss (and
hence compression) by using the ’Quality’ parameter of
JPEG. It ranges from 1− 100 with 100 being the best lossy
compression one can achieve. JPEG generally performs
DCT on a block of 8 × 8 pixels, followed by quantization
of DCT coefficients which constitutes the lossy step. After
quantization codec is performed such as Huffman coding,
run length coding etc. DCT is generally the most energy
consuming part in JPEG compression [16]. Thus reduction
in image data will lead to reduction in energy consumed in
JPEG compression by a similar factor.
2.4. Consideration in transmission and storage
After the image is compressed, e.g., via JPEG, into bit
streams, in wireless transmission, the data is encoded using
ECC (Error Correction Coding) schemes to tolerate the er-
rors occurring in wireless transmission due to channel noise.
ECC schemes provide some type of redundancy in data to
detect and correct the errors. One of the most popular ECC
is LDPC (Low Density Parity Check Codes). It is used
in both storage (e.g. Solid State Drives [13]) and wireless
transmission (e.g. 5G specifications [1]). LDPC uses parity
check bits to detect and correct errors. A simple schematic
of transmitted data packet is shown in Fig. 4. It consists of
message packet plus ECC bits (Fig. 4(a)). If the channel is
noisy then the transmitter would require to encode the mes-
sage more strongly (Fig. 4(b)). Thus for a fixed transmit-
ted data packet size, actual message packet would become
smaller resulting in reduced message bandwidth. If there
is a possibility of compressing the message data then, one
can either send more message bits per data packet result-
ing in increase of message transmission bandwidth or one
can encode message bits more aggressively to make it more
resistant to channel noise while keeping the message trans-
mission bandwidth the same (Fig. 4(b) and (c)). For storage
the analysis it is analogous to that of wireless channel.
2.5. Bicubic interpolation
After the user receives the data and performs decoding
and JPEG decompression, there is often a desire to obtain a
high-resolution image based on the received low-resolution
image captured by the sensor. Bicubic interpolation is
widely employed as a baseline to perform super-resolution,
which is an extension of cubic interpolation method. It con-
siders a rectangular grid of 4×4 pixels and tries to fit a third
Figure 4. Packet size analysis in wireless transmission.
order polynomial surface, p(x, y), as follows -
p(x, y) =
r=3∑
r=0
c=3∑
c=0
arcx
ryc, (4)
where arc are the coefficients to be determined. Bicubic in-
terpolation produces smooth images and is a popular bench-
mark for evaluating the performance of super-resolution al-
gorithms [21].
2.6. Deep residual networks
With the recent advances of ML, deep learning based
algorithms have demonstrated superior performance than
conventional methods for image super-resolution. Most of
them are based on convolution neural networks (CNNs),
which apply convolution operation to the input data fol-
lowed by an activation function to produce the output.
To improve training of CNN, Residual Neural Network
(ResNet) were first introduced by He et al. in [5]. To un-
derstand ResNets, let us denote the underlying mapping be-
tween input (x) and output of network asH(x). Then resid-
ual mapping can be defined as,
F (x) = H(x)− x. (5)
Simply speaking, residual mapping is the difference be-
tween input and expected output of network. The original
mapping can now be defined in terms of residual mapping
as
H(x) = F (x) + x. (6)
A simple graph of residual network is shown in Fig. 5.
ResNets performs superior because it is easier to optimize
the residual mapping than the original [5]. There is ample
evidence in the literature indicating that network depth is of
crucial importance and deeper networks in general achieve
better results [17, 19]. With ResNets it becomes easier to
train big networks.
3. Proposed acquisition and restoration
methodology under hardware constraints.
Bearing the above hardware constraints in mind, in this
paper, we propose a new framework for image sensing and
Figure 5. Residual Network [5]
restoration using deep learning based technique to recon-
struct the desired image.
3.1. HCAS
As mentioned before, a simple schematic of proposed
sensing/acquisition methodology using HCAS (Hardware
based Compressed Acquisition Scheme) is shown in Fig. 1.
In this work, simulations using clean images is performed
to mimic the proposed sensing methodology to verify the
concept. It is expected that reconstruction performance will
remain approximately the same in a real system.
Entire imaging pipeline consists of 6 stages of which
HCAS constitutes 3 stages. In the stage 1-3, image gets
compressed using downsampling, bit truncation and JPEG.
Stage 1-2 can be performed on the image sensor itself.
Stage 3 happens on a JPEG chip or Digital Signal Proces-
sor (DSP). Stage 4 represents transmission of image which
can be a wireless medium or an on-chip bus or even storage.
Stage 5 can happen on a DSP processor in the device itself
or it can be clubbed together with Stage 6 and can happen
in the cloud or on the ML processor on the image acquisi-
tion device itself. The idea of this work is to save energy
during acquisition i.e., from stage 1 to stage 5 as these pro-
cesses often consume a significant amount of power in edge
devices and stage 6 is not required unless a user is view-
ing image (e.g. smart phones, surveillance cameras etc.)
or a computer program is operating on images e.g. object
recognition. For some applications like drone transmitting
a surveillance footage to a base station power consumption
in stage 6 is not an issue. For stage 6, we propose DR-
CAS for image reconstruction. The subsequent paragraphs
describe each process in more detail.
In stage 1, image is downsampled using simple binning
(averaging operation) of pixels. The number of pixels that
are averaged depends on downscaling factor. As mentioned
before, while this operation happens on raw image, this
work simulates this process on clean images for the sake
of simplicity. Further lossy compression is performed using
bit truncation in stage 2. We perform the task of bit trunca-
tion in the following way
Truncated P ixel = round((averaged pixel)× 2−N ),
where N is the number of bits to be truncated. In this work
N is in the range [0, 3]. Since the truncated pixel does not
get multiplied with 2N after rounding operation, the image
appears darker after truncation operation for N > 0. Sim-
ply speaking, we left-shift the pixel bits by the amount we
want to truncate which makes the image appear dark. Thus
the swing of the pixel values also gets reduced by a factor of
2N . This makes the image more compressible using JPEG
in stage 3 because there is more loss of LSB bits than MSB
bits in JPEG compression. This also means that there is
more lossy compression induced artefact. The JPEG qual-
ity is varied in the range [70,100] in our experiments. Cur-
rently, we assume perfect transmission of image in stage 4.
When JPEG image is decompressed in stage 5, it gets mul-
tiplied by a factor of 2N to restore the brightness. Because
of bit truncation and lossy JPEG compression, artefact get
introduced in JPEG decompressed image in addition to the
loss of resolution (caused by downsampling via binning).
Following this, our proposed DRCAS is employed to re-
store the desired high resolution image.
Figure 6. DRCAS network proposed in this work.
3.2. DRCAS
We propose the DRCAS, denoting Deep Restoration net-
work for hardware based Compressed Acquisition Scheme,
to finish the task in Stage 6, with architecture shown in
Fig. 6. DRCAS is inspired by the original ResNet architec-
ture [5], EDSR architecture [10] and SRCNN architecture
[9] with some differences. A comparison of ResNet block
is shown in Fig. 7. To start, the basic ResNet block used
in this work uses ReLU (Rectified Linear Unit) layer in the
Figure 7. A comparison of basic ResNet blocks
end like original ResNet network. However it gets rid of
Batch Normalization network as in EDSR Network. Also,
unlike EDSR, our DRCAS avoids learning a complete im-
age; it only learns the residual between the bicubic interpo-
lated image and the actual image making the model much
smaller in comparison. This is achieved by making a by-
pass connection between input and output using bicubic in-
terpolation function as shown in Fig. 6. Since the residuals
are mostly close to zero, the training is speeded up and the
model complexity gets significantly reduced.
We train a separate network for a given downsampling
factor, bit truncation and JPEG Quality factor to restore
the image quality and resolution. Thus there are 48 dif-
ferent training tasks (4 cases of JPEG Quality, 4 cases of
bit truncation and 3 cases of downsampling). The hyper-
parameters are kept same for each training task.
4. Experimental results
We use DIV2K dataset [2] for training and evaluation;
DIV2K dataset is a high-quality (2K resolution) image
dataset, which consists of 800 training images, 100 vali-
dation images and 100 testing images. Since the testing
images are not made public, we use the last 100 images
from training set (i.e. image 0701.png to 0800.png) as the
testing set. The network uses patch size of 128 × 128 and
70, 000 training samples in a batch size of 64. The network
is trained for 24 epochs.
Peak Signal to Noise Ratio (PSNR) is used as a metric to
measure reconstruction performance with original high res-
olution image as the baseline. The results for reconstruction
are shown in Table 1 for full sized testing dataset. One can
see from the result that our DRCAS outperforms bicubic.
For testing purposes, this work also predicted the output of
EDSR network for the downsampled images used in our
work. As expected the performance of the EDSR network
falls sharply as it is not trained to handle the noise due to bit
truncation, averaging and JPEG. One can also see that the
performance of EDSR gets worse than bicubic as the image
quality degrades. It also serves as a proof that our DRCAS
does train itself properly to handle the degradation induced
by JPEG, bit truncation and binning. Some samples of re-
constructed images including best case (Q = 100 and B.W.
= 8) and worst case (Q = 70 and B.W. = 5) are shown in
Fig. 8. One can see that bicubic interpolated images are
more noisy and less sharper than the images generated by
DRCAS and EDSR.
While the performance of the network proposed in this
work might seems inferior to numbers reported in the EDSR
paper [10], the focus in this work is to perform on-sensor
compression to reduce data traffic and save energy. This is
achieved by bit truncation and pixel binning both of which
can be performed on commercially available image sensors.
Almost all existing works of super-resolution use bicubic
downsampling method which yields better image but can-
not be performed on the image sensor. Thus bicubic down-
sampling fails to perform compression of raw data. Our
proposed DRCAS is also simpler than EDSR network and
a comparison is shown in Table 2. Pixel binning and bit
truncation lead to significant reduction in raw data gener-
ated from image sensor. An analysis of compression of raw
data is provided in Table 3. It is measured as follows
Raw Data Compression =
8 ·N −B
8 ·N , (7)
whereN represents number of pixels binned together andB
represents bitwidth (B.W.) of pixel of downsampled image.
One can achieve 50%− 96% reduction in raw data. Reduc-
tion in raw data means significant energy saving in down-
stream processing. One can achieve approximately propor-
tional savings in energy for the same frequency of operation
or one can employ DVFS to achieve quadratic scaling in
energy reduction. Apart from savings in power, the system
also becomes faster as there is less data to process. Reduc-
tion in bitwidth can also result in exponential reduction in
power consumed at ADC (Section 2.1). Additionally, it can
lead to more than proportional savings in energy in image
processing circuits as LSB’s switch more from one pixel to
another than MSB in an image.
Let us assume that image is being readout to an 8-bit
wide data bus in column-wise fashion for each color chan-
nel. A table of switching activity measurement for such a
case is shown in Table 4. This will lead to significant sav-
ings in dynamic power consumption as explained earlier in
Sec. 2.2. Reduction in raw data also leads to reduction in
processed image size after JPEG compression. The results
for this are shown in Table 5, which shows the size of the
resulting image as a percentage of the size of the original
high resolution image stored in lossless JPEG format. One
can see that compressed image size ranges from 22.7% to
less than 1% of the size of lossless image.
As mentioned before, this work does not take into ac-
count the energy spent in reconstruction of the image as the
Table 1. Reconstruction results for DIV2K dataset (0701.png-0800.png). PSNR metric in dB. Bicub. refers to bicubic interpolation, EDSR
refers to the EDSR network in paper [10] and B.W. referes to the bitwidth of the image.
Quality B.W. This Work BiCub. EDSR This Work BiCub. EDSR This Work BiCub.
4× 4 4× 4 4× 4 2× 2 2× 2 2× 2 2× 1 2× 1
100 8 27.91 26.75 27.28 32.74 30.97 31.61 34.96 33.19
100 7 27.80 26.68 27.12 32.43 30.78 31.14 34.58 32.86
100 6 27.50 26.44 26.59 31.71 30.09 30.04 33.60 31.96
100 5 26.59 25.72 25.26 29.97 28.76 27.87 31.30 29.98
90 8 27.21 26.36 26.29 31.39 30.18 29.88 33.41 32.12
90 7 26.58 25.87 25.51 30.32 29.35 28.69 32.27 31.11
90 6 25.76 25.12 24.63 29.03 28.14 27.38 30.72 29.64
90 5 24.61 24.05 23.48 27.74 26.49 25.75 28.62 27.63
80 8 26.60 25.90 25.57 30.42 29.44 28.82 32.37 31.24
80 7 25.84 25.23 24.80 29.23 28.38 27.68 31.01 29.98
80 6 24.93 24.37 23.92 27.88 27.06 26.43 29.40 28.40
80 5 23.76 23.20 22.69 26.13 25.36 24.76 27.33 26.37
70 8 26.18 25.55 25.15 29.76 28.88 28.20 31.68 30.60
70 7 25.39 24.80 24.38 28.53 27.73 27.09 30.25 29.24
70 6 24.43 23.87 23.44 27.13 26.35 25.78 28.56 27.58
70 5 23.16 22.60 22.11 25.37 24.58 24.03 26.52 25.50
Table 2. Comparison of networks.
Network Residual Blocks Trainable Weights
This Work 6 .5M
EDSR 32 43M
Table 3. Raw data Compression Results. B.W. refers to the
bitwidth of image. Raw Compression does not depend on JPEG
Quality factor Q.
B.W. = 8 B.W. = 7 B.W. = 6 B.W. = 5
2× 1 50% 56.25% 62.5% 68.75%
2× 2 75% 78.12% 81.25% 84.37%
4× 4 93.75% 94.53% 95.31% 96.09%
Table 4. Switching activity analysis of images. For DIV2K dataset
(0701.png-0800.png).
Bit Position Swiching Activity (α)
0 = LSB 0.48
1 0.46
2 0.41
3 0.33
4 0.25
5 0.17
6 0.09
7 =MSB 0.04
aim is to reduce the energy for acquisition. The images can
be reconstructed either on edge devices or on cloud, and an
example of edge device can be SmartTV. One can reduce the
Table 5. Size Comparison. For DIV2K dataset (0701.png-
0800.png). Measured as percentage with respect to original image
in lossless JPEG format.
Quality Bitwidth 2× 1 2× 2 4× 4
100 8 22.7% 12.27% 3.48%
100 7 17.38% 9.41% 2.66%
100 6 12.88% 6.95% 1.92%
100 5 9.41% 5.11% 1.41%
90 8 7.77% 4.29% 1.21%
90 7 5.32% 3.07% 0.84%
90 6 3.68% 1.92% 0.57%
90 5 2.45% 1.27% 0.39%
80 8 5.32% 2.86% 0.82%
80 7 3.48% 1.88% 0.55%
80 6 2.25% 1.23% 0.37%
80 5 0.78% 0.80% 0.25%
70 8 4.09% 2.25% 0.65%
70 7 2.86% 1.47% 0.43%
70 6 1.76% 0.96% 0.29%
70 5 1.17% 0.61% 0.18%
image/video data transmission bandwidth. The image/video
can then be reconstructed using the GPU available in smart
TVs. For the case of smartphones, image can be acquired
in low resolution mode and can be reconstructed back in the
cloud. Since the images and photos are generally uploaded
to cloud storage nowadays, reconstruction in cloud is tech-
nically feasible. For the case of drones transmitting video
surveillance footage, the compression can reduce the power
(a) 2× 2 S.R.,Q = 100, B.W. = 8 (b) 4× 4 S.R.,Q = 70, B.W. = 5 (c) 2× 2 S.R.,Q = 80, B.W. = 6
(d) 2× 2 S.R.,Q = 90, B.W. = 7 (e) 2× 2 S.R.,Q = 80, B.W. = 6 (f) 2× 1 S.R.,Q = 100, B.W. = 8
(g) 4× 4 S.R.,Q = 90, B.W. = 6
Figure 8. Selected reconstructed images. For Fig. (f) EDSR result is not available as it is not designed for 2× 1 super resolution.
consumption in drone. It can also make transmission fea-
sible in noisy environment or over longer distance because
small image size offers an opportunity to aggressively en-
code the message packet with ECC.
5. Conclusion and future work
This paper establishes the use of DNNs for energy sav-
ings in process of image acquisition considering hardware
constraints. For a given image quality requirement, one can
acquire a low resolution image to save power and augment
resolution and quality using DNN network in a fashion dis-
cussed in this paper. This would reduce the effort spent
on the process of image acquisition resulting in improve-
ment in power and performance parameters of the imaging
device. The proposed methodology makes the system pro-
grammable i.e., user can shift from low resolution image
acquisition to traditional high resolution image acquisition
through software control of binning operation in imaging
systems which is generally exposed to the system program-
mer. Using these techniques, one can achieve more than
50% reduction in raw data and at least similar reduction in
power while maintaining the PSNR above 30 dB.
In the future, we would like to use a more accurate sim-
ulation for an imaging system. Deeper networks can also
be studied to improve the reconstruction performance. We
would also like to explore the prediction network using in-
teger operation instead of floating point to make it suitable
for edge devices and real time operation.
References
[1] 3GPP. Technical Specification Group Radio Access Net-
work; NR; Multiplexing and channel coding. Techni-
cal Specification (TS) 38.212, 3rd Generation Partnership
Project (3GPP), 12 2017. Version 15.0.0. 4
[2] Eirikur Agustsson and Radu Timofte. Ntire 2017 challenge
on single image super-resolution: Dataset and study. In The
IEEE Conference on Computer Vision and Pattern Recogni-
tion (CVPR) Workshops, July 2017. 2, 6
[3] Jihyun Cho, Jaehyuk Choi, Seong-Jin Kim, Jungsoon Shin,
Seokjun Park, James DK Kim, and Euisik Yoon. A 5.9
µm-pixel 2d/3d image sensor with background suppression
over 100klx. In VLSI Circuits (VLSIC), 2013 Symposium on,
pages C6–C7. IEEE, 2013. 3
[4] Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou
Tang. Image super-resolution using deep convolutional net-
works. IEEE transactions on pattern analysis and machine
intelligence, 38(2):295–307, 2016. 2
[5] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.
Deep residual learning for image recognition. In Proceed-
ings of the IEEE conference on computer vision and pattern
recognition, pages 770–778, 2016. 1, 4, 5
[6] Hong-Yi Huang, Patrick Adrian Conge, and Li-Wei Huang.
Cmos image sensor binning circuit for low-light imaging. In
2011 IEEE Symposium on Industrial Electronics and Appli-
cations, pages 586–589. IEEE, 2011. 3
[7] Jiwon Kim, Jung Kwon Lee, and Kyoung Mu Lee. Accurate
image super-resolution using very deep convolutional net-
works. In Proceedings of the IEEE conference on computer
vision and pattern recognition, pages 1646–1654, 2016. 2
[8] Oichi Kumagai, Atsumi Niwa, Katsuhiko Hanzawa, Hide-
taka Kato, Shinichiro Futami, Toshio Ohyama, Tsutomu
Imoto, Masahiko Nakamizo, Hirotaka Murakami, Tatsuki
Nishino, et al. A 1/4-inch 3.9 mpixel low-power event-
driven back-illuminated stacked cmos image sensor. In
2018 IEEE International Solid-State Circuits Conference-
(ISSCC), pages 86–88. IEEE, 2018. 3
[9] Christian Ledig, Lucas Theis, Ferenc Husza´r, Jose Caballero,
Andrew Cunningham, Alejandro Acosta, Andrew Aitken,
Alykhan Tejani, Johannes Totz, Zehan Wang, et al. Photo-
realistic single image super-resolution using a generative ad-
versarial network. arXiv preprint, 2017. 2, 5
[10] Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and
Kyoung Mu Lee. Enhanced deep residual networks for sin-
gle image super-resolution. In Proceedings of the IEEE Con-
ference on Computer Vision and Pattern Recognition Work-
shops, pages 136–144, 2017. 2, 5, 6, 7
[11] Jing Liu, Wanning Sun, and Yutao Liu. Bit-depth enhance-
ment via convolutional neural network. In International Fo-
rum on Digital TV and Wireless Multimedia Communica-
tions, pages 255–264. Springer, 2017. 2
[12] Jing Liu, Wanning Sun, Yuting Su, Peiguang Jing, and Xi-
aokang Yang. Be-calf: bit-depth enhancement by concate-
nating all level features of dnn. IEEE Transactions on Image
Processing, 2019. 2
[13] Yixin Luo, Saugata Ghose, Yu Cai, Erich F Haratsch, and
Onur Mutlu. Heatwatch: improving 3d nand flash mem-
ory device reliability by exploiting self-recovery and tem-
perature awareness. In 2018 IEEE International Symposium
on High Performance Computer Architecture (HPCA), pages
504–517. IEEE, 2018. 4
[14] Xiaojiao Mao, Chunhua Shen, and Yu-Bin Yang. Image
restoration using very deep convolutional encoder-decoder
networks with symmetric skip connections. In Advances
in neural information processing systems, pages 2802–2810,
2016. 2
[15] Yusuke Oike and Abbas El Gamal. Cmos image sensor with
per-column σδ adc and programmable compressed sensing.
IEEE Journal of Solid-State Circuits, 48(1):318–328, 2013.
3
[16] Yu Shichao, Hu Zhizhong, and Chen Xin. A scalable multi-
pipeline jpeg encoding architecture. In 2016 28th Interna-
tional Conference on Microelectronics (ICM), pages 369–
372. IEEE, 2016. 4
[17] Karen Simonyan and Andrew Zisserman. Very deep convo-
lutional networks for large-scale image recognition. arXiv
preprint arXiv:1409.1556, 2014. 4
[18] Yuting Su, Wanning Sun, Jing Liu, Guangtao Zhai, and
Peiguang Jing. Photo-realistic image bit-depth enhancement
via residual transposed convolutional neural network. Neu-
rocomputing, 2019. 2
[19] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet,
Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent
Vanhoucke, and Andrew Rabinovich. Going deeper with
convolutions. In Proceedings of the IEEE conference on
computer vision and pattern recognition, pages 1–9, 2015.
4
[20] Radu Timofte, Vincent De Smet, and Luc Van Gool. A+:
Adjusted anchored neighborhood regression for fast super-
resolution. In Asian conference on computer vision, pages
111–126. Springer, 2014. 2
[21] Radu Timofte, Shuhang Gu, Jiqing Wu, and Luc Van Gool.
Ntire 2018 challenge on single image super-resolution:
methods and results. In Proceedings of the IEEE Confer-
ence on Computer Vision and Pattern Recognition Work-
shops, pages 852–863, 2018. 1, 2, 4
[22] Seiya Umeda, Hiroshi Watanabe, Tomohiro Ikai, Tomonori
Hashimoto, Takeshi Chujoh, and Norio Ito. Joint super-
resolution and bit depth extension by dnn. In International
Workshop on Advanced Image Technology (IWAIT) 2019,
volume 11049, page 1104925. International Society for Op-
tics and Photonics, 2019. 2
[23] Zhihao Wang, Jian Chen, and Steven CH Hoi. Deep learn-
ing for image super-resolution: A survey. arXiv preprint
arXiv:1902.06068, 2019. 2
[24] Wenming Yang, Xuechen Zhang, Yapeng Tian, Wei Wang,
and Jing-Hao Xue. Deep learning for single image super-
resolution: A brief review. arXiv preprint arXiv:1808.03344,
2018. 2
[25] Marcus Yip and Anantha P Chandrakasan. A resolution-
reconfigurable 5-to-10-bit 0.4-to-1 v power scalable sar adc
for sensor applications. IEEE Journal of Solid-State Circuits,
48(6):1453–1464, 2013. 3
[26] Satoshi Yoshihara, Yoshikazu Nitta, Masaru Kikuchi, Ken
Koseki, Yoshiharu Ito, Yoshiaki Inada, Souichiro Ku-
ramochi, Hayato Wakabayashi, Masafumi Okano, Hiromi
Kuriyama, et al. A 1/1.8-inch 6.4 mpixel 60 frames/s cmos
image sensor with seamless mode change. IEEE Journal of
Solid-State Circuits, 41(12):2998–3006, 2006. 3
[27] Ke Yu, Chao Dong, Chen Change Loy, and Xiaoou Tang.
Deep convolution networks for compression artifacts reduc-
tion. arXiv preprint arXiv:1608.02778, 2016. 2
[28] Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and
Lei Zhang. Beyond a gaussian denoiser: Residual learning of
deep cnn for image denoising. IEEE Transactions on Image
Processing, 26(7):3142–3155, 2017. 2
[29] Zhimin Zhou, Bedabrata Pain, and Eric R Fossum. Frame-
transfer cmos active pixel sensor with pixel binning. IEEE
Transactions on electron devices, 44(10):1764–1768, 1997.
3
