Parallel Inverse Halftoning by Look-Up Table (LUT) Partitioning by Siddiqi, Umair F. & Sait, Sadiq M.
 1
 
 
 
Parallel Inverse Halftoning by Look-Up Table (LUT) Partitioning 
 
Umair F. Siddiqi and Sadiq M. Sait 
umair@ccse.kfupm.edu.sa,  sadiq@kfupm.edu.sa 
 
KFUPM Box: 673  
Department of Computer Engineering, 
King Fahd University of Petroleum & Minerals, Dhahran 31261 
Saudi Arabia 
Telephone: +966-3-860 1099 
Fax: +966-3-860 3955 
 
 
 
 
 
 
 
 
 
 
 2
 
Abstract 
 
Look-Up Table (LUT) method for inverse halftoning is computation less, fast, and also yields 
goods results. It employs a single LUT that is stored in a ROM and contains pre-computed 
contone (gray level) values for inverse halftone operation. This paper proposes an algorithm 
that can perform parallel inverse halftone operation by partitioning the single LUT into N 
smaller Look-Up Tables (s-LUTs). Thereby, upto k (k≤N) pixels can be concurrently fetched 
from the halftone image, and their contone values can also be fetched concurrently from 
separate smaller Look-Up Tables (s-LUT). The parallelization increases the speed of inverse 
halftoning by upto k times while the total entries in all s-LUTs remains equal to the entries in 
the single LUT of the serial LUT method. Some degradation in image quality is possible due 
to pixel loss during parallel fetching. This is due to some contone values cannot be fetched in 
the same cycle because some other contone value is being fetched from the s-LUT. The 
complete implementation of the algorithm requires two CPLD devices for computational 
portion, external content addressable memories (CAM) and static RAMs to store s-LUTs.  
 
Keywords: (1) Inverse Halftoning, (2) Hardware Implementation, (3) Look-Up Table Inverse 
Halftoning, (4) Complex Programmable Logic Devices (CPLD), (5) Image Processing, (6) 
Parallelizing, (7) Complex Programmable Logic Devices (CPLD). 
 
 
 
 
 
 
 3
1. Introduction  
The process of rendition of continuous tone pictures on media on which only two levels can 
be displayed is defined as Halftoning [1]. The problem has gained importance since the time 
of printing press when attempts were made to print images on paper by adjusting the size of 
dots according to the local print intensity. This process is termed as analog halftoning. With 
the availability and adoption of bi-level devices such as fax machines and plasma displays, 
digital halftoning has become important [2]. The input to a digital halftoning system is a gray 
level image in which pixels have more than two levels (e.g. 256 levels), and the result of the 
halftoning process is an image that has only two levels i.e., 1 or 0. Inverse halftoning on the 
other hand, is the reconstruction of gray level images from halftone images. Inverse halftone 
operation finds application in areas where processing is required on printed images. The 
images are first scanned, inverse halftoned and then operations like zooming, rotation and 
transformation are applied. Standard compression techniques cannot process halftones 
directly and therefore inverse halftoning is required before compression of printed images 
can be performed [1].  
 
Look-Up Table (LUT) inverse halftoning is a fast and low computation method [3]. LUT 
inverse halftoning was first introduced by Netravali and Bowen [4], but requires some 
information to be known that is not always available for halftone images. Subsequently Ting 
and Riskin [5] proposed another LUT method but did not aimed to obtained good quality. In 
the recent past a computation free LUT method was proposed by Mese and Vaidyanathan [1, 
3]. It can provide fast inverse halftoning with good image quality, and it can be applied on 
several different halftones. Two more methods for LUT inverse halftoning [6, 7] were 
suggested by Kuo-Liang Chung et al. and P. C. Chang et al. and their methods can give better 
 4
image quality but they are not completely computation free and require computation in 
addition to Look-Up Table (LUT) access. In Mese et al. method, one template that consists of 
the pixel to be inverse halftoned, and pixels in its neighborhood is fetched from the halftone 
image in a p-bits (p=17, 21, 22) vector and is used to form the address for the LUT. Its pre-
computed contone value is fetched from this address of the LUT. However, this method is 
serial and is able to inverse halftone only one template at a time. In this paper, we present an 
algorithm that can perform parallel inverse halftone operation by partitioning the single LUT 
of Mese at al. method into N number of smaller Look-Up Tables (s-LUTs). The N s-LUTs 
contain total entries equal to the entries in the single LUT of serial LUT method. In this way, 
the proposed algorithm can provide significant advantage in speed of inverse halftone 
operation and at the same time provides saving in memory requirements. In the proposed 
algorithm, ‘k’ templates are concurrently fetched from the halftone image and their contone 
values are obtained through s-LUTs. The rest of this paper is organized as follows: First the 
serial LUT method is described then the parallelization of LUT method for inverse halftoning 
is described in detail that basically employs partitioning the LUT based on some partitioning. 
This is followed by the simulation of the proposed algorithm and discussion about its 
performance. In the last section, implementation details of the proposed algorithm using 
CPLD devices are discussed.   
  
2 Look-Up Table (LUT) Method for Inverse Halftoning  
In the LUT method for inverse halftoning a template represented by ‘t’ is a group of pixels 
that consists of pixel to be inverse halftoned and the pixels in its neighborhood. The LUT 
method uses three types of templates namely: 16pels, 19pels and Rect. The 16pels consists of 
17-pixels, 19pels consists of 20-pixels and Rect consists of 22 pixels. The templates are 
 5
fetched from the halftone image following the raster-scan style, i.e., from left to right, in a 
row and from top to bottom. One pixel with surrounding ones (so called a template (t)) is 
fetched and inverse halftoned before the next template is fetched. The Look-Up Table (LUT) 
stores pre-computed contone values of a large number of templates. The templates for 
storage in the LUT are obtained from a training set of images that comprise halftone images 
and corresponding continuous tone images. The templates are fetched from the halftone 
images and their contone values are fetched from corresponding continuous tone images. 
When a template occurs more than once then its contone value is the mean of all contone 
values that corresponds to that template in the training set. The inverse halftone operation is 
performed in this way that a template (t) is fetched from the halftone image and it is sent to 
the Look-Up Table (LUT). If the LUT has the stored contone value for the template (t) it 
returns it otherwise the template (t) undergoes through anyone of these methods: (a) Low 
Pass Filtering, or (b) Best Linear Estimator [1]. When same halftone algorithm is used in 
training set images and the images going through inverse halftone operation then all 
templates always find corresponding contone value in the LUT, and consequently, this 
method becomes completely computation free. The LUT method for inverse halftoning can 
also be applied to color halftones where a separate LUT exists for color planes R, G, and B.  
 
3 Parallel Look-Up Table (LUT) Inverse Halftoning 
In order to perform parallel LUT inverse halftoning, two or more templates should be fetched 
from the halftone image and LUT (Look-Up Table) inverse halftone operation is applied to 
them at the same time. The main problems in parallelizing LUT method for inverse 
halftoning are the following:  
 6
(a) The Look-Up Table (LUT) is composed of a single memory block that does not allow 
simultaneous access to more than one location. Therefore, parallel templates cannot 
fetch their contone values at the same time.  
(b) If the LUT method for inverse halftoning is parallelized as it is then the memory 
requirements grow very large because one needs to store one template (t) for each 
template that is fetched in parallel.  
The next section presents an algorithm to parallelize the LUT method for inverse halftoning 
while solving the above problems.  
 
4.  Algorithm to Perform Parallel LUT Inverse Halftoning 
This section shows the algorithms that can perform parallel inverse halftone operation by 
enhancing the serial LUT method of Mese and Vaidyanathan [3]. In the proposed algorithm 
N smaller Look-Up Tables (s-LUTs) are used in place of a single LUT of serial LUT method. 
The proposed algorithm also introduces a circuitry that can distinguish k templates that are 
concurrently fetched from the halftone image through unique numbers. As a result of these 
two modifications, k templates can be fetched concurrently and go through parallel inverse 
halftone operation using N s-LUTs and therefore their contone values can be obtained 
simultaneously. The proposed parallel inverse halfoning using s-LUTs consists of two steps: 
(1) Algorithm to generate ‘N’ smaller Look-Up Tables (s-LUTs), and (2) Algorithm to send 
‘k’ concurrently fetched templates to distinct s-LUTs. In the rest of this section the algorithms 
are described in detail. 
4.1  Idea behind Proposed Algorithm 
The proposed algorithm to perform parallel inverse halftone operation is based on the 
initiation to partition the single LUT into N smaller Look-Up Tables (s-LUTs). The 
 7
partitioning can be done linearly or can use any sophisticated technique. In linear partitioning 
the contents from the training set are assigned to s-LUTs based on some fixed criteria like 
equal number of contents in all N s-LUTs. This approach has the problem that during inverse 
halftone operation it becomes difficult to estimate which template value exists in which s-
LUT. The algorithm proposed in this paper, instead partition the LUT into N s-LUTs by using 
a new approach. The new approach is initiated when some halftone images are observed and 
it is found that adjacent, i.e., either top-bottom, or left-right template values differ from each 
other in terms of number of ones present in them. The paper defines a function that takes 
XOR between the fetched template and m, where m is the mean of all template values present 
in the training set. Then the bits in the XOR result are added to calculate the number of ones. 
At this point a unique result is obtained for each concurrently fetched template, i.e., k unique 
value are obtained. However, their values vary from 0 to 2P-1, whereas number of s-LUTs is 
N. In next step, mod N operation is applied and numbers in range from 0 to N-1 are obtained 
for all concurrently fetched templates. The graph in Figure 1 shows the percentage of times 
this approach is successful is distinguishing concurrently fetched templates. Taking mod N is 
computation free when N is an exponent of 2, i.e., 2, 4, 8, 16, 32, or etc then mod N operation 
is performed by only keeping least significant log2N-bits of the input.        
0
10
20
30
40
50
60
70
0 20 40 60 80
number of s-LUTs ,  i.e., N
Pe
rc
en
ta
ge
 o
f t
im
es
 th
e 
pr
op
os
ed
 a
pp
ro
ac
h 
is
 u
na
bl
e 
to
 d
is
tin
gu
is
hl
y 
re
pr
es
en
t a
ll 
co
nc
ur
re
nt
ly
 fe
tc
he
d 
te
m
pl
at
es
when k= 4
when k= 8
when k= 16
 
Figure 1: Graph showing performance of proposed approach to distinguish concurrently fetched templates. 
 8
The N s-LUTs will be stored in N external memories and templates fetched from the halftone 
image act as input addresses to memories. Distribution of templates among N s-LUTs should 
be uniform so that memories of equal sizes can be utilized, however when s-LUTs do not 
have equal sizes than large s-LUTs can be stored in more than one memories and in that case 
(number of memories) > N. Two other approaches that can be also be used are: (1) to add the 
bits in the templates and then take mod N, and (2) directly take mod N of the fetched 
template. However, first taking XOR with m yields the best image quality among them 
therefore the algorithm proposed in this paper uses only this approach.    
4.2  Algorithm to generate N smaller Look-Up Tables (s-LUTs)  
N number of smaller Look-Up Tables (s-LUTs) must be generated before inverse halftone 
operation is performed, similar to the procedure of serial LUT method. The s-LUTs are 
numbered from 0 to N-1, where N must be an exponent of 2, i.e., 2, 4, 8, 16, etc. The 
algorithm is shown in Figure 2. It starts by building ‘Training_set’, that consists of 
continuous tone images and their halftone versions. In step 2, a template represented by ‘t’ is 
fetched from the halftone image. In step 3, first the fetched templates t is taken XOR gated 
with m, where m= mean of all template values present in the training set. Then bits in the 
obtained result are added and finally its mod N operation is taken by keeping only least 
significant log2N-bits. The result now obtained is the result of step 3. In the next step, 
template ‘t’ is sent to s-LUT that has same number as the result returned in step 3. Now 
procedure from step 2 to step 4 are repeated by fetching another template from the training 
set and it continuous until all templates in the training set are fetched and stored in s-LUTs.          
 9
 
Figure 2: Algorithm to generate smaller Look-Up Tables (s-LUTs).  
 
4.3  Algorithm to send ‘k’ concurrently fetched templates to distinct s-LUTs   
This algorithm performs the task to assign unique numbers in range from 0 to N-1 to k 
concurrently fetched templates and then send them to distinct s-LUTs using their unique 
numbers. In this way it can perform parallel inverse halftone operation using s-LUTs. The 
algorithm is shown in Figure 3. It starts by fetching k templates from the halftone image in 
which the templates are numbered from 1 to k. Then in step 3, mod N operation is performed 
on templates by keeping their only significant log2N-bits. In step 4, if any two or more 
templates have same value returned in step 3 then among them only the template that has the 
highest number assigned to it in step 2 is kept and others having same step 3’s result are 
discarded. The templates that are not discarded are now sent to s-LUTs that have same 
numbers as their values returned in step 3. In step 6, contone values to templates that were 
discarded in step 4 or the templates that do not find their contone values in their s-LUTs are 
assigned by copying contone values from their neighbors. Finally contone values of all k 
fetched templates are delivered to the output. This process repeats until all templates present 
in the halftone image are inverse halftoned. This algorithm is pipelined therefore, each step 
 10
can be performed in parallel on different data inputs and new k templates can be fetched on 
every clock cycle from the halftone image.  
 
Figure 3: Algorithm to perform parallel inverse halftoning using s-LUTs.  
 
5.  Simulation 
This section shows simulation of the proposed algorithm. The simulation is performed by 
implementing it in Java programming language. It starts by building a training set of 17 gray 
level and corresponding halftone images. Then value of N is chosen and s-LUTs are 
generated. The parallel inverse halftone operation is performed by setting different values of 
k. In this section, first the results of generating s-LUTs are shown and then some halftone 
images that are not present in the training set are inverse halftoned and are shown with their 
image quality in terms of Peak Signal to Noise Ratio (PSNR). 
5.1  Generation of s-LUTs 
The s-LUTs are generated using the training set and partitioning of templates among N s-
LUTs is shown in Figures 3 and 4. Figure 3 shows the partitioning when N= 8 and Figure 4 
shows the partitioning when N=16. It is shown that each s-LUT stores almost equal number 
 11
of contents when N= 8 and when N= 16 large variation in s-LUT sizes occur with some s-
LUTs remains emptied.  
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
0 1 2 3 4 5 6 7
smaller Look-Up Tables (s-LUTs)
N
um
be
r o
f t
em
pl
at
es
 p
re
se
nt
 in
 s
-L
U
Ts
 
Figure 4: Distribution of templates to s-LUTs (when N=8).  
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
smaller Look-Up Tables (s-LUTs)
nu
m
be
r o
f t
em
pl
at
es
 p
re
se
nt
 in
 s
-L
U
Ts
 
Figure 5: Distribution of templates to s-LUTs (when N=16).  
     
5.2  Parallel Inverse Halftoning 
This section shows the simulation of the proposed algorithm to perform parallel inverse 
halftone operation using s-LUTs. The simulation shows inverse halftone operation 
accomplished with different values of ‘k’ and ‘N’. The graph in Figure 6 shows average 
image quality when compared to quality of images obtained from the proposed algorithm in 
 12
terms of PSNR. The graph shows curves drawn for different values of k and their y-axis 
values i.e., image quality varies with increase in the value of N. However, N should an 
exponent of 2 i.e., 2, 4, 8, 16, etc. The results show an average of results obtained from 
images: Boat, peppers, and clock. Some sample images obtained from the serial LUT method 
and from proposed algorithm are shown in Figures 7 to 15, along with their original 
continuous tone versions.  
0
5
10
15
20
25
30
35
0 5 10 15 20
number of smaller Look-Up Tables (s-LUTs) 
i.e., N
Im
ag
e 
qu
al
ity
 in
 te
rm
s 
of
 P
S
N
R
 in
 
dB
When k= 4
When k= 8
When k= 16
 
Figure 6: Performance of the proposed algorithm in terms of image quality for different values of k and N.  
 
 
 
 
 
 
 
 
 13
 
Figure 7: Original continuous tone image named ‘peppers’. 
 
Figure 8: Peppers obtained from serial LUT method, PSNR= 29.4154 dB. 
 
Figure 9: Peppers obtained from inverse halftone operation using proposed algorithm with k=4 and N=8, PSNR= 29.2605 dB. 
 14
 
Figure 10: Original continuous tone image named ‘clock’. 
 
Figure 11: Clock obtained fro serial LUT method, PSNR= 30.1681 dB. 
 
Figure 12: Clock obtained from inverse halftone operation using proposed algorithm with k=4 and N=8, PSNR= 30.0846 dB. 
 15
 
Figure 13: Original continuous tone image named ‘boat’. 
 
Figure 14: Boat obtained from serial LUT method, PSNR= 28.7071 dB. 
 
Figure 15: Boat obtained from inverse halftone operation using proposed algorithm with k=4 and N=8, PSNR= 28.5449 dB. 
 
 16
6.  Integrated Circuit Design 
The integrated circuit that can implement the proposed algorithm with parameters k=4 and 
N=8 is designed. The target platform is Field Programmable Gate Array (FPGA) devices. 
The circuit is divided into blocks and each block is represented through Boolean equations 
and can work independently on different inputs. The following text shows the details of the 
integrated circuit. 
 
Block 1: In this stage four templates I0, I1, I2 and I3 are fetched from the halftone image and 
stored in registers t0, t1, t2 and t3 respectively. The Boolean equations representing logic in 
this block are shown below:   
t0(0…p-1) = I0(0…p-1), t1(0…p-1) = I1(0…p-1), t2(0…p-1) = I2(0…p-1), t3(0…p-1) = I3(0…p-1)           (6.1) 
Block 2: In this block bits in each template are added using Carry Save Adder (CSA) Tree. 
The Boolean equations representing operations in this block are shown below and the CSA 
tree for N= 8 and p=20 is shown in Figure 16: 
Si = CSA_TREE(ti(0…p-1)), Where i= 0, 1, 2, & 3.           (6.2) 
 
Figure 16: Carry Save Adder (CSA) Tree when p=20 and N=8.  
 17
Block 3: In this block templates t0, t1, t2 and t3 are appended with sequence numbers 001, 
010, 011, and 100 respectively. The Boolean expressions representing operations in this 
block are:  
t0’(0…p+2) = t0(0…p-1) & 001, t1’(0…p+2) = t1(0…p-1) & 010,  
t2’(0…p+2) = t2(0…p-1) & 011, t3’(0…p+2) = t3(0…p-1) & 100.    (6.3) 
Block 4: This block consists of four 1x8 multiplexers. The Boolean expressions that 
represent the logic in this block are:  
2)),'(0...p (t(0)  slut(1)  slut(2)  slut2) (0...pA
2)),'(0...p (t (0)slut (1)  slut(2)  slut2) (0...pA
2)),'(0...p (t(0)  slut (1)slut (2)  slut2) (0...pA
2)),'(0...p (t (0)slut  (1)slut (2)  slut2) (0...pA
2)),'(0...p (t(0)  slut(1)  slut (2)slut 2) (0...pA
2)),'(0...p (t (0)slut (1)  slut (2)slut 2) (0...pA
2)),'(0...p (t(0)  slut (1)slut  (2)slut 2) (0...pA
2)),'(0...p (t (0)slut (1) slut (2) slut 2) (0...pA
iiiii
iiiii
iiiii
iiiii
iiiii
iiiii
iiiii
iiiii
+⋅⋅←+
+⋅⋅⋅←+
+⋅⋅⋅←+
+⋅⋅⋅←+
+⋅⋅⋅←+
+⋅⋅⋅←+
+⋅⋅⋅←+
+⋅⋅⋅←+
⋅]7[
]6[
]5[
]4[
]3[
]2[
]1[
]0[
     (6.4) 
In the above equations i= 0 to 7. i=0 for first de-multiplexer, i=1 for second de-multiplexer 
and so on. The indices Ai[0] to Ai[7] represents 8 outputs from a de-multiplexer. 
Block 5: This block consists of eight 4x1 multiplexers that are connected to s-LUTs. The 
Boolean expressions that represent logic operations in this block are shown below: 
         
2)(0..pi2)). A(piA1) (pi A(p) i(A
 2))(pi A1) (pi A(p) i(A  2))(piA1) (pi A(p) i(A  2))(pi A1) (pi A(p) i(A
2)(0..pi A2)) (pi A1) (pi A(p) i (A 2))(piA1) (pi A(p) i(A
  2))(pi A1) (pi A(p) i(A 2) (0..pi A2))(piA1) (pi A(p) i(A
  2))(pi A1) (pi A(p) i(A2) (0 .. pi A2)) (pi A1) (pi A(p) i (A2)(0..pgi
+++−+
⋅++++⋅++++⋅++++
++⋅++++⋅++++
⋅++++++⋅++++
⋅++++++⋅++++←+
][][][][
][][][][][][][][][
][][][][][][][
][][][][][][][
][][][][][][][
1111
222333444
2222333
4443333
4444444
     (6.5) 
In the above equation, i= 0 to 7 and i=0 refers to first multiplexer, i=1 refers to second 
multiplexer and so on. The output g0 is from first multiplexer, g1 is from second multiplexer 
and so on. 
Block 6: In this block templates fetch their gray level values from s-LUTs. In hardware, it 
contains implementation of eight smaller Look-Up Tables (s-LUTs) using Content 
 18
Addressable Memory (CAM) and Read Only Memory (ROM) pairs. A combination of CAM-
ROM is used because each s-LUT stores a very small fraction of values out of 2p possible 
values, when templates are p-bits wide. The block diagram in Figure 17 shows the 
implementation of one s-LUT. The CAM stores the templates that are assigned to the s-LUT 
and ROM stores the gray level values. The Boolean equations illustrating the operations in 
this block are shown below: 
2)(0...pg2)(0...pf
1)(0..d(x ROM(0..7) c
1)),(0..p(g CAM1) (0..dx
bits8 256 i.e. levels greynumber of 
1 2sLUT) Up Table ( Look a smallerentries innumber of 
ii
iii
iii
d
+←+
−←
−←−
−=−
−=−
    (6.6) 
In the above expressions, i= 0 for s-LUT number 0, i=1 for s-LUT number 1 and so on. i varies from 0 to 7.  
 
When the contents of an s-LUT are large and cannot fit in a single memory module than more 
one memory modules or CAM-ROMs should be used. Figure 18 shows one s-LUT 
implemented using two CAM-ROM pairs. The CAM returns zero ROM address for entries 
not present in it and as a result of this output from all ROMs can be OR gated to get one valid 
result of that s-LUT. 
Block 7: In this block gray level values of non-discarded templates are copied to templates 
that were discarded. The approach used is that, a discarded template is assigned gray level 
value of template that was not discarded and has nearest highest number appended to it in 
Step 2 of the algorithm. The Boolean expressions representing integrated circuit that perform 
operations in this block are shown below:  
 
(0..7)a  (0..7)levelGray
(0..7)b  a  (0..7)a  a  (0..7)a
a  a  a  a  a  a  a  a  a
(0..7)c  a (0..7)c  a  (0..7)c  a  (0..7)c  a  (0..7)c a  (0..7)c  a  (0..7)c a  (0..7)c  a  (0..7)a
7 to 0 i  where2),(pf   1)(pf    (p)f   a
10t
898910
765432109
8766554433221108
iiii
←
⋅+⋅←
+++++++←
⋅+⋅+⋅+⋅+⋅+⋅+⋅+⋅←
=+⋅+⋅←
0
7
_
(6.7) 
where Gray_levelt0 is the gray level value corresponding to template t0.   
 
 19
(0..7) b(0..7) Gray_level
(0..7) d b (0..7)  b  b(0..7) b
 b  b  b  b  b  b  b  b b
(0..7) c b(0..7)  c  b(0..7)  c  b(0..7)  c  b(0..7)  c b(0..7)  c  b(0..7)  c b(0..7)  c  b(0..7) b
 0 to 7i , where  2)(pf . p . f(p)f  b
10t1
898910
765432109
87766554433221108
iiii
←
⋅+⋅←
+++++++←
⋅+⋅+⋅+⋅+⋅+⋅+⋅+⋅←
=++← )1(
       (6.8) 
where Gray_levelt1 is the gray level value corresponding to template t1.   
 
(0..7) d(0..7) levelGray
(0..7) e d (0..7)  d  d(0..7) d
 d  d  d  d  d  d  d  d d
(0..7) c a(0..7)  c  a(0..7)  c  a(0..7)  c  a(0..7)  c a(0..7)  c  a(0..7)  c a(0..7)  c  a(0..7) d
7 to 0 i    where2)(pf . 1)(pf . (p)f  d
10t
898910
765432109
87766554433221108
iiii
←
⋅+⋅←
+++++++←
⋅+⋅+⋅+⋅+⋅+⋅+⋅+⋅←
=++←
2_
,
(6.9) 
where Gray_levelt2 is the gray level value corresponding to template t2.   
 
(0..7) e(0..7) levelGray
(0..7) c e(0..7)  c  e(0..7)  c  e(0..7)  c  e(0..7)  c e(0..7)  c  e(0..7)  c e(0..7)  c  e(0..7) e
7 to 0 i     where2),(pf . 1)(pf . (p)f  e
8t
87766554433221108
iiii
←
⋅+⋅+⋅+⋅+⋅+⋅+⋅+⋅←
=++←
3_
(6.10) 
where Gray_levelt3 is the gray level value corresponding to template t3.   
 
The four gray level values: Gray_levelt0, Gray_levelt1, Gray_levelt2 and Gray_levelt3 are 
stored at correct (row, column) coordinates in the output gray level image. The algorithm is 
pipelined in which each step can work independently on different inputs.  
 
 
Figure 17: smaller Look-Up Table (s-LUT) implemented in terms of CAM and ROM 
 
Figure 18: One s-LUT implemented using two ROMs and CAMs. 
 20
 
7 Hardware Implementation 
The computational part of the proposed algorithm with k=4 and N= 8 is implemented using 
VHDL language on Altera Complex Programmable Devices (CPLD). It consumes two 
CPLDs and external CAMs and SRAMs are used to store s-LUTs. Figure 18 illustrates the 
system block diagram. The CPLDs used are Altera [9] MAX II and CAM and SRAM are 
implemented in Altera APEX FPGA devices but can be replaced with discrete devices in 
future designs. The CPLD I contains the proposed parallelization algorithm and CPLD II 
contains the pixel compensation circuit. The assignment of template numbers to incoming 
“19pels” is performed partially in both CPLD I & II in order to fit the design within MAX II 
pin count and to reduce fitting complexity of CPLD I.     
 
Figure 19: Block diagram of the algorithm implementation.  
In Figure 17, CPLD I accepts 4 “19pels” from the halftone image and sent each “19pels” 
according to value return by XM function to its four outputs out of total eight output ports. 
The ports from CPLD I are connected to CAMs that are connected to SRAMs. The grey level 
values from SRAMs go to CPLD II where circuits for gray level value copying are present. 
The CPLD II gives grey level values in the correct sequence, i.e., G1 corresponds to contone 
value of P1 and so on. The results of CPLD implementation obtained from Fitter and Timing 
analyzer tools present in Altera Quartus II 5.0 are tabulated in Table I.  
 
 21
Table I: Results of CPLD implementations 
Device Area I/O pins Clock Frequency 
CPLD I 
EPM2210GF324I5 
Logic elements: 
2049/2210 
261/272 33.86 MHz 
CPLD II 
EPM2210GF324I5 
Logic elements: 
262/2210 
262/272 164.85 MHz 
 
8. Conclusion 
The parallelization of LUT inverse halftoning is performed which has the following 
advantages: (a) In place of one pixel, now up-to k pixels can be fetched and inverse halftoned 
simultaneously, (b) N number of smaller Look-Up Tables (s-LUTs) are used in place of a 
single LUT, however, total entries in all s-LUTs remain equal to the entries present in the 
single LUT of serial LUT method, and (c) Fast parallel inverse halftone operation can be 
performed on fast hardware instead of on embedded hardware-software.    
 
Acknowledgements 
The authors acknowledge King Fahd University of Petroleum & Minerals, Dhahran, Saudi 
Arabia for all support.  
 
 
 
 
 
 
 
 
 22
References 
[1] Murat Mese and P. P. Vaidyanathan, “Recent Advances in Digital Halftoning and Inverse 
Halftoning Method,” IEEE Trans. Circuits and Systems I, June 2002.  
[2] Ping Wah Wong and Nasir D. Memon, “Image Processing for Halftoning,” IEEE Signal 
Processing Magazine, vol. 20, July 2003. 
[3] Murat Mese and P. P. Vaidyanathan, “Lookup Table (LUT) Method for Inverse 
Halftoning,” IEEE Transactions on Image Processing, vol. 10, October 2001.  
[4] A. N. Netravali and E. G. Bowen, “Display of Dithered Images,” Proc. SID, vol. 22, pp. 
185-190, 1981. 
[5] M. Y. Ting and E. A. Riskin, “Error-diffused Image Compression using a binary to gray 
scale decoder and predictive pruned tree structured vector quantization,” IEEE Trans. Image 
Procedding, vol. 3, pp. 854-858, 1994.  
[6] P. C. Chang, C. S. Yu and T. H. Lee, “Hybrid LMS-MMSE Inverse Halftoning 
Technique,” IEEE Transactions on Image Processing, vol. 10, January 2001. 
[7] Kuo-Liang Chung; Shih-Tung Wu, “Inverse Halftoning Algorithm using Edge-Based 
Lookup Table Approach,” IEEE Trans. Image Processing, Volume 14,  Issue 10,  Oct. 2005, 
pp. 1583 – 1589. 
[8] R. Floyd and L. Steinberg, “An Adaptive Algorithm for Spatial Grey-scale,” Proc. SID, 
pp. 75-77, 1976. 
[9] http://www.altera.com 
