A VLSI Array Architecture for Hough Transform by Maharatna, Koushik & Banerjee, Swapna
A VLSI Array Architecture for Hough Transform
K. Maharatna*
Systems Design Dept.
Institute for Semiconductor Physics (IHP)
Technology Park 25, D-15236, Frankfurt (Oder), Germany
email: maharatna@ihp-ffo.de
          Swapna Banerjee
Dept. of E & ECE
Indian Institute of Technology
Kharagpur – 721302 (INDIA)
email: swapna@ece.iitkgp.ernet.in
(* Author for correspondence)
Abstract:
In  this  article,  an  asynchronous  array  architecture  for  straight  line  Hough
Transform  (HT)  is  proposed  using  a  scaling  free  modified  CORDIC  (Co-Ordinate
Rotation Digital Computer) unit as a basic Processing Element (PE). It exhibits four-fold
angle  parallelism  by  dividing  the  Hough  space  into  four  subspaces  to  reduce  the
computation burden to 25% of the conventional requirements. A distributed accumulator
arrangement scheme is adopted to ensure conflict free voting operation. The architecture
is then extended to compute circular and elliptic HT given their centers and orientations.
Compared  to  some  other  existing  architectures,  this  one  exhibits  higher  computation
speed.
Keywords: Hough transform, CORDIC,  Low  power,  Image  processing,  Multiplierless
array architecture.1.  Introduction:
Hough  Transform  (HT)  is  a  well-known  technique  for  efficient  shape
recognition
(1, 2). High computational complexity and excessive memory requirement are
the major obstacles for monolithic integration of HT
(3). Memory requirement problem
may be simplified by current level of memory integration technique
(4).In this paper we
restrict ourselves to speed up the computational time of transformation part of the HT i.
e., the computation of vote address in the parameter space.
Different  architectures  and  algorithms  have  been  proposed  to  speed  up  the
computational time for HT
(4, 5,  6,  7,  8,  9). Most of the Hough – based methods encounter
the  evaluation  problem  of  implicit  trigonometric  and  transcendental  functions.  This
makes  the  monolithic  implementation  of  the  entire  algorithm  rather  difficult.  To
overcome this problem, CORDIC based architectures
(3, 10). Are used to generate the vote
address in parameter space.
The motivation of this work is to construct the HT architectures suitable for VLSI
implementation,  which  can  exhibit  high  throughput  rate  at  reduced  computational
complexity. For this purpose CORDIC based asynchronous array architectures have been
proposed. The total PE and angle scan range requirements are reduced by adopting an
angle  parallelization  scheme.  To  overcome  the  scaling  problem  inherent  to  the
conventional  CORDIC  unit,  a  scaling  free  modified  CORDIC  unit
(11)
  which  can  be
implemented  using  crosscoupled  bus  connections  and  adders.  A  high  throughput
asynchronous  array  architecture  for  straight  line  HT  is  proposed.  Then  the  proposed
architecture has been extended and modified to compute circular and elliptic HT. While
computing circular and elliptic HT, we focus  only on the estimation of the radius (forvectorization mode. Though, this problem is not present in straight line and circular HT
architectures.
The basic CORDIC unit has been designed using TGL on 1.6 mm sea of gates
semicustom environment which exhibits 62 mW power consumption at 5 V supply and
44  MHz  operation  frequency.  With  device  scaling,  this  CORDIC  unit  is  expected  to
operate  at  lower  supply  voltage,  which  implies  that  a  quadratic  advantage  in  power
consumption can be achieved.
Considering all these points, it can be conjectured that the proposed architectures
can be considered as  good candidates for low power high performance  real time HT
computation.Table 1
m = 1 m = 0 m = -1
Rotation
z ® 0
x
/ = x cos z + y sin z
y
/ = -x sin z + y cos z
x
/ = x
y
/ = y - zx
x
/ = x cos hz - y sin hz
y
/ = -x sin hz + y cos hz
Vectoring
y ® 0
x
/ = Ö(x
2+y
2)
z
/ = z - tan
-1 (y/x)
x
/ = x
z
/ = z-(y/x)
X
/ = Ö(x
2 - y
2)
Z
/ = z - tanh
-1 (y/x)
Table 2
Logic family Average output
capacitance (fF)
Average
Delay
(nsec.)
Power
dissipation
(mW)
Power
Delay
Product (pJ)
Energy Delay
product
(10
-21 Jsec.)
Static
CMOS 304.106 1.256 1.5329 1.9253 2.4181
Domino
CMOS 192.969 1.35 2.1867 2.9522 3.9854
NMOS pass
logic 42.1623 0.153 0.052 0.007956 0.001217
TGL 138.609 0.256 0.1732 0.04433 0.01134Table 3
Architecture Nature of PE Scan range of q Time required
to generate
histogram
Extra
requirements
Rhodes et al.
(8) Multipliers,
architecture is
WSI
[0, p] 20 msec.
(image size 256
´ 256, 1/10 of
the image are
edge pixels)
Precomputed
values of sinq,
cosq and RAM
Hanahara et
al.
(4)
Array
multipliers and
off chip
components
[0, p] 256 msec. For
1024 feature
points.
Precomputed
values of sinq,
cosq and RAM
Timmerman et
al.
(3)
Radix-2
conventional
CORDIC unit.
Effective scan
range is [0, p/4]
O[2MNn (TS +
Ta)]
Scaling factor
compensation.
Bruguera et
al.
(10)
Mixed radix
pipelined
CORDIC
[0, p/2] O[52Ta +
4(n-1) + Tconv]
Scaling factor
compensation,
extra
conversion unit
and RAM.
Proposed Scaling free
CORDIC. The
architecture is
asynchronous.
[0, p/4 ± d] O[2{N+(n-1)}
Ta]
149.179 msec
for 256 ´256
image and
23.569 msec for
1024 points.
Scaling of r by
the constant
factor Ö2 in B
and D
subspaces.Table Captions
Table 1. The CORDIC arithmetic function.
Table 2. Comparison of different logic families using the XOR structure.
Table 3. Comparison of different architectures for straight line Hough transform.
Figure Captions
Figure 1. The elementary CORDIC arithmetic unit.
Figure 2. Normal description of the straight line.
Figure 3. The basic PE for straight line Hough transform.
Figure 4. The array architecture for straight line Hough transform.
Figure 5. The parametric representation of a circle.
Figure 6 (a). The basic PE for circular Hough transform.
Figure 6 (b). The array architecture for circular Hough transform.
Figure 7 (a). The basic PE for elliptic Hough transform.
Figure 7 (b). The array architecture for elliptic Hough transform.+
-
2i+1
bit
shifter
i bit
shifter
+
+
+
-
+
-
2i+1
bit
shifter
i bit
shifter
x
y
x
/
y
/
ai
x
y
x
/ = x cosai + y sinai
y
/ = -x sinai + y cosai
Figure 1
y
x
r
q
Figure 2q0
 AA                      AB
AC                      AD
+ +
+ -
xp-1
yp-1 yp
xp
HS
xp-1
yp-1
xp
yp
Figure 3
HS HS HS HS
x
y
p 1 2 3 N
Figure 4r
q
x
y
Figure 5q0
+ + + -
xp-1
yp-1 yp
xp
HC
xp-1
yp-1
xp
yp
Figure 6 (a)
HC HC HC HC
x
y
p 1 2 3 N
Figure 6 (b)
a                     b
c                     d
e                     f
g                    h
´ -1
´ -1
´ -1
´ -1