Image processing applications using a novel parallel computing machine based on reconfigurable logic by Allinson, N. M. et al.
Image Processing Applications Using a Novel Parallel 
Computing Machine Based on Reconfigurable Logic 
N MAUinson, N J Howard, A R  Kokzand A MTymll 
-td- ‘ U n i d t y d Y a k , Y a k Y O l  SDD 
alig is a 32 phydcal nudefine-grained computer nnphyingfleld-prog~ammable gate arrays. Its 
a p p k d o n  lo tk Mgh speed inrprcmCntarion 9‘ wiow image pre-procwsing oprrotions (in pMicJor 
bintvy wwrphology) is descdbed together with ryPrcal spetd-up results. 
Tho Zelig Architecture 
nlatively large axrays of homogeneous simple processes that axe upaated in disaete ~~~ steps [l. SI. 
zelig was designed initially designed for tbe fast proasskg of arangeofcellularautanara (CA) algorithms. CAare 
IadiViduJcell &son bated upon tbecmcnt stoteofacell and those ofhlocol nagh- From tbe 
a p p l i d o n  d such simple rub%, complex local and global behaviwr can result ova many updating gmaatiaas. 
1994 The Institution of Electrical Engineers. 
Printed and published by the IEE, SPWY place, London WC2R OBL. UK. 
Such jmblms c o d m  well with the list 0rprcfem.d FPGA problems outlined above. A cell update rule which 
would require some 40 instructions on a conventional microprocessor can be processed in recanfigwable hardware in 
unda 100 118 A hundrtd copies (say) of this rule may be fined into the nconfigurable logic and executed in parallel. 
A 6ner-painod task would require less logic; morecopies can then be fitted into the hardware and ma higher degree 
ofparpllelism wouldresult In this way, billionsof instructionscan be executedpcrsccond 
'llm pmceddng cue of Zelig (Fip. 1) condsts of32 xilinx XC3CJ9O-u1pc%4 FPGAS. each with an associated 64K 
by8 rutiC RAM. This concsnbeconsidued masingle logic surface with a64K by 256RAMforaoring node 
Wad i t r h a o l d b y l a r d o d r u m ~ g a t e m a c h i n e .  EachnodeupdaUulecopyisaphysicaloodt,and 
hgcr UltOmaEa src Qtattd by time-multipkxing many virtual nodes through one physical nodc. The bgic surface 
pWi& Q(IC physical axis for the canputation space. Higher dimarsionS arc implemented by timemultiplexing to 
Qutc v h t d  UW. Many Combinations Of canmtion spa~e a possibk f a  ~ e m p k -  
65536liuof2s6-bitvt!ctaa 
512 x 512 plane of &bit nodes (famed from 16 swaths of 32 by 512 nodes) 
128 x 65336 pbne of 2-bit nodes 
256x256xU6cubeof 1-bitoodes. 
euermmc I I I  
Each FTGA can see the node dab of the two FFGAs directly above and the two dtectly below. By suitable 
mpcming of the laading of node data into the FPGAs. any virtual node can see any other virtual node that sham its 
same physical node. 'he dimeruionality and size of the computation space, and the maximum neighbornhood size 
of each virtual node, is detamined by the sequencing ofthe addresses far reading from and writing to the FFGAs. 
The architectwe is hadose capable af accommodating the exbcmes of small- dc large-grain multidimensional 
neigh-. 
All data movements IVC generated and controlled by ~ J I  nmos TSOO Transputer (I by Data Address Genuatm (using 
funtha reconfigurable logic). Adjacent virtual nodes will typically possess a large number of common neighbolas 
wbmc ValueSan retaincdwith the lo@ surfaccbenveen succtssivenode UplateS. ?his isachieved by shiftregistcrs 
intuual to tbe F F G h  Ibeprocess logic may. in fact, need tote pipelined tonduce tbepncessing time fur 
compkx tasks. Ibe system's video board is capable of displaying a non-interlaced 512 x 512 image with thnt &bit 
psepQcolarr~lntioa. It besed luwnd the Inmos IMso176 and is directly controllcdby the masterp.ocess<r. 
Binary Image Morphology 
h(~~wcsmpltdmrpplicatioa &main, tkimplanenmtion of binary morphologidoperatioasarepresented in 
"e detail 01. 'Ibe bost-kvcl maphobgy opetatioos available include all the commoa bw-level opgations 
(MAX, MIN, TRANSLATE, DILATE, ERODE. OPEN, CLOSE, COPY, COhiP. HIT. l", THICK): t o g e k  
with a amba ofmore g m  canmarrds (e.g.. FILL- fills a specified portion of an image: PEEK- reads the 
image data ata specified location; vn>Eo-displaysa pi t ied image). Tbe a l i g  nodc mancry cm start 64 512 x 
512binary image? (the 6uaimage plane is used by some Operations as an mtamediate working reegista). Epfh 
FpGAhes eight oclrpies of tbeprocessing bgic, 80 that the 32 FPGAs update 256pixcls in paraUcL 
Ibe marpboaolgical op" quin  the use of a structuring element, which is defined as a nine-bit integer. The 
weightingsfareacb eiemeat in b e 3  x 3 se€=:- 
[' :6 :.] 
64 128 256 
0 1 0  
[: ; : + 186 
2 / 3  
_..... 
E---- -.... 
D -  3 . . .  ,.. 
, m 3  .. .. . 
. .. . ,..- 
- ... . ,. ~ D '  9 
E3 . >  
D 
D 
0 
,. 1 
h Y 
h 
h 
V '  
.,... .. -.. . I.._ 
,,.. . .. . ... ,... 
Fig.2 Top-LCVCl Schematic for the BITMORFB FPGA Co$gwation 
214 
lbere are three FPGA designs - BITMORFA (for FPGA 0). BITMORFB (for FF‘GAs 1 to 30) and BlTMORFC (for 
FPGA 31). The two end configurations differ due to the way data is loaded into the neighbouring blocks. The 
hierarchicat design f a  BKMORFx is:- 
BlTMORFx 
mEc 
GmBREG 
ENDNEIGHBOURS 
NEIGHBOURS 
CORE 
MINKOWSKI 
MINKOB 
Othor IP Tasks and Porformanco 
n e i g h ~ ~ ( i a c i u d i n g r a n l - f i l t e t i n g ) a n d b c a l ~ m o d i f i c a t i o n .  ~meaningNindicaLorof 
pafolmanceistocomparethepafurmanccofwig with the- . g praGsoftwarc impkmentatioo NMing 
orher IF’ opaations that have been implemented in Zdig include a set of gmyscak morphological operators, 
cia the T800-25 bost processor. All software implementations have b e ~ ~  Optimised and make extensive use of W- 
up tabband minimise extemal memory 8cassts. Zelig opaates at a modest memory cyck of 100 11s and uses 
The highly parallel nature of the applications means that spetdups of 32 would be expected. Table 1 gives sane 
rlow-grade FpGAs. Io tams Of - 1 ~  Md cost, zelig is approximately @valent to 32 T800 Transputas. 
exrrmplesof Jpeedups actually obrained 
Table I Example Speedup Results 
Application 
Whit totalistic ~lltomrta- 512 x 512 
Binary crode/dilate - 512 x S12 
Gaay-scale aodJdilaoe - 512 x 512 
Median tiltenng-512 x512-3 x 3 window 
Local histogram eqllaliwb - 512 x 512- 5 x 5 W i a Q W  
Monte WO yield modelling 
DAG Controlled I 
I 
1om 
7,750 
630 
1.870 
2.890 
1940 
I 
cf;.,.. I I  
F i g 3  T o p - ~ l  Schewnuic for the CORE FPGA Coy?guration 
.. . - . . _ .  _ _  - . .. . - .. 
Tbe automata figme is given to show the peaformancc level achevable for the a p p m o n  tor w w n  a u g  was 
spcdally designed. The Monte Carlo yield modelling is an example of a more complex application where the 
crpaatians arc npeated many thousands of times until convergence is nachcd (41. 
2 1  6 
Conclusions 
~ ~ ~ r t r e ~ ~ ~ n o r m ~ y a s s o c i a t a d w i t h s o f t w a r e i m p 1 e m e n t a t i o n s : -  
Ihe "al rigid architectma of custom hardware m m  that applications can either nm very fast or not at all. 
NI.nkrofnOdes =-'nodedsEssiu: 
Numb of physical nodes a size of nconfigurable logic 
Ac knowlodgement 
'Ibis work was funded by SERCDRA p j w t  grant GR/F 92152 '7%~ Development of a Fast probabilistic 
Antanam Machine". 
Refemncer 
(1 J N M Allinsao and M L Sales (1992), CART - A cellular automa research tool, Micmprocessars and 
I21 P M Athanas and H F Silvcrman (1993). Processor reconfiguration through instruction-set metamorphosis, 
Microsystcms, 16,403415 
' 
IEEE compute€, 245.11-18 
01 R M Haralick and L G Shaprro (1992). Computer and Robot Vision, Vol. 1, Addison-Wesley, Mass., Cmp. 5 
[41 N J Howard A M Tyrrell and N M Auinsoo (1594). The yield enhancement @field-progranaable gate arrays, 
IKEE Trrm~. OII WI Systans, 2. 115-123 
[SI T Toffdi sard M Margolus (1987). CelfrJor AurOmola Machines, MIT Press Mass. 
[6l Xilinx Inc. (1991). The Programmzble Gate Array Data Book, xilinx Inc.. San Jose, Cat. 
