CMOS design of cellular APAPs and FPAPAPs: an overview by Rodríguez Vázquez, Ángel Benito
CMOS DESIGN OF CELLULAR APAPs and FPAPAPs: AN OVERVIEW 
A. RODRIGUEZ-VAZQUEZ~ 
Imtihrio de Microelectrdnica de Sevilla-CNM-CSIC. Avda. Reina Mercedes s/n 
41012 Sevilla (SPAIN). Tel.: 134 955056666, Fax: +34 955056686. 
E-mail: rcarmona@ime.cnm.es 
CNN-based analogic visual microprocessors have similarities with the so-called Single 
Instruction Multiple Data systems ‘, although they work directly on analog signal repre- 
sentations obtained through embedded optical sensors and hence do need neither a front- 
end sensory plane nor analog-to-digital converters. The architecture of these visual micro- 
processors is illustrated in Fig. 1 through two prototype chips, namely: ACE4K and 
ACE16K ’. In both cases, as in other related chips *, the architecture includes a core 
array of interconnected elementary processing units, surrounded by a global circuitry. 
This latter circuitry is intended for: 
Control and timing. 
* Addressing and buffering of the core cells. 
Inputloutput. 
Storage of user-selectable instructions (programs) to control the sequence of opera- 
tions of the processing core. 
* Storage of user-selectable analogic programming parameter configurations (tem- 
plates). 
On the other hand, the core of interconnected processing units embeds different func- 
tions on a common silicon substrate (see Fig. 2 for illustration purposes), namely: - 2-D sensing. 
2-D analoddigital array processing concurrent with the signal sensing. - 2-D spatio-temporal processing determined by local, receptive-field-like program- 
mable interconnections. - 2-D memory banks for concurrent on-line.uploading and downloading of short-term 
analog and digital data. 
Several analogic visual microprocessor chips in different CMOS technologies have 
been reported during the last few years. Table 1 presents a summary of some of the most 
relevant data of those implementations with at least 20 x 20 pixels. Some columns cor- 
respond to chips intended for black and white input images, while others are for chips 
which accept gray scale input images. As with any other analog processing circuit, figures 
of merit about performance must contemplate accuracy and area occupation in addition to 
speed and power consumption. Any comparison must refer to the number of operations 
per second and to the accuracy. The data in the table highlights the following: - There is a trade-off between area occupation (cell density) and accuracy, on the one 
hand, and speed of operation and power consumption, on the other. This trade-off is 
typical of analog integrated circuits ’. 
a. This overview is mostly based on the work of Sewando Espejo, Rafael Dominguez-Casm, Ricardo Carmona 
and Gustavo Lifih.  and has been supported by ESPRIT V Project IST-1999-19007 and by ONRNICOP Grant 
N00014-00-10429 (POAC), and the Spanish ClCYT Project TIC-1999-0826. 
16 Chan. Analog Data Bus 
Add. Bus 
Control. 1 
re 1. 
II 
1 
128 DA-AD Bank 
ff. Bus 
Mem. Control. 
32b Digital Data Bus 
Architechrres ofhalogic  Visual Microprocessor Chips: (a) ACFAK 2 ,I.%) ACE16K 5 , 
Figure 2. Illustrating the embedding of different functional features at the c o x  processing array of 
visual microprocessors. (a) Microphotograph of the ACE4K chip (left fipue), and conceptual represen- 
tation of the distributed functions embedded in the core array (fight figure); @) Layout of a processing 
wi t  of the ACE16K showing the areas occupied by the different functions realized concurrently by the 
core m y .  
* The evolution towards scaled-down technologies reports advantages in terms of 
speed and cell density. Actually, the ACE16K chip has 128 X 128 resolution and 
is capable of realizing sequences of 64 instructions; using up to 32 different tem- 
plates (each template consisting of 24 8-bit-coded analog programming values) dur- 
ing a sequence; loading and downloading full-size gray-scale images to and from the 
cache memory, and having always 8 full-size images available for usage during the 
flow; with an intemal processing time of 160ns, and providing digitally-coded output 
images (obtained with a battery of intemal A/D converters) with a downloading 
time of0.128ms. 
510 
Table I Summary and comparison of chip implementations 
Tech. p 0.8 0.7 
B 
PIX. 
Formar 
I1 I 
Weight. Continuous 
8-b 
Prog.e [+I1 c 
Ab 
I-state 
I-Input 
4 LLMs 
Capacitor 
Memory per 
5 , 5 ,  Ih  per cell 
Speed 
Photo 
Sensors. 
Program. 8 Templ. i t  Memory 
15.8 GOPS 0.53 GOPS 
I 
0.98 
XPSlarea 
GOPSlmm2 
II I 
375 
uWIcell -1 w 
Power 
m, 
22 I06 
O P S I m 2  
XPSlPOW 
(op,J) 1.58 1010 3.5 109 
22 Lines 20 Lines 
Binary Bus Analog Bus Electr. U0 / /  1 
II I 
a.A=Analog, B=Binary 0, D=DiI 
b.Only B/W results are available. 
c.7.7b Equivalent Accuracy. 
d.8b Eauivalent Accuracv. 
8 
0.5 
48 x 48 
B 
2 Id2 out 
Registers 
9,9, 1 
No 
1 Templ. 
5OnS 
295 
3 0 h W  
(max.) 
0.5 TOPS 
64 
GOPSlmm2 
1.6 10” 
48-b 
Binary Bus 
I1 
2 1 5  % 128x128 
9, 1, 1, 1 
7 Multimode 
Sensor 
250pWIcell 18OpWlcell 
1.2 W Chip 4W Chip 
40 GOPS 0.19 TOPS 
GOPSlmm2 COPSlmm2 
16 Lines 
e.It refers to the number of bits used to defme weight parameters. 
f.A, B, and z multipliers. 
g.A and B multipliers are the same. The chip uses a time-multiplexing scheme. 
h.Cross-shape neighourbood. 
51 1 
The capability to design cells with maximum density, speed and accuracy, and mini- 
mum area and power consumption relies basically on the exploitation of all functional fea- 
tures offered by the MOS transistor. This is very different from digital design, in which 
only the switching capability of the MOS transistor is exploited. The design of the entities 
which interconnect the cells (synapses) defines one of the major issues. In order to do this, 
different possibilities may be chosen a priori. In all cases electrical controllability is pro- 
vided by default. However, the different strategies exhibit quite a different performance in 
the presence of systematic and random error sources, as well as a different incidence of the 
global signal transmission errors. Hence, careful analysis and optimization is needed to 
select the best approach. Such analysis and optimization are needed to achieve the cell 
density and accuracy levels featured by last generation chips. The background for such 
procedures can be found in literature ’ ’. 
References 
1. T. Roska and A. Rodriguez-Vbquez (Editors), Towardr the Visual Microprocessor. 
John Wiley & Sons Ltd., 2000. 
2. G. Liiiln, P. Foldesy, S. Espejo, R. Dominguez-Castro and A. Rodriguez-Vbquez, “A 
0.5pm CMOS IO6 transistor analog programmable array processor for real-time image 
processing”. Proc. of the 1999 European Solid-State Circuits Conference, pp. 358-361, 
September 1999. 
3. R. Carmona, P. Gamdo, R. Domingnez-Castro, S. Espejo and A. Rodriguez-Vkquez, 
“Bio-inspired analog vlsi design realizes programmable complex spatio-temporal 
dynamics on a single chip”. Proc. of the 2002 Conference on Design and Test in Europe, 
to appear. 
4. J.C. Gealow and C.G. Sodini, “A pixel-parallel image processor using logic pitch 
matched to dynamic memory”. IEEE Journal of Solid-state Circuits, Vol. 34, pp. 83 1- 
839, June 1999. 
5. G. Liiiln, R. Dominguez-Castro, S. Espejo and A. Rodriguez-Vbquez, “ACE16K An 
advanced focal-plane analog programmable array processor”. Proc. of the 2001 
European Solid-State Circuits Con$, pp. 216-219, Villach (Austria), September 2001. 
6. P. Kinget and M. Steyaert, Analog VLSI Integration of Massive Parallel Processing 
Systems. KIuver Academic Publishen, 1997. 
7. R. Dominguez-Castro et al., “A 0.8pn CMOS 2-D Programmable mixed-signal focal- 
plane array processor with on-chip binary imaging and instruction storage”. IEEE J. 
Solid-state Circuits, Vol. 32, pp. 1013-1026, 1997. 
8. A. Paasio, A. Dawidziuk, K. Halonen and V. Pow, “Minimum size OSpm CMOS 
programmable 48 x 48 CNN test chip”. Proc. of the I997 European Conference on 
Circuit Theory and Design, pp.154-156, Budapest, September 1997. 
9. A. Rodriguez-Vbquez, E. Roca, M. Delgado-Restituto, S. Espejo and R. Dominguez- 
Castro, “MOST-Based Design and Scaling of Synaptic Interconnections in VLSI 
Analog Array Processing Chips”. J. of VLSISignal Proc. Systems for  Signal, Image and 
Video Technology, Vol. 23, pp. 239-266, Kluwer Academics Novembermecember 
1999. 
