Review of CMOS implementations of the CNN universal machine-type visual microprocessors by Roska, Tamás & Rodríguez Vázquez, Ángel Benito
ISCAS 2000 - IEEE International Symposium on Circuits and Systems, May 28-31, 2000, Geneva, Switzerland 
Review of CMOS Implementations of the CNN Universal Machine-Qpe Visual 
Microprocessors 
** 
7: Roska * and  A. Rodriguez-Vdzquez  
* Computer & Automation Institute - Hungarian Academy of Sciences 
Kende-u. 13-17, Budapest, H-1111, HUNGARY 
Phone: +36 1 209 5263, Fax: +36 1 209 5264, E-mail: roska@lutra.sztaki.hu 
Edificio CICA-CNM, CiTarfia s/n, 41012- Sevilla, SPAIN 
Phone: +34 95 5056679, Fax: +34 95 4231832, E-mail: angel@imse.cnm.es 
Instituto de Microelectrdnica de Sevilla - CNM-CSIC ** 
 ABSTRACT^ requires much more sophisticated digital processors. 
While in most application areas digital processors can 
solve problems initially, in some fields their capabilities are 
very limited. A typical example is vision. Simple animals 
outperform super-computers in the realization of basic vision 
tasks. In order to overcome the limitations of these 
conventional systems, a fundamentally different array 
architecture is needed. This architecture is based on the new 
paradigm of analogic cellular (CNN) computing whose most 
advanced implementation is the so-called CNN universal 
machine (CNN-UM). Its main components are: a) parallel 
architecture consisting of an array of locally-connected 
analog processors; b) a means of storing, locally, 
pixel-by-pixel, the intermediate computation results, and 3) 
stored on-chip programmability. When implemented as a 
mixed-signal VLSI chip, the CNN-UM is capable of image 
processing at rates of trillions of operations per second with 
very,small size and low power consumption. On the other 
hand, when integrating the adaptive multi-sensor array in the 
CNN-UM, the resulting sensor+computer array offers 
unprecedented capabilities. This paper reviews the latest 
results on CNN-UM chips and systems, and outlines the 
envisaged roadmap for these computers. 
1. Introduction 
Conventional vision machines use a CCD camera for 
parallel acquisition of the input image, and serial 
transmission of a digitized version of the input data to a 
separate computer. This results in huge data rates which 
conventional computers are not capable of analyzing in 
real-time. For instance, a 3-color@ 512 x 512 camera 
delivers about F x  I O 6  byteslsecond, where F is the frame 
rate. Conventional computers and DSPs are able to manage 
such a huge rate for auto-focus, image stabilization, control 
of the luminance/chrominance, etc. However, executing the 
spatial-temporal operations of image processing in real-time 
1.  This work has been supported by the EU 
(IST-1999-19007) and the Spanish CICYT 
(TIC99-0826). 
Consequently, conventional vision machines with real-time 
capabilities are bulky, expensive and extremely 
power-hungry. This is in contrast to living beings, where 
even very tiny and power-efficient brains can analyze 
complex time-varying scenes in real-time. A prototype of 
this way of processing is manifested at the very front-end of 
the human vision system - the retina [9]. 
This contrast between the performance of artificial and 
“natural” vision systems is, among other things, due to the 
inherent parallelism of the processing realized by the latter. 
Such parallelism is observed already in the retina [8]. It 
contains photoreceptor cells of two different types - called 
cones (about 6 million in the whole retina) and rods (about 
120 million) - which perform a logarithmic three-color 
imaging for around ten decades of light intensity range. It 
also contains processing cells - called horizontal, bipolar, 
amacrine and ganglion cells - to perform non-linear 
spatial-temporal processing operations on the incoming flow 
of images through a sequence of layers. Among many other 
tasks, such processing serves to extract important features 
from the raw sensory data and, thus, to reduce the amount of 
information transmitted for subsequent processing [3][9]. 
Inspired by the efficiency of natural vision systems, 
universities and companies have focused their efforts on the 
development of new generations of devices capable of 
overcoming the drawbacks of traditional ones through the 
incorporation of distributed parallel processing, and by 
making this processing act concurrently with the acquisition 
of the signal. One possible strategy to achieve that is through 
flip-chip bonding of separate sensing and processing 
devices; another possibility is to incorporate the sensory and 
the processing circuitry on the same semiconductor 
substrate. “Silicon retinas”, “smart-pixel chips” and 
“focal-plane array-processors” are members of this latter 
class of vision chips [5][4][6]. Their development is 
expected to have a significant impact in quite diverse 
scenarios. However, industrial applications demand chips 
capable of flexible operation, with programmable features 
and standard interfacing to conventional equipment. A 
powerful methodological framework for a systematic 
development of these types of chips is using the paradigm of 
0-7803-5482-6/99/$10.00 02000 IEEE 
11- 120 
analogic cellular (CNN) computing [l]  and the Cellular 
Universal Machine (CNN-UM) processing architecture [7]. 
This paper reviews recent advances on system-level and 
chip-level results related to CNN-UMs, and outlines the 
envisaged roadmap for these computers. 
2. New Directions in System Implementations 
Right after the first digital microprocessor was made, 
Intel Corporation started to sell its associated development 
system, a tool to educate engineers how to use and program 
this new device. Likewise, a visual microprocessor 
development system has been devised to help software 
engineers and product designers to learn this new device and 
start developing new products [lo]. 
The new version of our visual microprocessor 
development system is called ALADDIN: Analogic 
Application Development system for Dynamic Image 
processing and Navigation. 
The main parts of ALADDIN system are shown at Fig. 
1. We consider a PC based development system with 
cameras, video sources and multimedia accessories. Using 
this system, programs for the CNN-UM can be developed 
and tested in a dynamical visual environment. 
. The next issue is offering self contained Analogic 
Cellular Engine Boards (ACE Boards) with stored 
programmability. This means that walkman size units will be 
available integrated with or interfaced to sensors. Once a 
program is developed and tested at the ALADDIN site, we 
can download the program and use it immediately. 
When a new version of an analogic cellular visual 
microprocessor is developed, only a small part of the 
ALADDIN system will be changed, the platform hosting the 
chip and a small part of the interface software. All the rest is 
the same, hence, programming efforts and reuse of hardware 
and software components is maintained. 
A key element of the know-how is contained in the 
Analogic CNN Software Library. Templates (instructions), 
subroutines, and programs for well defined tasks are stored 
and distributed. Like during 60’s when the first algorithms 
for digital microprocessors were developed, we are 
witnessing a similar process these years. Soon we will edit a 
new version of our Library, we will call, “Recipes in Alpha”, 
containing hundreds of software modules tested on 
simulators and visual microprocessors as well. 
3. Chip Implementations 
During the last few years several CNN chips have been 
designed. Particularly, those having a size larger than 
l o x  10 and whose operation have been actually 
demonstrated through experimental evidence are found in 
[11]-[16]. The attached table presents a summary of some 
features associated to these chips. Speed is expressed in 
terms of analog operations per second. The equivalent digital 
multiply/add operations per second can be calculated in such 
a way that 10 time step is supposed in a time constant. T h s  is 
a default needed when the A template is full and analog input 
or output values are present. This means 10 X 20=200 
4. References 
L.O. Chua and T. Roska, “The CNN Paradigm”, IEEE Trans. 
Circuits & Systems-I, Vol. 40, pp. 147-156, March 1993. 
R. Dominguez-Castro, et al., “A 0.8pm CMOS 2-D Program- 
mable Mixed-Signal Focal-Plane Array Processor with 
On-Chip Binary Imaging and Instructions Storage”. IEEE J .  
Solid-state Circuits, Vo. 32, pp. 1013-1026, July 1997. 
M.M. Gupta, G.K. Knopf (Eds.), Neuro-Vision Systems, Prin- 
ciples and Applications, IEEE Press, 1994. ISBN: 
C. Koch, H. Li (Eds.), Vision Chips, Implementing Vision Al- 
gorithms with Analog VLSI Circuits, IEEE Press, 1995. ISBN: 
A. Rodriguez-VAzquez, et al.: “Current-Mode Techniques for 
the Implementation of Continuous-Time and Discrete-Time 
Cellular Neural Networks”, IEEE Trans. Circuits and Systems 
11: Analog and Digital Signal Processing, Vol. 40, pp. 132-146, 
March 1993. 
B.J. Sheu, J. Choi, Neural Information Processing and VLSI, 
Kluwer Academic Publishers, 1995. ISBN: 0-7923-9547-6 
T. Roska and L.O. Chua, “The CNN Universal Machine: An 
Analogic Array Computer”, IEEE Trans. Circuits & Systems-I, 
Vol. 40, pp. 163-173, March 1993. 
F. Werblin, T. Roska and L.O. Chua, “The Analogic Cellular 
Neural Network as a Bionic Eye”, Int J .  of Circuit Theory and 
Applications, Vol. 23, pp. 541-549, 1995. 
0-7803-1042-X 
0-8 186-6492-4 
equivalent multiply/add operations per time constant, so that 
calculating with 4096 cell processors and about 28011s time 
constant [16], the equivalent speed is about 3 TeraOPS. 
The data in this table reveals a trade-off between speed 
and accuracy - common to any analog integrated circuit. Out 
from these chips, those reported in [ 141 [ 161 have embedded 
distributed optical sensors; i.e. they are true focal plane array 
processors. On the other hand, only the latter is capable to 
operate with gray scale inputs and producing gray outputs, 
while at the same time having all functional features of 
CNN-UMs. 
Relevant data pertaining to the chip in [16] are displayed 
in Fig. 2. Specially relevant are the low power consumption 
per unit cell and the large operation speed. This chip has also 
served as a vehicle to demonstrate the concept of true VLSI 
analog chips with robust, controlled and predictable 
response. From here the challenges are basically to increase 
the size and to improve the VO [IS]. Thus the a major next 
step will be the design of QCIF-resolution chip with 
embedded optical sensors in a 0.35pm or 0.1Spm technology 
- a target that is scheduled to be reached during 2,001. 
The integration of multiple sensors per pixel within the 
array computer probably defines the dominant medium- and 
long-term scenario for CNN-UM based systems [17]. The 
multiple sensors should be adaptive and capture different 
modalities, spectra, sensitivity and dynamics. Their control 
parameters should be set by underlying programmed 
calculations made by a CNN-UM. Hence, the multi-sensor 
image acquisition depends, pixel by pixel, on the actual 
changing scene to be analyzed. 
11-121 
[9] F. Werblin, A. Jacobs and J. Teeters, “The Computational 
Eye”. ZEEE Spectrum, Vol. 33, pp. 30-37, May 1996. 
[ 101 P. Szolgay et al., “The Computational Infrastructure for Cellu- 
lar Visual Microprocessors”. Proceedings of the ZEEE 7th Int. 
Con$ on Microlectronics for  Neural, Fuzzy, and Bio-Inspired 
Systems, pp 54-60, Granada, Spain, April 1999. 
[ 111 S. Espejo, R. Carmona, R. Dominguez-Castro and A. Rodrigu- 
ez-Vkzquez, “A CNN Universal Chip in CMOS Technology”, 
International Journal of Circuit Theory and Applications, vol. 
24, pp. 93-109, Jan-Feb. 1996. 
[12] A. Paasio, V. Porra, “A CNN Universal Machine with 295 
cells/mm*”. Proc. of the I997 Int. Symposium on Non Lineal 
Theory and its Applications (NOLTA’97), Honolulu, USA, 
1997, pp. 221-224. 
[13] P. Kinget and M. Steyaert, Analog VLIiIIntegration of Massive 
Parallel Processing Systems. Kluver Academic Publishers, IS- 
[ 141 R. Dominguez-Castro et al., “A 0.8pm CMOS 2-D Program- 
mable Mixed-Signal Focal-Plane Array Processor with 
On-Chip Binary Imaging and Instructions Storage”. IEEE J .  
Solid-state Circuits, Vol. 32, pp. 1013-1026, No. 7, July 1997. 
BN: 0-7923-9823-8, 1997 
[I51 J. Cruz and L. Chua, “A 16x16 Cellular Neural Network 1Jni- 
versa1 Chip”. Analog Integrated Circuits and Signal Process- 
ing, Vol. 15, pp. 226-238, March 1998. 
[I61 G. Lifikn, P. Foldesy, S. Espejo, R. Domhguez-Ca:stro 
and A. Rodriguez-Vkquez, “A 0 S p m  CMOS 106 Transis- 
tors Analog Programmable Array Processor for Real-Time Im- 
age Processing”. Proc. of the 1999 European Solid-State Cir- 
cuits Conference, pp. 358-361, September 1999. 
[ 171 T. Roska, “Computer-Sensors: Spatio-Temporal Computers 
for Analog Array Signals, Dynamically Integrated with Sen- 
sors”. Journal of VLSI Signal Processing Systems for  Signal, 
Image and Video Technology, Vol. 23, pp. 221-238, Kluwer 
Academics Novembermecember 1999. 
[ 181 A. Rodriguez-Vkquez, E. Roca, M. Delgado-Restituto, S .  Es- 
pejo and R. Dominguez-Castro, “MOST-Based Design and 
Scaling of Synaptic Interconnections in VLSI Analog Array 
Processing Chips”. Journal of VLSI Signal Processing Systems 
for Signal, Image and Video Technology, Vol. 23, pp. 239-;!66, 
Kluwer Academics Novembermecember 1999. 
Video 
Template Library 
0 
Electrical Electrical Control 
Signals Signals (CPI Code) 
CNN Platform B u s  - Output Data Template, and Data - - - - - - 
A * 
Level Shifters, Sample-and-Holds, Multiplexers, etc 
Chip Optical Input 
Figure 1. The CNN Chip Prototyping System (CCPS) 
11- 122 
:PSlmW 
__- 
82M 
Functions 
S tored-Programmable 
6-7lbit Analog Resolution 
Binary Outputs 
No Diagonal Interactions 
External-Programming 
6-7lbit Analog Resolution 
Binary Outputs 
0‘12G 
S tored-Programmable 
6-7lbit Analog Resolution 
Embedded Optical Sensors 
Binary Outputs 
25G 
S tored-Programmable 
2lbit Analog Resolution 
Binary Inputs and Outputs 
1.24G 
S tored-Programmable 
4lbit Analog Resolution 
Analog Inputs and Outputs 
9 , 1 4 5 ~  - 
Figure 2. Microphotograph and relevant numbers for the CNNUC3 
_ -  
Size 
#Proc. - 
32 x 3: 
Speed 
XPSa - 
0.30T 
Density 
ells/mmi 
31 
PS/mm2 
9.3G 
0.52G 
:PSlcel 
0.30G 
1
LO x 2( 
- 
10 x 2: 
- 
$8 x 41 
- 
14 x 1, 
17 12.5G 
- 
0.131 
3 1M 
0.30G 
Analog 0.7pm 
CMOS Mixed- 
0.8pm Signal 
28 8.25G 
295 
CMOS Basicall! 
OSpm Digital 
7.651 3.760 1.11T 
16 31G 
7.936 
0.371 
0.401 
- 
1.89G 
98M 
Analog 
- 
0.33G 
~ 
S tored-Programmable 
7-81bit Analog Resolution 
Analog Inputs and Outputs 
Embedded Optical Sensors 
Embedded Ana. and Dig. Data 
RAM 
Mixed 
0.5pm 
54 x 6 81 
a. XPS: Analog Operations Per Second, is an equivalent measurement indicating the number of analog arithmetic operations like addition, 
substraction, multiplication and division. IPS: Instructions Per Second, is a typical measurement of a digital processor speed. Common 
instructions are bitwise addition, complement, shifting, etc. 
11-123 
