Toward visual microprocessors by Roska, Tamás & Rodríguez Vázquez, Ángel Benito
Toward Visual Microprocessors
TAMÁS ROSKA, FELLOW, IEEE, AND ÁNGEL RODRÍGUEZ-VÁZQUEZ, FELLOW, IEEE
Invited Paper
This paper outlines motivations and models underlying the
design of visual microprocessors based on the cellular neural
network universal machine. We also overview the state of the art
regarding the realization of these microprocessors in the form
of very large-scale integration chips. Examples corresponding to
measurements realized on these chips are enclosed for illustration
purposes.
Keywords—Analogic cellular supercomputing, cellular neural
networks, CNN technology, visual microprocessors.
I. INTRODUCTION
For more than 100 years, the living visual system of mam-
mals has been intensively studied by neuroscientists and bio-
physicists alike. Recently, computer engineers have been ac-
tive creating machine vision systems. Still, although many
ideas have been proposed and implemented in silicon [1]–[3],
including resistive grid “silicon retinas,” programmable cel-
lular neural/nonlinear network (CNN)1 models of the visual
pathway, as well as many “smart optical sensors,” no com-
plete neuromorphic model of the topographic parts of the vi-
sual pathway has been made available. The reason is simple:
the lack of understanding of the detailed operation of many
key components located at the front-end of the visual system,
notably, the retina and the lateral geniculate nucleus (LGN).
Hence, the representation of the visual scene from the input
to the higher layers has been unknown. Of the many exciting
Manuscript received May 31, 2001; revised February 15, 2002. This
work was supported by grants from the Hungarian Academy of Sciences,
the Spanish MCyT (Project TIC1999-0826), the National Research Fund
of Hungary (OTKA), the CEE (Project IST-1999-19 007), and the Office
of Naval Research (Projects N00014-00-C-0295, N68171 97-C- 9038 and
N68171 98-C-9004).
T. Roska is with the Analogic and Neural Computing Laboratory, MTA-
SzTaki (Hungarian Academy of Science) and Pázmány University, Budapest
H-1111, Hungary (e-mail: roska@sztaki.hu).
Á. Rodríguez-Vázquez is with the Department of Analog and
Mixed-Signal Circuit Design, IMSE/CNM, 41012 Sevilla, Spain (e-mail:
angel@cnm.us.es).
Publisher Item Identifier 10.1109/JPROC.2002.801453.
1Cellular neural/nonlinear network (CNN) models were introduced by
Chua and Yang in 1988 [5], and then generalized and used as a model for
bionic eyes by Chua, Roska, and Werblin [6]–[8]. Their principles and ap-
plications for visual processing are covered in [9].
partial results related to the visual pathway, some recent find-
ings (see, for instance, [4]) suggest a few sound principles.
• Sensing and processing are interactive processes, and
the processing is mainly analog, combined with masks
of binary (yes/no) maps.
• The basic structure is composed of several stacks of
layers of neurons connected by local receptive field or-
ganizations with different spatial distributions and time
constants.
• The processing strategy is a kind of “multiscreen the-
ater”; namely, from a given visual scene, several par-
allel maps are generated and then further processed.
This is true even in the mammalian retina [4] where
about a dozen parallel channels are organized.
To implement neuromorphic visual models on silicon, we
have two ways:
• Pick up a specific task and its model and implement it
on silicon. This is the usual way, leading to very useful,
task-specific smart sensors.
• Make mixed-signal2 visual microprocessors. That is,
processors which combine optical sensing with analog
cellular spatial-temporal dynamics and some form of
logic (they are called analogic processors because they
combine analog and logic processing structures), which
have receptive fields like elementary instructions, and
the possibility of storing and executing user-selectable
sequences of instructions (programs).
Clearly, the second approach is more demanding in terms of
architecture, very large-scale integration (VLSI) chip design,
and computational infrastructure, leading to a new type of
hardware/software system design.
This paper focuses on the second approach. Namely, we
will briefly review the analogic cellular computer architec-
ture, some CMOS prototype chips related to that architecture,
and the accompanying computational infrastructure. Some
examples measured from the so-called ACE4K chip [10] and
the CACE1K chip [11] are included for illustration purposes.
The former has a one-layer architecture, while the latter has
a three-layer architecture inspired by the CNN model of the
2Mixed-signal means that analog and digital signal representations are
combined, and hence analog and digital signal processing.
0018-9219/02$17.00 © 2002 IEEE
1244 PROCEEDINGS OF THE IEEE, VOL. 90, NO. 7, JULY 2002
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 15,2020 at 14:17:43 UTC from IEEE Xplore.  Restrictions apply. 
mammalian retina proposed in [12] based on the discoveries
about the functionality of the inner part of this retina as re-
ported in [4].
II. CNN-BASED VISUAL MICROPROCESSORS
Back in the 1960s, the building blocks for logic design
had been the various logic circuits (micromodules) imple-
menting different “smart” logic tasks. These had also been
used to make digital computers. The digital computer has
a key attribute due to J. Von Neumann, namely stored pro-
grammability. It means that the same core architecture, via
algorithms coded in software, can be used for a myriad of
tasks. Or, to put it in another way, the architecture is open to
the human intellect for millions of algorithmic innovations.
This is the functional secret behind the success of the digital
microprocessor, first made in the early 70s. Visual micropro-
cessors aim to mimic this functional secret. However, they
are mixed-signal devices which realize analog-and-logic spa-
tial/temporal processing tasks (wave processing), and hence
require quite different building blocks [3].
The front-end “devices” encountered in natural vision sys-
tems are capable of acquiring and processing images in a con-
current manner. The retina contains photoreceptors and dy-
namically coupled processing cells of different types. Among
many other tasks, the early processing realized at the retina
serves to extract important features from the raw sensory data
and, thus, to reduce the amount of information transmitted
for subsequent processing. In contrast to that, image acqui-
sition and processing are usually separated in conventional
artificial vision systems. One key aspect of visual micropro-
cessors is the integration of sensing and stored programmable
processing (SPP) at the analog signal array level—the inte-
grated SPP principle. Among many other things, this allows
us to tune the sensors dynamically, pixel by pixel, depending
on the content and even on the context of the changing scene.
Some of the key architectural aspects have been discussed in
[13].
Some features which make the visual microprocessors ad-
dressed in this paper different from other topographic smart
sensors [1], [2] include the following.
• They use a core analog processing array (a CNN
[5]–[7]) with tunable interaction weight patterns and
embedded pixel-wise data memories.
• This programmable and reconfigurable array is em-
bedded in a computer architecture resulting in the
so-called CNN univesal machine (CNN-UM).
• The CNN-UM is stored programmable and capable
of implementing analogic spatial–temporal algorithms
through the smart synergy of hardware and software.
All the signal variables are continuous, except for the dis-
creteness in space (pixels or voxels). At the same time, visual
microprocessors retain the extraordinary strength of digital
computers, their unconstrained variability via programming
or software. Obviously, such software and related algorithms
are different from conventional ones.
Below we summarize the main architectural and
algorithmic ideas underlying CNN-based visual micropro-
cessors. It is worth mentioning that although most of their
present-day applications are related to vision, many other
Fig. 1. A typical simple CNN structure.
Fig. 2. The standard output nonlinearity.
topographic problems (tactile and auditory), including topo-
graphic optimization, are among the emerging applications.
A. CNN Dynamics
CNNs can be either single-layer or multilayer. Consider
first a single layer consisting of a two-dimensional (2-D),
regular grid of cells , where and are the row and
column coordinates. The topography of such a structure is
shown in Fig. 1.
Assume each cell hosts a processor with its real-valued
input, state(s), and output signals, , and ,
respectively. In such a 2-D layer, each cell processor is con-
nected to its neighbors (in a 3 3 or 5 5, etc., neighbor-
hood or sphere of influence), denoted by . The sim-
plest first-order cell state dynamics is given by3
(1)
where is called the threshold of the cell
and are called the feedback and feed-
forward synaptic operators or templates; in case of a 3 3
neighborhood of radius 1, they are 3 3 matrices.
The state and the output signals of each cell are typically
related through the following nonlinear output equation:
(2)
depicted in Fig. 2. However, the nonlinearity could be of
several types and it could also be included in a simpler
dynamic equation form. Namely, the standard nonlinearity
3The time is scaled in the relative time unit  which is the time con-
stant of the simple first-order cell dyanmics.
ROSKA AND RODRÍGUEZ-VÁZQUEZ: TOWARDS VISUAL MICROPROCESSORS 1245
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 15,2020 at 14:17:43 UTC from IEEE Xplore.  Restrictions apply. 
Fig. 3. The initial picture and the diffused picture using a
diffusion template defined by gene G .
in (2) and the cell-state dynamics represented by (1), the
so-called Chua–Yang model, could be replaced by the
full-range model which means that , and that
the first term in (1) is replaced by a nonlinear function
whose shape is the inverse of that used for the standard
nonlinearity [14].
Once the cell dynamics is fixed, the interaction patterns
and and the offset value define the functionality of
the CNN layer. Given an input signal array for
, defined as a picture with pixel values ,
the set of values determines the outcome of the
CNN dynamic process. This set is called a cloning template
or a gene. In the space-invariant case, the templates are 3 3
(or 5 5 or 7 7) matrices. This means that a CNN array
can be defined by the cell dynamics and the 19 (or 51 or 99)
numbers of the templates and the offset . The input
image could be either static or dynamic; hence, a CNN layer
plays the role of an image processor.
The peculiar property of controlling the functionality of a
whole array of interconnected cells by means of just a few
interconnection weights (e.g., 19 numbers) is very familiar
to neurobiologists. Indeed, the cloning template is no more
than a receptive field organization in the retinotopic part of
the visual pathway [8]. On the other hand, the CNN para-
digm is well suited for representing many topographic sen-
sory modalities via their receptive field organizations. The
first attempts [15] have been followed by many other useful
results.
In a nontrivial case, the CNN dynamics is a wave acting
for a finite time . For example, for a diffusion template or
gene we have
(3)
Fig. 3 shows the initial state and the output image (at
elapsed time). There exists a very wide catalog of templates
covering a myriad of applications. Also, because these tem-
plates are programmable by definition, learning can be incor-
porated to adapt the templates either globally, for example,
using a genetic algorithm [16], or locally. Thus, not only
associative memories can be constructed, e.g., [17], but the
plasticity of the brain might be directly modeled [13].
Fig. 4. The extended cell of the CNN-UM.
B. The CNN-Universal Machine (CNN-UM) [7]
If we furnish each CNN cell processor with local memo-
ries [local analog memory (LAM) and local logic memory
(LLM)] and a local communication and control unit (LCCU)
to send/receive information to/from the global analogic pro-
gramming unit (GAPU), we get the extended CNN cell of the
CNN-UM architecture. For practical reasons, in each cell we
add a local logic unit (LLU) and a local analog output unit
(LAOU) which take inputs and send outputs from/to their
local memories, LLM and LAM, respectively. Fig. 4 shows
the extended cell schematically.
The GAPU is the conductor of the extended cell array,
communicating with each cell via the LCCUs of each cell.
The GAPU contains three registers and a global analogic
control unit (GACU), the latter of which is the host of the
stored program and controls the whole array computer. The
three registers store the cloning templates [analog program-
ming-instruction register (APR)], the local logic instructions
[logic program-instruction register (LPR)], and the switch
configuration codes [switch configuration register (SCR)],
respectively.
The CNN-UM can be viewed as an array computer de-
fined on flows [18]. Algorithms can be constructed where
the elementary instruction is the solution of a partial differ-
ential equation (PDE). This correspondence was highlighted
already in the seminal paper [5] for the heat equation; also,
in [19], a mechanical system was modeled by a CNN. Later,
systematic methods have been devised to convert PDEs de-
fined in continuous space into CNN dynamics [20]. Recent
advances in complex image processing show that PDE-based
techniques seem to be superior in many respects (e.g., [21]).
The drawback is their high computational complexity when
implemented in digital processors. Here, using a CNN, solu-
tion of a nonlinear PDE is the basic task.
The next example shows a complex analogic spatial/tem-
poral algorithm used for the calculation of the inner bound-
aries of the left ventricle in an echo-cardiogram [22]. Active
waves [23] are used as algorithmic steps. For reference, we
1246 PROCEEDINGS OF THE IEEE, VOL. 90, NO. 7, JULY 2002
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 15,2020 at 14:17:43 UTC from IEEE Xplore.  Restrictions apply. 
Fig. 5. The bold arrows represent different cloning templates. Some of them are performing the
solution of complex nonlinear PDEs as elementary instructions; these are written on the left-hand
side of the figure with their execution times on the right-hand side. In addition, several simpler
instructions and templates are used, for instance, local logic operations.
also show the execution times of the algorithmic steps on the
so-called ACE4k chip [10].
C. Example 1
A flow diagram is depicted in Fig. 5 of the analogic CNN
algorithm with some typical intermediate results. Observe
that it can be interpreted as a combination of three image
flows merging and branching during the processing stage of
a single frame. Here the third flow stands for the information
calculated from the current frame, the second one for the in-
termediate results obtained from the previous frame, while
the first one represents the binary masks generated from the
previous result. The core of the three main processing stages
of the algorithm can also be described by PDEs (left): 1)
image filtering and reconstruction derived from nonlinear
diffusion PDEs; 2) motion estimation derived from optical
flow PDEs; and 3) trigger wave-type active contour-based
boundary tracking derived from reaction-diffusion nonlinear
PDEs. These PDE approximations, executed on the ACE4K
chip, can be completed within a millisecond, allowing the
processing system to reach its peak performance around four
thousand frame/sec (right).
D. Multilayer and Complex Cell CNN-UM
The multilayer CNN structure was already introduced in
[5]. It is used when several 2-D CNN layers are necessary
Fig. 6. Fig. 3 shows the initial state and the output image (at
T = 2 elapsed time). There exists a very wide catalog of templates
covering a myriad of applications. Also, because these templates
are programmable by definition, learning can be incorporated to
adapt the templates. Either globally, for example, using a genetic
algorithm [16], or locally. Thus, not only associative memories can
be constructed, e.g., [17], but the plasticity of the brain might be
directly modeled [13].
to describe the spatial-temporal dynamics. In many cases,
the layers are just cascaded, and the consecutive instruc-
tions of the CNN-UM are adequate to model the same
process. However, in those cases where interlayer feedback
does exist, we need the multilayer CNN structure. Such a
multilayer CNN is useful for modeling the vertebrate retina
[12].
Fig. 6 shows the conceptual architecture of a second-order
dynamics, three-layer cell which has been prototyped in the
ROSKA AND RODRÍGUEZ-VÁZQUEZ: TOWARDS VISUAL MICROPROCESSORS 1247
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 15,2020 at 14:17:43 UTC from IEEE Xplore.  Restrictions apply. 
Fig. 7. Using the CACE1K chip, programming the layer time constants and the A-templates on the
two dynamic layers, a double wave propagation can be programmed. The resulting sequence of
snaphots shows the different speed and the different types of waves on the two layers.
chip called CACE1K [11]. The dynamic operation is given
according to the following expressions:
(4)
where represents the built-in difference arithmetic.
The operation of this prototype is hence controlled by the
23 parameters involved in (4), given as
(5)
plus the relative values of the time constants of Layers 1 and
2, totaling 25 different parameters. Many types of nonlinear
waves (trigger-, traveling-, auto-, and spiral-waves) can be
obtained by properly controlling these parameters [23].
E. Example 2
This example illustrates the generation of double-wave
propagation using the CACE1K chip [11]. The template ele-
ment values for this operation are
(6)
and the ratio between the time constants of the two layers is
. Using the same chip, very recently we have
been able to implement some of the key inner retinal effects,
impossible to realize on first-order layers. More detailed re-
sults are reported elsewhere [24].
Our quest to make a programmable prototype spatial-tem-
poral computer which could also serve as a visual micropro-
cessor could be justified in two ways. On the one hand, we
have proven earlier that the CNN-UM is universal. In a sense,
it is equivalent to the Turing machine. The proof was real-
ized by implementing the game of life. On the other hand,
in each cell, with not more than four layers, we can imple-
ment any nonlinear multi-input single-output operator with
fading memory. This is only one side of the story. On the
other side, which is similar to the digital computers or Turing
machines in which the -recursive functions are the formal
descriptions of the algorithms with proven capabilities, we
have also determined the equivalent formal notion of algo-
rithms as the -recursive functions with similar properties
[18]. Hence, we have all the theoretical background to es-
tablish our new type of computer for topographic operations,
in particular for vision. Moreover, it has turned out that the
neuromorphic constructs for most of the topographic senses
with accompanying processing are quite similar to those of
CNN models [9].
III. ANALOGIC VISUAL MICROPROCESSOR IN SILICON
CNN-based analogic visual microprocessors have simi-
larities with the so-called single instruction multiple data
(SIMD) systems [25], although they work directly on analog
signal representations obtained through embedded optical
sensors and hence do need neither a front-end sensory plane
nor analog-to-digital converters. The architecture of these
visual microprocessors is illustrated in Fig. 8 through two
prototype chips, namely, ACE4K [10] and ACE16K [26].
In both cases, as in other related chips [11], [27]–[29], the
architecture includes a core array of interconnected elemen-
tary processing units, surrounded by a global circuitry. This
latter circuitry is intended for:
• control and timing;
• adressing and buffering of the core cells;
• input/output;
• storage of user-selectable instructions (programs) to
control the sequence of operations of the processing
core;
• storage of user-selectable analogic programming pa-
rameter configurations (templates).
1248 PROCEEDINGS OF THE IEEE, VOL. 90, NO. 7, JULY 2002
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 15,2020 at 14:17:43 UTC from IEEE Xplore.  Restrictions apply. 
Fig. 8. Architectures of analogic visual microprocessor chips: (a) ACE4K [10] and (b) ACE16K
[26].
On the other hand, the core of interconnected processing
units embeds different functions on a common silicon sub-
strate (see Fig. 9 for illustration purposes), namely:
• 2-D sensing;
• 2-D analog/digital array processing concurrent with the
signal sensing;
• 2-D spatio-temporal processing determined by local,
receptive-field-like programmable interconnections;
• 2-D memory banks for concurrent online uploading
and downloading of short-term analog and digital
data.
Several analogic visual microprocessor chips in different
CMOS technologies have been reported during the last few
years. Particularly, [10], [11], and [26]–[29] report those im-
plementations with at least 20 20 pixels. Table 1 presents a
summary of some of their most relevant data. Some columns
ROSKA AND RODRÍGUEZ-VÁZQUEZ: TOWARDS VISUAL MICROPROCESSORS 1249
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 15,2020 at 14:17:43 UTC from IEEE Xplore.  Restrictions apply. 
Fig. 9. llustrating the embedding of different functional features at the core processing array
of visual microprocessors. (a) Microphotograph of the ACE4K chip (left) and conceptual
representation of the distributed functions embedded in the core array (right). (b) Layout of a
processing unit of the ACE16K showing the areas occupied by the different functions realized
concurrently by the core array.
correspond to chips intended for black and white input im-
ages, while others are for chips which accept gray-scale input
images. As with any other analog processing circuit, figures
of merit about performance must contemplate accuracy and
area occupation in addition to speed and power consump-
tion. The speed measure here is proportional to the number
of cells, the inverse of the time constant, and a weighted
number of multipliers per cell. Any comparison must refer
to the number of operations per second and to the accuracy.
The data in the table highlights the following.
• There is a tradeoff between area occupation (cell den-
sity) and accuracy, on the one hand, and speed of opera-
tion and power consumption, on the other. This tradeoff
is typical of analog integrated circuits [33].
• The evolution toward scaled-down technologies reports
advantages in terms of speed and cell density. Actu-
ally, the ACE16K chip has 128 128 resolution and
is capable of realizing sequences of 64 instructions;
using up to 32 different templates (each template con-
sisting of 24 8-bit-coded analog programming values)
during a sequence; loading and downloading full-size
gray-scale images to and from the cache memory, and
having always eight full-size images available for usage
during the flow; with an internal processing time of
160 ns, and providing digitally coded output images
(obtained with a battery of internal converters) with a
downloading time of 0.128 ms.
The capability to design cells with maximum density,
speed and accuracy, and minimum area and power consump-
tion relies basically on the exploitation of all functional
features offered by the MOS transistor. This is very different
from digital design, in which only the switching capability
of the MOS transistor is exploited. The design of the entities
which interconnect the cells (synapses) defines one of the
1250 PROCEEDINGS OF THE IEEE, VOL. 90, NO. 7, JULY 2002
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 15,2020 at 14:17:43 UTC from IEEE Xplore.  Restrictions apply. 
Table 1
Summary and Comparison of Chip Implementations
major issues. In order to do this, different possibilities may
be chosen a priori, as illustrated in Fig. 10. In all cases,
electrical controllability is provided by default. However, the
different strategies exhibit quite a different performance in
the presence of systematic and random error sources, as well
as a different incidence of the global signal transmission
errors. Hence, careful analysis and optimization is needed
to select the best approach. Such analysis and optimization
are needed to achieve the cell density and accuracy levels
featured by last generation chips. The background for such
procedures can be found in [3], [10], [11], [26], and [28].
IV. ABOUT SCALING DOWN
It is expected that the performance figures featured for
these chips can be further enhanced as technology scales
ROSKA AND RODRÍGUEZ-VÁZQUEZ: TOWARDS VISUAL MICROPROCESSORS 1251
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 15,2020 at 14:17:43 UTC from IEEE Xplore.  Restrictions apply. 
Fig. 10. Using a single NMOST for voltage-to-current transformation. Only first-order terms are
included in the displayed behavioral equations.
down. However, one problem arises due to the necessity of
maintaining analog accuracy, and hence the quality of the
analog design, as transistor sizes decrease. Below we first
identify mismatch as the main limit for the analog accu-
racy and then explore different tradeoffs associated with the
analog design in the presence of mismatch.
A. Mismatch Versus Noise as a Limiting Factor
Mismatch makes two nominally identical devices behave
differently when they are used in a real integrated circuit.
Based on the formulation of mismatch as a function of device
geometries in [30], the variance of the large-signal transcon-
ductance parameter , the threshold voltage , and the
slope factor4 as function of the device area and aspect
ratio can be represented as
(7)
where is the transistor channel area and is the transistor
aspect ratio.
Another accuracy limiting factor is noise. The equivalent
noise current for an MOS transistor can be expressed as [31]
(8)
where and vary between 1 and 2,
within the ohmic region and of this quan-
tity in saturation, and is the small-signal
transconductance parameter.
4In the original model, the variance was formulated for the body effect
factor    (n ) can be obtained as a function of  (V ) and  ().
Let us consider that the only significant mismatch error
is that of the large-signal transconductance parameter —as
it actually happens in many practical circuits used for es-
tablishing interconnections in analog array processors [32],
[33]. In terms of the transistor area and aspect , this error
is expressed as
(9)
Under similar assumptions, the noise contribution can be ap-
proximated by
(10)
Using typical parameters for CMOS 0.5- m technologies
( V, V, V,
cm V s , m ,
V F) and considering a bandwidth of 1–5 MHz, we conclude
that, for devices with channel areas of about 50 m , the
matching level sets an accuracy slightly above 8 b while for
this same area and a channel aspect ratio of 0.1 the noise
poses a limit in the resolution of 10.48 bit, far beyond from
that posed by mismatching phenomena.
B. The Effect of the Scaling Process
Let us assume that lateral dimensions scale as
(11)
Thus, the gate oxide thickness, which approximately evolves
in current technologies as , scales as
(12)
1252 PROCEEDINGS OF THE IEEE, VOL. 90, NO. 7, JULY 2002
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 15,2020 at 14:17:43 UTC from IEEE Xplore.  Restrictions apply. 
Fig. 11. (a) Historical trend of parameter A . (b) Historical trend of parameter A [34].
Assume that the synapse size defines the achievable cell
density
(13)
where and are the synapse width and length. As tech-
nologies scale down, Density might hence evolve as
(14)
Another important parameter is the time constant which
can be expressed as
(15)
In the case that one transistor is employed to realize the
synapse [32], [33], the transconductance parameter is ap-
proximately given by
(16)
where is the weight control signal.
On the other hand, assuming that the capacitor is imple-
mented by using the gate capacitance of an MOS transistor,
the capacitance value neglecting border effects is approxi-
mately given by
(17)
From (16) and (17), the time constant becomes
(18)
Hence, it might ideally scale as
(19)
Unfortunately, the density and speed enhancements re-
ported by (14) and (19) cannot be realized in practice due to
the necessity of keeping the analog accuracy. The question
is, what happens with the technological parameters related
to the accuracy when the technology scales down? Do they
also scale down? The answer is that not all of them scale
as feature size does. The historical trend shows [34] that
scaling down produces a reduction of the main parameter
related to mismatching, namely the parameter,
[see Fig. 11(a)]. However, as already mentioned, accuracy in
the behavior of the one transistor synapse is mainly affected
by random fluctuations on the parameter [32], [33]. Errors
of the synapse current are approximately given by
(20)
Fig. 11(b) shows that the parameter has remained prac-
tically unchanged as feature size was scaled down. Hence,
synapse errors evolve as
(21)
Consequently, if transistors are designed such that their
channel areas are scaled down by , then, the relative error
will grow according to
(22)
Accuracy can only be kept by maintaining approximately
the same absolute channel area. Of course this statement is
valid provided that the empirical trend depicted in Fig. 11(b)
remains.
C. Design Tradeoffs
Among many other things, analog design art consists
mainly in the combination of many design equations in-
volving area occupation, power consumption, speed, and
accuracy. Typically, the objective is to meet the design
requirements by minimizing (or maximizing) a certain
figure of merit (FOM), using the channel areas and aspect
ratios of the transistors as design variables.
Unfortunately, as already highlighted in previous section,
it is not possible to optimize all figures simultaneously;
instead, tradeoffs among the different figures must be
considered.
1) Accuracy Versus Density: The dependence of mis-
match on the channel aspect ratio is low for moderately
large values of the channel areas. Due to this, the channel
ROSKA AND RODRÍGUEZ-VÁZQUEZ: TOWARDS VISUAL MICROPROCESSORS 1253
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 15,2020 at 14:17:43 UTC from IEEE Xplore.  Restrictions apply. 
area is constrained by the required accuracy and it may
therefore be said that the precision satisfies
(23)
where is defined as
(24)
On the other hand, the density of synapses, that is, the
number of synapses per area unit, can be basically expressed
as
(25)
where is a constant which includes the influence of the
routing lines and diffusion regions on the achievable density.
Hence, a first tradeoff can be formulated as
(26)
Accordingly, maximum achievable accuracy and cell density
cannot be optimized separately since the greater the accuracy,
the smaller the density and vice versa.
2) Speed Versus Power: The maximum power consump-
tion of a synapse is expressed as
(27)
while the minimum time constant, corresponding to the max-




Consequently, it seems that the only way to minimize this
figure, i.e., reduce the power consumption and increase the
speed, is by reducing the synapse area. Nevertheless, this au-
tomatically leads to a reduction of the achievable accuracy.
On the other hand, reducing the signal ranges, or ,
will directly degrade the signal-to-noise ratio (SNR) and thus
the accuracy.
A global FOM involving speed accuracy and tradeoff can
be formulated in the following way:
(30)
Since does not show any evolution as technology is scaled
down, this FOM only depends on the technology scaling
process as does. Therefore, since , it is ex-
pected that the FOM will worsen in the future.
V. COMPUTATIONAL INFRASTRUCTURE FOR TERAOPS
OPERATION
A. Computational Infrastructure
Practical stored programmability requires a standard
computational infrastructure and a high-level language,
operating system, and software library for the analogic soft-
ware. Moreover, the computational infrastructure should rely
on the existing PC culture and should be transparent to dig-
ital systems. The details of the computational infrastructure
and the chip set architecture have been published elsewhere
[35]. Presently, analogic CNN visual microprocessors are
supporting TeraOPS equivalent digital computing speed,
and rates of more than 10 000 frames/s have been tested.
B. Programmable Neuromorphic Vision Models
Many parts of the visual pathway, in different animals and
in humans, have been recently studied in detail. As to the
retina, see, e.g., [36] and the recent breakthrough in [4]. As
with the retinotopic neuromorphic vision models, the three
basic structures of the spatial–temporal models are as fol-
lows:
• layers with given receptive fields combined in a cas-
cade structure;
• allowing interlayer feedback (e.g., in the prototype
complex cell structure);
• the combination of an ON and OFF pathway (or an exci-
tatory and an inhibitory flow).
Recognize that in these models there is no discretization in
time.
These structures are implementable on CNN (see, e.g.,
the first results in [15]). On the other hand, it is impractical
to build special chips for each visual effect (e.g., for edge
detection, histogram equalization, motion detection, length
tuning, directional sensitivity, and detecting a typical mor-
phology). Moreover, if we want to make a visual prosthesis,
programmability might be mandatory.
In the next example, we show a typical channel of a mul-
tilayer CNN retina model reflecting the basic new concepts
of mammalian retinal operation [4]. Observe that in the cas-
cade structure there are many interlayer feedback parts. In
addition, the two paths of signals represent the ON and OFF
visual pathway.
C. Example 3
The flow diagram of a typical vertebrate retina model is
shown in Fig. 12. Snapshots of a moving head are also pre-
sented. Based on [4], it is known that in a mammalian retina
there are about a dozen parallel channels embedded in the
inner part of the retina. Here we show one typical and simple
channel. The interested reader can consult [24] and its refer-
ence publications.
VI. COMPUTATIONAL COMPLEXITY
Classical computational complexity studies are based on
the digital computer, in particular the Turing Machine. Re-
cently, a first step in the direction of breaking this powerful
1254 PROCEEDINGS OF THE IEEE, VOL. 90, NO. 7, JULY 2002
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 15,2020 at 14:17:43 UTC from IEEE Xplore.  Restrictions apply. 
Fig. 12. Retina modeling. The left side, showing also a drawing of the interacting general neuron
types in the retina, presents the multilayer CNN structural elements of the ON–OFF retina model.
The neurons in the retina are organized into 2-D layers modeled with CNN layer(s). A neuron in
a given layer interacts with another neuron in another layer through synapses, which have their
own dynamics and temporal characteristics. The layers are depicted by horizontal lines and the
interlayer synapses by vertical arrows. The circle represents the intralayer coupling, which is a
space-constant-dependent diffusion. The dashed lines stand for nonlinear transfer functions. The
right side is a sequence of the sample frames from a processed natural scene video in one particular
(local edge detector) model. The topmost picture is the input and the others are the responses in some
computed layers. The green color indicates the inhibition, the red regions correspond to the excitation
and the white spots stand for the spiking, to the output of the retina.
but rigid framework has been made by introducing a still iter-
ative computational complexity theory based on real values
[37]. The CNN-UM defines a computing platform one step
further: it is a machine based on flows, or real-valued image
flows [18].
Computing is a physical process. While the classical com-
plexity theory was basically good for logic operations and for
dealing with the combinatorial complexity, as well as a part
of the number-crunching tasks (but still missing the semantic
aspects), it cannot even capture the problem of chaotic sig-
nals or nonlinear waves. The latter, as we have seen, is com-
pletely common in visual models. The principal question is
practical: how long does it take to solve a problem on a given
piece of silicon within a given power dissipation? The an-
swer is not only dependent on the size of the problem, but
more importantly on the parameters of the operator. Recent
results show some possible answers in this direction [18]. As
a part of this endeavor, the notion of an analogic cellular algo-
ROSKA AND RODRÍGUEZ-VÁZQUEZ: TOWARDS VISUAL MICROPROCESSORS 1255
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 15,2020 at 14:17:43 UTC from IEEE Xplore.  Restrictions apply. 
rithm has been developed via the -recursive functions. As
the -recursive function is the basis for digital algorithms
(they are basic components of the C language as well), the
-recursive function is the basis for analogic cellular soft-
ware and the Alpha language used for it [35]. It has been
proven that the CNN-UM is a minimal implementation for
the -recursive functions.
VII. CONCLUSION
We have shown some basic notions, architectures, CMOS
implementations, computational infrastructures as well
as the biological plausibility for a visual microprocessor.
Operating focal plane visual microprocessors and its accom-
panying computational infrastructure with analogic visual
software are available. It has been shown that the integrated
sensing and stored programmable processing principle is
crucial in any complex vision-related tasks, including the
whole process from sensors to visual understanding.
ACKNOWLEDGMENT
The authors deeply appreciate the assistance of D. Bálya,
P. Földesy, I. Petrás, and Cs. Rekeczky related to the ex-
amples and the contributions of S. Espejo, R. Domínguez-
Castro, R. Carmona, and G. Liñán.
REFERENCES
[1] C. Koch and H. Li, Eds., Vision Chips, Implementing Vision Algo-
rithms with Analog VLSI Circuits. New York: IEEE Press, 1995.
[2] A. Moini, Vision Chips. Norwell, : Kluwer, 2000.
[3] T. Roska and A. Rodríguez-Vázquez, Eds., Toward the Visual Mi-
croprocessor. New York: Wiley, 2000.
[4] B. Roska and F. S. Werblin, “Vertical interactions across ten parallel,
stacked representations in the mammalian retina,” Nature, vol. 410,
pp. 583–587, Mar. 2001.
[5] L. O. Chua and L. Yang, “Cellular neural networks: Theory and
applications,” IEEE Trans. Circuits Syst., vol. 35, pp. 1257–1290,
1988.
[6] L. O. Chua and T. Roska, “The CNN paradigm,” IEEE Trans. Cir-
cuits Syst. I, vol. 40, pp. 147–156, Mar. 1993.
[7] T. Roska and L. O. Chua, “The CNN universal machine: An analogic
array computer,” IEEE Trans. Circuits Syst. II, vol. 40, pp. 163–173,
Mar. 1993.
[8] F. Werblin, T. Roska, and L. O. Chua, “The analogic cellular neural
network as a bionic eye,” Int. J. Circuit Theory Applicat., vol. 23,
pp. 541–549, 1995.
[9] L. O. Chua and T. Roska, Cellular Neural Networks and Visual Com-
puting. Cambridge, U.K.: Cambridge Univ. Press, 2002.
[10] G. Liñán, P. Földesy, S. Espejo, R. Domínguez-Castro, and A.
Rodríguez-Vázquez, “A 0.5 m CMOS 10 transistor analog
programmable array processor for real-time image processing,” in
Proc. 1999 Eur. Solid-State Circuits Conf., Sept., pp. 358–361.
[11] R. Carmona, P. Garrido, R. Domínguez-Castro, S. Espejo, and
A. Rodríguez-Vázquez, “Bioinspired analog vlsi design realizes
programmable complex spatio-temporal dynamics on a single
chip,” in Proc. 2002 Conf. Design and Test in Europe, to be
published.
[12] D. Bálya, B. Roska, E. Nemeth, T. Roska, and F. S. Werblin, “A
qualitative model framework for spatio-temporal effects in verte-
brate retina,” Proc. 2000 IEEE Conf. Cellular Neural Networks and
Their Applications, pp. 165–170, 2000.
[13] T. Roska, “Computer-sensors: Spatial-temporal computers for
analog array signals, dynamically integrated with sensors,” J. VLSI
Signal Process. Syst., vol. 23, pp. 221–238, 1999.
[14] S. Espejo, R. Carmona, R. Domínguez-Castro, and A. Rodríguez-
Vázquez, “A VLSI-oriented continuous-time CNN model,” Int. J.
Circuit Theory Applicat., vol. 24, pp. 341–356, May–June 1996.
[15] T. Roska, J. Hámori, E. Lábos, K. Lotz, L. Orzó, J. Takács, P. Vene-
tianer, Z. Vidnyánszky, and Á. Zarándy, “The use of CNN models
in the subcortical visual pathway,” IEEE Trans. Circuits Syst. I, vol.
40, pp. 182–195, Mar. 1993.
[16] T. Kozek, T. Roska, and L. O. Chua, “Genetic algorithm for
CNN template learning,” IEEE Trans. Circuits Syst. I, vol. 40, pp.
392–402, June 1993.
[17] P. Szolgay, I. Szatmári, and K. László, “A fast fixed point learning
method to implement associative memory on CNN’s,” IEEE Trans.
Circuits Syst. I, vol. 44, pp. 362–366, 1997.
[18] T. Roska, “Analogic wave computers—Wave-type algorithms:
Canonical description, computer classes, and computational com-
plexity,” Proc. 2001 IEEE Int. Symp. Circuits and Systems, pp.
41–44, 2001.
[19] P. Szolgay, G. Vörös, and Gy. Eröss, “Applications of the cellular
neural network paradigm in mechanical vibrating systems,” IEEE
Trans. Circuits Syst. I, vol. 40, pp. 222–227, Mar. 1993.
[20] T. Roska, L. O. Chua, D. Wolf, T. Kozek, R. Tetzlaff, and F. Puffer,
“Simulating nonlinear waves and partial differential equations via
cnn—Part I: Basic techniques,” IEEE Trans. Circuits Syst. I, vol. 42,
pp. 807–815, Oct. 1995.
[21] L. Alvarez and J. M. Morel, “Morphological approach to multiscale
analysis,” in Geometry-Driven Diffusion in Computer Vision, B. M.
H. Romeny, Ed. Norwell, MA: Kluwer, 1994, pp. 229–249.
[22] C. Rekeczky, Á. Tahy, Z. Végh, and T. Roska, “CNN-based
spatio-temporal nonlinear filtering and endocardial boundary
detection in echocardiography,” Int. J. Circuit Theory Applicat.,
vol. 27, pp. 171–207, 1999.
[23] C. Rekeczky and L. O. Chua, “Computing with front propagation:
Active contour and skeleton models in continuous-time CNN,” J.
VLSI Signal Process. Syst., vol. 23, pp. 373–402, 1999.
[24] D. Bálya, C. Rekeczky, and T. Roska, “Basic mammalian retinal
effects on the prototype complex cell CNN universal machine,” in
Proc. IEEE 7th Int. Workshop Cellular Neural Networks and Their
Applications, 2002, pp. 251–258.
[25] J. C. Gealow and C. G. Sodini, “A pixel-parallel image processor
using logic pitch matched to dynamic memory,” IEEE J. Solid-State
Circuits, vol. 34, pp. 831–839, June 1999.
[26] G. Liñán, R. Domínguez-Castro, S. Espejo, and A. Ro-
dríguez-Vázquez, “ACE16K: An advanced focal-plane analog
programmable array processor,” in Proc. 2001 Eur. Solid-State
Circuits Conf., Villach, Austria, Sept. 2001, pp. 216–219.
[27] P. Kinget and M. Steyaert, Analog VLSI Integration of Massive Par-
allel Processing Systems. Norwell, MA: Kluwer, 1997.
[28] R. Domínguez-Castro et al., “A 0.8 m CMOS 2-D programmable
mixed-signal focal-plane array processor with on-chip binary
imaging and instruction storage,” IEEE J. Solid-State Circuits, vol.
32, pp. 1013–1026, 1997.
[29] A. Paasio, A. Dawidziuk, K. Halonen, and V. Porra, “Minimum size
0.5 m CMOS programmable CNN test chip,” in Proc. 1997 Eur.
Conf. Circuit Theory and Design, Budapest, Hungary, Sept. 1997,
pp. 154–156.
[30] M. J. M. Pelgrom et al., “Matching properties of MOS transistors,”
IEEE J. Solid-State Circuits, vol. 24, pp. 1433–1440, Oct. 1989.
[31] E. A. Vittoz, “Future of analog VLSI in the VLSI environment,”
Proc. 1990 IEEE ISCAS, pp. 1372–1390.
[32] R. Domínguez-Castro, S. Espejo, A. Rodríguez-Vázquez, and
R. Carmona, “A one-transistor-synapse strategy for electri-
cally-programmable massively-parallel analog array processors,”
in IEEE-CAS 1997 Region 8 Workshop on Analog and Mixed IC
Design, ISBN 0-7803-4240-2, Sept., pp. 117–122.
[33] A. Rodríguez-Vázquez, E. Roca, M. Delgado-Restituto, S. Espejo,
and R. Domínguez-Castro, “MOST-based design and scaling of
synaptic interconnections in VLSI analog array processing chips,”
J. VLSI Signal Process. Syst. Signal, Image Video Technol., vol. 23,
pp. 239–266, Nov./Dec. 1999.
[34] M. Steyaert et al., “Speed-power-accuracy trade off in high-speed
analog-to-digital converters: Now and in the future,” in Proc. 9th
Workshop in Analog Circuit Design, Apr. 2000.
[35] T. Roska, A. Zarándy, S. Zöld, P. Földesy, and P. Szolgay, “The com-
putational infrastructure of analogic CNN computing—Part I: The
CNN-UM chip prototyping system,” IEEE Trans. Circuits Syst. I,
vol. 46, pp. 261–268, 1999.
[36] F. Werblin, A. Jacobs, and J. Teeters, “The computational eye,” IEEE
Spectrum, vol. 33, pp. 30–37, May 1996.
[37] L. Blum, F. Cucker, M. Shub, and S. Smale, Complexity and Real
Computation. New York: Springer, 1998.
1256 PROCEEDINGS OF THE IEEE, VOL. 90, NO. 7, JULY 2002
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 15,2020 at 14:17:43 UTC from IEEE Xplore.  Restrictions apply. 
Tamás Roska (Fellow, IEEE) received the Diploma in electrical engineering
from the Technical University of Budapest, Budapest, Hungary, in 1964 and
the Ph.D. and D.Sc. degrees from the National Qualification Committee,
Hungarian Academy of Sciences, Budapest, Hungary, in 1973 and 1982,
respectively.
Since 1964, he has held various research positions. During 1964–1970,
he was with the Measuring Instrument Research Institute, Budapest, be-
tween 1970 and 1982 with the Research Institute for Telecommunication,
Budapest (serving also as the head of Department for Circuits, Systems and
Computers) and since 1982, he has been with the Computer and Automa-
tion Institute of the Hungarian Academy of Sciences where, for 15 years, he
has been the head of the Analogic and Neural Computing Research Labora-
tory. He has taught several courses at various universities, presently, at the
Technical University of Budapest, at the University of California, Berkeley,
and very recently at the Pázmány P Catholic University in Budapest. He is
teaching courses on “Emergent Computations” and “Cellular Neural Net-
works.” In 1974 and each year since 1989, he has been a Visiting Scholar
at the Department of Electrical Engineering and Computer Sciences and the
Electronics Research Laboratory, and recently a Visiting Research Professor
at the Vision Research Laboratory of the University of California, Berkeley.
He also presently serves as a Dean of the Faculty of Information Technology
at the Pázmány P. Catholic University, Budapest. His main research areas are
cellular neural networks, nonlinear circuit and systems, neural circuits, vi-
sual computing, and analogic spatial-temporal supercomputing. He has pub-
lished more than 200 research papers and four books (some as a coauthor),
and held several guest seminars at various universities and research institu-
tions in Europe, USA, and Japan. He is a co-inventor of the CNN Universal
Machine (with L. O. Chua), a U.S. patent of the University of California with
worldwide protection, and the analogic CNN Bionic Eye (with F. Werblin
and L. O. Chua), another U.S. patent of the University of California. He has
contributed also to the development of various physical implementations of
these inventions making this Cellular Analogic Supercomputer a reality.
Dr. Roska received the IEEE Fellow award for contributions to the qualita-
tive theory of nonlinear circuits and the theory and design of programmable
cellular neural networks. In 1993, he was elected to be a member of the
Academia Europaea (European Academy of Sciences, London, U.K.) and
the Hungarian Academy of Sciences. For technical innovations he received
the D. Gabor Award for establishing a new curriculum in information tech-
nology, and for his scientific achievement he was awarded the A. Szent-
györgyi Award and the Széchenyi Award, respectively. In 1994, he became
the elected active member of the Academia Scientiarium et Artium Europaea
(Salzburg, Austria). In 2002, he received the Bolyai Award in Hungary.
Since 1975, he has been a member of the Technical Committee on Nonlinear
Circuits and Systems of the IEEE Circuits and Systems Society. Between
1987–1989, he was the founding Secretary and later he served as Chairman
of the Hungary Section of the IEEE. Recently, he has served twice as As-
sociate Editor of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I,
FUNDAMENTAL THEORY AND APPLICATIONS and for 2002–2003 he has been
appointed as the Editor-in-Chief of this journal. He has served as Guest
Co-Editor of special issues on cellular neural networks of the International
Journal of Circuit Theory and Applications (1992, 1996, 1998, 2000), the
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS (1993 and 1999), and the
Journal of VLSI Signal Processing Systems (1999). He is a member of the
Editorial Board of the International Journal of Circuit Theory and Applica-
tions. He is a member of the Technical Committee on Multimedia and the
Technical Committee on Neural Networks of the IEEE. In 1998, he estab-
lished and became the first Chair of the Technical Committee on Cellular
Neural Networks and Array Computing of the IEEE Circuits and Systems
Society. In 2000 he received the IEEE Millenium Medal and the Golden Ju-
bilee Award of the IEEE Circuits and Systems Society.
Ángel Rodríguez-Vázquez (Fellow, IEEE) is a Professor of Electronics at
the Department of Electronics and Electromagnetism, University of Seville,
Seville, Spain. He is also a member of the research staff of the Institute of
Microelectronics of Seville—Centro Nacional de Microelectrónica (IMSE-
CNM)—where he is heading a research group on Analog and Mixed-Signal
Integrated Circuits. His research interests are in the design of analog inter-
faces for mixed-signal circuits, CMOS imagers and vision chips, telecom
circuits, neuro-fuzzy controllers, symbolic analysis of analog integrated cir-
cuits, and optimization of analog integrated circuits. In these fields, he has
published 5 books, 23 book chapters in other books, around 100 journal pa-
pers, and more than 250 conference papers.
Dr. Rodríguez-Vázquez served as an Associate Editor of the IEEE
TRANSACTIONS ON CIRCUITS AND SYSTEMS-I, FUNDAMENTAL THEORY AND
APPLICATIONS (IEEE TCAS-I) from 1993 to 1995, as Guest Editor of the
IEEE TCAS–I special issues on “Low-Voltage and Low-Power Analog and
Mixed-Signal Circuits and Systems” (1995) and “Bio-Inspired Processors
and Cellular Neural Networks for Vision” (1999), as Guest Editor of
the IEEE TCAS-II special issue on “Advances in Nonlinear Electronic
Circuits” (1999), and as chair of the IEEE Circuits and Ssytems Analog
Signal Processing Committee (1996). Currently, he is an Associate Editor
for IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-II. He was corecipient
of the 1995 Guillemin-Cauer award of the IEEE Circuits and Systems
Society, the best paper award of the 1995 European Conference on Circuit
Theory and Design, and the 1999 best paper award of the International
Journal on Circuit Theory and Applications. In 1992 he received also the
young scientist award of the Seville Academy of Science.
ROSKA AND RODRÍGUEZ-VÁZQUEZ: TOWARDS VISUAL MICROPROCESSORS 1257
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 15,2020 at 14:17:43 UTC from IEEE Xplore.  Restrictions apply. 
