A Bio-Inspired Two-Layer Mixed-Signal Flexible Programmable Chip for Early Vision by Carmona Galán, Ricardo et al.
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 14, NO. 5, SEPTEMBER 2003 1313
A Bio-Inspired Two-Layer Mixed-Signal Flexible
Programmable Chip for Early Vision
Ricardo Carmona Galán, Member, IEEE, Francisco Jiménez-Garrido, Rafael Domínguez-Castro,
Servando Espejo, Member, IEEE, Tamás Roska, Fellow, IEEE, Csaba Rekeczky, István Petrás, and
Ángel Rodríguez-Vázquez, Fellow, IEEE
Abstract—A bio-inspired model for an analog programmable
array processor (APAP), based on studies on the vertebrate retina,
has permitted the realization of complex programmable spatio-
temporal dynamics in VLSI. This model mimics the way in which
images are processed in the visual pathway, what renders a feasible
alternative for the implementation of early vision tasks in standard
technologies. A prototype chip has been designed and fabricated
in 0.5 m CMOS. It renders a computing power per silicon area
and power consumption that is amongst the highest reported for
a single chip. The details of the bio-inspired network model, the
analog building block design challenges and trade-offs and some
functional tests results are presented in this paper.
Index Terms—Cellular neural networks, machine vision, neural
networks hardware, visual systems.
I. INTRODUCTION
THE RETINA is found to be responsible for a ratherinvolved treatment of visual information at early stages
in the process of vision [1]–[3]. Through the close interaction
of sensory and processing structures, complex spatio-temporal
processes are realized in the retina which reduces the enormous
amount of information associated to the visual flow into a
data set of manageable size. Although retinas are not yet fully
understood, and defines a challenging basic research area,
the construction of vision processing devices with retina-like
features shows large potential to overcome the limitations of
conventional vision technologies. In that sense, during the last
few years, several neuromorphic [4] vision chips have been
developed and reported in literature. Some of these works are
listed and examined in [5] and [6].
Recently, the behavior of the more external strata of the multi-
layered structure of vertebrate retina has been successfully mod-
eled by using the Cellular Neural Network (CNN) framework
[7]. Such model has been based on studies and observations
about the mammalian retina which have been recently published
in Nature [3]. In this model, interactions between cells in the
Manuscript received September 15, 2002. This work was supported in part
by ONR Project N-000140210884, CE Project IST-1999-19007 (DICTAM) and
the Spanish MCyT Project TIC1999-0826.
R. Carmona Galán, F. Jiménez-Garrido, R. Domínguez-Castro, S. Espejo,
and A. Rodríguez-Vázquez are with the Instituto de Microelectrónica de
Sevilla-CNM-CSIC, Campus de la Universidad de Sevilla, Sevilla 41012,
Spain (e-mail: rcarmona@imse.cnm.es).
T. Roska, I. Petrás, and C. Rekeczky are with the Analogic and Neural Com-
puting Laboratory, Computer and Automation Institute of Hungarian Academy
of Science, Budapest H-1111, Hungary.
Digital Object Identifier 10.1109/TNN.2003.816377
retinal fabric are realized on a local basis; each cell interacts
with its nearest neighbors. Also, every cell belonging to the
same layer has the same interconnection pattern. For each retinal
layer, the same set of interconnection weights is applied to each
and everyone of its cells; i.e., layers are spatially-invariant. In
addition to this, the signals supporting intra- and inter-layer in-
teractions are continuous in magnitude and time.
The phenomena observed in [3] are modeled in [7] by two
coupled sets of two-dimensional (2-D) nonlinear differential
equations. Because of the local interactions and the spatial-in-
variance, the behavior of such a model is fully described
by some 25 parameters. This set of controlling parameters
include interaction strengths, time constants and bias terms.
By properly setting their values complex, interacting waves
are generated which emulates the phenomena observed in the
mammalian retina. This paper presents a fully-programmable
mixed-signal1 implementation of the model in [7] on a sil-
icon chip. The chip, fabricated in a standard 0.5 CMOS
technology, have a core composed of 32 322 elementary
processors to implement the behavioral model, and embeds, in
addition to this core circuitry, a set of circuit structures needed
to render it a complete retina-like visual microprocessor system
on a chip, namely, the following:
• establishing boundary conditions for the network
dynamics;
• storing intermediate images, through 2-D short-term
analog and digital memory banks;
• content-controlled and programmable intra-cell dataflow;
• global control and timing;
• addressing and buffering of the core cells;
• input–output;
• storing user-selectable analog and digital programming
parameter configurations (for the coding of interaction
weights and the setting of reconfiguration conditions);
• storing user-selectable instructions (programs) to control
the sequence of operations of the processing core;
• controlling and timing the sequence of operations for the
whole chip.
1The circuit embodies both analog and digital circuitry. Analog techniques
are used to implement the core behavioral model, including the dynamic oper-
ation and the interactions among cells, while digital techniques are employed
to control the operation of the chip and to make the control interfacing to the
external world.
2Each elementary processor is double, due to the layered structure of the chip.
Hence, it is more correct saying that the new chip includes 2 (32 32) hori-
zontally and vertically interacting processors.
1045-9227/03$17.00 © 2003 IEEE
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 07,2020 at 14:32:31 UTC from IEEE Xplore.  Restrictions apply. 
1314 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 14, NO. 5, SEPTEMBER 2003
The integrated system belongs to the category of the
so-called Single-Instruction Multiple-Data processors [8],
although works directly on analog signal representations. It
reports significant advantages in terms of area and power
efficiency. For instance, leaving aside the resources needed to
obtain the digital image representations used by the chip in
[8], it features 4 (OPS: OPerations per Second)
and 1 ; while the chip in this paper features
6 and 1.56 GOPS/mW. In addition to that,
and to the best of our knowledge, no other SIMD micropro-
cessor-on-a-chip with retina-like behavior has been reported
to date, although a number of remarkable, pioneering vision
chips have been successfully implemented as for instance those
listed in [6].
This paper is organized as follows. Section II is dedicated
to the bio-inspired network models, the foundations of the
mathematical network model in a sketch of the biological
retina. Section III describes the architecture of the APAP
chip and its main components. Section IV explains how the
analog building blocks of the basic processing units have been
designed. Section V reviews the peripheral circuitry design.
Experimental results obtained from testing a prototype chip
are shown in Section VI. Finally, Section VII displays some
conclusions.
II. BIO-INSPIRED NETWORK MODEL
A. Sketch of the Vertebrate Retina
Due to the vast amount of information contained in the visual
stimuli, nature has developed a specialized part of the nervous
system to handle it: the retina. On one side, the neuronal im-
pulses conveying information along the nerves do not support
such a large data rate. On the other side, because of the high
correlation found between the elements of the image —most of
the energy of the signal, in images displaying natural scenes, is
concentrated in the lower spatial and temporal frequencies—,
not every bit of information has to be passed to the brain to
accomplish vision. Therefore, the retina, brought to the sen-
sory periphery instead of being integrated in the central nervous
system, processes the visual information at the focal plane, real-
izing what is called early vision. By performing so, the data flow
to the visual cortex is greatly reduced, thus solving the problem
of intelligent processing of visual information in a tight time
frame.
The vertebrate retina has the structure displayed in Fig. 1
[9]. A first layer of photodetectors at the outermost layer of
the retina, the cone cells—a different type of cell, the rods, are
specialized in sensing in very dim light conditions and satu-
rate very easily, captures light and converts it to activation sig-
nals. Bipolar cells carry these signals across the retina layers
to the ganglion cells that interface the retina with the optical
nerve, in a trip of several micrometers [3]. The ganglion cells
convert the continuous activation signals, proper of the retina,
to spike-coded signals that can be transmitted over longer dis-
tances by the nervous system. On the way to the ganglion cells,
the information carried by bipolar cells is affected by the oper-
ation of the horizontal and amacrine cells. They form layers in
which activation signals are weighted and promediated in order
Fig. 1. Schematic diagram of the vertebrate retina [9] showing the layer of
photosensors at the top and the ganglion cells connecting to the optic nerve.
to, first, bias photodetectors and, second, to account for inhibi-
tion on the vertical pathway. The four main transformations that
take place in this structure are: the photoreceptor gain control,
the gain control of the bipolar cells, the generation of transient
activity and the transmission of transient inhibition. Briefly, cap-
tured stimulus are promediated and the high-gain characteris-
tics of the cones and the bipolar cells are shifted to adapt to the
particular light conditions. These operations have a local scope
and depend on the recent history of the cells. Once adaptation
is achieved, patterns of activity are formed dynamically by the
presence or absence of visual stimuli. Also inhibition is gen-
erated and transmitted laterally through the layers of horizontal
and amacrine cells. As a result of these transformations, the pat-
terns of activity reach the layer of ganglion cells. At this point,
the patterns are converted into pulse-coded signals that are sent
to the brain to be interpreted. In a sense, the layered structure
of the retina translates the visual stimuli into a compressed lan-
guage that can be understood by the brain in recreating vision.
B. CNN Analogy of the Inner and Outer Plexiform Layers
There are, in this description, some interesting aspects of the
retinal layers that markedly resemble the characteristics of a
CNN: the 2-D aggregation of continuous signals, the local con-
nectivity between elementary nonlinear processors, the analog
weighted interactions between them. Also, the complete signal
pathway in the retina have the topology of a 3-D, or more prop-
erly, two-and-a-half dimensional pile of 2-D layers connected
vertically network. Motivated by these coincidences, and based
on physiological and pharmacological studies [2], a CNN model
has been developed that approximates the observed behavior of
the vertebrate retina [10].
The outer plexiform layer of the retina, OPL, is responsible
for the image capture. It has been characterized by experimental
measurements [11], leading to a model with three different
layers of cells. The first one, the photosensing layer, consists in
an aggregation of cone cells. It is assumed here that the retina
is adapted to lighting conditions and so the rods are saturated
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 07,2020 at 14:32:31 UTC from IEEE Xplore.  Restrictions apply. 
CARMONA GALÁN et al.: BIO-INSPIRED TWO-LAYER MIXED-SIGNAL FLEXIBLE PROGRAMMABLE CHIP 1315
(a) (b)
Fig. 2. Conceptual diagram of the (a) OPL of the retina and (b) the wide-field activity in the IPL.
and remain silent. In addition to the layer containing the cones,
there is a second layer composed of horizontal cells and a
third one composed of bipolar cells. Each of these layers has
the structure of a 2-D CNN itself. Each of them has its own
interaction patterns (CNN templates) and its particular time
constant. Cell dynamics are sustained by a first or a second
order core. The structure of the OPL is depicted in Fig. 2(a),
where interactions between layers of cells are represented
by arrows. The input signal is captured by the cones and
feedforward to the layers of horizontal and bipolar cells. From
the experiments it has been concluded that no feedforward
connection exists between the horizontal cells and the layer of
the bipolar cells. No feedback has been observed neither from
the output of the bipolar cells to the previous layers. It has been
deduced that the feedback connection of the horizontal cells
to the layer of cones acts as a modulator of the feedforward
functions rather than affecting directly to the cones state. This
feature, that is not implemented in this chip, is realized in [12].
Regarding the inner plexiform layer, IPL, it is responsible for
the generation of the retinal output. A simplified model of the
IPL is described in [11]. It has three layers of cells and supports
the so called wide field activity, observed in certain amacrine
cells. Wide field activity consists in the integration of the ac-
tion potentials along a widely extended area previous to the gan-
glion cells. Based on the experimental records, the model con-
sists in two layers of wide field amacrine cells excited by the
input signal, which in this occasion is the output of the bipolar
cells, and a third layer that controls the dynamic of the previous
layers by means of feedback signals. As before, the three layers
are supposed to be 2-D CNNs with their own internal coupling
and their own time constant [see Fig. 2(b)].
Because of the relative simplicity of these models, a pro-
grammable CNN chip has been proposed [12]. The program-
Fig. 3. Conceptual diagram of the second-order three-layer CNN.
mable array processor of the chip consists in two coupled CNN
layers, and a third layer, of a much faster dynamics ( )
that supports analog arithmetic [see Fig. 3]. Each elementary
processor contains the nodes for both CNN layers. The third
layer is inherently implemented by these analog cores, with the
help of the local facilities for analog signal storage. The evo-
lution of the coupled CNN nodes of a specific cell is
described by these coupled differential equations as shown in
(1) at the bottom of the page where the nonlinear losses term








Fig. 4 depicts the block diagram of the vertically coupled
CNN nodes. Synaptic connections between cells are linear.
Each CNN layer incorporates feedback connections, by means
(1)
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 07,2020 at 14:32:31 UTC from IEEE Xplore.  Restrictions apply. 
1316 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 14, NO. 5, SEPTEMBER 2003
Fig. 4. Block diagram of the two coupled CNN layer nodes.
of which the output of each cell contributes to the state of its
neighbor, weighted by the elements ; a feedforward
connection, weighted by , that regulates the contribution
of the cell’s input; a bias term , that can be different for
each cell; and finally coupling connections between both layers,
weighted by and . Each layer has its own time-constant
. Programming different dynamics in this CNN model is pos-
sible by adjusting the template elements and the time-constants
of the layers. The total number of synapses to be implemented
on each cell is 22, plus the 2 bias maps multipliers, which will
be treated as a second input image for each layer.
III. APAP CHIP ARCHITECTURE
A. Analog Programmable Array Processor Chip
The proposed chip consists in a mixed-signal parallel pro-
cessing array of 32 32 identical cells [see Fig. 5]. It is sur-
rounded by a ring of circuits implementing the boundary condi-
tions for the CNN dynamics. The peripheral circuitry, required
for the proper operation of the central array processor consist
in the timing and control unit, the program memory and the I/O
interface.
The timing and control unit is composed by a micro-instruc-
tion decoder, generating the appropriate signals to configure the
network, and an internal clock/counter with a set of finite state
machines that generate the internal signals that enable program
memory accesses and other data transfers. The operation con-
trol unit constitutes the interface between the program memory
and the processing array. The program memory is composed, on
one side, of 16 blocks of SRAM of 64 bytes of capacity dedi-
cated to the analog weights, and four blocks of 128 bytes each
for the logic program, including bits for the network configu-
ration and control signals for the I/O interface. Digital signals
buffering can be considered part of the operation control unit.
In addition, the analog instructions and reference signals, cod-
Fig. 5. Floorplan of the prototype chip.
ified in one section of the program memory, need to be trans-
mitted to every cell in the network in the form of analog volt-
ages. Thus, a bank of D/A converters interfaces these memory
blocks with the processing array. Distributing analog references
across large distances within a chip is not a trivial task. Apart
from the problems derived from electromagnetic interference,
voltage drops in long metal lines carrying currents can be quite
noticeable. Signal buffering and low-resistance paths must be
provided to avoid this, especially in the case of weights, that
enter the synapses through a low impedance node.
Finally, the image I/O interface consists in a serializing-
deserializing analog multiplexor. It accommodates the serial
analog I/O channel to the 32 I/O lines corresponding to the
32 columns of the array by means of a battery of blocks. The
corresponding row and column address decoders, controlled by
the timing unit, are part of this block.
B. Basic Cell Structure
The basic cell of the CNN-based array processor has a sim-
ilar architecture to that of the CNN universal machine cells
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 07,2020 at 14:32:31 UTC from IEEE Xplore.  Restrictions apply. 
CARMONA GALÁN et al.: BIO-INSPIRED TWO-LAYER MIXED-SIGNAL FLEXIBLE PROGRAMMABLE CHIP 1317
Fig. 6. Conceptual diagram of the (a) basic cell and the (b) internal structure of each CNN layer node.
[15]. However, in this occasion, the prototype includes two dif-
ferent continuous-time CNN layers. Therefore, as depicted in
Fig. 6(a), together with the local analog and logic memories
(4 LAMs and 4 LLMs), for to the storage of intermediate results,
the local logic unit (LLU), responsible for pixel-level logic op-
erations, two different analog CNN core blocks are found, each
one belonging to one of the two different CNN layers imple-
mented. The synaptic connections between processing elements
of the same layer are built around the cell core, as shown, while
interlayer coupling, kept within the pixel scope in this model, is
placed inside the cell (represented by arrows between the pro-
cessing layers in the diagram). All the blocks in the cell commu-
nicate via an intracell data bus, which is multiplexed to the array
I/O interface. Control and cell configuration bits are passed di-
rectly from the control unit.
The internal structure of each of the CNN cores of the cell is
depicted in the diagram of Fig. 6(b). Each core receives contri-
butions from the rest of the processing nodes in the neighbor-
hood which are summed and integrated in the state capacitor.
The two layers differ in that the first layer has a scalable time
constant, controlled by the appropriate binary code, while the
second layer has a fixed time constant. The evolution of the state
variable is also driven by self- feedback and by the feedforward
action of the stored input and bias patterns. There is a voltage
limiter which helps to implement the limitation on the state vari-
able of the FSR CNN model. This state variable is transmitted
in voltage form to the synaptic blocks, in the periphery of the
cell, where weighted contributions to the neighbors’ are gener-
ated. There is also a current memory that will be employed for
cancellation of the offset of the synaptic blocks. Initialization of
the state, input and/or bias voltages is done through a mesh of
multiplexing analog switches that connect to the cell’s internal
data bus.
Running complex spatio-temporal dynamics in this network
requires following several initialization and calibration steps.
First of all, acquisition of the input image and auxiliary masks
and/or patterns. For this purpose, the array I/O interface is di-
rected to specific LAM locations in a row-by-row basis. After
that, the analog instruction, i.e., the set of synaptic weights re-
quired for a specific operation, is selected and transmitted to all
the cells in the array. Then, the offset of the critical OPAMPs
is quenched in a calibration step. After that, the time-invariant
offsets of the synaptic blocks are computed and stored in the
current memories. Now the network is almost ready to operate.
Then, the state capacitors and the feedforward synapses are ini-
tialized by means of the appropriate switch configuration, and
the network evolution is run by closing the feedback loop in each
processing element. Before stopping the network evolution, the
final state is stored in a LAM register for further operation.
IV. THE BASIC PROCESSING UNIT
A. Single-transistor Synapse
One of most important blocks in the cell is the synaptic block.
The synapse is, simply, a four- quadrant analog multiplier. Their
inputs are the cell state, , or input, , variables and the cor-
responding weight signal, , while the output is the cell’s con-
tribution to a specific neighboring cell. The multiplier is re-
quired to have voltage inputs and current output. On one side,
both the cell state and the weight signal, the multiplier inputs,
must be distributed over different points in the circuit. The cell
state must drive every synapse in the local scope and the weight
signal must be transmitted to every cell in the array. If these sig-
nals are represented by voltages, they can be easily conveyed to
any high-impedance node by a simple wire. On the other side,
because the contributions of all the neighbors are summed at
the input of the processing core, this summation can be readily
achieved by wiring all these contributions concurrently to a low-
impedance node if they are in current format. In addition to this,
in this particular application, there is no need to have a strictly
linear relation between the weight signal, , and the output
current, . Moreover, one thing that is common in this type of
processing is that the weight signal does not change during the
evolution of the network. It means that any deviation depending
on is not a gain error, but an offset error, i.e., an error which
can be cancelled by autozeroing in a preprocessing calibration
step.
Different CMOS compatible circuits can be employed to re-
alize the multipliers. For instance, synapses can be implemented
by MOS transistors in weak inversion [16], exploiting the ex-
ponential law that governs this regime of operation to achieve
multiplication. There are multipliers based on MOS transistor in
strong inversion, operating in the saturation region, where their
large-signal characteristic exhibits a quadratic law, which is the
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 07,2020 at 14:32:31 UTC from IEEE Xplore.  Restrictions apply. 
1318 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 14, NO. 5, SEPTEMBER 2003
principle behind the well- known Gilbert cell [17]. Direct multi-
plication can also be achieved by a MOS transistor operating in
the ohmic region. Its low-frequency large-signal characteristic
is given into first-order approach by (if n-type)
(4)
where . A multiplication can be realized
with this device as long as holds
[18]. This alternative has several advantages [19]: it requires
a reduced amount of area, because four-quadrant behavior is
achieved with one single transistor. Second, it has a better rela-
tion between bias power and signal power, thus leading to higher
accuracy at lower power consumption, while in the saturation
region the information is carried by a small fraction of the ac-
tual currents flowing through the devices. Third, the use of the
ohmic region shows better mismatch figures than any other re-
gion [20].
The one-transistor synapse works as follows. Consider a
p-type MOS transistor operating in ohmic region [see Fig. 7].
The transistor is selected type p because the more resistive
p-type channel allows smaller currents, and so power consump-
tion, for the same transistor lengths. Or, equivalently, for the
same current levels, the required p-channel MOS is shorter than
its n-type counterpart. The source-to-drain current of a PMOS
transistor in the ohmic region is given by [21]
(5)
where the threshold adopts one of these two analogue forms:
if
if (6)
must be kept fixed in order to use and as single-ended
input voltages, and to sense as the output of the synapse.
For this purpose, we can employ a current conveyor [22] at the
current input node of each cell. The current conveyor permits
current sensing while maintaining a virtual reference at node .
All the synapses contributing to the same cell can be connected
to the same virtual reference. The only objection being that the
impedance seen at this node must be well below the parallel of
the output impedances of all the synaptic blocks.
Back to (5), notice that the second term in the right side of
the equation does not depend on , therefore node is a
strong candidate to hold the cell state variable voltage. But
must be always positive for the MOS transistor to operate above
threshold, thus let be composed of a reference voltage ,
sufficiently high, and a superposed cell state signal
(7)
And, in order to achieve four-quadrant multiplication, must
be permitted to go up and below . Let us select as the
reference for the weight signal, , being
(8)
Fig. 7. Multiplier using one single MOS transistor in ohmic region.
Then (5) can be rewritten as
(9)
which is a four-quadrant multiplier with an offset term that is
time-invariant—at least during the evolution of the network—
and not depending on the cell state. Therefore, we have arrive
to a four-quadrant multiplier with single-ended voltage inputs
and a current output, with a offset that can be eliminated by a
calibration step, with the help of a current memory
(10)
The limitations found to this behavior are, in the first order,
the upper and lower boundaries of the ohmic region in strong
inversion [21]. From them, it can be concluded that
(11)
Another restriction is found in the degradation of the mo-
bility. The transversal electric field (normal to the surface of
the channel) pushes the carriers toward the semiconductor sur-
face where they suffer scattering, which renders a reduction
in the speed of the carriers, thus degrading the mobility. This
transversal electric filed depends on the gate voltage, thus the
first summand in (10) will no longer be linear with . Using a
widely accepted model for this effect in a MOS transistor [21],
we arrive to
(12)
where is a maximum ef-
fective gate voltage, beyond which the distortion introduced by
mobility degradation exceeds the linearity requirements. Com-
bining these two equations:
(13)
For moderate linearity requirements, in a typical CMOS
technology, the right hand side of (13) becomes approxi-
mately equal to 1 V. If and are assigned the same
voltage ranges, around their reference values, then
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 07,2020 at 14:32:31 UTC from IEEE Xplore.  Restrictions apply. 
CARMONA GALÁN et al.: BIO-INSPIRED TWO-LAYER MIXED-SIGNAL FLEXIBLE PROGRAMMABLE CHIP 1319
Fig. 8. Current conveyor realization and small-signal equivalent.
. With this, back to (11), substituting
the values of , and
(14)
Thus, must be high enough to leave room for , but not
too large because the weight signal will progress up to
above . In addition, we have to provide range for the current
conveyor circuitry to maintain a virtual reference precisely at
, and for the circuits generating the weight voltages, which
will have a limited output swing. If we select ,
then there are 0.75 V above before hitting the power rail
at 3.3 V, what means one , approximately. With this value,
results in 0.95 V. Finally, once the voltage ranges are fixed,
a maximum current per synapse is selected for meeting power
requirements, in our case it will be 1.4 . With these values,
the synapse is dimensioned. In our chip, it will be 2 wide
and 2.59 long.
B. Current Conveyor
The current conveyor, required for creating a virtual refer-
ence node at which the synapses outputs can be sensed, is im-
plemented by the circuit of Fig. 8. Any difference between the
voltage at node and the reference is amplified and the
negative feedback corrects the deviation.The input impedance
of this block is very low, what means that changes in the small-
signal input current does not disturb appreciably the vir-
tual reference at node , this is . The bias current
is required to ensure that node is always the source of tran-
sistor . At the same time, this circuit permits the injection of
a nearly exact copy of the input current at the state node, whose
voltage range differs from that of the weight signals. The only
drawback of using this circuit is that a voltage offset, , at the
input of the differential amplifier—which can be implemented
with a simple OTA as it drives a very high impedance node,
the gate of —results in an error of the same amount in the
reference voltage implemented at node . Since the main con-
tribution to the offset is random, this error will be distributed all
along the array resulting in mismatched synaptic blocks that can
degrade performance, e.g., anisotropic evolution of the network
yielded by a symmetrical propagation template. As we are im-
pelled to use small-size devices, in order to achieve the highest
cell-packing density possible, the random offset can be quite
large. In order to avoid this, an offset calibration mechanism has
been implemented at the critical OTAs [see Fig. 9]. The input re-
ferred offset voltage, , has been taken out of the OTA block
symbol. Without the offset cancellation circuit (the shadowed
area), at low frequencies, and considering a negligible output
conductance, the output of the OTA is
(15)
Considering the error cancellation mechanism, when is ON,
then the inputs are shorted, , and is connected as
a diode, its source-to-drain is in steady state
(16)
After some time, is turned off and, except from a remnant
switching error, the current is memorized by means of the
voltage stored in . Thus, the total current injected into the
load is free of any offset:
(17)
C. Current Memory
As it has been mentioned, the offset term of the synapse cur-
rent must be removed for the output current to precisely repre-
sent the result of a four-quadrant multiplication. For this pur-
pose, before the CNN operation, but right after the new weights
has been uploaded, all the synapses are reset to . Then
the resulting current, which is the sum of the offset currents of all
the synapses concurrently connected to the same node, is mem-
orized. This value will be subtracted on-line from the input cur-
rent during the network evolution, resulting in a one-step can-
cellation of the errors of all the synapses. The validity of this
method relies in the accuracy of the current memory. For in-
stance, in this chip, the sum of all the contributions will range
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 07,2020 at 14:32:31 UTC from IEEE Xplore.  Restrictions apply. 
1320 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 14, NO. 5, SEPTEMBER 2003
Fig. 9. Offset calibration mechanism for the critical OTAs.
from 18 to 46 . On the other side, the maximum current
signal of the synapse is:
(18)
what means a total current range of 1 . If an equivalent res-
olution of 8 bits is intended, then, . In these
conditions, our current memory must be able to distinguish
2 nA out of the 46 . This represents an equivalent resolution
of 14.5 bits. In order to achieve such accuracy levels, a so-called
current memory will be employed [23]. As depicted in
Fig. 10, it is composed by three stages, each one containing a
switch, a capacitor and a transistor. At the beginning, while ,
and are ON, the current is divided into , and ,
and
(19)
Switches controlled by , and are successively turned
off. Each time that one of these switches turns off, the voltage
stored in its associated capacitor changes, e.g., changes from
to , because of charge injection. The other tran-
sistors have to accommodate to absorb the error, as the sum of
currents is still forced to be , and thus and change to
(20)
when turns off. Correspondingly, changes to
(21)
when falls. Finally is turned off, and ends in
. The final current, , is
(22)
and substituting here the values of , and , we find
that
(23)
the only error left is that corresponding to the last stage. The
former stages do not contribute to the error in the memorized
current. If the block is designed so as to store the most sig-
nificant bits in the first capacitor, and the less significant bits
in the last one, then the error in the memorized current can be
made quite small. Consider that the total resolution of the cur-
rent memory is . Let us assume that is conducting the
most significant bits of the current , then conducts
the next and conducts the rest, thus, for the last stage
an effective resolution can be defined
(24)
If the error in the memorized current has to be kept below
0.5 LSB, and then
(25)
And this is the design equation that relates the geometric aspect
of transistor , through , with the magnitude of the storage
capacitor, via . Once we have , and can be easily
derived
(26)
One might think that adding more stages to the current
memory will endlessly increase accuracy. However, there is
one factor that has not been addressed yet. As the order of
the memory increase, the tinier the currents that have to be
sensed by the last stages. There comes a point in which the
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 07,2020 at 14:32:31 UTC from IEEE Xplore.  Restrictions apply. 
CARMONA GALÁN et al.: BIO-INSPIRED TWO-LAYER MIXED-SIGNAL FLEXIBLE PROGRAMMABLE CHIP 1321
Fig. 10. S I current memory schematics an timing.
Fig. 11. Binary programmable current mirror (4b).
leakages from the capacitors of the first stages are of the size
of the current to be memorized by the last stages, thus making
it impossible to reach a steady state current that corrects from
the previous errors. This problem worsens as temperature rises.
For instance, at 70 C leakages can introduce changes in the
memorized current in the order of 0.2 . If the dynamics
of the current memory require several to settle—because of
the use of large capacitors and the tiny currents involved—the
memorized current will display an error that is quite above the
initial estimation.
D. Time Constant Scaling Block
The time constant of the CNN layer is defined as ,
the ratio between the state capacitor, and the transconductance
obtained by multiplying the current factor of the synapse,
, times the weight signal voltage . This
time constant depends on the specific set of templates being im-
plemented in the CNN. The state capacitor is composed by the
gate capacitances of the 11 synapses driven by the cell’s state.
As in this technology, this makes a total of
1.97 pF. In the most favorable case, when every neighbor, even
the cell itself, is contributing the maximum amount of current
to the cell state, a parallel stack of 18 synapses, a transconduc-
tance of 22.5 is found. This represents a minimum CNN
time constant of 87.4 ns.
Scaling the time constant of one of the CNN layers involves
either modifying the value of the state capacitor or of the
synapses transconductance. For the first alternative, we will
need to implement a regulable capacitor. If a continuously
regulable capacitor is pretended, it does not seem to be easy
to realize. If a capacitor with a discrete set of capacitances is
adequate, an area of 16 times will be
required to implement a 1:16 time constant ratio.
The second alternative, scaling the transconductances of
every synapse contributing to the cell, can be achieved with a
current mirror. Scaling up/down the sum of currents entering
the cell is equivalent to scaling up/down the transconductances
of the synapses, and thus, to scaling down/up the time constant
of the CNN core. A circuit for continuously adjusting the
gain of a mirror can be designed based on the active-input
regulated-Cascode current mirror [24]. The major disadvantage
of using this circuit is its strong dependence on the power rail
voltage. As we will see later, the power rail voltage can deviate
further more than 1% is a densely packed 32 32 -cell parallel
array processor chip. This will cause a large mismatch in the
time-constants of the different cells in the layer. An alternative
to this is a binary programmable current mirror (Fig. 11).
The input current, , must be always positive, in the sense
indicated in the figure, and the output current is given by:
(27)
where , , and are the decimal values of the control bits.
In this occasion, 4 bits will be more than enough to program the
required relations between and . The mismatch between
the time constants of the different cells is now fairly attenuated
by design.
A new problem arises related with the placement of the
scaling block in the signal path. There are several alternatives.
First, the scaling block, the binary weighted current mirror,
can be placed after the offset cancellation memory, like in
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 07,2020 at 14:32:31 UTC from IEEE Xplore.  Restrictions apply. 
1322 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 14, NO. 5, SEPTEMBER 2003
(a) (b)
(c)
Fig. 12. Alternatives for the placement of the scaling block.
Fig. 12(a). The problem is that any offset introduced by the
scaling block is incorporated to the signal path without possible
cancellation. The second alternative [see Fig. 12(b)] is to place
the scaling block before the offset cancellation memory. It
means that the memory will have to operate over a wider
range of currents, and thus complicating its design and surely
degrading its performance. Our choice, depicted in Fig. 12(c)
has been to place the scaling block in the memorization loop.
The current memory will operate on the unscaled version of
the input current, and any offsets associated with the scaling
blocks will be sensed and memorized to be cancelled on-line
during the network evolution.
The resulting CNN core is shown in Fig. 13 [25]. In this pic-
ture, the voltage reference generated with the current conveyor,
the current mirrors and the memory can be easily identi-
fied. The inverter, , driving the gates of the transistors of the
current memory is required for stability. Without it, the output
node, , will diverge from the equilibrium. The operation of
this circuit is as follows. Before running the CNN dynamics, the
current offsets of all the synapses are injected to the virtual ref-
erence at node . This current is scaled down to one -th of
its value by means of the adjustable current mirror formed by
and . The arrow over stands for the binary pro-
grammability of this device. The value of is
(28)
Then, if all the transistors of the memory are conducting,
this is , and are ON, then the negative feedback loop
makes to conduct the same current as . is also
adjustable so as to make and the current memory to work
with the same current ranges than the input stage. The rest of
the operation has been already described. The current memory
stores successively the remaining most significant bits of the
input current, plus the errors accumulated. When it is done, the
CNN loop can be closed and the output current represent
the scaled sum of the contributions, with the state-independent
errors substracted.
The critical aspects of this circuit are related with the feed-
back loop formed by , , , the inverting amplifier
and the transistors , when sensing the offset current. During
this process the output current is zero because the current
path to the state capacitor is open. Once the input current has
been established, can be considered a bias voltage. First
of all, it must be taken into account that during the three dif-
ferent phases in which the loop is closed ( , and ON,
OFF and and ON, and, finally, and OFF and
ON) the values of and change, so the stability condi-
tions must hold for any possible set of values. Considering the
small-signal equivalent circuit for this loop, a three-pole system
is found [see Fig. 14], with pole frequencies: ,
and . The nearest pole,
that is at node , will be employed to compensate the loop for
stability. As and decrease for the latest phases of the
current memorization, the loop will be more stable because this
causes the loop dc gain, , to decrease and to grow, breaking
away from and thus increasing the phase margin. Therefore,
the worst situation will occur when , and are ON, and
thus the circuit is designed to be stable in these conditions. It
is also important that is kept reasonably low, otherwise it
will displace the unity-gain frequency, , toward the value of
the inversion . This means a loss of phase margin, and can
compromise the loop stability.
As we commented before, leakage currents can degrade the
memory operation especially as the operation temperature
rises. Although the negative feedback moves the circuit toward
the correction of the errors, it maybe too slow to settle at a value
before leakages modify the position of the equilibrium point.
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 07,2020 at 14:32:31 UTC from IEEE Xplore.  Restrictions apply. 
CARMONA GALÁN et al.: BIO-INSPIRED TWO-LAYER MIXED-SIGNAL FLEXIBLE PROGRAMMABLE CHIP 1323
Fig. 13. Input block with current scaling.
Fig. 14. Simplified schematics of the feedback loop and its small signal equivalent.
Therefore, compensation must be kept under a limit to avoid
slowing down the loop dynamics in excess.
V. PERIPHERAL CIRCUITRY
A. Analog Weight Signals Distribution
Conveying analog voltages from the boundaries of the array
to the inner cells is not a trivial task. Especially if the metal lines
supporting the signals have to carry large currents, and the width
of these lines must be reduced because of cell area compromises.
These resistive lines carrying some current cause voltage sig-
nals to drop. In the case of the weight signals—that have to be
transmitted to every synapse in the network, entering through
a low-impedance node and, thus, dragging a quite perceptible
amount of current—this voltage drop can seriously compromise
the appointed resolution. Also for the power supply lines, that
carry an important amount of current, voltage drops are a se-
rious problem, as they can cause misfunction of the inner cell’s
circuitry. In consequence, it is important to develop a reliable
model for this phenomenon. It is possible to find a closed ex-
pression to compute the maximum error in the propagation of
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 07,2020 at 14:32:31 UTC from IEEE Xplore.  Restrictions apply. 
1324 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 14, NO. 5, SEPTEMBER 2003
the reference voltage through a metal line, the one-dimen-
sional model, as a function of the resistance between cells, ,
and the current demanded by each cell, [25]:
(29)
This expression is useful to determine the appropriate width of
the power lines laid across the array. For instance, in the case of
this prototype chip ( ) each cell demands 300 under
normal operation conditions, let us preview for a much higher
peak consumption, say 800 . If the maximum error allowed
in the power voltage, nominally 3.3 V, will be 50 mV. and the
power supply distribution coincides with that of the presented
model—the voltage sources are tied to the ends of each row of
the array, no vertical connection between horizontal power lines
is considered, each segment of the lines will have as much as
520
(30)
where is the sheet resistance of the metal, and and
are the length and width of the metal track. For the uppermost
metal layer in the CMOS process employed, the most conduc-
tive of the three, is of 35 at room temperature, but
can go up to 80 at 100 C. If the length of the cells
is 190 , making a conservative estimation, employing the
higher value for , it is found that the minimum width needed
to distribute the power supply voltage is approximately 30 .
A similar approach is employed to derive the width of the metal
lines carrying the weight signals. Now, currents are much lower,
. But the maximum error permitted is as low as
1.6 mV, this is 0.5 LSB for an equivalent resolution of 8b with a
total signal range of 800 mV. Tracing the weight lines with the
same metal employed before, it is found that . The
minimum width required to maintain the accuracy is approxi-
mately 2.3 .
In the chip, the power supply grid is 2-D. There are vertical
metal lines connecting the nodes of the network too. Although
we have not found a closed form for the maximum error in a 2-D
grid, a good estimation can be made making some assumptions.
Fig. 15. A 2-D model for the voltage decay toward the center of the network.
First, suppose that the network is a square, , and that the
power supply is directly connected to every cell in the border:
(31)
The maximum drop will be observed at the centre of the array.
An equation can be written for the inner nodes [see Fig. 15],
assuming equal resistances of the horizontal and vertical lines:
(32)
These equations constitute a system of linear equation on
variables, whose matricial form can be automatically com-
puted as shown in (33) at the bottom of the page.
Solving this system for the central term of the array, a max-
imum value for the error is found. The voltage drop from
in the middle of the network as a function of the number of cells
is plotted in Fig. 16. Black stars are the computed minima while
the solid line is the best fitting second-order curve. This approx-
imation yields the following relation:
(34)
It means that providing a second mesh of metal connections
can reduce the voltage drop in nearly one half. If the horizontal
(33)
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 07,2020 at 14:32:31 UTC from IEEE Xplore.  Restrictions apply. 
CARMONA GALÁN et al.: BIO-INSPIRED TWO-LAYER MIXED-SIGNAL FLEXIBLE PROGRAMMABLE CHIP 1325
Fig. 16. Voltage drop from V experimented by the central cell in the grid.
Fig. 17. Weights buffering alternatives and low-frequency small signal models.
and vertical resistances are the same, the error at the centre cell
is divided, approximately, by .
B. D/A Weight Codes Conversion and Buffering
The D/A converters employed in this chip are of an inher-
ently monotonic type [26]. This circuit consists in a long string
of equally valued resistors running from the higher to the lower
voltage references. This string is tapped at equally spaced
points. The access to this points is controlled by a tree of
analog switched driven by the outputs of some decoding logic.
Monotonicity is assured by construction, as every tap points to
a higher voltage that the previous one. Differential nonlinearity
in this circuit is introduced by the mismatch between the
resistors, therefore, they must be sized to avoid important
impairities in the converter steps. Integral nonlinearity in the
weights representation is not a problem because it can be
corrected by software.
The outputs of the D/A converters need buffering to be
transmitted to the array processor. Voltage references driving
high-impedance nodes, as the gates of MOS transistors, do not
require extra driving other than that afforded by the voltage
followers at the converters’ output. Weights, on the contrary,
have to be transmitted to low-impedance nodes, 1024 sources
of MOS transistors in parallel. The loading impedance, ,
results unbearable for the voltage buffer especially if we
take into account the resistance of the long tracks of metal
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 07,2020 at 14:32:31 UTC from IEEE Xplore.  Restrictions apply. 
1326 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 14, NO. 5, SEPTEMBER 2003
Fig. 18. (a) Serializing-deserializing I/O interface and (b)2-stage circuit for sample and hold.
Fig. 19. Microphotograph of the prototype chip.
conveying the signal from one end of the die to the other.
Basically, the resistance of the metal tracks add up to the buffer
output resistance, thus boosting the voltage division. This is the
situation in Fig. 17(a). Computing the output impedance of the
buffer-plus-metal-tracks circuit, it adds up to
(35)
hence, despite the fact that the output resistance of the amplifier
is greatly attenuated by feedback, the resistance of the metal
tracks, represented here by , makes to be as large as the
actual . The resistance of the metal tracks can be as high as
1 while the resistance of one of the p-type MOS transistors
operating in the ohmic region employed can be about 1 .
TABLE I
PROTOTYPE CHIP DATA
1024 of them in parallel makes an of , approximately,
thus halving the dynamic range of the signals, and therefore
losing one bit of resolution in the weights. This error can not be
afforded so a different scheme must be employed for the buffers
of the weight signals. For instance, in the circuit in Fig. 17(b) is
the actual output voltage the one that is fed back to the amplifier,
causing the output impedance seen by the load to be
(36)
what enhances the performance of the buffer. But still, propa-
gation of signals through metal tracks carrying strong current
intensities may end in appreciable disparities between the volt-
ages transmitted to different points along the metal tracks, un-
less they are made unrealistically wide. In order to avoid this,
voltage buffers have been allocated nearer to the cell array [27].
Working in parallel, they can be considered as a voltage ampli-
fier with gain , a high input impedance and a rather low output
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 07,2020 at 14:32:31 UTC from IEEE Xplore.  Restrictions apply. 
CARMONA GALÁN et al.: BIO-INSPIRED TWO-LAYER MIXED-SIGNAL FLEXIBLE PROGRAMMABLE CHIP 1327
(a) (b)
Fig. 20. (a) Protocol signals handshake and (b) start of image samples acquisition.
impedance [see Fig. 17(c)]. The output impedance at low-fre-
quencies is
(37)
what certainly reduces any possible loading effect. The design
challenge is to avoid the loss of phase margin in the feedback
loop, being the dominant pole, associated with the output node
, and the second pole, which is associated with node ,
too close to operate without compensation. Extra capacitance
must be added accordingly to node to avoid instability, at
the expense of a reduction on the circuit speed.
C. I/O Interface
The last major subsystem of the prototype is the I/O inter-
face. This circuit is provided for the acquisition and delivery
of image samples, which must be analog voltages ranging from
0.6 V to 1.4 V, nominally. The collection of 32 32 image sam-
ples, that will be acquired or delivered on the same batch, are
transmitted through a serial channel but passed in parallel to the
32 columns of the array. This is achieved by the circuit struc-
ture depicted in Fig. 18(a). It consists in 32 sample and hold
circuits, one for each column of the array, connected concur-
rently to the serial I/O channel. Each S/H circuit, represented by
the schematic in Fig. 18(b), consists in 2 S/H stages and several
transmission gates. When acquiring an image, .
The serial I/O channel connects the input pad to the input nodes
of the sample and hold stages. Then, at 10 MS/s, 32 samples
of the signal are stored in the first row of S/H circuits by ac-
tivating alternatively signals and . Once the
first 32 samples are acquired, the following 32 samples of the
input signal are stored in the second row of S/H circuits, by
activating and for the 32 S/H stages alterna-
tively. At the same time, all the signals corresponding
to the first row of S/H’s are activated together, thus uploading
the stored samples to the first row of the array. This is realized
Fig. 21. Read-out of a three stepping stripes image row-by-row.
during , which is enough time for the
driving capabilities of the S/H amplifiers to update the voltage
at the prescribed local memory in the cells of the first row of the
array. By the time the second row of S/H circuits is done with
the acquisition of the second batch of 32 samples of the input,
the first row starts acquiring the following 32 samples, while the
second S/H row passes the stored sample to the second row of
the array of cells. This process continues until the last row of
the array is updated with the information from the new image.
For the delivery of the output image, the process is inverted,
, then the S/H circuits first read in parallel what
is transmitted by the rows of the array, and then deliver the ac-
quired samples of the output one by one, through the I/O channel
that now is connected to an output buffer to send the voltages
off-chip at 10 MS/s.
It must be regarded that most of the signals controlling the I/O
processes are generated on- chip, hence, for the user, external
control is reduced to follow a simple protocol. First of all,
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 07,2020 at 14:32:31 UTC from IEEE Xplore.  Restrictions apply. 
1328 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 14, NO. 5, SEPTEMBER 2003
Fig. 22. Integral nonlinearity of the I/O map.
Fig. 23. Triggered waves across the fastest and slowest layers.
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 07,2020 at 14:32:31 UTC from IEEE Xplore.  Restrictions apply. 
CARMONA GALÁN et al.: BIO-INSPIRED TWO-LAYER MIXED-SIGNAL FLEXIBLE PROGRAMMABLE CHIP 1329
Fig. 24. Wide field erasure effect (only the fastest layer shown).
there is an internal clock, made up from a ring of inverters,
with frequency controlled by an analog voltage, constituting
a nonharmonic VCO. This is the subpixel clock, generating
signal SubPixCLK. From the program memory, bits 54th and
55th of the logic program instruction indicate the initiation of
an I/O process (the 54th bit corresponds to signal ENIO) and
specify whether a 32 32 -pixel image is to be acquired or
delivered by the chip (what is done by signal ENRW, 55th bit
of the logic program instruction). Then, when an instruction
containing is selected, the I/O interface waits for
a rising edge of an external signal named PixCLK, referred as
the pixel clock. Once it occurs, an internal process controlled
by SubPixCLK generates the appropriate internal signals
( , , , , and ,
and the appropriate row selection) to acquire or deliver one
voltage sample. When this is achieved, the I/O interface
generates a pulse named PixReady, that must be sensed by the
user, that means that the system is ready to receive, or send, the
following pixel sample. The critical steps in the design of this
scheme are two.
• Overlapping in the S/H selection signals must be avoided.
Guard times must be provided that do not rely on the in-
ternal delays that are not controlled precisely.
• Control of the local memories access must be passed to
the I/O interface while acquiring or delivering an image.
Therefore, with the activation of ENIO and the arrival of
the first PixCLK rising edge, a shift- register clocked by Sub-
PixCLK passes a pulse from the leftmost stage, say SR0, to
the rightmost, SR5, permitting the generation of synchronized
edges that will constitute the rising and trailing edges of the
S/H control signals. These pulses can be combined to obtain
the desired control signals. The internal generation and sepa-
ration of the signal edges prevent uncontrolled delays to alter
the precedence between signals, almost independently of the
clock frequency at which these circuits are operated. At the end
of the count realized by the shift-register, a PixReady pulse is
generated, notifying the user that the next voltage sample can
be recorded in the following capacitor of the S/H battery. The
combinational and sequential logic circuits employed to imple-
ment the I/O interface has been designed full-custom, in order to
allow as much integration with the processing array as possible.
Using a library of cells would not have permitted the intricate
routing employed to tailor the I/O control.
VI. EXPERIMENTAL RESULTS
A. Prototype Chip Data
A prototype chip has been designed and fabricated in a stan-
dard 0.5 CMOS technology with single-poly and triple-
metal layers. Fig. 19 displays a microphotograph of the chip. It
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 07,2020 at 14:32:31 UTC from IEEE Xplore.  Restrictions apply. 
1330 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 14, NO. 5, SEPTEMBER 2003
Fig. 25. Wide field erasure effect, represented in 3-D.
contains a central array of 32 32 cells of the type formerly de-
scribed. Surrounding the array, a ring of boundary cells, imple-
menting the contour conditions for the CNN dynamics, is found,
together with the necessary buffers to transmit digital instruc-
tions and analog references to the array. On the lower part of the
chip, the program control and memory blocks can be found. The
last major subsystem is the I/O interface including S/H batteries,
decoders, counters and different sequential logic. The whole
system fits in 9.27 8.45 sq.mm., including the ring of bonding
pads. Without pads, the total area is 8.77 7.94 , this in-
cludes the CNN array and the necessary circuit overhead. The
array of CNN cells alone occupies 5.98 5.83 , which is,
roughly, the 50% of the total area of the chip. The resulting cell
density, excluding circuits outside the array, is 29.24 .
In order to cautiously handle this data, it is important to notice
that the area occupied by the cell array scales linearly with the
total number of cells, what is not the case of the overhead cir-
cuitry, which tends to be a smaller fraction of the total chip size
as long as the number of cells rises. The power consumption of
the whole chip has been estimated in 300 mW. Data I/O rates are
nominally 10 MS/s. The time constant of the fastest layer (fixed
time constant) is designed to be under 100 ns. Table I summa-
rizes some characteristics and measured features of the chip.
The peak computing power of this chip is of 470 GOPS. Here,
OPS means analog arithmetic operations per second. In a time
constant, 100 ns, each CNN core performs 12 multiplications
and 11 additions, then, for each cell, with two cores, we have 46
operations within each cell in 100 ns. Having 1024 processing
cells, the chip can reach 470 GOPS when running the network
dynamics. If the computing power per unit area —considering
the main array alone— and per unit power are calculated we
have 6.01 and 1.56 GOPS/mW.
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 07,2020 at 14:32:31 UTC from IEEE Xplore.  Restrictions apply. 
CARMONA GALÁN et al.: BIO-INSPIRED TWO-LAYER MIXED-SIGNAL FLEXIBLE PROGRAMMABLE CHIP 1331
Fig. 26. Spatio–temporal edge detection and deactivation (fast layer).
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 07,2020 at 14:32:31 UTC from IEEE Xplore.  Restrictions apply. 
1332 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 14, NO. 5, SEPTEMBER 2003
Fig. 27. Traveling-wave generation (fast layer only).
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 07,2020 at 14:32:31 UTC from IEEE Xplore.  Restrictions apply. 
CARMONA GALÁN et al.: BIO-INSPIRED TWO-LAYER MIXED-SIGNAL FLEXIBLE PROGRAMMABLE CHIP 1333
Fig. 28. Spiral wave (only the fastest layer shown).
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 07,2020 at 14:32:31 UTC from IEEE Xplore.  Restrictions apply. 
1334 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 14, NO. 5, SEPTEMBER 2003
B. Electrical Test Results
First of all, the I/O interface has been tested. In Fig. 20(a), it
can be seen how the chip gives back the PixReady pulse, a cer-
tain time after the signal PixCLK shows a rising edge. Fig. 20(b)
displays the start of the image acquisition process. Because of
limitations of the test setup, PixCLK has a frequency of 2 MHz
in this tests, but the chip has been designed to operate with a
10-MHz pixel clock. Fig. 21 shows an image consisting in three
vertical stripes of different values (0.6 V, 1.0 V and 1.4 V) being
delivered through the I/O channel. In order to test the accu-
racy of the image acquisition and delivery processes, a ramp of
analog values ranging from the minimum to the maximum valid
inputs has been transmitted to the chip, stored in the LAM’s, and
finally recovered. The separation of the recovered samples from
their corresponding input in a 256-level representation, is rep-
resented in Fig. 22 in LSB as the INL of the input acquisition,
storage and recovery. The chip can handle analog data with an
equivalent resolution of 7.5 bits.
C. Retinal Behavior Emulation
Image processing algorithms can be programmed on this
chip by setting the corresponding switches configuration and by
tuning the appropriate interconnection weights — the program-
ming interface is digital while internal coding of the weights is
analog. Propagative and wavelike phenomena, similar to those
found at the biological retina, can be observed in this chip
by just setting the proper coupling between cells in the same
or in different layers. For instance, Fig. 23 shows how some
spots in the faster layer (second layer) grow until reaching the
boundaries of the network, these same spots trigger a slower set
of waves in the first layer. These pictures has been generated
with the prototype chip by running the network dynamics,
from the same initial state, during successively larger periods
of time. This permits the reconstruction of the evolution of the
state of the cells during the CNN dynamics.
The wavefronts generated at the slower layers can be em-
ployed to inhibit propagation in the faster layer, thus generating
a trailing edge for the waves in the fast layer. This produces the
similar results as the wide field erasure effect observed in the
IPL of the retina [see Fig. 24]. Fig. 25 displays a 3-D plot of this
effect for a different input. Another interesting effect observed
in the OPL of the retina [11] is the detection of spatio–tem-
poral edges followed by de-activation of the patterns of activity.
This phenomenon has been also programmed in the chip [see
Fig. 26].
D. Active Waves Phenomena
By setting the appropriate interconnection weights, active
wave phenomena—the propagation of waves in an energetically
active medium, can be observed in the chip. For instance, the
triggering of a traveling wave [see Fig. 27], or the generation
of spiral waves [see Fig. 28].
VII. CONCLUSION
The proposed approach supposes a promising alternative to
conventional digital image processing for applications related
with early-vision and low-level focal-plane image processing.
Based on a simple but precise model of the real biological
system, a feasible efficient implementation of an artificial
vision device has been designed. The peak computing power
of this chip is 470 GXPS, what outdoes its digital counterparts
due to the fully parallel nature of the processing —based on the
analogy not on the simulation. In terms of computing power per
silicon area and power consumption, this chip features amongst
the more powerful devices reported.
In addition to the advantages in terms of performance fea-
tures highlighted in the previous table, the chip presented in this
paper is capable to generate complex spatio-temporal dynamic
processes, in a programmable way and storing intermediate pro-
cessing results.
ACKNOWLEDGMENT
The authors deeply appreciate the many useful and fruitful
discussions with G. Liñán related to chip architecture and cir-
cuit design, T. Serrano-Gotarredona regarding the implementa-
tion of programmable current mirrors and with D. Bálya and P.
Földesy related to the experiments.
REFERENCES
[1] D. H. Hubel, Eye, Brain and Vision. New York: W. H. Freeman, 1988.
[2] F. Werblin, “Synaptic connections, receptive fields and patterns of ac-
tivity in the tiger salamander retina,” Investigative Ophthalmology and
Visual Science, vol. 32, no. 3, pp. 459–483, Mar. 1991.
[3] B. Roska and F. S. Werblin, “Vertical interactions across ten parallel,
stacked representations in the mammalian retina,” Nature, vol. 410, pp.
583–587, Mar. 2001.
[4] M. Carver, Analog VLSI and Neural Systems. Reading, MA: Addison-
Wesley, 1989.
[5] C. Koch and H. Li, Eds., Vision Chips: Implementing Vision Algorithms
with Analog VLSI Circuits. Los Alamitos, CA: IEEE Computer So-
ciety Press, 1995.
[6] M. Alireza, Vision Chips. Boston, MA: Kluwer Academic Publishers,
1999.
[7] D. Bálya, B. Roska, E. Nemeth, T. Roska, and F. S. Werblin, “A qualita-
tive model framework for spatio-temporal effects in vertebrate retina,”
Proc. 2000 IEEE Conf. on Cellular Neural Networks and their Applica-
tions, pp. 165–170, 2000.
[8] J. C. Gealow and C. G. Sodini, “A pixel-parallel image processor using
logic pitch – matched to dynamic memory,” IEEE J. Solid-State Circuits,
vol. 34, no. 6, pp. 831–839, June 1999.
[9] F. Werblin, T. Roska, and L. O. Chua, “The analogic cellular neural net-
work as a bionic eye,” Int. J. Circuit Theory and Applications, vol. 23,
no. 6, pp. 541–69, Nov.–Dec. 1995.
[10] A. Jacobs, T. Roska, and F. S. Werblin, “Methods for constructing phys-
iologically motivated neuromorphic models in CNN’s,” Int. J. Circuit
Theory Appl., vol. 24, no. 3, pp. 315–339, May–June 1996.
[11] C. Rekeczky, B. Roska, E. Nemeth, and F. Werblin, “Neuromorphic
CNN models for spatio-temporal effects measured in the inner and outer
retina of tiger salamander,” Proc. Sixth IEEE International Workshop on
Cellular Neural Networks and their Applications, pp. 15–20, May 2000.
[12] K. Boahen, “A retinomorphic chip with parallel pathways: Encoding
INCREASING, ON, DECREASING, and OFF visual signals,” Analog
Integr. Circuits Signal Processing, vol. 30, no. 2, pp. 121–35, Feb. 2002.
[13] C. Rekeczky, T. Serrano-Gotarredona, T. Roska, and A. Ro-
dríguez-Vázquez, “A stored program 2nd order/3-Layer complex cell
CNN-UM,” Proc. Sixth IEEE International Workshop on Cellular
Neural Networks and their Applications, pp. 219–224, May 2000.
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 07,2020 at 14:32:31 UTC from IEEE Xplore.  Restrictions apply. 
CARMONA GALÁN et al.: BIO-INSPIRED TWO-LAYER MIXED-SIGNAL FLEXIBLE PROGRAMMABLE CHIP 1335
[14] S. Espejo, R. Carmona, R. Domínguez-Castro, and A. Ro-
dríguez-Vázquez, “A VLSI oriented continuous-time CNN model,” Int.
J. Circuit Theory Appl., vol. 24, no. 3, pp. 341–356, May-June 1996.
[15] T. Roska and L. O. Chua, “The CNN universal machine: An analogic
array computer,” IEEE Trans. Circuits Syst.—II, vol. 40, no. 3, pp.
163–173, Mar. 1993.
[16] S. Sidney, Analog Integrated Circuits. Englewood Cliffs, NJ: Prentice-
Hall, 1985.
[17] B. Gilbert, “A precise four-quadrant multiplier with subnanosecond re-
sponse,” IEEE J. Solid-State Circuits, vol. 3, no. 4, pp. 365–373, Dec.
1968.
[18] Y. P. Tsividis, “Integrated continuous-time filter design —An overview,”
IEEE J. Solid-State Circuits, vol. 29, no. 3, pp. 166–176, Mar. 1994.
[19] R. Domínguez-Castro, A. Rodríguez-Vázquez, S. Espejo, and R. Car-
mona, “Four-Quadrant one-transistor synapse for high density CNN im-
plementations,” Proc. Fifth IEEE International Workshop on Cellular
Neural Networks and Their Applications, pp. 243–248, Apr. 1998.
[20] A. Rodríguez-Vázquez, E. Roca, M. Delgado-Restituto, S. Espejo, and
R. Domínguez-Castro, “MOST-Based design and scaling of synaptic
interconnections in VLSI analog array processing chips,” J. VLSI Signal
Processing Systems for Signal, Image and Video Technol., vol. 23, pp.
239–266, Nov./Dec. 1999.
[21] Y. Tsividis, Operation and Modeling of the MOS Transistor. New
York: McGraw-Hill, 1987.
[22] K. C. Smith and A. S. Sedra, “The current conveyor —A new circuit
building block,” IEEE Proceedings, vol. 56, pp. 1368–1369, Aug. 1968.
[23] C. Toumazou, J. B. Hughes, and N. C. Battersby, Eds., Switched-Cur-
rents: An Analogue Technique for Digital Technology. London, U.K.:
Peter Peregrinus, 1993.
[24] T. Serrano and B. Linares-Barranco, “The active-input regulated-cas-
code current mirror,” IEEE Trans. Circuits Syst.—I, vol. 41, no. 6, pp.
464–467, June 1994.
[25] R. Carmona, “Analysis and design of CNN-based VLSI hardware for
real-time image processing,” Ph.D., Universidad de Sevilla, 2002.
[26] R. C. Jaeger, “Tutorial: Analog data acquisition technology. Part I–dig-
ital-to-analog conversion,” IEEE Micro, vol. 24, no. 3, pp. 20–37, May
1982.
[27] G. Liñán, S. Espejo, R. Domínguez-Castro, and A. Rodríguez-Vázquez,
“ACE4k: An analog I/ O 6464 visual microprocessor chip with 7-bit
analog accuracy,” Int. J. Circuit Theory Appl., vol. 30, no. 2-3, pp.
89–116, June 2002.
Ricardo Carmona Galán (M’95) received the degrees of Licenciado and
Doctor (Ph.D.) degrees in physics, in the speciality of electronics, both from
the University of Seville, Spain, in 1993 and 2002, respectively.
He was a student at the National Center for Microelectronics at Seville,
funded by IBERDROLA S. A. He was a Research Assistant at the Electronics
Research Laboratory of the Department of Electrical Engineering and Com-
puter Sciences of the University of California, Berkeley, from 1996 to 1998.
He is a member of the Department of Analog Design in the Microelectronics
Institute of Sevilla (CNM-CSIC). Since October 1999, he has been an Assistant
Professor of the Department of Electronics and Electromagnetism at the School
of Engineering of the University of Seville. His main areas of interest are
linear and nonlinear analog and mixed-signal integrated circuits, in particular,
the design and VLSI implementation of cellular neural networks and analog
memory devices for real-time image processing and vision chips.
Dr. Carmona Galán has co-received the Best Paper Award of 1999 from the
International Journal of Circuit Theory and Applications, and the 2002 Salvà i
Campillo Award, conceded by the Catalonian Association of Telecommunica-
tion Engineers.
Francisco Jiménez-Garrido received the B.S. degree in physics in 1998 and
the B.S. degree in electronic engineering in 2002 from University of Seville,
Spain. Since 1999, he has been with the Department of Analog Circuit Design
of the Spanish Microelectronics Center (Institute of Microelectronics of Seville,
IMSE). And he is working toward the Ph.D. degree in the Department of Elec-
tronics and Electromagnetism of the University of Seville.
He has research interests in linear and nonlinear analog and mixed-signal
integrated circuits for image processing and communication devices.
Rafael Domínguez-Castro received the five-year degree in electronic physics
in 1987, the M.S. equivalent in microelectronics in 1989 and the Doctor en Cien-
cias Fisicas degree in 1993, from the University of Seville, Spain.
Since 1987, he has been with the Department of Electronics and Electromag-
netism at the University of Seville, where he is currently a professor of elec-
tronics. He is also a member of the research staff at the Institute of Microelec-
tronics of Seville – Centro Nacional de Microelectrónica (IMSE-CNM-CSIC),
where he is a member of a research group on Analog and Mixed-Signal VLSI.
His research interests are in the design of embedded analog interfaces for mixed-
signal VLSI circuits, design of CMOS imagers and CMOS focal plane array
processors and development on CAD for automation of building blocks analog
design, specially optimization and automatic sizing of basic building blocks for
integrated circuits.
Dr. Domínguez-Castro has co-received the 1995 Guillemin-Cauer award of
the IEEE Circuits and Systems Society, and the Best Paper Award of the 1995
European Conference on Circuit Theory and Design.
Servando Espejo (M’96) received the Licenciado en Física degree, the M.S.
degree equivalent in microelectronics, and the Doctor en Ciencias Físicas degree
from the University of Seville, Spain, in June 1987, July 1989, and March 1994,
respectively.
He is currently Profesor Titular of Electronics at the Department of Elec-
tronics and Electromagnetism of the University of Seville, and also with the
Department of Analog Circuit Design of the Spanish Microelectronics Center.
From 1989 to 1991, he was an intern at AT&T Bell Laboratories in Murray
Hill, NJ, and an employee of AT&T Microelectronics of Spain. His main areas
of interest are linear and nonlinear analog and mixed-signal integrated circuits,
including neural networks electronic realizations and theory, vision chips, mas-
sively parallel analog array processing systems, chaotic circuits, and communi-
cation devices.
Dr. Espejo has co-received the 1995 Guillemin-Cauer award of the IEEE Cir-
cuits and Systems Society, and the best paper award of the 1995 European Con-
ference on Circuit Theory and Design.
Tamás Roska (M’87–SM’90–F’93) received the Diploma in electrical engi-
neering from the Technical University of Budapest, Hungary, in 1964 and the
Ph.D. and D.Sc. degrees in Hungary in 1973 and 1982, respectively.
Since 1964, he has held various research positions. During 1964–1970, he
was with the Measuring Instrument Research Institute, Budapest, between 1970
and 1982 with the Research Institute for Telecommunication, Budapest (serving
also as the head of department for Circuits, Systems and Computers) and since
1982, he has been with the Computer and Automation Institute of the Hungarian
Academy of Sciences, where for 15 years, he has been the head of the Ana-
logic and Neural Computing Research Laboratory. He has taught several courses
at various universities, presently, at the Technical University of Budapest, at
the University of California at Berkeley, and very recently at the Pázmány P
Catholic University in Budapest. He is teaching courses on “Emergent Compu-
tations” and “Cellular Neural Networks.” In 1974, and since 1989, he has been a
Visiting Scholar at the Department of Electrical Engineering and Computer Sci-
ences and the Electronics Research Laboratory, and recently a Visiting Research
Professor at the Vision Research Laboratory of the University of California at
Berkeley. He is presently a Dean of the Faculty of Information Technology at
the Pázmány P. Catholic University, Budapest. His main research areas are cel-
lular neural networks, nonlinear circuit and systems, neural circuits, visual com-
puting and analogic spatial–temporal supercomputing. He has published more
than 200 research papers and four books (partly as a coauthor), and held several
guest seminars at various universities and research institutions in Europe, the
United States, and Japan.
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 07,2020 at 14:32:31 UTC from IEEE Xplore.  Restrictions apply. 
1336 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 14, NO. 5, SEPTEMBER 2003
Prof. Roska is a member of several Hungarian and international Scientific So-
cieties. Since 1975, he has been a member of the Technical Committee on Non-
linear Circuits and Systems of the IEEE Circuits and Systems Society. Between
1987–1989, he was the founding Secretary and later he served as Chairman of
the Hungary Section of the IEEE. Recently, he has served twice as Associate
Editor of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, Guest Co-Ed-
itor of special issues on Cellular Neural Networks of the International Journal
of Circuit Theory and Applications (1992, 1996, 1998, and 2000), the IEEE
TRANSACTIONS ON CIRCUITS AND SYSTEMS (1993 and 1999), and the Journal
of VLSI Signal Processing Systems (1999). He is a member of the Editorial
Board of the International Journal of Circuit Theory and Applications. He is
a member of the Technical Committee on Multimedia and the Technical Com-
mittee on Neural Networks of the IEEE. He received the IEEE Fellow award for
contributions to the qualitative theory of nonlinear circuits and the theory and
design of programmable cellular neural networks. In 1993, he was elected to be
a member of the Academia Europaea (European Academy of Sciences, London)
and the Hungarian Academy of Sciences. For technical innovations he received
the D. Gabor Award, for establishing a new curriculum in information tech-
nology and for his scientific achievement he was awarded the A. Szentgyörgyi
Award and the Széchenyi Award, respectively. In 1994, he became the elected
active member of the Academia Scientiarium et Artium Europaea (Salzburg). In
1998, he established and became the first Chair of the Technical Committee on
Cellular Neural Networks and Array Computing of the IEEE Circuits and Sys-
tems Society. In 2000, he received the IEEE Millennium Medal and the Golden
Jubilee Award of the IEEE Circuits and Systems Society.
Csaba Rekeczky received the B.S. degree in electrical engineering and the
Ph.D. degree from the Technical University of Budapest, Hungary, in 1993 and
1999, respectively.
He is with the Computer and Automation Institute of the Hungarian Academy
of Sciences, working at the Analogic and Neural Computing Research Labora-
tory. He was a research assistant at Department of Electrical Engineering of the
Tokushima University, Japan, from 1994 to 1995. He was a visiting scholar of
the Department of Electrical Engineering and Computer Sciences of Univer-
sity of California, Berkeley, from 1997 to 1998. His main areas of interest are
cellular neural and nonlinear networks, neuromorphic modeling and image pro-
cessing with parallel nonlinear array processors.
Dr. Rekeczky received the 1995 award for outstanding Ph.D. students at the
Computer and Automation Institute of the Hungarian Academy of Sciences,
and the 1993 award of the Hungarian Scientific Society of Measurement and
Automation for diploma thesis.
István Petrás, photograph and biography not available at the time of
publication.
Angel Rodríguez-Vázquez (M’80–SM’95–F’96) is a Professor of Electronics
at the Department of Electronics and Electromagnetism (University of Seville).
He is also a member of the research staff of the Institute of Microelectronics
of Seville – Centro Nacional de Microelectrónica (IMSE- CNM) – where he
is heading a research group on Analog and Mixed-Signal Integrated Circuits.
His research interests are in the design of analog front-ends for mixed-signal
circuits and systems-on-chip, telecom circuits, CMOS imagers and vision
chips, sensory-processing-actuating systems-on-chip and bio-inspired inte-
grated circuits. In these topics, he has published seven books, 36 book chapters
in other books, about 100 journal papers, and about 300 conference papers. He
served as an Associate Editor of the IEEE TRANSACTIONS ON CIRCUITS AND
SYSTEMS—I from 1993 to 1995, as Guest Editor of the IEEE TCAS-I special
issue on “Low-Voltage and Low-Power Analog and Mixed-Signal Circuits
and Systems” (1995), as Guest Editor of the IEEE TCAS-II special issue on
“Advances in Nonlinear Electronic Circuits” (1999), as Guest Editor of the
IEEE TCAS-I special issue on “Bio-Inspired Processors and Cellular Neural
Networks for Vision” (1999), and as chair of the IEEE-CAS Analog Signal
Processing Committee (1996). Currently, he is an Associate Editor for IEEE
TCAS-II and Guest Editor of the IEEE TCAS-I special issue on “Advances
on Analog-to-Digital and Digital-to-Analog Converters”. He is also member
of the editorial staff of the International Journal on Circuit Theory and
Applications and the Analog Integrated Circuits (New York: Wiley) and Signal
Processing Journal (New York: Kluwer Academics) He was co-recipient of
the 1995 Guillemin-Cauer award of the IEEE Circuits and Systems Society,
the Best Paper Award of the 1995 European Conference on Circuit Theory and
Design, and the 1999 Best Paper Award of the International Journal on Circuit
Theory and Applications. In 1992, he also received the Young Scientist Award
of the Seville Academy of Science. In 2002, he received el VII Premi Salvá
i Campillo al Projecte Més Original. In 1996, he was elected to the degree
of Fellow of the IEEE for “contributions to the design and applications of
analog/digital nonlinear ICs.”
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 07,2020 at 14:32:31 UTC from IEEE Xplore.  Restrictions apply. 
