Second-order neural core for bioinspired focal-plane dynamic image processing in CMOS by Carmona Galán, Ricardo et al.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 51, NO. 5, MAY 2004 913
Second-Order Neural Core for Bioinspired
Focal-Plane Dynamic Image Processing in CMOS
Ricardo Carmona-Galán, Member, IEEE, Francisco Jiménez-Garrido, Carlos Manuel Domínguez-Mata,
Rafael Domínguez-Castro, Servando Espejo Meana, Istvan Petras, and Ángel Rodríguez-Vázquez, Fellow, IEEE
Abstract—Based on studies of the mammalian retina, a
bioinspired model for mixed-signal array processing has been
implemented on silicon. This model mimics the way in which
images are processed at the front-end of natural visual pathways,
by means of programmable complex spatio-temporal dynamic.
When embedded into a focal-plane processing chip, such a
model allows for online parallel filtering of the captured image;
the outcome of such processing can be used to develop control
feedback actions to adapt the response of photoreceptors to
local image features. Beyond simple resistive grid filtering, it is
possible to program other spatio-temporal processing operators
into the model core, such as nonlinear and anisotropic diffusion,
among others. This paper presents analog and mixed-signal very
large-scale integration building blocks to implement this model,
and illustrates their operation through experimental results taken
from a prototype chip fabricated in a 0.5- m CMOS technology.
Index Terms—Neural network hardware, nonlinear systems,
parallel processing.
I. INTRODUCTION
PHYSIOLOGICAL and pharmacological studies of themammalian retina show that this amazing piece of wet-
ware is not a simple phototransducer, but is responsible for very
complex signal processing. The retina operates on the captured
visual stimuli at early stages in the process of vision. Complex
spatio-temporal processing encodes visual information into
a reduced set of channels [1]. The visual information flow
is compressed into a data set of a manageable size, to be
delivered to the brain by the optic nerve. Although the mapping
is retinotopic, it is not the raw image brightness that is sent to
the visual cortex, but a specific set of image features (closely
related with the spatial and temporal characteristics of the
visual stimulus) which are obtained and codified in the retina.
The purpose of this early vision processing is to alleviate the
work of the central nervous system. The application of a highly
regular computational task onto a large set of simple data (e.g.,
Manuscript received July 31, 2003; revised January 8, 2004. This work was
supported in part by EU Framework Program VI under Grant IST 2001 38097
(LOCUST), in part by the Spanish Ministry of Science and Technology under
Grant TIC 2003 09817 C02 01 (VISTA), and in part by the Office of Naval
Research under Grant NICOP N000140210884. This paper was recommended
by Guest Editor A. Zarandy.
R. Carmona-Galán, F. Jiménez-Garrido, C. M. Domínguez-Mata,
R. Domínguez-Castro, S. Espejo Meana, and A. Rodríguez-Vázquez are with
the Institute of Microelectronics of Seville, Centro Nacional de Microelec-
trónica (IMSE-CNM), Universidad de Sevilla, 41012 Seville, Spain (e-mail:
rcarmona@imse.cnm.es).
I. Petras is with the Analogic and Neural Computing Laboratory, Hungarian
Academy of Sciences, Budapest H-1518, Hungary.
Digital Object Identifier 10.1109/TCSI.2004.827641
picture brightness samples) is transferred to the retina, while the
cortex activity is dedicated to higher level operations on more
complex data structures. The massive parallelism of this model
inspires a feasible alternative to conventional digital image
processing. The limited bandwidth available for transferring
signals between the camera array and the processor, and the
limited computing speed achievable in a serial, or timidly par-
allel, processing architecture, make these systems fail to match
the tight requirements found in real-time image processing.
We are interested in local monitoring and control of the
photosensing devices for contrast enhancement. This capability
improves the perceived sensation by extracting the reflectance
information from the acquired luminance matrix [2]. Data
bottlenecks, arising mostly in transferring image samples from
the camera to the processor, and in delivering the appropriate
control signals to each photosensor, and the enormous amount
of data to be processed, make it hardly realizable at a practical
frame rate by a conventional digital processing system. Yet,
this task is gracefully implemented in the biological retina.
Concurrent processing and sensing eliminate data bottlenecks
in the forward and feedback paths, and massively parallel
processing provides enough computing power. Mixed-signal
very large-scale integration (VLSI) permits the implementation
of massively parallel multidimensional signal processing
without serious area and power penalties. These chips are
called neuromorphic [3] as they mimic the way in which the
layers of neurons in the biological retina realize early vision.
An image acquisition and focal-plane processor chip must
have, at every pixel, a reliable, locally adaptive photosensing
device (the opto-electronic interface) plus the analog and/or
mixed-signal core which realizes signal processing at the
pixel-level. Concerning the distributed processing facilities, the
cellular neural network (CNN) universal machine architecture
[4] has several advantages. It has an analog front-end, which is
compatible with the nature of the signals coming from the pho-
tosensors, it is general-purpose and fully programmable, it has
a distributed memory to store intermediate results, and it has
been proven to realize the type of processing required for sensor
control [5]. In addition, retinal features have been successfully
modeled and simulated within the CNN framework [6].
This paper presents, in the first place, a network model in-
spired on the layered structure of the mammalian retina. Then,
the implementation of a fully-programmable second-order
neural core to provide active wave computing at the focal-plane
is shown. By setting the appropriate parameters: such as
interaction strengths, time constants, and bias terms, an array
of such processing elements can emulate some phenomena
1057-7122/04$20.00 © 2004 IEEE
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 07,2020 at 14:01:17 UTC from IEEE Xplore.  Restrictions apply. 
914 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 51, NO. 5, MAY 2004
Fig. 1. Schematic diagram of the functional architecture of the mammalian
retina [8].
observed in the mammalian retina. At the end of the paper,
experiments in a 0.5- m CMOS prototype of 32 32 cells,
each one containing a second-order neural core, are displayed.
II. BIOINSPIRED NETWORK MODEL
A. Sketch of the Mammalian Retina
The retina is a peripheral component of the central nervous
system responsible for acquiring and coding the information
contained in the visual stimuli. Specialized neurons develop a
particular kind of massively parallel processing of raw sensory
information. Visual stimuli trigger patterns of activation in the
layered structure of the retina, which are processed as they ad-
vance toward the optic nerve. These patterns of activation are
analog waves supported by continuous time signals, contrarily
to the spike-like coding of neural information found elsewhere
in the nervous system [7]. The biological motivation for this
peculiarity can be found in the lack of bandwidth offered by
the spike-like neural impulses to handle the vast amount of data
contained in the visual stimuli. Fig. 1 displays a conceptual di-
agram of the functional architecture of the mammalian retina
[8]. In this scheme, light comes through the inner retina, all the
way across the eye, crosses the transparent layers of cells and is
captured by the photosensors in the outer retina. At the outer-
most end of the layered structure, the retinal pigment epithelium
(RPE) is found. This is a nonneuronal layer of cells that sur-
rounds the outer segments (OS) of the photoreceptors. It is the
source for the regeneration of the pigment chromofore after its
isomerization by light. The following layer is composed of spe-
cialized photoreceptive cells of two types: rods and cones. Rods
are more light sensitive and responsible for scotopic vision.
Cones are less sensitive, more numerous, and are responsible
forcolor vision. Their OS contain stacks of discs with rhodopsin,
the visual pigment. Rods and cones capture light and convert it
into activation signals. Their inner segments (IS) contain the rest
of the cellular organelles. The next visible layer is the outer nu-
clear layer (ONL), which contains the cell bodies of the rods and
cones. The outer plexiform layer (OPL) contains the axons from
the horizontal cells and the dendritic trees of bipolar cells. They
receive synaptic inputs from the rods and cones. Bipolar cells
carry the activation signals across the retinal layers to the gan-
glion cells that interface the retina with the optical nerve, in a trip
of several micrometers [1]. The inner nuclear layer (INL) con-
tains the cell bodies of bipolar, horizontal and amacrine cells.
The inner plexiform layer (IPL) contains the axons of the bipolar
and amacrine cells, and the dendritic trees of the retinal gan-
glion cells. The ganglion cell layer (GCL) contains the bodies
of the ganglion and displaced amacrine cells. The optic nerve
fiber (ONF) is built from the axons of the retinal ganglion cells.
The ganglion cells convert the continuous activation signals,
proper of the retina, into spike-coded signals which can be trans-
mitted over longer distances by the nervous system. On its way
to the ganglion cells, the information carried by bipolar cells is
affected by the operation of the horizontal and amacrine cells.
They form layers in which activation signals are weighted and
promediated in order to, first, bias photodetectors and, second,
to account for inhibition on the vertical pathway. The four main
transformations that take place in this structure are: the pho-
toreceptor gain control, the gain control of the bipolar cells,
the generation of transient activity and the transmission of tran-
sient inhibition [1]. Briefly, captured stimuli are promediated
and the high-gain characteristics of the cones and the bipolar
cells are shifted to adapt to the particular light conditions. These
operations have a local scope and depend on the recent history
of the cells. Once adaptation is achieved, patterns of activity
are formed dynamically by the presence or absence of visual
stimuli. Also, inhibition is generated and transmitted laterally
through the layers of horizontal and amacrine cells. As a result
of these transformations, the patterns of activity reach the layer
of ganglion cells. At this point, the patterns are converted into
pulse coded signals that are sent to the brain to be interpreted. In
a sense, the layered structure of the retina translates the visual
stimuli into a compressed language which can be understood by
the brain in recreating vision.
B. CNN Analogy of IPLs and OPLs
In the above description, there are some aspects of the retinal
layers that markedly resemble the features of a CNN [9]: the
two-dimensional (2-D) aggregation of continuous signals, the
local connectivity between elementary nonlinear processors,
and the analog weighted interactions between them. Also, the
complete signal pathway in the retina has the topology of a
three-dimensional (3-D) network, or, more properly a 2(1/2)-D
network, a pile of 2-D layers connected vertically. Motivated
by these coincidences, a CNN model has been developed
which approximates the observed behavior of different parts
of the mammalian retina,for instance, the OPL. The OPL is
responsible for the generation of the first activation patterns
immediately after image capture. It has been characterized by
experimental measurements, leading to a model with three
different layers [10]. These layers stand for the contribution of
photoreceptors, horizontal and bipolar cells. Each of them has
the structure of a 2-D CNN itself. Each of them has its own
interaction patterns (CNN templates), and its particular time
constant. Cell dynamics at each layer are supported by a first-
or a second-order continuous-time core.
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 07,2020 at 14:01:17 UTC from IEEE Xplore.  Restrictions apply. 
CARMONA-GALÁN et al.: SECOND-ORDER NEURAL CORE 915
Fig. 2. Block diagram of the two coupled CNN layer nodes.
The IPL has been also modeled within the CNN framework.
The IPL is responsible for the generation of the retinal output. A
simplified model of the IPL has three layers. Two of them repre-
sent the influence of the wide-field amacrine cells excited by the
input signal, which in this case is the output of the bipolar cells,
and there is a third layer that controls the dynamic of the pre-
vious layers by means of feedback. As before, the three layers
can be seen as 2-D CNNs with their own internal coupling and
their own time constant [10].
Because of the relative simplicity of these models, a
programmable CNN chip has been proposed [11]. The pro-
grammable array processor consists of two coupled CNN
layers. Each elementary processor contains the nodes for both
CNN layers. The third layer, supporting analog arithmetics, is
implemented offline by these analog cores, with the help of the
local facilities for analog signal storage. The evolution of the
coupled CNN nodes of a specific cell is described by
these coupled differential equations
(1)
where the loss term and the activation function are those of the







Fig. 2 depicts the block diagram of the vertically coupled
CNN nodes. Synaptic connections between cells are linear.
Each CNN layer incorporates feedback connections, by means
of which the output of each cell contributes to the state of its
neighbor, weighted by the elements ; a feedforward
connection, weighted by , which regulates the contribu-
tion of the cell’s input; a bias term , which can be different
for each cell; and, finally, coupling connections between
both layers, weighted by and . Each layer has its own
time-constant . Programming different dynamics in this
CNN model is possible by adjusting the template elements and
the time-constants of the layers. The total number of synapses
to be implemented on each cell is 22, plus the two bias maps
multipliers, which will be treated as a second input image for
each layer.
III. SECOND-ORDER CORE IMPLEMENTATION
A. Second-Order Cell Structure
The internal architecture of the basic processing cell pre-
sented here is similar to the structure of the cells in the CNN uni-
versal machine [4]. However, in this case, the prototype cell in-
cludes two different continuous-time CNN layers, as described
in the conceptual diagram of Fig. 2. Together with the two dif-
ferent analog CNN core blocks [Fig. 3(a)], four local analog
memories (LAMs) and four local logic memories (LLMs) are
provided at the pixel-level for the storage of intermediate re-
sults, and a local logic unit (LLU) is built as well for pixel-level
logic operations. The synaptic connections between the analog
processing nodes of the same layer are built around the cell core,
as shown, while interlayer coupling, kept within the pixel scope
in this model, is placed inside the cell (represented by arrows
between the processing layers in the diagram). All the blocks in
the cell communicate via an intracell data bus, which is multi-
plexed to the array input/output (I/O) interface. Control and cell
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 07,2020 at 14:01:17 UTC from IEEE Xplore.  Restrictions apply. 
916 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 51, NO. 5, MAY 2004
Fig. 3. (a) Conceptual diagram of the basic cell. (b) Internal structure of each CNN layer node.
configuration bits are passed directly from the control unit, lo-
cated outside the array processor.
The internal structure of each CNN core is depicted in
Fig. 3(b). Each one receives contributions from the rest of the
processing nodes in the neighborhood which are summed and
integrated in the state capacitor. The two layers differ in that
the first layer has a scalable time constant, controlled by the
appropriate binary code, while the second layer has a fixed time
constant. The evolution of the state variable is also driven by
self-feedback and by the feedforward action of the stored input
and bias patterns. There is a voltage limiter which helps to
implement the limitation on the state variable of the FSR CNN
model. This state variable is transmitted in voltage form to the
synaptic blocks, in the periphery of the cell, where weighted
contributions to the neighbors are generated. There is also a
current memory that will be employed for cancellation of the
offset of the synaptic blocks. Initialization of the state, input
and/or bias voltages is done through a mesh of multiplexing
analog switches which connect to the cell’s internal data bus.
Running complex spatio-temporal dynamics in this network
requires following several initialization and calibration steps.
First of all, acquisition of the input image and auxiliary masks
and/or patterns. To this purpose, the array I/O interface is di-
rected to specific LAM locations in a row-by-row basis. After
that, the analog instruction, i.e., the set of synaptic weights re-
quired for a specific operation, is selected and transmitted to all
the cells in the array. Then, the offset of the critical OPAMPs
is extracted in a calibration step. After that, the time-invariant
offsets of the synaptic blocks are computed and stored in the
current memories. Now the network is almost ready to operate.
The state capacitors and the feedforward synapses are then ini-
tialized by means of the appropriate switch configuration, and
the network evolution is run by closing the feedback loop in each
processing element. Before stopping the network evolution, the
final state is stored in a LAM register for further operation.
B. Single-Transistor Synapse
One of the most important blocks in the cell is the synaptic
block. The synapse is simply a four-quadrant analog multiplier.
Its inputs are the cell state , or input variables, and the
corresponding weight signal , while the output is the cell’s
contribution to a specific neighboring cell. The multiplier is re-
quired to have voltage inputs, which can be easily conveyed to
any high-impedance node by a simple wire, and current output,
which may be easily summed by wiring all current contributions
concurrently to a low-impedance node. Two important facts for
the implementation of the synaptic blocks are, first, that there
is no need to have a strictly linear relation between the weight
signal , and the output current , and second, that the weight
signal does not change during the evolution of the network.
Thus, any deviation depending on is not a gain error, but
an offset error, i.e., an error which can be cancelled by autoze-
roing in a preprocessing calibration step.
Direct multiplication can be achieved by a MOS transistor
operating in the ohmic region. Its low-frequency large-signal
characteristic is found in the first-order approach by (if n type)
(4)
where . A multiplication can be realized
with this device as long as holds [13].
This alternative has several advantages, compared with multi-
pliers built with MOS transistors in weak inversion or in strong
inversion saturation [14]: it requires a reduced amount of area,
because four-quadrant behavior is achieved with one single tran-
sistor. In addition, it has a better relation between bias power and
signal power, thus leading to higher accuracy at lower power
consumption, while in the saturation region the information is
carried by a small fraction of the actual currents flowing through
the devices. Third, the use of the ohmic region shows better mis-
match figures than any other region [15].
The one-transistor synapse works as follows. Consider a
p-type MOS transistor operating in the ohmic region (Fig. 4).
The transistor selected is of type p because the more resistive
p-type channel requires smaller currents (hence, smaller power
consumption) for the same transistor lengths. Alternatively, for
the same current levels, the required p-channel MOS is shorter
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 07,2020 at 14:01:17 UTC from IEEE Xplore.  Restrictions apply. 
CARMONA-GALÁN et al.: SECOND-ORDER NEURAL CORE 917
than its n-type counterpart. The source-to-drain current of a
PMOS transistor in the ohmic region is given by
(5)




where must be kept fixed in order to use and as
single-ended input voltages, and to sense as the output of the
synapse. For this purpose, we can employ a current conveyor
[16] at the current input node of each cell. The current conveyor
permits current sensing while maintaining a virtual reference at
node . All the synapses contributing to the same cell can be
connected to the same virtual reference. The only objection is
that the impedance at this node must be well below the parallel
of the output impedances of all the synaptic blocks.
Back to (5), notice that the second term on the right-hand side
of the equation does not depend on ; therefore, node is a
strong candidate to hold the cell state variable voltage. But
must always be positive for the MOS transistor to operate above
the threshold; thus, let be composed of a reference voltage
sufficiently high, and a superposed cell state signal
(7)
In order to achieve four-quadrant multiplication, must be
permitted to go above and below . Let us select as the
reference for the weight signal being
(8)
Equation (5) can then be rewritten as
(9)
which is a four-quadrant multiplier with an offset term which is
time invariant (at least during the transient evolution of the net-
work) and does not depend on the cell state. Therefore, we have
arrived at a four-quadrant multiplier with single-ended voltage
inputs and a current output, with a offset which can be elimi-
nated by a calibration step, with the help of a current memory
(10)
The limitations found to this behavior are the upper and
lower boundaries of the ohmic region in strong inversion and
the degradation of the mobility. The transversal electric field
pushes the carriers toward the semiconductor surface where
they suffer scattering, which renders a reduction in the speed of
the carriers, thus degrading the mobility. This transversal elec-
tric field depends on the gate voltage; thus, the first summand
in (10) is no longer linear with . Combining the two limiting
factors
(11)
where is a maximum ef-
fective gate voltage, beyond which the distortion introduced by
mobility degradation exceeds the linearity requirements.
Fig. 4. Multiplier using one single MOS transistor in the ohmic region.
Fig. 5. Current conveyor realization and small-signal equivalent.
For moderate linearity requirements, in a typical CMOS
technology, the right-hand side of (11) becomes approxi-
mately equal to 1 V. If and are assigned the same
voltage ranges, 400 mV around their reference values, then,
400 mV. With these values of ,
and 0.8 V, must be kept 1.6 V below . Thus,
must be high enough to leave room for , but not too
large because the weight signal will progress up to above
. In addition, we have to provide a range for the current
conveyor circuitry to maintain a virtual reference precisely at
, and for the circuits generating the weight voltages, which
will have a limited output swing. If we select 2.55 V,
then, they are 0.75 V above before hitting the power rail at
3.3 V, which means one , approximately. With this value,
results in 0.95 V. Finally, once the voltage ranges are fixed,
a maximum current per synapse is selected for meeting power
requirements; in this case, it will be 1.4 A. With these values,
the synapse is dimensioned. In this chip, it will be 2- m wide
and 25.9- m long.
C. Current Conveyor
The current conveyor, required for creating a virtual refer-
ence node at which the synapses outputs can be sensed, is im-
plemented in the circuit of Fig. 5. Any difference between the
voltage at node and the reference is amplified and the
negative feedback corrects the deviation. The input impedance
of this block is very low, which means that changes in the small-
signal input current do not disturb the virtual reference at
node appreciably; thus, . The bias current is re-
quired to ensure that node is always the source of transistor
. At the same time, this circuit permits the injection of a
nearly exact copy of the input current at the state node, whose
voltage range differs from that of the weight signals. The only
drawback of using this circuit is that a voltage offset, , at the
input of the differential amplifier —which can be implemented
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 07,2020 at 14:01:17 UTC from IEEE Xplore.  Restrictions apply. 
918 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 51, NO. 5, MAY 2004
Fig. 6. Offset calibration mechanism for the critical OTAs.
with a simple operational transconductance amplifier (OTA) as
it drives a very high impedance node, the gate of — results
in an error of the same amount in the reference voltage imple-
mented at node . Since the main contribution to the offset is
random, this error will be distributed all along the array resulting
in mismatched synaptic blocks which can degrade performance,
e. g. anisotropic evolution of the network yielded by a symmet-
rical propagation template. As we are impelled to use small-size
devices, in order to achieve the highest cell-packing density pos-
sible, the random offset can be quite large. In order to avoid
this, an offset calibration mechanism has been implemented at
the critical OTAs (Fig. 6). The input-referred offset voltage
has been taken out of the OTA block symbol. Without the offset
cancellation circuit (the shadowed area), at low frequencies, and
considering a negligible output conductance, the output of the
OTA is
(12)
Considering the error cancellation mechanism, when is ON,
then, the inputs are shortcircuited , and is con-
nected as a diode, with its source to drain in steady state as
(13)
After some time, is turned off and, except for a remnant
switching error, the current is memorized by means of the
voltage stored in . Thus, the total current injected into the
load is free of any offset
(14)
D. Current Memory
The offset term of the synapse current must be removed for
the output current to accurately represent the result of a four-
quadrant multiplication. To this purpose, before the CNN oper-
ation, but right after the new weights have been uploaded, all the
synapses are reset to . The resulting current, which is
the sum of the offset currents of all the synapses concurrently
connected to the same node, is memorized. This value will be
subtracted online from the input current during the network evo-
lution, resulting in a one-step cancellation of the errors of all the
synapses. The validity of this method relies on the accuracy of
the current memory. For instance, in this chip, the sum of all the
Fig. 7. S I current memory schematics and timing.
contributions will range from 18 to 46 A. On the other hand,
the maximum current signal of the synapse is
A (15)
which means a total current range of 1 A. If an equivalent
resolution of 8 bits is intended, then, nA. In these
conditions, our current memory must be able to distinguish 2 nA
from the 46 A. This represents an equivalent resolution of 14.5
bits. In order to achieve such accuracy levels, a so-called
current memory will be employed [17]. As depicted in Fig. 7,
it is composed of three stages, each one containing a switch, a
capacitor and a transistor. At the beginning, while , , and
are ON, the current is divided into , , and , and
(16)
Switches controlled by , , and are successively turned
off. Each time one of these switches turns off, the voltage stored
in its associated capacitor changes, e. g., changes from
to because of charge injection. The other transistors
have to accommodate to absorb the error, as the sum of currents
is still forced to be , and thus, and change to
(17)
when turns off. Correspondingly, changes to
(18)
when falls. Finally, is turned off, and ends in
. The final current is
(19)
and substituting here the values of , , and , we find that
(20)
the only error left is that corresponding to the last stage. The
former stages do not contribute to the error in the memorized
current. If the block is designed to store the most significant
bits in the first capacitor, and the less significant bits in the last
one, then, the error in the memorized current can be made quite
small. Consider that the total resolution of the current memory
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 07,2020 at 14:01:17 UTC from IEEE Xplore.  Restrictions apply. 
CARMONA-GALÁN et al.: SECOND-ORDER NEURAL CORE 919
is . Let us assume that is conducting the most sig-
nificant bits of the current , then, conducts the next ,
and conducts the rest. Thus, for the last stage, an effective
resolution can be defined as
(21)
If the error in the memorized current has to be kept below
0.5LSB, and , then
(22)
And this is the design equation which relates the geometric as-
pect of transistor , through , with the magnitude of the
storage capacitor, via . Once we have , and it may
be easily derived that
(23)
One might think that adding more stages to the current
memory will endlessly increase accuracy. However, there is
one factor that has not been addressed yet. As the order of the
memory increases, the currents, which have to be sensed by
the last stages, become smaller. There comes a point when the
leakages from the capacitors of the first stages are of the size
of the current to be memorized by the last stages, thus making
it impossible to reach a steady-state current which corrects the
previous errors. This problem worsens as temperature rises.
For instance, at 70 C, leakages can introduce changes in the
memorized current in the order of 0.2 nA s. If the dynamics
of the current memory require several microseconds to settle
(because of the use of large capacitors and due to the tiny
currents involved) the memorized current will display an error
that is quite above the initial estimation.
E. Time-Constant Scaling Block
The time constant of the CNN layer is defined as ,
the ratio between the state capacitor and the transconductance
obtained by multiplying the current factor of the synapse
3.13 A V times the weight signal voltage . This
time constant depends on the specific set of templates being im-
plemented in the CNN. The state capacitor is composed by the
gate capacitances of the 11 synapses driven by the cell’s state.
As 3.45 fF m in this technology, this makes a total of
1.97 pF. In the most favorable case, when every neighbor, even
the cell itself, is contributing the maximum amount of current
to the cell state, a parallel stack of 18 synapses, with a transcon-
ductance of 22.5 A V is found. This represents a minimum
CNN time constant of 87.4 ns.
Scaling the time constant of one of the CNN layers involves
either modifying the value of the state capacitor or of the
synapse transconductance. For the first alternative, it will be
necessary to implement a regulable capacitor. If a continuously
regulable capacitor is pretended, it does not seem to be easy
to realize. If a capacitor with a discrete set of capacitances is
Fig. 8. Alternatives for the placement of the scaling block.
adequate, an area of 16 times m will be
required to implement a 1:16 time-constant ratio.
The second alternative, scaling the transconductances of
every synapse contributing to the cell, can be achieved with a
current mirror. Scaling up/down the sum of currents entering
the cell is equivalent to scaling up/down the transconductances
of the synapses, and thus, to scaling down/up the time constant
of the CNN core. A circuit for continuously adjusting the
gain of a mirror can be designed based on the active-input
regulated-cascode current mirror [18]. The major disadvantage
of using this circuit is its strong dependence on the power rail
voltage. The power rail voltage can deviate further more than
1% in a densely packed 32 32 -cell parallel array processor
chip. This will cause a large mismatch in the time-constants of
the different cells in the layer. An alternative to this is a binary
programmable current mirror. Its output current is given by
(24)
where , , , and are the decimal values of the control
bits. In this case, 4 bits will be more than enough to program the
required relations between and . The mismatch between the
time constants of the different cells is now fairly attenuated by
design.
A new problem arises related with the placement of the
scaling block in the signal path. There are several alternatives.
First, the scaling block, the binary weighted current mirror, can
be placed after the offset cancellation memory, as in Fig. 8(a).
The problem is that any offset introduced by the scaling block
is incorporated to the signal path without possible cancellation.
The second alternative [Fig. 8(b)] is to place the scaling block
before the offset cancellation memory. This means that the
memory will have to operate over a wider range of currents,
thus complicating its design and surely degrading its perfor-
mance. Our choice, depicted in Fig. 8(c), has been to place the
scaling block in the memorization loop. The current memory
will operate on the unscaled version of the input current, and
any offsets associated with the scaling blocks will be sensed
and memorized to be cancelled online during the network
evolution.
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 07,2020 at 14:01:17 UTC from IEEE Xplore.  Restrictions apply. 
920 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 51, NO. 5, MAY 2004
Fig. 9. Input block with current scaling.
Fig. 10. Simplified schematics of the feedback loop and its small signal equivalent.
The resulting CNN core is shown in Fig. 9 [19]. In this pic-
ture, the voltage reference generated with the current conveyor,
the current mirrors and the memory can be easily identified.
The inverter driving the gates of the transistors of the current
memory is required for stability. Without it, the output node
will diverge from the equilibrium. The operation of this circuit
is as follows. Before running the CNN dynamics, the current
offsets of all the synapses are injected to the virtual reference
at node . This current is scaled down to one th of its value
by means of the adjustable current mirror formed by and
. The arrow over stands for the binary programma-
bility of this device. The value of is
(25)
Then, if all the transistors of the memory are conducting,
that is, if , , and are ON, then, the negative feedback
loop makes conduct the same current as . is also
adjustable so as to make and the current memory work
with the same current ranges as in the input stage. The rest of
the operation has already been described. The current memory
stores successively the remaining most-significant bits of the
input current, plus the errors accumulated. When this is done,
the CNN loop can be closed and the output current represents
Fig. 11. Microphotograph of the prototype chip.
the scaled sum of the contributions, with the state-independent
errors substracted.
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 07,2020 at 14:01:17 UTC from IEEE Xplore.  Restrictions apply. 
CARMONA-GALÁN et al.: SECOND-ORDER NEURAL CORE 921
Fig. 12. Wide field erasure effect, represented in 3-D.
The critical aspects of this circuit are related to the feedback
loop formed by , , , inverting amplifier , and
transistors , when sensing the offset current. During this
process, output current is zero because the current path to
the state capacitor is open. Once the input current has been es-
tablished, can be considered a bias voltage. First of all, it
must be taken into account that during the three different phases
in which the loop is closed ( , , and ON, OFF and
and ON, and, finally, and OFF and ON) the values of
and change, so the stability conditions must hold for
any possible set of values. Considering the small-signal equiva-
lent circuit for this loop, a three-pole system is found (Fig. 10),
with pole frequencies: , , and
. The nearest pole, at node , will be em-
ployed to compensate the loop for stability. As and de-
crease for the latest phases of the current memorization, the loop
will be more stable because this causes the loop dc gain to
decrease and to grow, breaking away from , thus increasing
the phase margin. Therefore, the worst situation will occur when
, , and are ON, and thus, the circuit is designed to be
stable in these conditions. It is also important that is kept rea-
sonably low, otherwise it will displace the unity-gain frequency
toward the value of the inversion . This means a loss of
phase margin, and can compromise the loop stability.
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 07,2020 at 14:01:17 UTC from IEEE Xplore.  Restrictions apply. 
922 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 51, NO. 5, MAY 2004
Fig. 13. Spatio-temporal edge detection and de-activation (fast layer).
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 07,2020 at 14:01:17 UTC from IEEE Xplore.  Restrictions apply. 
CARMONA-GALÁN et al.: SECOND-ORDER NEURAL CORE 923
Leakage currents can degrade the memory operation es-
pecially as the operation temperature rises. Although the neg-
ative feedback moves the circuit toward the correction of the
errors, it may be too slow to settle at a value before leakages
modify the position of the equilibrium point. Therefore, com-
pensation must be kept under a limit to avoid slowing down the
loop dynamics in excess.
IV. EXPERIMENTAL RESULTS
A. Prototype Chip Data
A prototype chip has been designed and fabricated in a
standard 0.5- m CMOS technology with single-poly and
triple-metal layers. Fig. 11 displays a microphotograph of
the chip. It contains a central array of 32 32 second-order
cells of the type formerly described (this prototype does not
incorporate the adaptive photosensors). Surrounding the array,
a ring of boundary cells, implementing the contour conditions
for the CNN dynamics, is found, together with the necessary
buffers to transmit digital instructions and analog references to
the array. On the lower part of the chip, the program control and
memory blocks can be found. The last major subsystem is the
I/O interface including sample-and-hold batteries, decoders,
counters, and different sequential logic. The whole system fits
in 9.27 8.45 mm , including the ring of bonding pads. One
single processing element occupies 188 m 186 m. The
resulting cell density is 29.24 mm . In order to cautiously
handle this data, it is important to notice that the area occupied
by the cell array scales linearly with the total number of cells,
which is not the case with the overhead circuitry, which tends
to be a smaller fraction of the total chip size as the number of
cells rises. The power consumption of the whole chip has been
estimated at 300 mW. Data I/O rates are nominally 10 Ms/s.
The time constant of the fastest layer (fixed time constant) is
designed to be under 100 ns. The chip can handle analog data
with an equivalent resolution of 7.5 bits (measured). The peak
computing power of this chip is of 470 GOPS. Here, OPS
means analog arithmetic operations per second. In a time con-
stant, 100 ns, each CNN core performs 12 multiplications and
11 additions. Thus, for each cell, with two cores, there are 46
operations within each cell in 100 ns. Having 1024 processing
cells, the chip can reach 470 GOPS when running the network
dynamics. The computing power per unit area—considering
the main array alone—is 6.01 GOPS mm and per-unit power
is 1.56 GOPS/mW.
B. Retinal Behavior Emulation
Image processing algorithms can be programmed on this chip
by setting the configuration of switches and by tuning the appro-
priate interconnection weights—the programming interface is
digital while the internal coding of the weights is analog. Prop-
agative and wave-like phenomena, similar to those found at the
biological retina, can be observed in this chip by just setting the
proper coupling between cells in the same or in different layers.
For instance, it can be programmed to propagate spots in the
faster layer toward the border of the array. These spots trigger
a slower set of waves in the first layer. The wavefronts gener-
ated at the slower layers can be employed to inhibit propagation
in the faster layer, thus generating a trailing edge for the waves
in the fast layer. This produces similar results to the wide field
erasure effect observed in the IPL of the retina. Fig. 12 displays
a 3-D plot of this effect. These pictures have been generated
with the prototype chip by running the network dynamics, from
the same initial state, during successively larger periods of time.
This permits the reconstruction of the actual evolution of the
state of the cells during the CNN dynamics. Another interesting
effect observed in the OPL of the retina [10] is the detection of
spatio-temporal edges followed by de-activation of the patterns
of activity. This phenomenon has also been programmed in the
chip (Fig. 13).
V. CONCLUSION
Based on a simple but precise model of the real biological
system, a feasible efficient implementation of an artificial vision
device has been designed. Tailored analog building blocks for
fully programmable focal-plane image processing are provided.
A prototype chip containing a network of 32 32 2 CNN
nodes have been designed, fabricated and successfully tested in
standard CMOS technologies. Different wave-computing algo-
rithms can be implemented in this chip by simply programming
the network dynamics with only a few parameters: connection
weights, time constant ratio, bias map and boundary conditions.
ACKNOWLEDGMENT
The authors deeply appreciate the many useful and fruitful
discussions with G. Liñán related to chip architecture and cir-
cuit design, T. Serrano-Gotarredona regarding the implementa-
tion of programmable current mirrors, and with D. Bálya, and
P. Földesy related to the experiments.
REFERENCES
[1] B. Roska and F. S. Werblin, “Vertical interactions across ten parallel,
stacked representations in the mammalian retina,” Nature, vol. 410, pp.
583–587, 2001.
[2] V. Brajovic, “A model for reflectance perception in vision,” Proc. SPIE,
vol. 5119, pp. 307–315, 2003.
[3] A. Moini, Vision Chips. Norwell, MA: Kluwer, 1999.
[4] T. Roska and L. O. Chua, “The CNN universal machine: An analogic
array computer,” IEEE Trans. Circuits Syst. II, vol. 40, pp. 163–173,
Mar. 1993.
[5] C. Rekeczky, T. Roska, and A. Ushida, “CNN-based difference-con-
trolled adaptive nonlinear image filters,” Int. J. Circuit Theory Applicat.,
vol. 26, pp. 375–423, 1998.
[6] D. Balya, C. Rekeczky, and T. Roska, “Basic mammalian retinal effects
on the prototype complex cell CNN universal machine,” in Proc. 7th
IEEE Int. Workshop Cellular Neural Networks and their Applications,
vol. 14, Frankfurt, Germany, July 22–24, 2002, pp. 251–258.
[7] F. Werblin, “Synaptic connections, receptive fields and patterns of ac-
tivity in the tiger salamander retina,” Invest. Ophthalm. Visual Sci., vol.
32, no. 3, pp. 459–483, 1991.
[8] H. Wassle and B. B. Boycott, “Functional architecture of the mammalian
retina,” Physiolog. Rev., vol. 71, no. 2, pp. 447–447, 1991.
[9] F. Werblin, T. Roska, and L. O. Chua, “The analogic cellular neural net-
work as a bionic eye,” Int. J. Circuit Theory Applicat., vol. 23, no. 6, pp.
541–69, 1995.
[10] C. Rekeczky, B. Roska, E. Nemeth, and F. Werblin, “Neuromorphic
CNN models for spatio-temporal effects measured in the inner and outer
retina of tiger salamander,” in Proc.6th IEEE Int. Workshop Cellular
Neural Networks Applications, Catania, Italy, May 23–25, 2000, pp.
15–20.
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 07,2020 at 14:01:17 UTC from IEEE Xplore.  Restrictions apply. 
924 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 51, NO. 5, MAY 2004
[11] C. Rekeczky, T. Serrano-Gotarredona, T. Roska, and A. Ro-
dríguez-Vázquez, “A stored program second order/3-layer complex cell
CNN-UM,” in Proc.6th IEEE Int. Workshop Cellular Neural Networks
Applications, Catania, Italy, May 23–25, 2000, pp. 219–224.
[12] S. Espejo, R. Carmona, R. Domínguez-Castro, and A. Ro-
dríguez-Vázquez, “A VLSI oriented continuous-time CNN model,” Int.
J. Circuit Theory Applicat., vol. 24, no. 3, pp. 341–356, 1996.
[13] Y. P. Tsividis, “Integrated continuous-time filter design —An overview,”
IEEE J. Solid-State Circuits, vol. 29, pp. 166–176, Mar. 1994.
[14] R. Domínguez-Castro, A. Rodríguez-Vázquez, S. Espejo, and R. Car-
mona, “Four-quadrant one-transistor synapse for high density CNN im-
plementations,” in Proc.5th IEEE Int. Workshop Cellular Neural Net-
works Applications, London, U.K., Apr. 1998, pp. 243–248.
[15] A. Rodríguez-Vázquez, E. Roca, M. Delgado-Restituto, S. Espejo, and
R. Domínguez-Castro, “MOST-based design and scaling of synaptic
interconnections in VLSI analog array processing chips,” in J. VLSI
Signal Processing Syst. Signal, Image Video Technol., 1999, vol. 23, pp.
239–266.
[16] K. C. Smith and A. S. Sedra, “The current conveyor —A new circuit
building block,” Proc. IEEE , vol. 56, pp. 1368–1369, Aug. 1968.
[17] C. Toumazou, J. B. Hughes, and N. C. Battersby, Eds., Switched-Cur-
rents: An Analogue Technique for Digital Technology. London, U.K.:
Peregrinus, 1993.
[18] T. Serrano and B. Linares-Barranco, “The active-input regulated-cas-
code current mirror,” IEEE Trans. Circuits Syst. I, vol. 41, pp. 464–467,
June 1994.
[19] R. Carmona, “Analysis and design of CNN-based VLSI hardware for
real-time image processing,” Ph. D. dissertation, Dept. Electron. Elec-
tromagn., Univ. of Seville, Seville, Spain, 2002.
Ricardo Carmona-Galán (M’99) received the
degrees of Licenciado and Doctor (Ph.D.) in
physics, in the speciality of electronics, from the
University of Seville, Seville, Spain, in 1993 and
2002, respectively.
From 1994 to 1996, he worked at the National
Center for Microelectronics, Seville, Spain, funded
by an IBERDROLA S. A Grant. From July 1996
to June 1998, he was a Research Assistant in the
Electronics Research Laboratory, Department of
Electrical Engineering and Computer Sciences,
University of California at Berkeley. He is currently with the Department
of Analog Design, Institute of Microelectronics of Seville (IMSE), Centro
Nacional de Microelectrónica (CNM-CSIC), Seville, Spain. Since October
1999, he is an Assistant Professor in the Department of Electronics and
Electromagnetism, School of Engineering, University of Seville, where he
teaches “Circuit Analysis and Synthesis” and “Circuit Synthesis Laboratory”
for the degree of Telecommunication Engineer. His main areas of interest are
linear and nonlinear analog and mixed-signal integrated circuits, in particular,
the design and VLSI implementation of cellular neural networks and analog
memory devices for real-time image processing and vision chips.
Francisco Jiménez-Garrido received the B.S. de-
gree in physics and the B.S. degree in electronic engi-
neering from the University of Seville, Seville, Spain,
in 1998 and 2002, respectively, and is working toward
the Ph.D. degree in the Department of Electronics and
Electromagnetism of the same university.
Since 1999, he has been with the Department of
Analog Circuit Design, Institute of Microelectronics
of Seville, Centro Nacional de Microelectrónica
(IMSE-CNM), University of Seville. He has re-
search interests in linear and nonlinear analog and
mixed-signal integrated circuits for image processing and communication
devices.
Carlos Manuel Domínguez-Matas received the de-
gree in physics in the specialty of electronics from
the University of Seville, Seville, Spain, in 2001, and
is currently working toward the Ph.D. degree in the
field of analog design of image sensors at the same
university.
Since 2001, he has been working in the De-
partment of Analog Circuit Design, Institute of
Microelectronics of Seville, Centro Nacional de
Microelectrónica (IMSE-CNM), University of
Seville. His research interests are in the fields of
design and modeling of analog integrated circuits and mixed-signal test and
design.
Rafael Domínguez-Castro received the degree in
electronic physics, the M.S. degree equivalent in
microelectronics, and the Doctor en ciencias fisicas
degree from the University of Seville, Seville, Spain,
in 1987,1989, and 1993, respectively.
Since 1987, he has been with the Department
of Electronics and Electromagnetism, University
of Seville, where he is currently a Professor of
Electronics. He is also a Member of the Research
Staff, Institute of Microelectronics of Seville, Centro
Nacional de Microelectrónica (IMSE-CNM-CSIC),
University of Seville, where he is a Member of the Research Group on Analog
and Mixed-Signal VLSI. His research interests are in the design of embedded
analog interfaces for mixed-signal very large-scale integrated circuits, design
of CMOS imagers and CMOS focal-plane array processors, and development
on computer-aided design for automation of building blocks analog design,
especially optimization and automatic sizing of basic building blocks for
integrated circuits.
Dr. Domínguez-Castro is the co-recipient of the 1995 Guillemin–Cauer
Award of the IEEE Circuits and Systems Society and the Best Paper award of
the 1995 European Conference on Circuit Theory and Design.
Servando Espejo Meana received the licenciado
en física degree, the M.S. degree equivalent in
microelectronics, and the Doctor en ciencias físicas
degree from the University of Seville, Seville, Spain,
in 1987, 1989, and 1994, respectively.
He is currently a Professor of Electronics in the
Department of Electronics and Electromagnetism,
University of Seville, and also in the Department of
Analog Circuit Design, Institute of Microelectronics
of Seville, Centro Nacional de Microelectrónica
(IMSE-CNM), University of Seville. From 1989 to
1991, he was an Intern at AT&T Bell Laboratories, Murray Hill, NJ, and an
employee of AT&T Microelectronics, Madrid, Spain. His main areas of interest
are linear and nonlinear analog and mixed-signal integrated circuits including
neural networks electronic realizations and theory, vision chips, massively
parallel analog array processing systems, chaotic circuits, and communication
devices.
He is the co-recipient of the 1995 Guillemin–Cauer Award of the IEEE Cir-
cuits and Systems Society and the Best Paper award of the 1995 European Con-
ference on Circuit Theory and Design.
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 07,2020 at 14:01:17 UTC from IEEE Xplore.  Restrictions apply. 
CARMONA-GALÁN et al.: SECOND-ORDER NEURAL CORE 925
István Petrás received the B.Sc. degree from Bánki
Donát Polytechnic, Budapest, Hungary, and the
M.Sc degree from the University of Veszprém,
Veszprém, Hungary, in 1994 and 1998, respectively,
both in information engineering, and is working
toward the Ph.D. degree in the Analogical and
Neural Computing Research Laboratory, Computer
and Automation Research Institute, Hungarian
Academy of Science, Budapest, Hungary.
From October 2000 to May 2001, he was a Vis-
iting Scholar in the Nonlinear Research Laboratory,
University of California, Berkeley. From May 2002 to June 2002, he worked
in the SISTA-COSIC Laboratory, Department of Electrical Engineering,
Katholieke Universiteit, Leuven, Belgium. His research interests include new
parallel image processing methods that can be implemented on cellular neural
networks; analysis and design of spatio-temporal chaotic pattern generation
and analysis, spatio-temporal nonlinear wave filters for two-dimensional signal
processing tasks, somatosensory system modeling, and ultra-high-speed object
recognition and tracking.
Angel Rodríguez-Vázquez (M’80–SM’95–F’96)
received the Liceniado en físcia electrónica and the
Ph.D. degrees from the University of Seville, Seville,
Spain, in 1977 and 1983, respectively.
He is a Professor of Electronics in the Department
of Electronics and Electromagnetism, University of
Seville. He is also a Member of the Research Staff
at the Institute of Microelectronics of Seville, Centro
Nacional de Microelectrónica (IMSE-CNM), Seville,
Spain, where he heads a research group on Analog
and Mixed-Signal Integrated Circuits. His research
interests are in the design of analog front-ends for mixed-signal circuits and
systems-on-chip, telecom circuits, CMOS imagers and vision chips, sensory-
processing-actuating systems-on-chip, and bio-inspired integrated circuits. On
these topics, he has published seven books, 36 book chapters in other books,
approximately 100 journal papers, and about 300 conference papers. He is also
a member of the editorial staff of the International Journal on Circuit Theory
and Applications, Analog Integrated Circuits, and Signal Processing Journal.
Dr. Rodríguez-Vázquez was co-recipient of the 1995 Guillemin–Cauer
Award of the IEEE Circuits and Systems Society. In 1992, he received the
Young Scientist Award of the Seville Academy of Science. In 1996, he was
elected Fellow of the IEEE for contributions to the design and applications
of analog/digital nonlinear ICs. He served as an Associate Editor of the IEEE
TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: FUNDAMENTAL THEORY AND
APPLICATIONS from 1993 to 1995. Currently, he is an Associate Editor for
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS. He
was a Guest Editor of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I:
REGULAR PAPERS Special Issue on “Advances on Analog-to-Digital and
Digital-to-Analog converters.”
Authorized licensed use limited to: Universidad de Sevilla. Downloaded on April 07,2020 at 14:01:17 UTC from IEEE Xplore.  Restrictions apply. 
