Analog weight buffering strategy for CNN chips by Liñán Cembrano, Gustavo et al.
Analog Weight Buffering Strategy for CNN Chips 
G. Liiidn-Cembrano, A.  Rodriguez-Vdzquez, R. Carmona, S. Espejo and R. Dominguez-Castro 
Instituto de Microelectrhica de Sevilla. IMSE-CNM-CSIC 
Avda. Reina Mercedes s/n 41012 Sevilla (SPAIN) 
Te1.:+34955056679, Fax: +34955056686 
E-mail: angel@imse.cnm.es 
Abstract' 
Large, gray-scale CNN chips employ analog signals to 
achieve high-density in the internal distribution of the tem- 
plate parameters. Despite the design strategies adopted at the 
circuitry employed to implement the weights, accuracy is 
ultimately limited by the controlling signals. This paper pre- 
sents a buffering strategy intended to achieve 8-bit equiva- 
lent accuracy in the distribution of the internal analog 
signals, as employed in the chips ACE4k [l], ACEI6k [2], 
and CACElk [3]. 
1. Introduction 
Spatial uniformity is a key attribute for correct operation 
of CNN processors. Such uniformity can be degraded due to 
mismatch among circuits at different array locations and to 
errors in the signals that program the operation of the cells. 
These signals are generated at the chip periphery and distrih- 
uted across the whole array. The problem of distributing 
electrical signals to a very large array of cells must be care- 
fully analyzed and understood since it might break up the 
spatial uniformity required in CNN arrays. The target con- 
sists of making "equal" two electrical signals at different, 
widely separated, locations within the array. 
The issues to be confronted are different depending upon 
wether the signals to be made equal are analog (used for car- 
rying parameters) or digital (used for canying instructions). 
This paper focuses on the former. Consider that every build- 
ing block has been designed for 8-bit equivalent accuracy 
based on the formulation of systematic and random errors 
[4]. It basically means that you have designed each block by 
spending the minimum amount of area which ensures this 
analog accuracy. Consider now the synapses and suppose 
that the distribution of the weights is not so accurate. Then, 
what you really obtain is a system which is less accurate than 
you expected. The operator - multiplier - provides the 
required precision but the operand - weight - does not. At 
the very end, your cell density - which has been penalized 
1. This work has been partially funded by DICTAM IST-1999-19007, 
and ClCYT TIC1999-0826 
by designing more accurate multipliers - is not justified by 
the performances that you get! 
This paper is intended to illustrate the problems of signal 
distribution to large arrays, and to detail the solutions 
adopted in ACE4k, ACE16k and CACEIk. 
2. Driving an Array of Low Impedance Nodes 
Let us start our analysis by illustrating the problem of 
driving an array of low impedance nodes. This choice is not 
fortuitous since weight signals are provided to the one-tran- 
sistor synapses in our chips via one of its diffusion terminals. 
Hence, the analog buffer providing the weights must also 
provide the synapse current. Therefore, we must consider the 
voltage drop across the conductive path which connects the 
output node of the buffer - somewhere in the programming 
block - to the diffusion terminal of every synapse. 
Consider the case of distributing a voltage level V, to 
the array. Let us assume the electrical model in Fig. 1, Here, 
weight voltages are column-wise transmitted. Assume the 
very worst case in which all the synapses drive the maximum 
current I .  Let R, be the impedance of the conductive path 
which connects the same weight terminal in two adjacent 
cells in a column. Let R ,  be the output impedance of the 
buffer connected to the output of the DAC, let Rc be resis- 
tance of the segment which connects the column to the hori- 
zontal line which distributes the weight, finally let us define 
an additional resistor R E  which accounts for the resistance 
of the segment between two columns of the horizontal bus. 
Obviously, the larger the resistivity of the path from a cell to 
the huffer, the larger the error in the transmitted voltage. If 
we evaluate the voltage drop between V, and V N , M  - 
N = N,,, ; M = N C O ,  - as an upper limit for that error it 
is found that, 
V w -  V N , M = I ,  [ N r o , .  N C O , .  Ro + 
Let us now assign numerical values for the constants in eq. 1. 
First of all we need the synapse's current consumption that 
O-J803-J761-3103/$17.00 02003 IEEE III-522 
I 
"w . . 
I . 
. . 
& 
Fig. 1. Model for the Study of the Current Distributing to the Array of Low Impedance Nodes 
. 
in the case of the ACE4k prototype is about' 1.XpA. In 
order to evaluate R ,  we can use the following expression 
where Ron,,= 30mR/sq is the sheet resistance of the third 
metal layer, Lcel, = 102.2 pm is the cell height, while 
Wn,e, is the width of the metal layer driving the weight lo 
the cell. Since we need to transmit 20 weights and the 
width of the cell is IZOpm, the maximum width of each 
metal line is 6pm, which after including the separation 
between lines is reduced to about 5pm.  Using these values 
in eq. 2 yields, 
R ,  = 0.6R (3) 
Without loss of generality, we make R, = 1 C2 -as it corre- 
sponds to assuming that this segment is twice as long as the 
cell height. On the other hand, since R ,  is the output imped- 
ance of the buffer, we will use a value of about3 0.1 R . 
Finally, R E  can he estimated as, 
R E  = ' (wce l l 'w€ )  (4) 
This value accounts not only for the signal component but also for 
the offset term which must be cancelled by the on-cell current mem- 
ow 111. 
This implies lo assume that the third metal layer will be fully 
employed for weight routing purposes. 
The VBIUCS of R,j and R< do not have imponam influence in the 
final result. 
where Wee,, is the cell width and W E  is the width of the 
horizontal metal line. Assuming that we could use a metal 
line of 50 pm yields, 
RE E 60 mR. ( 5 )  
by using these values in eq. I, and considering that 
N r O w  = N C O /  = 64 yields, 
V , , - Y N , , ,  = 17 mV (6)  
Since ACE4k uses a weight signal swing of 800mV , and 
the desired accuracy' is 8-bit the maximum allowable error 
should he, 
(7) 
I 
2 
max(Err)  5 -LSB = 1.5625 mV 
However, the obtained accuracy -even allowing the horimn- 
tal bus to occupy Imm - is only 5.5-bit . This is an 
important conclusion; it means that, even by using device 
areas in the cell which guarantee X-hit accuracy, the 
obtained precision, because of transmission degradation, is 
only 5.5-bit. Hence, the increase of area in the analog pro- 
cessing circuitry is not worth. Accuracy is constrained by 
different mechanisms than mismatch. 
~ ~ ~ 
4. Notice that since we use 20 weights, the total width ofthc horizon- 
tal weight bur will he ahout I mm which i s  a huge value. 
5 .  We require 8-hi1 equivalent accuracy for evely analog block in the 
chip. 
111-523 
3. The Distributed Buffer Strategy 
Let us now examine eq. I to find out a solution to the 
problem. Easily, one understands that the factor determining 
the transmission error is the large number of cells connected 
to the buffer. Indeed, most of the error is due to the horizontal 
metal line driving the weights to different columns. Here, the 
influence of the resistance of every segment is multiplied by 
approximately N C O ,  x N,,,  . Hence, a suitable solution 
consists of minimizing the number of columns connected to 
the buffer. Such solution relies on the replication of the out- 
put branch of the original buffer - properly scaled in order to 
provide current to a single column - and on connecting such 
an output branch to every column in the array. Providing 
those secondary level buffering stages with a high input 
impedance will avoid for static voltage drops across the lines 
which drive them. Therefore, by renaming R, as the output 
impedance of each of those output stages to find I ,  
2 
V,- V N , M  = I .  [ N , , ,  . R o  + N,,; R ,  + ... 
N,, ,  ' (N , , ,  - 1) (8) 
. R,l 2 
where the effect of R E  disappears due to the high input 
impedance of the secondary stages. The evaluation of this 
expression yields, 
V , - V N , M = 3  mV (9) 
which still does not satisfy our requirements. 
Next step is to further reduce the number of rows driven 
by each secondary level buffer. Due to the spatial uniformity 
required to CNN arrays, the only way to do that is by repli- 
cating the secondary level buffers also at the top of each col- 
u m n  Now, the effective number of cells driven by each 
buffer is halved and we get, 
m Output Branches 
The evaluation of this new expression, by using IO C2 as 
the output impedance of each elementary column buffer, 
yields, 
V,- V ,  ,=I.; mV (11) 
i' i 
which satisfies our requirements. 
Fig. 2 shows the block diagram of the distributed buffer 
topology, while Fig. 3 shows the schematic of the complete 
r v  44 4 4 4 4 4 4% 4 4 7 7 T? I 'AAAAAAAAAAAAAAA 
"w ? OutputBranches 
Fig. 2. The new topology of distributed buffers 
buffer. Its first stage consists of a folded-cascode OTA while 
the output branch is a modified source follower which uses 
an negative feedback loop in order to lower the output 
impedance of the buffer as compared to what happens in con- 
ventional 2-Transistors source-follower structures. More- 
over, the current through transistor M b  does not depend on 
how much current must be sent to the array since it is fixed 
by current source M, . Conversely, in the case of the 2-T 
source-follower, the current through the transistor which 
plays the same role as M,, and which fixes the output 
impedance of the buffer as R, = ( g m b .  A) - '  , is given by 
l b  = I b i o s - l o u r .  Finally, the large capacitance2 at node 
Out  serves for compensation purposes. 
4. The Effect of Mismatch: Horizontal Bus 
The distributed buffer topology bases its functionality on 
avoiding voltage drops across the horizontal buses which 
r - - - i  - 
Template 
Memory 
Fig. 3. Distributed Buffer Schematic. 
2. N,,,/2 times the input capacitance of  the synapses plus the para- 
1. In this case, V,,, is the voltage at the last cell of each column. sitic capacitance of a long line crossing the array. 
UI-524 
drive the signals to the output stages located at the top and 
bottom of every column. However, it relies on a perfect 
matching between output branches in different columns. Of 
course, this will be not true in practice. 
Let us now consider the circuit in Fig. 4. Here, each out- 
put stage is modelled by one transistor - Mb in Fig. 3 - and 
two current sources I A .  Icj - for j = l . , .Nco,  - wbicb 
account for transistors GC and Ma in Fig. 3. In addition, we 
will consider that the current which is required by the j-th 
column is I,,{, . If we introduce mismatching effects we can 
write, 
lAj = IA+61Aj  I C j  = I c i 6 1  Ci (12) 
' c o l j  = I A j - I C j  (13) 
S I j  = 61 A i  -61v (14) 
where [ S I A y  S l y ]  are due to random fluctuations of the 
technological parameters. Ideally, 
however, due to mismatch, each output stage introduces an 
additional current in  the output node given by, 
Then, it can be demonstrated that the standard deviation for 
the difference of the output voltages of two branches is given 
by, 
a ( A V )  = R E . o I .  (15) 
k =  I 
where I ,  
(16) 
2 2 2 a I = o I A + o  I C  
and RE comes to the scene again as the resistance of the seg- 
ment which connects the outputs of two adjacent output buff- 
ers. By using our parameters in eq. 15 we obtain that the 
required width of the horizontal metal line connecting the 
output nodes of adjacent output buffers must be 2s 
-A 
Fig. 4. Schematic for Illustration of the Mismatch Effect 
I ,  The values far 0'1, and allc were obtained f" Montecarla 
simulations once the buffer was already designed. 
2. In the final layout of the chip we employed a 17pm wide metal 
line in order to furlher reduce this voltage drop. 
W E  > 8 p m  (17) 
Therefore, this distributed buffer topology solves two prob- 
lems at the same time, that of driving large arrays of low 
impedance nodes, and also helps in reducing the area 
required for routing lines. 
5. Conclusions 
This paper has addressed the problem of accurately dis- 
tributing analog signals to very large arrays of low-imped- 
ance nodes. In these systems, non-null metal resistivity leads 
to voltage drops across the lines steering the signals through 
the network. This is particularly important for weights, as 
their spatial uniformity is one of the most characteristics 
properties of the entire systems. The presented technique, 
successfully applied to three new generation CNN chips, 
consists of avoiding the driving high currents through long 
resistive paths. Instead, high impedance lines and feedback 
mechanisms are employed in order to short the distance DC 
currents must travel until reaching the synapses weights ter- 
minals. The final topology reduces this distance to half a col- 
umn of cells - the shortest path which does not require 
modifying the spatial uniformity of the array - and produces 
final DC voltage drops of less than 2mV, even if the place 
where the signal is generated and the place where it is 
applied are separated by 0.7cm. 
6. References 
[ I ]  G. LiR(tn, S.  Espejo, R. Dominguea-Castro and A. 
Rodriguez-Vdzquez, "ACE4k: An Analog 110 64x64 
Visual Microprocessor Chip with 7-bit Analog 
Accuracy". International Journal of Circuit Theoly and 
Applications, Vol. 30, pp. 89-1 16. March-June 2002. 
[2] G. Liiiin, S. Espejo, R. Dominguez-Castro and A. 
Rodriguez-Vizquez, "Architectural and Basic Circuit 
Considerations for a Flexible 128 x 128 Mixed-Signal 
SIMD Vision Chip". Analog Integrated Circuits and 
Signaf Processing, Vo1.33, No. 2, pp. 179-190, 
November 2002. 
[3] R. Cannona, E JimBnez-Garrido. R. Dominguea-Castro, 
S .  Espejo, T. Roska, C. Reckezki and A. Rodriguez- 
Vbzquez, "A Bio-Inspired 2-Layer Mixed-Signal Mixed- 
Signal Flexible Programmable Chip for Early Vision". 
IEEE Transactions on Neural Networks, (submitted). 
[4] A. Rodriguez-Vazquez, G. LiiiPn, S. Espejo and R. 
Dominguez-Castro, "Mismatch-Induced Trade-offs and 
Scalability of Analog Preprocessing Visual 
Microprocessor". Analog Integrated Circuits and Signal 
Processing, to appear in 2003. 
