Abstract -The design details and test results of a field-programmable analog array (FPAA) prototype chip in 1.2-pm CMOS are presented. The analog array is based on subthreshold circuit techniques and consists of a collection of homogeneous configurable analog blocks (CAB'S) and an interconnection network. Interconnections between CAB's and the analog functions to be implemented in each block are defined by a set of configuration bits loaded serially into an on-board shift register by the user. Macromodels are developed for the analog functions in order to simulate various neural network applications on the field-programmable analog array.
I. INTRODUCTION
IELD-programmable gate arrays for prototyping digi-F tal circuits are now commercially available from several vendors (as in [l] ). Conspicuously absent in the literature is a field-programmable analog array (FPAA) and perhaps for good reason-many more challenges must be addressed such as bandwidth, linearity, signalto-noise ratio, frequency response, etc. Noteworthy, however, are several recent commercial products and publications in the literature that offer fixed topologies and programmable "coefficients." For example, most singlechip analog neural network implementations are designed for a fixed network topology even though the synaptic weights are programmable [41, 191, [121, [131. As pointed out by Sivilotti [14] , it would be advantageous to devise an IC strategy whereby various analog networks can be realized, as determined by the user, in some type of reusable generic prototyping medium. The approach taken in this paper is to focus on a small number of specialized analog functions that are realized using subthreshold techniques. Furthermore, differential circuitry is used for noise immunity, and current-mode signaling provides addition and subtraction operations. In addition, memory functions such as integrators, coefficient storage, and sampled analog delay lines are also supported.
As manufacturability and cost consciousness has grown among circuit designers, an awareness of the importance of testability has also increased. But, testing of analog circuits is a difficult task. The FPAA concept provides an Manuscript received May 14, 1991; revised July 16, 1991. This work was supported by an operating grant from NSERC and Micronet.
The authors are with the VLSI Research Group, Department of Electrical Engineering, University of Toronto, Toronto, Canada M5S 1A4.
IEEE Log Number 9103054.
interesting approach to the testing of analog IC's. Since the circuits in every programmable block are identical, and since the input and output of each block are accessible through the routing network, the FPAA provides good controllability and observability, such that each programmable analog function can be tested exhaustively.
In addressing these challenging requirements, this paper presents a novel FPAA and associated test results. In Section 11, we outline various functional requirements for the implementation of neural networks. Section I11 presents the design of a configurable analog block (CAB). Section IV presents the strategy for interconnecting CAB's. Section V discusses the implementation of the prototype chip and the procedure for loading the coefficients and for configuration of the array. The experimental results for the prototype chip are also presented in this section. The simulation of neural network applications, using macromodels of the circuits discussed in Section 11, are discussed in Section VI. Finally, the conclusions are presented in Section VII.
FUNCTIONAL REQUIREMENTS
The functions required to implement most neural networks are addition, threshold operation, coefficient multiplication, and signal multiplication (used in high-order neural networks). In networks capable of "learning," the coefficient of the coefficient multiplier is often required to be adjustable. In order to realize these functions in CMOS technology, a subthreshold circuit technique is used, offering the advantage of low power dissipation. Since a neural network usually has many neurons and synaptic weights, the required circuits must be simple to permit several functional blocks to be implemented in the same die. In the following, circuits for realization of the required functions are presented. These circuits utilize both differential current-mode and differential voltagemode signals. The swing of the voltage-mode signals is kept small (except for the case of the threshold operation) to maximize the speed of operation. The operation of addition among signals is obtained by representing the signals in current mode and then adding the currents at the corresponding nodes.
The threshold operation is similar to the behavior of an analog comparator. It has two states (low and high states) and a gain region, which is required for the transition 0018-9200/91/ 1200-1860$01 .OO 01991 IEEE between the two states. Fig. 1 Although the multipliers and the comparators can be cascaded together, direct connection of the multipliers is not possible. Since this structure is usually desired in high-order neural networks, a current buffer is designed for this purpose. The buffer can be obtained by connecting the gate to the drain of M1 in Fig. 1 instead of connecting it to the gate of M2. Since this circuit also utilizes the translinear technique, the transfer characteristic between the input and output current is very linear.
CONFIGURABLE ANALOG BLOCK
A typical [l] field-programmable gate array (FPGA) consists of configurable logic blocks (CLB's) that perform the required logic function and an interconnection network to provide connections between CLB's. The FPAA presented in this paper is modeled on a similar strategy, although there are many issues unique to analog circuits that must be accommodated. The analog functions will be grouped into CAB's and the interconnection network will connect them together. One important observation, however, is that the circuit topologies for different functions are similar. Therefore, different functions can be conveniently obtained by configuring the circuit primitives (Table I (a)> with a few transistors that act as analog switches (Table I( 
The design of the CAB is shown in Fig. 3 . The configuration of the CAB is determined by a local 3-b shift register. Each bit of the shift register controls various switches and/or multiplexors within the CAB itself. For example, when the shift register contains the bit pattern 001, the CAB is configured as a four-quadrant multiplier with X and Y as inputs and 2 as output. Table I1 shows all the configurations inside the CAB.
Though various specialized types of CAB's could be defined in the same manner, controlling the level of granularity of the CAB is critical to minimizing the number of 1 / 0 lines that ultimately must be accommodated by the interconnection network.
IV. INTERCONNECTION NETWORK
Interconnection networks can usually be categorized into two types: crossbar and multistage (hierarchical). The crossbar interconnection network provides full interconnection capability between any two connected elements, which is known as nonblocking. It also provides less delay on data transfer. However, the crossbar approach exhibits an area growth rate of O ( N 2 ) for connecting N inputs to N outputs. On the other hand, hierarchical interconnection networks evolve with slower cost growth. Examples of this type of network include Banyan [3], omega [5] , and . These networks have a cost growth of O(N1ogN) at the expense of longer delay between inputs and outputs. For the design of the array, the idea of using a hierarchical network will be appropriate because of the area requirement. The delay of the network is acceptable since the CAB's utilize current-mode and small-swing voltage-mode signals which minimize the delay. However, those hierarchical networks mentioned above are designed for data transfer or routing between two layers of elements. They do not allow connections between elements in the same layer. Therefore, the ele- ments are not capable of being fully connected. Furthermore, as the CAB's are increased, the dimension of the array will be proportional to N x log N and the layout of the array will be rectangular. However, a layout with an aspect ratio of one is usually desired in a VLSI design. Consequently, the network mentioned above may not be suitable for the design of the array. From a VLSI point of view, the area of the layout is of most concern. Therefore, the interconnection network must have the property of "area universality" [SI. A network that is area universal is a network that, for a given area, can efficiently embed any circuit whose size is only slightly smaller. One such area-universal network is a
LEE AND GULAK CMOS FIELD-PROGRAMMABLE ANALOG ARRAY 1863
The coefficients required by the coefficient multiplier are loaded before the actual configuration of the entire array. The bit pattern 100 for the CAB is dedicated to Fig. 4 . ~n area-universal fat tree (the dotted box indicated by 9 this operation. During the coefficient loading cycle, the control unit will shift a ONE followed by ZERO'S into the highlights the motohme imdementation).
. -_ .
I I
array. When the shift registers of a particular CAB confat tree [8] shown in Fig. 4 , which is also a hierarchical network. In fact, Leiserson [7] showed that the fat tree was near optimum in terms of the number of elements, required area (or volume), data transfer delay, and growth rate of routing channel size. Based on this network, the FPAA is designed with the CAB'S (as the leaves of the fat tree) connected by switch blocks (SB's) in different levels of the fat tree. The switch blocks can be realized by using crossbar switches. However, the switch blocks can be simplified by constraining the number of allowable connections. With this in mind, the switch blocks just above the leaves are designed as shown in Fig. 5 . The connections made by this SB will be determined by the contents of a local 10-b shift register, which is specified by the user. The SB also allows for polarity changes of the differential signals. Therefore, addition or subtraction among current-mode signals is possible. The overall structure of the SB's in different levels is designed by repeatedly embedding different netlists of various types of circuits (e.g., a Hopfield network) to the fat-tree network and appropriately locating the ON/OFF switches to fit these systems onto the network.
V. IMPLEMENTATION AND PROGRAMMING OF THE
PROTOTYPE DESIGN In order to demonstrate the feasibility of the circuits, a prototype containing the portion inside the dotted box 9 (Fig. 4) was designed, which consists of two CAB'S and an SB. The shift registers of the SB and the shift registers of the CAB are connected together as a chain. The configuration bits are set by shifting the bits into the chip serially. A control unit is needed to determine the required bit patterns for the configuration cycles as well as the clock phases of the shift registers to control the begin and end of the different cycles.
tain the pattern 100, the global write signal w is set low by the control unit and the required coefficient value will be loaded to this CAB through the global wires G1 and G2 with an external differential voltagesource. Before the next ZERO is shifted into the array, W will be reset high. Since only the first bit is a logic ONE, exactly one CAB will be in the coefficient loading mode. Therefore, the coefficients will be loaded into the corresponding CAB's sequentially.
After the coefficient loading cycle, the control unit will shift a specified bit pattern into the chip. This bit pattern will determine the operations of the CAB'S and the connections bztween CAB'S. During this cycle, the global write signal W will always be high in order to prevent disturbance of the coefficient values. At the end of this cycle, the clock phases for the shift registers will be kept low and the analog array is now configured and ready to operate.
The experimental prototype is fabricated in 1.2-pm CMOS technology. Some specifications of the CAB and the SB are shown in Table 111 . The die photo of the prototype chip is shown in Fig. 6 . When testing the chip, one of the CAB's was configured as a signal multiplier and the other one was configured as a current buffer. Internal interconnections between the CAB's themselves and the external connections from the chip were through the SB. The configuration bits were shifted into the chip by an auxiliary off-chip control circuit. Since inputs to the signal multiplier require a differential voltage-mode signal, two external variable voltage sources were used for one input to the multiplier. The other differential input to the multiplier was obtained from the output of the current buffer. Since the circuits operate in the subthreshold region, the differential output current is in the nanoampere range. In order to measure current in this range, two electrometers were used. The dc characteristic of the DC measurement of the output was measured by varying the two current sources with different fixed voltages at the inputs of the multiplier. The results are shown in Fig. 7 . The output differential current was fairly linear with respect to the input differential current. The worst-case percentage error over the -100-to 100-nA range of the input differential current was about 2%. The reduction in the observed nonlinearity is most likely due to the nonlinearities in the buffer. Since the MOSFET's in the buffer may not operate inside the subthreshold region, it may violate the four-quadrant Gilbert multiplier.
exponential Z , , -V,, characteristic required for the translinear circuit technique. The output offset current is about 20 nA, which is 3.5% of the maximum output current. The noise of the array was measured by taking samples at a rate of 50 samples/min at the output terminals for an extended period of time (20 mid. The root mean square value of the noise was less than 1 nA.
Another experiment involved the use of the current comparator by configuring one of the CAB'S with the bit pattern 011. The connections between the CAB'S and the instruments were the same as the previous example. Fig. 8 shows the experimental results, which were compared with the HSPICE simulations. The experimental results had an offset current of about 30 nA.
VI. SIMULATIONS USING MACROMODELS
In this section, the analog operations discussed above are modeled by a set of simple circuit models. These models are then used in the simulation of the neural networks to reduce the computational requirements.
A. Macromodels of the Circuits
The macromodels are developed by first simulating the dc and ac characteristics of the analog operations and then modeling the operations with sets of resistors, capacitors, and dependent sources according to the simulation results derived from the extracted layout. As an illustration, the schematic diagram of the model for the current buffer is shown in Fig. 9 . The models of other analog functions can be obtained using the same procedures. Different values of the resistors, capacitors, and dependent sources may be obtained if different processes are used. During the simulation of the circuits, the biasing current is set to 100 nA. The dc and ac characteristics of the models and the actual circuits are well matched. Nevertheless, the settling time of the actual circuit is longer than that of the model since the impedance of the actual circuit is varying as a function of the input differential current while the model has constant impedance over the entire input range. However, the model still provides a lower bound for the settling time. The model of the constant multiplier is the same as the signal multiplier with one of the inputs set to a constant differential voltage. Since the loading of the voltages to the analog memories is not involved in the simulations, the model of this operation is omitted. The model for the interconnection switches is modeled by an RC network. The estimated wiring capacitance is also included in the model.
B. Neural Network Simulation Using the Macromodels
Based on the models developed, the multilayer feedforward network and the Hopfield network are simulated to illustrate the application of the FPAA in this area. Fig. 10 shows the schematic diagram of a single-layer neural network realized by the circuits discussed above. Four switches are assumed in each connection between different functions. The weights of the network were obtained by training the network with the delta rule [2] . The input differential voltage values of the multiplier models are set according to the resulting normalized weight values. The input and output patterns used during training are shown in Table IV . The response of the network is correct and shown in Fig. 11 .
The behavior of the networks due to the offset errors of the circuit is also studied and simulated with the use of a multilayer network to solve the classic EXCLUSIVE-OR problem. Table V 
-rrspomc 01 neuron 6 rcbPo"lc of neuron 1 . . . . . nA (5% of the maximum input current), incorrect results were produced as shown in Fig. 13 . As shown in the simulation, the offset errors of the circuits may affect the performance of the neural networks. For the case of Hopfield network, the patterns are stored at the minimum points of the following energy function:
where oi the output of neuron i, wij the weight from i to j, Bi the constant input to neuron i.
The offset errors of the circuits are incorporated into the constant input Bi7s and therefore may affect the minimum points of the above function. A n incorrect pattern will be recalled unless the minimum points are widely spread. If a correct pattern is recalled, the offset errors will cause a variation in the settling time. Fig. 14 shows the schematic diagram of a four-neuron Hopfield network. The output of the network with random offset errors in the range of -15 to 15 nA (3.75% of the maximum input current) at the input of each neuron is shown in Fig. 15 . The weights of the network are obtained by the outer-product rule [lo] . The network has two minimum points which are (+ 1, -1, + 1, -1) and ( -1, + 1, -1, + 1) . Since the two points are widely spread, the resulting pattern is correct but various settling times are evident with various random offset errors. VII. CONCLUSION The concept of developing a field-programmable analog array (FPAA) to realize different neural network topologies is proposed. The array is designed using subthreshold circuit techniques to achieve very low power dissipation. The interconnection network is based on an "area-universal" fat tree. Consequently, the layout of the array is compact and area efficient. An experimental prototype design was successfully implemented and tested. The prototype can be dynamically reconfigured to perform various analog functions. A set of macromodels for the circuits is developed. These models are then used in the simulation of various neural networks. Based on the prototype and simulation results, the FPAA concept appears to be a promising candidate for neural network applications.
