Abstract-Circuits of threshold elements (Boolean input, Boolean output neurons) have been shown to be surprisingly powerful. Useful functions such as XOR, ADD and MULTIPLY can be implemented by such circuits more efficiently than by traditional AND/OR circuits. In view of that, we have designed and built a programmable threshold element. The weights are stored on polysilicon floating gates, providing long-term retention without refresh. The weight value is increased using tunneling and decreased via hot electron injection. A weight is stored on a single transistor allowing the development of dense arrays of threshold elements. A 16-input programmable neuron was fabricated in the standard 2 ^m double-poly, analog process available from MOSIS.
I. INTRODUCTION
I N the field of neuromorphic analog VLSI, most research deals with implementing neurons that in some way learn or adapt [8] , [11] , [12] . That is because it is believed that the power of neural systems comes from their adaptive behavior. In fact it has been shown that the function performed by a neuron-the sum of weighted inputs followed by a thresh old-is by itself (without learning) a powerful building block. For many years, theoretical computer science has studied the power of such neurons, in issues related to polynomial versus exponential size circuits and the general problem of NP completeness. The basic problem-build Boolean input Boolean output threshold circuits, to compute useful Boolean functions efficiently. Threshold circuits have been shown to be surprisingly powerful [1] . For example, integer division can be implemented by a polynomial-size threshold circuit of constant depth [3] , [23] . In other words, if one is to implement a threshold circuit to compute the division of two n-bit integers, one needs polynomially many, in n, threshold elements. On the other hand, using the traditional logic circuits, composed of AND, OR, and NOT gates, requires exponentially many gates. That is also the case with simpler functions such as exclusive-0/? and and integer addition.
Many results from the theory of threshold circuits could be applied to the implementation of circuits on silicon. Results such as the relationship between the maximal size allowed for the weights and the power of the resulting element or circuit [6] , [91, not to mention efficient designs for XOR, ADD, MULTIPLY, and other useful functions, see [13] , [14] , and [17] . For example, a simple application of the theory led us to the introduction of a multiple threshold element, [5] . The latter reduces the area of the layout from 0(n 2 ) to 0(n) for certain Boolean functions, in particular symmetric functions, such as PARITY.
Our research has three distinct goals.
1) The implementation aspect. To design and implement efficient threshold elements on silicon.
2) The theoretical aspect. To leverage the work done in theoretical computer science in order to design high performance threshold circuits in a systematic way.
3) The programmable aspect. To introduce threshold ele ments as building blocks in FPGA's. Implementations of threshold circuits were proposed already in the 60's and 70's [2] , [24] , [27] , and more recently in [14] and [21] . To our knowledge, the theoretical results on threshold circuits have not been linked to any work involving silicon implementations. Programmable neuron-based hardware has been recently proposed [20] , [22] . In the implementation section below, we show how those relate to our work. For a short overview of FPGA's see [25] . In Section II, we define the linear threshold element. In Section III, we compare threshold circuits to traditional logic circuits. In Section IV, we discuss the programmable aspect of the design. Section V shows the VLSI implementation and testing results. Finally, Section VI presents the multiple threshold element mentioned above. This element was presented in [5] from the theoretical point of view. It was compared to traditional threshold circuits and (AND, OR, NOT) circuits. Some of the results in [5] are summarized in Section VI which also presents an implementation of the element on a 2 /zm-technology 2 mm x 2 mm chip. □ One may argue that even though LT circuits are more powerful, their building blocks are more complex and therefore will require a larger area in the circuit layout. This argument is correct to some extent. However, we hope that the exponential to polynomial decrease in the number of required elements dominates the penalty introduced by an increase in their size. The following section addresses the issue.
IV. PROGRAMMABLE VERSUS HARDWIRED WEIGHTS
One can look at FPGA's as circuits of elements in which the function that each element computes can be programmed, that is it can be chosen among a set of available functions. In traditional FPGA's that set consists of AND, OR, and NOT. We propose a larger collection of functions, namely the set of Linear Threshold Functions, LT.
All the information about an LT gate is contained in the weights and threshold. We consider two ways of implementing the weights. 1) Hardwired weights are encoded in the width to length ratio of a transistor. 2) Programmable weights are stored as non volatile charge on a floating gate. Hardwired weights cannot be changed once the circuit has been fabricated, while programmable ones can. Hardwired weights present an interesting problem in terms of automated layout. Some functions such as the comparison function, computer, require weights ranging from 1 to 2 n / 2 . AND, OR and all symmetric functions can be implemented with identical weights. This difference implies that using hardwired weights, some LT gates are larger than others.
Using programmable weights simplifies the layout, and allows one to modify the function that the LT element com putes. In the next section we describe the details of the implementation. 
V. IMPLEMENTATION AND RESULTS
In [22] the authors have fabricated a neuron-based circuit that implements an arbitrary Boolean function. We implement an arbitrary threshold element (a limited set of Boolean functions). The actual function is selected by modifying the weights. Fig. 4 shows the schematic. The threshold element consists of 16 nFET transistors with common source and drain, one pFET and two inverters. In the case of the programmable LT element, the 16 transistors are pbase nFET's with an isolated poly layer (floating gate). Also for the programmable case, an additional nonfloating gate nFET is included. It is used for programming the weights as explained below.
The 16-input threshold element was fabricated using the standard 2 /im double-poly, analog process available from MOSIS. 
A. Description of Operation
The input transistors serve as multipliers. The multiplication relies on the fact that the inputs are Boolean, 0 V for a logical 0, and X volts for a logical 1, where X can vary from 1 to 5 V. An input generates current proportional to the corresponding weight. The sum, E™ =1 WiX\ comes naturally as we connect all transistors to the same drain and source. The threshold is subtracted using a pFET (Fig. 4) . That is another difference with the approach of [21] where a capacitive sum of voltages is used, rather than a sum of currents. Finally two inverters provide hard thresholding pulling the output to logical 0, or logical 1.
B. Programming the Weights
We store the weights on polysilicon floating gates, using a single transistor per weight, providing long-term retention without refresh. To program in a new function one modifies the weights via tunneling and hot electron injection, see [11] , [12] , and [28] for similar applications of floating gates. There is a single tunneling line per LT element by means of which one can clear its weights. To program the weight separately we use hot-electron injection. For example, the weight corresponding to element i and input j (transistor (i,j) on Fig. 6 ), is addressed by selecting line i (Fig. 4) and input j. In other words the pins used as inputs during normal operation of the chip are also used to program in the function, no extra pins are needed (except for one select line per element).
As shown in [7] an analog memory cell, which is slightly more complex than the single transistor storage used here, can store up to 14 bits of information, an amount largely sufficient for most practical threshold functions. The maximum clock rate was found to be 1 MHz. Both the input and output are shown. The output is taken after the first inverter (see Fig. 4 ). The 16 input pins are all connected to the same input signal. The output signal is attenuated and lags behind the input. That is also due in part to the pads used. Fig. 9 shows the response.
C. Measurements and Discussion
The static power dissipation depends on the particular value of the inputs and threshold. By varying them a power dissipation of the order of 1 mW was observed. The maximal power dissipation occurs at values of X such that the sum E WiXi is close to the threshold. At those values we also may get an unstable behavior; noise may bring the output to either logical 0 or 1. In general in such situations the circuit-delay is high, since it takes a long time for the output to stabilize. One can avoid this problem by selecting the weights, W{ in such a way that the above situation never occurs. That is, for all inputs X, |E W{Xi\ > e, where e is the margin. For more details on how to set the margin see [4] .
The above measurements are meant to provide a qualitative characterization of the prototype. In our initial implementation no steps were taken in order to optimize parameters such as power dissipation, speed and noise margin. For example, using larger inverter transistors can increase the speed of the circuit, at the expense of power. (with polynomialy many transitions) Boolean function of the weighted sum of its inputs (see Fig. 10 ).
What is the advantage of LTM wijh respect to LTt We show that LTM circuits are more amenable in implementation than LT circuits. In particular, the area of the VLSI layout is reduced from 0(n 2 ) in LT circuits to 0(n) in LTM circuits, for n input symmetric Boolean functions.
Definition 2: (LT gate with multiple transitions-LTM)
A function / is in LTM if there exists a set of weights Wi e Z,l <i < n and a function h: Z -> {0,1} such that
for all X e {0, l} n .
The only constraint on h is that it undergoes polynomialy many transitions. A single LTM element can implement the n-input parity function. A single layer produces multiple addition. Fig. 11 shows the LTM element used to compute bit 3 of the addition of two 4-bit integers. For more details, examples and proofs to the above claims refer to [4] and [5] ,
The theoretical results about LTM can be applied to the VLSI implementation of Boolean functions. The idea of a gate with multiple thresholds came to us as we were looking for an efficient VLSI implementation of symmetric Boolean functions. Even though a single LT gate is not powerful enough to implement any symmetric function, a 2-layer LT circuit is. The LT 2 layout of a symmetric function requires area of 0(n 2 ), while using LTM one needs only area of 0(n). Implementing a generalized symmetric function in LT 2 requires up to n LT gates in the first layer. Those have the same weights Wi except for the threshold WQ. Instead of laying out n times the same linear sum EJ WiXi we do it once and compare the result to n different thresholds. The resulting circuit corresponds to a single LTM gate. The LT 2 layout is redundant, it has n copies of each weight, requiring area of at least 0(n 2 ). On the other hand, LTM performs a single weighted sum, its area requirement is 0(n).
One such element was fabricated on a 2 mm x 2 mm chip, using 2 /j,m technology from MOSIS. It has 16 inputs.
The output consists of a 4-bit bus addressing a 4-bit memory cell. The weighted sum is implemented in the Neuron MOPS fashion, as a capacitive sum of voltages, see [16] , [21] , as opposed to a sum of currents used in the layout of the LT gate; Fig. 4 .
VII. CONCLUSION
We have fabricated and tested a 16-input programmable lin ear threshold element using floating gates to store the weights. Such storage requires no refresh and allows the weights to be modified via tunneling and injection. We have fabricated a second chip implementing a 16-input multi-threshold element. A single multi-threshold element can implement XOR and integer addition. It takes advantage of the fact that some useful Boolean functions can be implemented by a two-layer LT circuit in which all elements of the first layer have the same weights. That allows to reduce the area from 0(n 2 ) to 0(n), by implementing the weighted sum only once.
We focused on a qualitative characterization of the proto type, as a proof of concept, rather than a quantitative com parison with traditional digital logic. The theoretical results in threshold logic suggest that the number of elements used in a threshold circuit is significantly smaller than the corresponding number for digital logic circuits, for certain useful Boolean functions, as the number of inputs grows. However, for practical purposes, a thorough, qualitative comparison between the "threshold element" and the "traditional digital logic" element is required.
From the practical point of view one possible extension of this research is to devise a systematic (maybe automated) way of generating the layout of threshold circuits with hardwired weights. Another direction of research is to incorporate pro grammable threshold elements as building blocks in FPGA's.
