Circuits of threshold elements ( Boolean input, Boolean output neurons ) have been shown to be surprisingly powerful. Useful functions such as XOR, ADD and MULTIPLY can be implemented by such circuits more efficiently than by traditional AND/OR circuits. In view of that, we have designed and built a programmable threshold element. The weights are stored on polysilicon floating gates, providing long-term retention without refresh. The weight value is increased using tunneling and decreased via hot electron injection. A weight is stored on a single transistor allowing the development of dense arrays of threshold elements. A 16-input programmable neuron was fabricated in the standard 2 pm double -poly, analog process available from MOSIS. A long term goal of this research is to incorporate programmable threshold elements, as building blocks in Field Programmable Gate Arrays.
Introduction
In the field neuromorphic analog VLSI, most research deals with implementing neurons that in some way learn or adapt, [7] , [9] , [lo] . That is because it is believed that the power of neural systems comes from their adaptive behavior. In fact it has been shown that the function performed by a neuron -the sum of weighted inputs followed by a threshold -is by itself ( without learning ) a powerful building block. For many years, theoretical computer science has studied the power of such neurons, in issues related to polynomial versus exponential size circuits and the general problem of N P completeness. The basic problem -build Boolean input Boolean output threshold circuits, to compute useful Boolean functions efficiently. Threshold circuits have been shown to be surprisingly powerful [l] . For example, integer division can be implemented by a polynomial-size threshold circuit of constant depth, [3] , [20] . In other words, if one is to implement a threshold circuit to compute the division of two n-bit integers, one needs polynomially many, in n threshold elements. On the other hand, using the traditional logic circuits, composed of AND ,OR and NOT gates, requires exponentially many gates. That is also the case with simpler functions such as exclusive-OR and and integer addition. Many results from the theory of threshold circuits could be applied to the implementation of circuits on silicon. Results such as the relationship between the maximal size allowed for the weights and the power of the resulting element or circuit [5] [24] , and more recently in [13] , 1181 To our knowledge, the theoretical results on threshold circuits have not been linked to any work involving silicon implementations. Programmable neuron-based hardware has been recently proposed [17] , [19] . In the implementation section below, we show how those relate to our work. For a short overview of FPGA's see [22] . In Section 2 we define the linear threshold element. In Section 3 we compare threshold circuits to traditional logic circuits. In Section 4 we discuss the programmable aspect of the design. Section 5 shows the implementation and results. Although we could allow the weights, w,, to be real numbers, it is known [16] that for an arbitrary linear threshold function one can use integers and needs at most O(n logn) bits per weight, where n is the number of inputs. It outputs 1 only when all inputs are 1, therefore: Figure 2 shows the diagram for f along with two other Boolean functions that can be realized by a single threshold element. Majority is defined in Example 2 below.
Neural logic versus conventional logic
Why bother use threshold elements given that any Boolean functions can be implemented, in a systematic way, by a circuit of A N D , OR and NOT gates ( A O N circuit ). The reason is that for some functions, such as exclusive-OR ( X O R ) , the number of elements in the AON circuit will grow exponentially with the number of bits in the input. On the other hand, if one uses linear threshold elements, the number of gates is linear in the number of input bits. This is shown in Figure 3 for a 3-bit input. In general, a depth-2, A O N circuit computing X O R of n bits requires at least 2"-l + 1 gates. Using L T , one needs only n + 1 gates. 
Programmable versus hardwired weights
One can look at FPGA's as circuits of elements in which the function that each element computes can be programmed, that is it can be chosen among a set of available functions. In traditional FPGA's that set consists of A N D , O R and NOT. We propose a larger collection of functions, namely the set of Linear Threshold Functions, LT.
All the information about an LT gate is contained in the weights and threshold. We consider two ways of implementing the weights. Hardwired weights cannot be changed once the circuit has been fabricated, while programmable ones can. Hardwired weights present an interesting problem in terms of automated layout. Some functions such as the comparison function, COMP, require weights ranging from 1 to 2n/2. Figure 4 
shows a 8-bit C O M P function. A N D , O R and all
symmetric functions can be implemented with small weights. This difference implies that using hardwired weights, some LT gates are larger than others.
Using programmable weights simplifies the layout, and allows one to modify the function that the LT element computes. In the next section we describe the details of the implementation.
Weighted Sum 
Implementation and Results
In [19] the authors have fabricated a neuron-based circuit that implements an arbitrary Boolean function. We implement an arbitrary threshold element ( a limited set of Boolean functions ). The actual function is selected by modifying the weights. Figure 5 shows the schematic implementation. A 16-input threshold element was fabricated using the standard 2 pm double -poly, analog process available from MOSIS. See Figure 6 for the layout. The 16 inputs are fed to all four gates via metal 2 ( purple ), such layout allows one to build dense arrays of threshold elements.
We store the weights on polysilicon floating gates, using a single transistor per weight, providing long-term retention without refresh. The multiplication relies on the fact that the inputs are boolean, 0 Volts for a logical 0, and X volts for a logical 1, where X can vary from 1 to 5 Volts. An input generates current proportional to the corresponding weight.
The sum, E&, w,z, comes naturally as we connect all transistors to the same node. That is another difference with the approach of [18] where a capacitive sum of voltages is used, rather than a sum of currents. Finally two inverters provide hard thresholding pulling the output to logical 0, or logical 1.
To program in a new function one modifies the weights via tunneling (increasing) and hot electron injection (decreasing), see [9] , [lo] , [23] for similar applications of floating gates. As shown in [6] an analog memory cell, which is slightly more complex than the single transistor storage used here, can store up to 14 bits of information, an amount largely sufficient for most practical threshold functions.
We tested the linearity of our threshold element by detecting the value of the threshold, wo, at which WO + x:20 z, = 0, while varying the number of 1's in the input vector. 1 Volt was used as the value of logical 1. Figure 7 shows the result.
Notice the square root shape of the data. This illustrates an important point, the voltage one needs to apply in order to get a certain value of T is not linear in T. For an n F E T , operating above or below threshold the contributions of a single input are respectively: Number ofhputs at lVolt Such non-linearities result in a large dynamic range.
Conclusion
We have fabricated and tested a 16-input programmable linear threshold element using floating gates to store the weights. Such storage requires no refresh and allows the weights to be modified via tunneling and injection. We have fabricated a second chip implementing a multi-threshold element. A single multi-threshold element can implement X O R and integer addition. It takes advantage of the fact that some useful Boolean functions can be implemented by a 2-layer LT circuit in which all gates of the first layer have the same weights. That allows to reduce the area from n2 to n, by implementing the weighted sum only once. See [4] for further details.
From the practical point of view one possible extension of this research is to devise a systematic ( maybe automated ) way of generating the layout of threshold circuits with hardwired weights. Another direction of research is to incorporate programmable threshold elements as building blocks in FPGA's.
Acknowledgments
This work was supported in part by the NSF Young Investigator Award CCR-9457811, by the Sloan Research Fellowship, by a grant from the IBM Almaden Research Center, San Jose, California, and by the center for Neuromorphic Systems Engineering as a part of the National Science Foundation Engineering Research Center Program; and by the California Trade and Commerce Agency, Office of Strategic Technology. The authors would like to thank the reviewers for their comments. Special thanks to Vincent Koosh for helping with the testing and analysis of the chip.
