Abstract -This paper reviews silicon implementations of threshold logic gates, covering several decades. It details numerous VLSI implementations including: capacitive (switched capacitor andfloating gate with their variations), conductance/current (pseudo-nMOS, output-wired-inverters, and a plethora of solutions evolved from them), as well as many dWferential solutions. Nanoelectronic implementations (e.g., based on negative resistance devices and on single electron technologies) will shortly be mentioned.
I. INTRODUCTION
Research on neural networks (NNs) goes back sixty years ago. The seminal year for the development of the "science of mind" was 1943 when the first mathematical model of a neuron operating in an all-or-none fashion: the threshold logic (TL) gate (TLG) was invented (2].
In the last decade the tremendous impetus of VLSI 'technology has made neurocomputer design a really lively research topic. Research on hardware implemcntatioins of NNs, and on TL in particular, has been very active. In this invited review paper we shall focus only on the many different approaches that have been tried for implementing TL in silicon. Effectiveness of TL as an alternative to modem VLSI design is determined by the availability, cost and capabilities of the basic-building blocks. In this sense, many interesting circuit concepts for developing CMOS compatible TLGs have been explored. A,s the number of different proposed solutions reported in the literature is on the order of hundreds, we cannot mention all of them here. Instead, we shall try to cover important types of architectures and present several representative examples. For keeping the paper's length reasonable, the early days of TL implementations (i.e., when the technologies were TTL, ECL, 12L, and nMOS), although quite instructive, will not be covered here (the interested reader should consult [I ] ).
II. CMOS SOLUTIONS Probably thefirst pure CMOS solution is due to Hampel [48] ( Fig. I (a) ). The CMOS devices form a plurality of TLG configurations having majority (MAJ) logic functions with near symmetrical switching delay times. MAJ functions are Threshold Logic Functions (TLFs) having identical weights. A MAJ gate can implement arbitrary TLFs by repeating/ complementing some of its inputs. The fact that the nMOS and pMOS stacks are alike leads to symmetrical switching delay times. This solution has low power consumption and large noise margins, but largerfan-in gates are slow.
A NULL Convention Logic gate [4] (Fig. I (b) ), receives a plurality of inputs, eachi having an asserted state and a NULL state. The TLG switches its output to an asserted state whien tile niumilber of asserted inputs exceeds a tlircshiold number. The TLG switches its output to the NULL state only after all inputs have returned to NULL. This is an asynchronous delay-insensitive design. The gate has low power and large noise margins, being reasonably fast for small fan-ins.
Another solution is based on pass-transistor logic. It offers an attractive alternative to pure CMOS. A steering circuit, which produces all the TLFs for an n-input logic function was detailed in [3] . A distinguishing characteristic is that pass-transistor-based ones depend only on the number of variables, not on their associated weights. This design inherits the problems specific to static pass-transistor logic. (Fig. 2 (a) ) the main idea was to use switched capacitors, switches and inverters,, and to take advantage of the inherent saturation of the inverters to implement the neuron non-linearity without additional elements [7] . This first approach required a somehow complex three-phase clock. The principle was also presented in [5] . It has quickly evolved into a simpler two-phase clock solution [6] , known as the Capacitive Threshold Logic (CTL) gate ( Fig. 2(b) ). CTL gates have a regular structure, and are able to implement large fan-ins, while their main drawbacks are: large delays, large area, DC power consumption, and the threshold value programming mechanism. The reset time grows with the fan-in, and can become large. Propagation delay is logarithmic with thefanin, and has a strong dependence on the unit capacitor [6] .
III. CAPACITIVE IMPLEMENTATIONS
The area of the unit capacitor is equivalent to several minimum sized inverters. Due to the linear operation of the sense amplifier, the power consumption is high. Several solutions for overcoming CTL's limitations are detailed in [43] . [III (Fig. 4 (a) ). Another low-power solution is the Cliarge Recycling Threshold Logic (CRTL) gate [9] (Fig. 4 (b) ).
CRTL gates exhibit high-speeds, and are suitable for high fan-ins, while also having low power consumption. In fact, CRTL gates achieve the highest speed and 15-20% lower power consumption when compared with clocked vMOS
[12], C3L [11], and LCTL [23] . Recently, a Self-Timed Threshold Logic (STTL) has been proposed [(10 (Fig. 4 (c) ).
It describes a "capacitor sharing" technique for significantly reducing the occupied area, which can be applied to other vMOS implementations. The self-timing idea comes from asynchronous circuits; eliminating the clock can reduce the power even more.
I . ' l 1 The second solution is based on inverters with their outputs hard-wired together, and was detailed in 1973 [46] ( Fig. 5 (a) ). Unfortunately, due to the sensitivity of the voltage on the common node, and of the Vth of the output inverter to process variations, the output-wired-inverter TLGs are fan-in limited. These TLGs are extremely fast, while exhibiting high power consumption (assumable when traded-off for speed), as well as narrow noise margins. The output-wired-inverters TLGs have been rediscovered several times. In [44] a very fast NOR gate was presented: it is Lerch's construction [46] without the restoring inverter. This gate was used in the MIPS R2010 [44] . Later, Schultz et al.
[18] rediscovered Lerch's original construction, and called it "Ganged-CMOS logic" (GCMOS) [47] .
B. Beyond Pseudo-nMOS A lot of effort has been devoted to reducing the power consumption of large fan-in pseudo-nMOS gates. The other drawback-the reduced noise margins-was left as an open question. The main idea for reducing the DC power was to replace the pMOS load with a more complex load circuit. Such solutions rely on: using asynchronous feedback and/or feedforward, reducing the voltage swings, using a clock signal (dynamic solutions), using controlled current mirrors, or even data-dependent solutions (for details see [43] (Fig. 5 (b) [21] (Fig. 5 (c) ). The 3DTE reduces the input and the internal node capacitance, making the gate very fast, but does not tackle either the power consumption or the narrow noise margins. An improved (3-comparator having higher non-linearity in the threshold zone was presented in [22] . A simpler method for enhancing the noise margins is presented in [19] (Fig. 6 (a) ). It adds datadependent non-linear terms to the (3DTEs, converting the TLG into a "high order perceptron." The non-linear terms form a Noise Suppression Logic (NSL). For reducing the DC power a data-dependent Self-Timed Power-Down mechanism (STPD) has also been developed [19] (Fig. 6   (b) (Fig. 7 (a) ). At the same time, a generic Latch-Type TL (LCTL) gate was proposed in [23] (Fig. 7 (b) ). The speed performance of LCTL gates has been improved in [26] . It is called Crosscouple Inverters with Asymmetrical Loads Threshold Logic (CIALTL), and can be seen in Fig. 8 (a) . Recently, a number of solutions based on advanced clocked CMOS differential structures have been developed. These implement the pulldown networks with two banks of parallel nMOS transistors, instead of using nMOS complementary logic trees. Examples are: Single-input Current-Sensing Differential Logic (SCSDL) [30] , Differential Current-Switch Threshold Logic (DCSTL) [24] (Fig. 8 (b) ) after the DifferentialCurrent Switch Logic (DCSL) [29] , and Current-Mode Threshold Logic (CMTL) [25] . These TLGs are still sensitive to noise and mismatch of process parameters, which limit their maximum fan-in. Yield analysis for SCSDL have shown thatfan-in < 14 [30] . An enhancement is to implementf with one bank andf' with the otlier bank, and include an NSL both forfandf' [31] . The fact thatfand f' always have transitions in opposite directions leads to increased speed and better noise margins. This method was demonstrated in conjunction with the Split-level Precharge Differential (SLPD) logic [27] (see [31] and Fig. 9 [32] (Fig. 10 (a) ). Another n-input TLG was proposed in [33] . It requires one tunnel junction and n + 2 true capacitors (Fig. 10 (b) ). A MAJ gate using a balanced pair of single electron boxes has also been detailed [34] . A parallel prefix 16-bit adder designed using capacitive input SETs has been recently characterized [35] . [36] (Fig. 11 (a) ). A specific logic functionality of a MOBILE is determined by embedding an input stage, which modifies the peak currenlt of one of the RTDs. TLGs implemented with RTDs have been widely studied [37] . Other configurations are possible, but the major advantage comes from the fact that the NRD characteristic directly supports multiple valued logic style [41] , making TL an ideal candidate (Fig. 11 (b) ).
RTDs are the most mature type of quantum-effect devices. They exhibit NDR at room temperature, and have been already implemented [39] . A prototyping technique based on a four MOS-NDR transistors has also been reported [38] .
VI. CONCLUSIONS
The present state-of-the-art shows a large variety of TLG implementations coping with their major drawbacks: power dissipation, reduced noise margins, and the sensitivity to process variations. It is quite amazing how much effort and ingenuity has been spent/invested, let alone the remarkable diversity of technologies that have been tried. TLGs have benefited from developments in the more general field of differential logic. The other design parameter to consider is the fan-in. The claim that TLG should have a large fan-in comes from their original goal of mimicking the brain, but theoretical results [42] have shown that small fan-ins can lead to VLSI-optimal solutions. The scarcity of commercial applications is not because TLGs have poor performances. Advanced TLGs can easily compete with BGs. So why are they not used? The answer has its roots in the design approach, namely the fact that TLGs need full custom design and that there is an acute lack of high-level synthesis tools.
Lastly, because nano computing will probably get central stage positions in the (near) future, TL will surely benefit from that. Both RTDs and SET appear to hold the most promise as a short to medium-term solution. The fact that TL is a perfect fit for them will certainly help their future developments. 
