Abslmct-This paper proposes a new differential neural These emerging nano devices have led to many different TLG inspired gate with improved noise immunity. The charge implementations such as those presented in [SI, [91, [lo].
. -.. conjunction with the NSL enhancing the speed performance of the TLG. The terms perceptron and TLG will be used interchangeably in this paper. 
The work of Warren McCulloch and Walter Pitts entitled
where wi is the synaptic weight associated with input xi, e is the threshold, and A is the fan-in. In order to allay concems that a neuron is a TLG, the threshold logic (TL) model has been tested on a spike train generated by the Hodgkin-Huxley model with a stochastic input [2] . The result was that the TL model correctly predicted nearly 90% of the spikes, justifying the description of a neuron as a TLG. The operation of charge recycling differential gates involves precharging several nodes of the gate during which both differential outputs switch to Vo,d2 by distributing the charge. During the evaluation phase the outputs only need to swing up or down by VD&, rather than VoD, decreasing the propagation delay and expectedly the power consumption.
Charge recycling is achieved by placing a transistor "switch" between the output signal and its complement. The transistor placed between the differential outputs is not conducting during the evaluation phase, allowing the outputs to be driven to their full voltage levels, by charging or discharging the load and parasitic capacitances. When in the precharge phase, the switch closes, short-circuiting the differential outputs, which must be isolated from Voo and GND so that the charge stored on the load and parasitic capacitances is redistributed.
We list a few gates that employ the charge recycling technique, and highlight the improvements reported.
The 
-
The other class of differential TLG implementations is the currentkonductance category. Two parallel connected banks of nMOS transistors are used for implementing the weighting operation, followed by a current CMOS comparator for the threshold operation. how NTL works in conjunction with SPDL [ I l l . SPDL is an improvement on other charge recycling differential logic such as HRDL and CRDL in terms of power dissipation, propagation delay, increased reliability, avoiding metastable states, and large fan-out. The 3-input charge recycling differential noise immune perceptron shown in Figure 1 has a total of 24 appropriately sized transistors.
It is a special case of the differential gate in that it lacks the traditional cross-coupled pull-up network and thus requires a few additional transistors to produce the differential outputs. The gate consists of the SPDL block, and two evaluate and two NSL blocks. The SPDL part is represented by transistors NO-NS and PI-P9 (for further details see
The perceptron is initialized when the clock is high, resulting in transistors N4, NS, P6 and P7 being turned ON. This state sets nodes IN and IN-bur to logic 1 while transistors N2, N3, P2, P3, P4 and P5 are turned OFF. Of particular interest during this initialization stage is that transistor NI conducts, enabling charge recycling between the differential outputs labeled OUT and OUT-bur. Charge recycling makes these outputs switch from VDD or GND to VDd2. It is also during this initialization stage that the inputs can be changed.
The output function is implemented by the n-network comprising transistors N6, N8 and NI0 for one bank and N7, N9 and NI 1 for the other bank. The NSL devices are NI2 and NI4 for one TL bank, and N13, NI5 and NI6 for the other.
Transistor NO provides a path to GND for either bank. The evaluation state involves turning transistors NO and P I ON after the inputs settle. Both the n-and p-network devices are sized to ensure that the weights (w,) are represented appropriately resulting in the TLG functioning properly. The proper sizing of the transistors N6-Nl1 for encoding the weights associated with the inputs is such that the WIL of N6
is four times that of N8, and N10. The same is true for the right bank, where the WIL of NI 1 is four times that of N7, and N9. The NSL logic blocks (NI2 and NI4 along with N13 and N15-NI6) have to be sized at least as large as N6 (making them larger will always improve on the noise margins [14], but might degrade the speed). Table I summarizes the transistor sizes for one possible implementation. The perceptron has been evaluated for performance in terms of power dissipation, and switching delays. The results reported here are based on a layout of the gate done in standard CMOS 0.25 pm technology at 2.5 V. The gate is subjected to a load of eight minimum sized inverters on both differential outputs. Figure 2 depicts traces of the perceptron's critical nodes. In this figure we show results obtained at a frequency of 1000 MHz, while simulations at 100MHz and 500MHz have been performed to ensure proper functionality. Different simulations factoring environmental parameter (temperature and power supply voltage) variations have also been performed to fully characterize the gate. The average current values at 100 MHz, 500 MHz, and 1000 MHz (when running continuously) were unexpectedly high: 289 pA, 739 PA, and 1.04 mA, respectively. This can only partly be explained by the load and high temperature we have used.
Removing transistor NI allows the differential outputs to have a full swing from GND to VDD, and we have taken measurements for such a configuration and determined that the perceptron with charge recycling capability reduces switching delays by 38%. It is much faster to switch the output node from VDd2 to VDD or from VDd2 to GND as opposed to having a rail-to-rail switching activity at the output nodes during evaluation. Figure 3 shows simulations of the two configurations, one with charge recycling capability, and one without.
The average current over all possible input combinations shows that the charge recycling perceptron draws nearly the same current as the one without charge recycling capability. The current values are high and there is only a slim 6% difference in favor of the perceptron with charge recycling capability. This small difference in average current is due to the fact that even when some input patterns result in the output remaining at logic 1 or logic 0, the differential outputs still have to be driven from VDd2 to either VDD or GND, while the outputs remain constant for the design without charge recycling capability. We note that the current simulations (shown in Figure 3) for the two configurations look very similar. This is due to the fact that dynamic switching occurs at several internal nodes even when the input patterns being evaluated result in no changes at the output nodes. This work demonstrates that the charge recycling perceptron can be implemented using two banks for f and f-bar in conjunction with NSL blocks. An ideal approach in measuring the efficiency of the perceptron in terms of power dissipation, and speed entails comparing the design with purely CMOS, pseudo-NMOS, and Domino implementations. Most of the design examples cited were implemented in older technology nodes making it difficult to directly compare them in terms of delay and power dissipation. A fair comparison would dictate that the circuits be simulated at the same technology node.
IV. CONCLUDING REMARKS
The present state-of-the-art shows a large variety of differential TLGs. Some of these are quite advanced implementations). A novel differential charge recycling TLG with NSL has been proposed. It incorporates two new ideas: usingfandf-bar, and adding nonlinear data dependent terms.
For the selected frequencies it shows speed improvements of 34.38% over its counterpart dynamic perceptron without charge recycling. This work has shown that differential logic gates are fast but not necessarily suitable for low power dissipation. The basic disadvantages include: the need to have two networks (for uut and out-bar) resulting in increased switching activity, and the need to reduce the pull-down stack for improved delays (which unfortunately increases the leakage currents). As far as we know, the best solution for low power differential logic was that presented in [54]. It uses an inverter's short-circuit current to drive critical nodes. We expect that further scaling [551 will accentuate differential logic gates' need for power, as opposed to simpler gatesirrespective of the techniques employed4ue to increased leakage and higher switching activity. We conclude that differential structures are seemingly not the best solution for scaled CMOS. Simpler gates with fewer transistors might do better. These could employ techniques such as adaptive body biasing and sub-threshold power supply voltages for enhanced overall performance.
