Abstract-We introduce the class of hysteretic linear-threshold (HLT) logic functions as a novel extension of linear threshold logic, and prove their general applicability for constructing state-holding Boolean functions. We then demonstrate a fusion of HLT logic with the quasi-delay insensitive style of asynchronous circuit design, complete with logical design examples. Future research directions are also identified.
I. INTRODUCTION

B
OOLEAN logic is the well-established foundation upon which most current digital systems rest. It is both easily manipulated and easily implemented with networks of transistors on chip-its expressive power lies in its simplicity. However, that expressive power has been proven strictly inferior to other classes of logic, in particular linear-threshold (LT) logic [1] . LT logic promises decreased logical complexity for certain computations, and mathematical differences between LT and Boolean logic may result in certain implementation advantages in silicon.
Another feature common to most current digital systems is a global clock used for sequencing operations and coordinating the holding of state. Clocked circuits can be very simple to design, but generating and distributing the clock itself is a task whose difficulty rises in proportion to the clock's frequency and the amount of circuitry it controls. Asynchronous circuits eliminate this difficulty by completely decentralizing the sequencing task and distributing it among all of the functional modules. Synchronization complexity is thereby limited to the more easily solved regime of the fan-in/fan-out of individual modules. Asynchronous methodologies that include delay insensitivity also permit the designer to focus on verifying the logical correctness of the design rather than on the timing of signals, usually resulting in a shorter design time.
In this paper, we establish the theoretical foundations underlying what we believe to be a novel design style, by incorporating a generalization of LT logic into a certain asynchronous circuit style. We also present a small number of elementary complexity results using our theory, and offer a simple theoretical design example. Future publications will describe designs for more complex systems, details of silicon implementation, and silicon performance results. After the research described in this paper began, we became aware of work by Theseus Logic, Inc. with certain parallels to our own [2] , [3] . In particular, their NULL convention logic (NCL) methodology utilizes threshold gates to detect input arrival and feedback to add hysteresis to those gates. While a "NULL" or reset state is required of any asynchronous circuit to be used more than once [4] , our work does not require symbolic completeness via an explicit "Intermediate" state, and does not limit hysteresis to cases where all inputs to an operator must reset between valid states. Our choice of the quasi-delay insensitive (QDI) design methodology lends a different flavor to our circuit derivations as compared to the NCL methodology, but the generality of the theorem presented below does allow individual NCL operators to be expressed in hysteretic LT (HLT) logic.
Throughout this paper, we adopt a convention from complexity literature for describing sets of functions. If denotes a class of operators (e.g., G AON for the class of AND, OR, and NOT circuits), G is the set of all functions computable by a depth-network of such operators.
II. THRESHOLD FUNCTIONS AND HYSTERESIS
A general threshold function can be defined by (1) where is the Heaviside function, , and and are constants. Because can only assume two values, the cardinality of must be 2. To allow cascading of threshold functions, it is customary to stipulate . The sets and are common choices for ; to avoid implementation difficulties later (e.g., negative voltages), we shall assume the latter in this paper; hence, we take and .
is the threshold of , and, without loss of generality, we shall assume in this paper, that it is in . In general, ; although, for our purposes, we can require . Again, with a view toward implementability, we define a directly implementable threshold function as one in which all . Different functional forms of the subfunctions generate different families of threshold functions. The most elementary and well-studied type comprises the LT functions, where . This class of functions forms the basis for conventional neural networks. Our current inquiries are confined to LT logic, although the enhanced power of polynomial threshold logic, in which the are product functions of , shows promise as a future research direction [5] . It has been demonstrated that feedback in LT systems leads to a set of stable states [6] , but using that information for sequential logic design has been relatively clumsy compared to production rules or other Boolean formalisms. To separate state-holding behavior from computational topology, we define a hysteretic threshold (HT) function by (2) where and , and . Once again, we take and . In this context, the sequencing implied by the semicolon models the fact that the feedback is not instantaneous, which is consistent with wire delay or iterative simulation techniques.
Note that threshold functions are a strict subset of HT functions for which . In particular, LT functions are a strict subset of HLT functions.
We now prove by construction that any HT function, and hence any noninterfering Boolean function (state-holding or not), can be implemented by threshold functions with a fixed feedback configuration. Noninterfering Boolean functions have set and reset production rules that are never simultaneously enabled. For convenience, we also define
Theorem 1 (General Hysteresis Theorem): Any HT function can be implemented by a threshold function with a fixed-weight feedback connection from output to input.
Proof: First, we write HT function as (4) where is a parametric constant. We must show that hysteresis, , and offset, , are well-defined. We observe that (5) and (6) which together yield the constraints that
We can always satisfy this inequality, because and . Next, we define (8) We then have that (9) Logical advantage will be maximized when is centered between the two sums, implying that we should take (10) Finally, we equate in (1) with , which completes the proof.
Corollary: All noninterfering Boolean functions are in HT2.
Proof: We start by writing as production rules in disjunctive normal form (DNF) as (11) The operator being noninterfering implies that all are mutually exclusive with all . Further, the DNF of implies all and are strictly conjunctive (i.e., Boolean AND) in . Now, define
. AND 
III. EXAMPLE APPLICATIONS OF THE GENERAL HYSTERESIS THEOREM
In conventional Boolean Logic, AND, OR, and NOT are typically taken to be the simplest combinational operators. In the same spirit, the C-element can be said to be the simplest noncombinational (i.e., state-holding) operator. The C-element is normally defined by (13) It is frequently more convenient to compute the dual of this function because of direct implementability concerns in CMOS. Because the C-element is neither purely conjunctive nor purely disjunctive, it cannot be in AON1 (although the inverting C-element can be implemented with one stage of conventional transistor logic). To show that the normal C-element is in HLT1, consider an -input HLT function with all inputs weighted by unity. Then, from the definition given above, simply set . The General Hysteresis Theorem then gives us a concise implementation in which and . Now, consider the asymmetric C-element given by (14) Boolean can be replaced by a variety of logic functions, making this a template for precharge, return-to-zero logic. Such operators have been used extensively in such contexts as the Caltech MiniMIPS processor [7] . To see that the asymmetric C-element is also in HLT1, set all and , yielding . the General Hysteresis Theorem then gives us that and . Next, we consider the co-asymmetric C-element, given by (15) We prove by contradiction that this function is not in HLT1. Assume there exist and satisfying the HLT definition. Then, we have that (16) and (17) which is a contradiction. Therefore, co-asymmetric C-elements are not in HLT1. However, the corollary to the General Hysteresis Theorem guarantees that they are in HLT2.
We note that the General Hysteresis Theorem is not restricted to operators for which . Such operators are stable in the sense that once they set their output as a function of their inputs, they will not alter the output again as long as the inputs remain constant. If , the possibility exists for the operator to switch its output true and, after the delay implied by the semicolon in (2), immediately back to false again even if the inputs have not changed in the interim. Such unstable operators have negative hysteresis which can either be implemented with a negative hysteretic weight, or with a positive hysteretic weight driven by the inverse of the function's output (we will assume the latter). Unstable operators appear in digital logic in such places as pulse generators for latches [8] , or in neural networks in the form of pulsed neurons [9] .
IV. A BRIEF WORD ABOUT CMOS IMPLEMENTATION
The purpose of this paper is to establish a theoretical foundation for HLT circuits and how they may be used to implement QDI systems. We here briefly touch on the subject of HLT circuit implementation in CMOS. We will present further implementation details and experimental results from test silicon in future publications.
Production rules are readily realized in silicon: each guard corresponds directly to a series-parallel network of FETs (p-type for set guards and n-type for reset guards). Logical correctness is independent of such factors as transistor sizing and permuting transistors in series, although those factors may impact electrical performance. Electrical limitations, such as on the maximum allowable number of FETs in series, may make implementing certain production rules unfeasible, requiring us to decompose them into sets of simpler production rules.
Several different schemes exist to implement LT elements. Proposals to implement such elements with multi-input transformers whose turn ratios constitute input weights has existed for some time [10] . A more suitable method for integrated circuits involves a bank of voltage-controlled current sinks fighting a fixed current source. If all the input voltages switch between the same two logic values, the relative strength of each current sink (e.g., its ratio if implemented with FETs) determines its input weight. The current source can readily be set by multiplying the desired threshold by the current drawn by a unity-sized current sink. If the sinks and source are ideal and have infinite output impedance, the voltage at the summing node will reach whichever power supply rail corresponds to the side with the greater current drive. Level shifting can then be employed to recover the high and low logic voltages as the operator's output. This arrangement is a generalization of the pseudo-NMOS NOR arrangement, and has seen considerable use in multiple-valued logic [11] .
An alternate strategy based on capacitive voltage division came to prominence via Shibata and Ohmi's work with neuron-MOS, or floating gate, transistors [12] . Circuit inputs are connected to capacitor top plates, whose common bottom plate drives a CMOS inverter. Assuming that charge on the floating node is fixed, the voltage at the gate is a sum of the input voltages, weighted by the relative capacitor sizes, and offset by the total capacitance times the bottom plate charge. In the ideal case, the inverter compares the weighted sum with its switching threshold and outputs one of the power supply voltages, which can again be level shifted to recover the logic voltages. Unlike the arbitrary threshold current source above, the inverter's switching threshold and initial floating gate charge may be constrained by implementation details (e.g., the inverter's switching threshold is frequently fixed at half the power supply voltage).
While neither of these schemes suffers from major conventional CMOS fanin limitation, series chain limits due to high channel resistance and charge sharing, they are limited by summing junction noise which scales as a function of the number of current sinks or capacitors present.
V. QDI SYSTEMS AND HLT LOGIC
A thorough introduction to asynchronous design methodologies is beyond the scope of this paper; several different styles have been described in other publications [13] - [16] . All of these styles arise from various compromises with respect to true delay insensitivity (DI), in which logical sequencing of events is guaranteed regardless of any finite, positive delays in wires and operators. The compromises, in the form of timing assumptions, are necessary, because the class of truly DI systems has been shown to be unacceptably small [4] . The QDI style developed at Caltech makes the minimum timing assumption required to achieve Turing completeness, maintaining very strong theoretical guarantees about proper logical sequencing of events. In order to inherit that strength, we have adapted the QDI style for constructing our own asynchronous, HLT circuits.
QDI circuits can be compiled from communicating hardware processes (CHP) notation into production rules by identifying and strengthening syntactic guards [17] . These syntactic guards are equally valid for constructing HLT functions, either directly or by converting production rules into HLT functions through the procedure in the corollary to the General Hysteresis Theorem. The enhanced power of LT logic allows direct implementation of certain syntactic guards, such as the majority and parity functions, that would be quite cumbersome to implement with Boolean logic [18] .
An operator whose outputs become valid when all its inputs are valid and neutral when all its inputs become neutral is called weak condition (WC) [19] . WC operators are the most natural way to syntactically compile most QDI programs, because they require no additional circuitry to implement proper sequencing. However, the resulting production rules may contain unacceptably many conjuncts or disjuncts, prompting the designer to decompose the operator into smaller suboperators each with lower fan-in. Such decomposition increases the complexity (and implementation difficulty) of the overall system. HLT implementations may be subject to different complexity constraints, possibly offering the designer an alternative that avoids this decomposition issue.
Additionally, QDI design assumes that state-holding operators are perfect, holding state indefinitely. It is implicitly expected that state-retaining circuitry (like staticizers) will be added to state-holding production rules upon translation into actual transistor networks. In HLT logic, state preservation is an explicit, not implicit, property of the feedback from output to input. HLT circuits should therefore be easier to logically verify against high-level designs, and less prone to designer oversight.
VI. DESIGN EXAMPLE: HALF-BUFFERING AND MODULE
In this section, we illustrate these advantages with a design example, based on the standard QDI approach. For the sake of clarity, we shall omit reset circuitry in this example.
One possible CHP description of a pipeline stage performing the AND function is given by AND (18) where channels , , and contain data signals and and enable signal . One possible handshaking expansion, utilizing the half-buffer reshuffling [20] , is given by (19) Syntactic compilation into production rules yields (20) where . In order to achieve direct implementability, the rules for and must be altered, for instance, as follows: (21) Note that and are state-holding, and will require staticizing when implemented with actual transistors. To implement weak conditioning, the set rules for and contain five conjuncts, which is most naturally implemented with five P-type transistors in series. In cases where this series-chain length is unacceptable, the standard QDI solution (as used in the MiniMIPS) is to switch from WC logic to precharge logic, as follows: (22) This circuit reduces the maximum number of conjuncts in all set rules to two. This reduction is purchased with additional circuitry to detect the neutrality (and validity) of the inputs separately from the operators that generate the outputs. While this transformation allows us to simplify the reset somewhat, the overall circuit now has four extra operators, three of which are state holding, and we will need to add staticizers. Such circuits are considerably harder to design, build, and test than are their weak-condition counterparts. Now, consider the following HLT compilation:
Observe that the signal is logically inverted relative to and . For this paper, we follow standard practice in the LT literature of considering this inversion to be "free."
Like the precharge logic production rules just given, this circuit detects input and output validity/neutrality separately from the actual output generation circuitry. However, the set and reset rules are entirely symmetrical, and the output operators are actually weak-condition. By further exploiting the properties of LT logic and using the fact that the and signals are mutually exclusive true, we arrive at a form which uses only WC operators, analogous to the first weak-condition example (24)
For instance, suppose that we are constrained to set by a floating-gate transistor implementation. Substituting that choice into the General Hysteresis Theorem, we arrive at the following HLT circuit:
There are now only three HLT operators, compared to three conventional operators and two inverters in the weak-condition Boolean case, or seven conventional operators and two inverters required by in the precharge Boolean case. Recall that both Boolean circuits will also require staticizers for all stateholding nodes, which are unnecessary for the HLT circuits. Also observe the increased symmetry in the HLT set and reset rules relative to those in the Boolean case. All these properties make HLT circuits both easier to design and to debug.
Note that additional constraints may exist for HLT elements, such as an upper limit on the maximum allowable fan-in . Such restrictions would require decomposing large operators into cascades of simpler operators. Researchers have reported examples of successful silicon implementation styles for LT circuits with fan-ins above 16 [21] , so circuits like the ones presented in this section should be implementable directly as given.
VII. CONCLUSION
We have presented an alternative to Boolean logic that interfaces cleanly with a particular asynchronous design methodology. The General Hysteresis Theorem actually extends beyond this particular methodology, and could potentially allow HLT circuits into any digital design style meeting its (weak) requirements. Examining the ramifications of such synergy will be a direction for ongoing research.
What we have not yet presented is evidence that HLT functions can be implemented cleanly and efficiently in silicon. We are in the process of testing current-summing and capacitive voltage-summing HLT systems with no more than two CMOS devices between the power supply rails, capable of rapid switching speeds and very low-voltage operation. The results of these investigations will be the subject of future publications.
