Abstract-Research on fine-grained pipelines can be a way to obtain high-performance applications. Monostable to bistable (MOBILE) gates are very suitable for implementing gate-level pipelines, which can be achieved without resorting to memory elements. The MOBILE operating principle is implemented operating two series connected negative differential resistance devices with a clock bias. This brief describes and experimentally validates a two-phase clock scheme for such MOBILEbased ultragrained pipelines. Its advantages over other reported interconnection schemes for MOBILE gates, and also over pure CMOS two-phase counterparts, are stated and analyzed. Chains of MOBILE gates have been fabricated and the experimental results of their correct operation with a two-phase clock scheme are provided. As far as we know, this is the first working MOBILE circuit to have been reported with this interconnection architecture.
I. INTRODUCTION
The design of functional units implementing very fine-grained pipelining for high-performance applications is currently a very active area of research. These solutions do not apply conventional pipeline techniques, which insert flip-flops to short down signal propagation paths in combinational logic, but instead rely on logic circuit styles, which naturally are able to block the propagation of data. That is, evaluation of their inputs can be disabled. The potential of dynamic logic, with its precharge and evaluation phases, in implementing this kind of pipelining was recognized a long ago [1] , [2] . The operation of Domino logic in a pipelined fashion using an overlapping multiphase clock scheme and without latches between consecutive clock-phases was analyzed in depth in [2] . It is known that variations of this multiphase solution (three to six phases) have been developed by different companies and applied to commercial applications [3] , [4] .
In these superpipeline logic styles, operating frequency (or throughput) depends both on the number of clock-phases and the number of gate levels per clock-phase. The clock period needs to accommodate all the phases and the duration of each phase is determined by the number of gate levels per clock-phase. Thus, from the point of view of ultrahigh-speed applications, a two-phase scheme with a single gate per clock-phase is very attractive.
Logic styles based on the monostable to bistable (MOBILE) operating principle are very suitable for implementing gate-level pipelines, which can be achieved without resorting to memory elements [5] . Originally, MOBILE gates were operated in a pipelined fashion using a four-phase clock scheme, but a two-phase overlapping clocking is also possible. MOBILE gates are implemented operating two series connected negative differential resistance (NDR) devices with a switching bias. Different devices like resonant tunneling diodes, transistors [6] , or molecular devices [7] exhibit NDR in their I -V characteristic. Many circuits have been reported, which use NDR for different applications (memories, logic, oscillators, analog-todigital (A/D) converters) and with different goals (high speed, low power, or low device count, etc.) [8] , [9] and the MOBILE principle has been identified as a key element in crossbar architectures [10] . In particular, MOBILE RTD-CMOS pipelined architectures have been compared with conventional CMOS fine-grained pipelines [11] and shown to be significantly more power efficient than conventional CMOS. Also, different transistor circuits have been explored that emulate the NDR characteristic [12] - [16] . This brief explores in depth and experimentally validates the two-phase clock scheme for MOBILE-based fine-grained pipelines. These pipelines do not have the functional limitation of the Domino solutions, in which only noninverting blocks can be chained. Also, the well known racethrough failure of two-phase fine grained Domino pipelines without latches is eliminated.
This brief is organized as follows. In Section II, the MOBILE logic style is described. In Section III, we analyze the two-phase gate-level MOBILE pipelines. In Section IV, we describe the circuits which have been designed and fabricated to experimentally validate the proposed pipeline and show experimental results. Finally, some key conclusions are given in Section V.
II. MOBILE LOGIC STYLE

A. MOBILE Operating Principle
The MOBILE [17] in Fig. 1(b) is an edge-triggered current controlled gate, which consists of two devices exhibiting NDR in their I -V characteristic [ Fig. 1(a) ], connected in series and driven by a switching bias voltage, V CK . When V CK is low, both NDRs are in the ON-state and the circuit is monostable. Increasing V CK to an appropriate maximum value ensures that only the device with the lowest peak current switches from the ON-state to the OFF-state. Output is high if the driver NDR switches and low if the load switches. Logic functionality can be achieved if the peak current of one of the NDR devices is controlled by an input. In the configuration of the rising edge-triggered inverter MOBILE shown in Fig. 1(c) , the peak current of the driver NDR can be modulated using the external input signal V in . The transistor acts as a switch, so that for a low input, current flows only through NDR D , but for a high input, the effective peak current of the driver is the sum of the peak currents of NDR D and NDR X . The inverter topology can be generalized to implement arbitrary functions by just replacing the single transistor in Fig. 1(c) [or Fig. 1(d) ] by a transistor network realizing the target functionality. NDR peak currents are selected such that the value of the output depends on whether the network transistor evaluates to "1" or to "0." Fig. 1(d) shows a falling edge triggered inverter. Note that branch implementing functionality is now in parallel to the load NDR and uses a p-type transistor.
It is well known that a sufficiently slow rising (or falling) transition of V CK is required for MOBILE operation [18] . Thus, there is a critical rise time for the switching bias below which the gate does not operate correctly. Under that critical rise time, there is at least one input combination for which the gate does not produce the expected logic output. This is due to ac currents associated with internal parasitics and output capacitive loads (fan-out), which are more important for faster clock changes and which somewhat "alter" the ideal MOBILE operating principle based on peak currents comparison. This critical value depends on both circuit (NDR peak currents, fan-out …) and technological parameters. Design must therefore take into account these ac currents to guarantee the desired relationship between load and driver currents for each input combination when V CK approaches 2V p , V p being the peak voltage of the NDRs [see Fig. 1 
Rising (falling) edge-triggered MOBILE logic gates evaluate inputs with the rising (falling) edge of the bias voltage and hold the logic level of the output while the bias voltage is high (low), even though the inputs change (self-latching operation [19] ). The output returns to zero (to one) with the falling (rising) edge of the clock until the next evaluation. The self-latching operation allows the implementation of gate-level pipelined architectures without extra memory elements [5] .
B. MOBILE Interconnections 1) Four Phase Clock Scheme:
Originally, cascaded rising edgetriggered MOBILE gates were operated in a pipelined fashion using a four-phase overlapping clocking scheme shown in Fig. 2(a) [5] , [20] . Note that there are four steps in the operation of each MOBILE gate: evaluation (clock rises), hold (clock high), reset (clock falls), and wait (clock low). V CK,i is delayed with respect to V CK,i−1 by T /4, T being the clock period. In this way, the i th stage evaluates (rising edge of V CK,i ) while the (i − 1) th stage is in the hold phase (V CK, i−1 high). Four clock signals are enough, since the first phase can be used for the fifth level and so on. In Section II-A, it was stated that there is a critical rise time below which the gate does not operate correctly. This explains the clock shape with equal rise, high, fall, and low times. Thus, for this scheme, the clock cycle is 4T EVAL,MAX (operating frequency is 0.25/T EVAL,MAX ) with T EVAL,MAX being the largest evaluation time required by any gate in the network.
Other simpler clock schemes have been explored to reduce the difficulties of distributing four clock signals with tight constraints on their relative delays and to increase throughput, which is limited by the evaluation time of four gates.
2) Single Phase Clock Scheme: A network of MOBILE-based gates can be operated with a single clocked bias signal if rising edge-triggered gates and falling edge-triggered gates are alternated and latches are added at each stage to eliminate the return-to-reset behavior [21] . However, it has been demonstrated that to ensure correct operation it is not necessary to eliminate return-to-reset behavior [22] . It is enough to keep the output of each MOBILE stage stable until it has been evaluated by the next one. Thus, each latch is replaced by a static inverter. Fig. 2(b) shows chained MOBILEs implemented with the single-phase architecture. The first and the third gates are rising edge-triggered, whereas the second and the fourth are falling edge-triggered. Note that the role of the intergate elements is to guarantee that one MOBILE has already taken the decision about its output before the reset to zero of the previous one reaches its input. The functionality of the intergate element is not, therefore, limited to be inverting [note that both inverters and buffers have been used in Fig. 2(b) ], and this makes logic design more flexible. The clock cycle is 2T EVAL,MAX (operating frequency is 0.5/T EVAL,MAX ) with T EVAL,MAX being the largest evaluation time required by any stage (MOBILE + static logic) in the network.
However, the single-phase solution has two drawbacks. First, negative edge-triggered MOBILEs are used, which requires p-type transistors. This means larger transistors and thus larger parasitic capacitances, which degrade gate speed. Secondly, the only clock skew tolerance mechanism is the delay of the intergate element. Both of these problems are overcome by the two-phase scheme described in Section III.
III. ANALYSIS OF TWO-PHASE MOBILE NETWORKS
One alternative solution is to design networks with only positive edge-triggered MOBILE gates operated with an overlapping two phase clock scheme as shown in Fig. 2(c) [23] . Note that intergate elements (inverting or noninverting) can also be added if required by logical (to increase design flexibility), or electrical (for example, efficient handling of large loads) considerations.
The throughput (or frequency) behavior of the single phase scheme is preserved in the sense that it is also determined by the evaluation time of two stages. Also, the proposed two-phase scheme has clear advantages over the one-phase solution since the worse evaluation time is larger for the latter because it uses falling edge gates with pMOS functional networks. Moreover, the overlapping of the two clocks makes it possible to directly connect two MOBILE gates, since the return to reset of one stage takes place after the evaluation of the next stage, and to increase clock skew tolerance-a critical weak point in the single-phase interconnection.
In terms of area, the proposed topology eliminates p-type transistors, which not only leads to area savings but to have smaller input capacitances and, therefore, smaller NDR in previous stages. A reduction in area up to 28% over a single-phase scheme have been obtained in the core of a 4-b CLA designed in an RTD-CMOS hybrid technology, which leads to a more compact solution even including the area occupied by the two-phase clock generation circuitry.
A comparison with Domino-based counterparts also produces some interesting results. As we mentioned earlier in Section I, Dominobased solutions are functionally limited in that they do not allow the implementation of inverting stages. This limitation is due to the fact that dynamic gates evaluate when their clock is high, and might therefore respond to the precharge of their preceding stages. This will occur if the precharge results in a low to high transition of the preceding stage output, since this high value can discharge the dynamic output node. However, there is no response to high to low transitions in the output of the preceding stages. That is why an odd number of inverting static gates are allowed after each dynamic gate, resulting in noninverting stages. More specifically, the basic domino stage comprises a dynamic gate followed by a static inverter. It has been proposed that modifying the topology of the dynamic gate will avoid this limitation, but at the expense of using at least three clock phases [24] . This limitation does not apply to MOBILE pipelines, thanks to the self-latching property of these gates: during clock high (hold), gate inputs can change without affecting the gate output. In this way, a MOBILE stage does not respond to the reset to zero of its driving stages. Since self-latching holds both for high to low and for low to high input transitions, this is true regardless of the number of inversions (none, odd number or pair number) between two consecutive MOBILE gates, and so both inverting and noninverting stages are allowed. This leads to greater flexibility in logic design.
Two-phase domino pipelines also require labor-intensive design due to race-through failure. The fact that there is both an upper and a lower limit on the allowed overlap of the two phases represents a timing challenge which complicates design, and, in many cases, causes latches to be introduced [25] , [26] . The lower limit is determined by the fact that it should be long enough for a stage to evaluate before the precharge of the preceding stages driving it reaches its inputs, even under worst case clock skew conditions. In fact, multiphase overlapped clock schemes are said to be clock skew tolerant because of this overlapping. However, when only two phases are used, evaluation may slip over consecutive stages if the overlap is too long (race-through failure). In Fig. 3 , assuming domino stages, stage 2 evaluates when V CK,2 is high, and stage 3 could respond to stage 2 output if V CK,1 is a high enough time after V CK,2 goes high. Stage 3 thus evaluates twice in one clock cycle. This is not normal pipelining behavior. Again, the self-latching property of MOBILE logic blocks eliminates the upper limit. Evaluation is associated to the clock edge, so even if V CK,1 is high long after Fig. 3 .
Block diagram and clock waveforms of a generic two-phase interconnection scheme. V CK,2 goes to high, there is no evaluation because stage 3 does not see a clock edge.
IV. EXPERIMENTAL RESULTS
The operation of the two-phase MOBILE gate interconnection scheme has been experimentally validated. As far as we know, this is the first time a working two-phase MOBILE network has been reported.
Two-phase clocked chains of MOBILE inverters have been designed and fabricated. These structures were implemented with MOS-NDR devices (circuits made up of transistors that emulate the NDR I -V characteristic) and the MOBILE gate topology from [14] in a 1.2 V/90 nm CMOS technology. Fig. 4(a) and (b) shows the schematics of the MOS-NDR device used and a MOBILE inverter implemented with them. The design of MOBILE blocks operating at high frequencies is not straightforward. As was previously mentioned, it is necessary to consider ac currents associated with parasitic, which may be large at high frequencies. Design validation thus requires accurate modeling of layout parasitics and this is why experimental validation is so important. Fig. 4(c) shows the block diagram of one of the fabricated circuits. Each MOBILE stage is an inverter like the one shown in Fig. 4(b) . As shown in Fig. 4(c) , a two-phase clock generator has been also included. It provides two nonoverlapped clock signals (V CK,1 and V CK,2 ) with the same frequency as the master clock (V CK ). Note that power clocks are avoided since the clock signal of each MOBILE circuit is applied to the input of a static inverter. The output of this inverter is used as the clock of the MOBILE inverter. In this way, the two overlapped clocks required are generated and the constraints on clock rising time are relaxed. Area overhead for the clocks generator represents 24% of the total of the chain. Note that this percentage would be smaller if this circuit is integrated into more complex networks.
The packaged circuit was tested and shown to operate correctly. Critical input for this kind of circuit is a sequence alternating 0s and 1s in the sense that it limits operating frequency. Fig. 4(d) shows the experimental results when such sequence is applied to the 10-stage pipeline. Waveforms are shown of the master clock, the input (V IN ) and the output (V OUT ). These were captured using the Agilent DSO6104A oscilloscope. Note that in addition to the package, there are input buffers (for V IN and V CK ) and output buffers and pads not shown in Fig. 4 . V CK and V IN are 1.2 V pulse trains at 400 and 200 MHz, respectively. As expected, V OUT is a periodical signal of the same frequency as the input. The 0101…01 sequence is obtained at the output of the pipeline with a latency of five clock cycles, since data is evaluated twice each V CK cycle. Note the different shapes of V IN and V OUT , due to the return to zero behavior of MOBILE. Results are shown at 400 MHz so that the signals are not attenuated by the experimental setup, but correct operation was observed for this and counterpart circuits incorporating static inverting and noninverting elements between MOBILE gates of over 1 GHz, which was the target design frequency considering the available measurement setup. However, simulations of the extracted stand-alone pipelined chain in Fig. 4 (c) exhibits correct behavior up to 2.2 GHz.
We fabricated a variation of the previous chain in which the master clock was distributed as shown in the block diagram of Fig. 5(a) . The purpose of this experiment is to further validate the operation of the direct connection of MOBILE stages, which is the more critical one in terms of minimum overlap of the clocks of consecutive stages. In this simple configuration, this overlap is just the delay introduced by one inverter and thus suits well our target.
Experimental results for this topology are given in Fig. 5(b) , where V CK and V OUT are the clock and the output and V IN is the input. For this experiment, V CK and V IN are clock pulses at 1.4 GHz and 175 MHz (one-eight of the frequency of V CK ), respectively. Since V IN has a duty cycle of 25%, the output waveform consists of two cycles at high level and six at low level, which is consistent with the expected response to the input. As previously mentioned, at this frequency attenuation due to the experimental setup was observed.
V. CONCLUSION
The operation of two-phase gate-level pipelines based on the MOBILE operating principle has been experimentally validated. As far as we know, this is the first time a working two-phase MOBILE network has been reported. This interconnection scheme has advantages over other previously reported clock schemes for MOBILE logic networks. It is simpler than the conventional fourphase solution and leads to higher operating frequencies, since only two stages, instead of four, sequentially evaluate in one clock cycle. Unlike one phase schemes, p-type transistors are avoided and clock skew tolerance increases. This is possible due to the self-latching property of MOBILE gates that makes it possible to take advantage of the overlapping of the two clock-phases to tolerate clock skew, and also avoids the limitations of conventional CMOS counterparts.
Finally, one additional feature of MOBILE logic blocks is that, unlike other clocked logic styles, they can be operated with sinusoidal clocks. It would therefore be worth exploring their use with energyrecovering techniques.
