This paper exploits the unique in-field controllability of the device polarity of ambipolar carbon nanotube field effect transistors (CNTFETs) to design a technology library with higher expressive power than conventional CMOS libraries. Based on generalized NOR-NAND-AOI-OAI primitives, the proposed library of static ambipolar CNTFET gates efficiently implements XOR functions, provides full-swing outputs, and is extensible to alternate forms with areaperformance tradeoffs. Since the design of the gates can be regularized, the ability to functionalize them in-field opens opportunities for novel regular fabrics based on ambipolar CNTFETs. Technology mapping of several multi-level logic benchmarks -including multipliers, adders, and linear circuits -indicates that on average, it is possible to reduce both the number of gates and area by ∼ 38% while also improving performance by 6.9×.
Introduction
Carbon nanotube field effect transistors (CNTFETs) are novel devices that are projected to outperform scaled CMOS technologies [1] . CNTFET-based devices offer high mobility for near-ballistic transport, high carrier velocity for fast switching, as well as better electrostatic control due to the quasi one-dimensional structure of CNTs [1] . Early CNTFETs suffered from high variability in CNT diameter and alignment, as well as manufacturing defects due to indistinguishable metallic and semiconducting CNTs. Recent progress in addressing challenges arising from variability and defects at the technological level [2, 3] as well as by defect tolerant design [4] has showcased the strong potential of CNTFET-based technology for future nanoelectronics applications.
CNTFETs can be fabricated with Ohmic or Schottky contacts, leading to MOSFET-type or Schottky-barrier-type operation, respectively. Schottky-barrier CNTFETs (SB-CNTFETs) are ambipolar, i.e., they conduct both electrons and holes, showing a superposition of p-and n-type behaviors. Ambipolar SB-CNTFETs can be controlled by an additional terminal, called the polarity gate, which sets the p-or n-type device polarity, while the actual gate terminal controls the current flow through the transistor [5] . This novel feature of ambipolar SB-CNTFETs was investigated in [6] , where a compact and in-field reconfigurable universal 8-function logic gate was described. Furthermore, the use of ambipolar SB-CNTFETs to implement in-field reconfigurable generalized NOR (GNOR) gates, combining NOR and XOR operations, was described in [7] . It was shown that GNOR gates can be utilized in a two-level programmable logic array (PLA) to save circuit area, to map logic functions into compact and fast Whirlpool PLAs [8] , and to realize AND-XOR planes that efficiently map n-bit adders [9] .
However, prior work with ambipolar SB-CNTFETs has only demonstrated dynamic logic, where function monotonicity requireThis research was supported in part by Swiss NSF grant 20021-109450/1 and in part by NSF grant CCF-0701547. The first author thanks Prof. Subhasish Mitra for useful discussions. . ments limit the potential of multi-level logic implementations. Furthermore, multi-level logic synthesis that leverages the high expressive power of ambipolar SB-CNTFETs, i.e., the potential to implement more complex logic functions with less physical resources has not been investigated in literature. Unlike ambipolar SB-CNTFET logic gates that implement XOR operations in a compact form, traditional libraries provide the universal NAND, NOR, and compound AOI/OAI gates but fail to efficiently implement circuits that contain one or more binate operations such as the XOR. This makes them inefficient for circuits such as n-bit adders and parity functions that are efficiently implemented using XOR gates [9] . This paper exploits the unique in-field controllability of the device polarity of ambipolar SB-CNTFETs to design a family of fullswing static logic gates based upon SB-CNTFETs in a transmission gate configuration. Based on generalized NOR-NAND-AOI-OAI primitives that embed XORs, the family is used to build a technology library with a significantly higher expressive power than conventional CMOS libraries. To the best of our knowledge, this is the first work to describe a library of ambipolar SB-CNTFET gates that can be cascaded and used for the synthesis and mapping of multi-level logic circuits. Logic gates with no more than three SB-CNTFETs each in the pull-up (PU) and pull-down (PD) networks respectively can implement 46 functions, as compared to only 7 functions with CMOS logic having the same topology. This core family of static logic gates forms the basis for compact extensions to a pseudo logic family with transmission gates in the PD network, a static logic family with pass transistors in the PU and PD networks, and a pseudo logic family based only on pass transistors in the PD network. Since the design of the gates can be regularized, the ability to functionalize them in-field opens opportunities for novel regular fabrics based on ambipolar SB-CNTFETs. Technology mapping of several multi-level logic benchmarks -including multipliers, adders, and linear circuits -indicates that on average, it is possible to reduce both the number of gates and area by ∼ 38% while also improving performance by 6.9×. Although not reported here, energy per cycle gains over CMOS are expected to be consistent with the 2.5× reduction reported in literature [1] . This paper is organized as follows. Section 2 provides a background to SB-CNTFETs. Section 3 describes the design of the static and pseudo ambipolar SB-CNTFET logic families. Section 4 describes library implementation, characterization, and results of synthesis and mapping. Section 5 discusses regular fabrics and directions for future work. Section 6 is a conclusion.
Background and motivation
Although different families of CNTFETs have been demonstrated in literature, the most important distinction is between MOSFETtype and Schottky-barrier-type CNTFETs [6] . Whereas the first family is characterized by doped CNT channels and Ohmic contacts, the second family uses intrinsic CNT channels that form a Schottky-barrier (SB) at the metallic drain and source contacts. SB-CNTFETs are ambipolar, i.e., they conduct both electrons and holes, showing a superposition of p-and n-type behaviors. The SB thickness can be modulated by the fringing gate field at the CNTto-metal contact, allowing the polarity of ambipolar SB-CNTFETs (CNTFETs henceforth) to be set electrically [5] . Similar ambipolar behavior has also been reported in graphene nanoribbon field-effect transistors, and suggests the possible electrical polarity control of these novel devices as well [10] .
Whereas the uncontrollable ambipolar behavior -that enables transistor conduction in either gate polarity -is undesirable, the ability to control CNTFET polarity (p-or n-type) in-field by controlling the fringing gate field suggests the innovation of using a second gate, termed the polarity gate throughout this paper, to control the electrical field at the CNT-to-metal junction and to set the device polarity [5] . Thus, CNTFETs can be used to realize in-field programmable ambipolar devices, i.e., devices whose p-or n-type behavior can be programmed in-field using the polarity gate.
Several techniques to manufacture such in-field programmable CNTFETs have been proposed in literature [5, 7] . A sample device cross-section is shown in Fig. 1 .a with the layout drawn in Fig. 1.b . The gate G in region A turns the device on or off, as the regular gate of a MOSFET does; the polarity gate PG in region B controls the type of polarity setting to p-or n-type. If the polarity gate is set to 0, the device exhibits n-type behavior; the device exhibits p-type behavior if the polarity gate is set to 1. The symbol for the in-field programmable CNTFET is shown in Fig. 1 .c and the configuration of p-and n-type devices is illustrated in Fig. 1 .d. 
Ambipolar CNTFET logic families
The novel in-field programmability of CNTFETs was investigated in [6] , where a compact in-field reconfigurable logic gate that maps eight different logic functions of two inputs using only seven CNTFETs was presented. In [7] , the design of a generalized NOR (GNOR) gate was proposed as the core building block to realize infield PLAs. It has a compact design and a high expressive power by combining both NOR and XOR operations in the output function. For example, the dynamic GNOR gate in Fig. 2 and "evaluate" operations in dynamic logic. However, this logic gate has two major weaknesses. First, it is based on dynamic logic that is vulnerable to internal signal races. Second, if both signals B and D are equal to 1, then the PD network will be formed exclusively by p-type devices. This can pull down the output to ∼ |V Tp| at most. The output does not provide full swing, worsening further when stages are cascaded, seriously compromising noise margins.
Transmission gate static logic family
The first innovation proposed in this paper is that analogous to CMOS gates, full swing can be restored by inserting a PU network that represents the complement of the PD network. However, the potential presence of n-type (p-type) CNTFET(s) in the PU (PD) network may still result in a degradation of the output signal. In fact, an n-type device in the PU network passes V DD − VTn at most, and a p-type device in the PD network passes V SS + |VTp| at least, causing signal degradation in both cases. To obtain full swing in all configurations, we replace each CNTFET whose polarity is to be set during operation time by a transmission gate formed by two CNTFETs controlled (at both the regular gate and the polarity gate) by complementary signals. In a transmission gate, both n-and ptype devices are in parallel to ensure that one of the two transistors restores the signal level in all cases ( The second innovation proposed in this paper is to extend the GNOR gates to generalized NAND (GNAND) and generalized AOI and OAI (GAOI and GOAI) configurations, by considering seriesparallel combinations of transmission gates and transistors in the PU/PD paths. Figure 4 illustrates the circuit implementation of all gates that can be obtained using no more than two transmission gates or transistors in series/parallel in the PU/PD networks. The derivation of transistor aspect ratios (W/L), indicated in the figure, will be explained in Sec. 4. With no more than three transmission gates and transistors in the PU or PD networks, with a maximum of three inputs (applied to the gates) and three control inputs (applied to the polarity gates), we obtain 46 different logic gates listed in Table 1 . Even though every transmission gate has two transistors, a topologically uniform comparison between CNTFET-and CMOS-based gates suggests that we consider CMOS gates with three inputs at most, instead of six. Then, with the same constraints and topology, we obtain only 7 CMOS-based logic gates (F00, F02, F03, F10, F11, F12, and F13), highlighting the higher expressive power of the proposed transmission-gate-based static logic family.
In this design approach, whenever the function U ⊕ V is implemented with transmission gate CNTFET, both polarities of U and V are needed, as illustrated in Fig. 4 . By swapping the order in which the signals with different polarities are applied to the transmission gates, it is possible to implement U ⊕V , U ⊕V and U ⊕V . F07: 
Alternate CNTFET families
In this section, we derive alternate CNTFET families with lower transistor count from the transmission gate static logic family. In the first approach, the transistor count can be reduced by replacing the PU network by a single PU transistor, resulting in a pseudo logic style. The PU CNTFET is weaker than the PD devices in order to allow the output signal to fall sufficiently and meet the noise margin. The gates are expected to be slower because of the weak PU network. Higher static power is also a potential concern. The pseudo logic implementation of the same set of logic functions listed in Table 1 can be derived, as illustrated in Fig. 5 .a for F05.
The second approach to reduce transistor count is to replace all transmission gates by pass transistors, in static or pseudo logic configurations. Figures 5.b and 5.c illustrate the pass transistor implementations of F05, as an example, in static and pseudo logic styles respectively. However, this implies that CNTFETs that are electrically configured as p-or n-type can be located in the PU or PD network, respectively. Since this may degrade the output level, a restoration stage (inverter) is used to restore full swing at the output. The area-delay costs of this approach are assessed in Sec. 4.3. 
Simulation results
We designed the logic gates with equal rise and fall times, and the output current is equal that of the unit inverter. Since electron 
and hole mobility is equal in CNTs, the on-resistance of p-and n-type CNTFETs is equal. Thus, unlike CMOS gates, the PU devices in CNTFET gates need not be larger than the PD devices. This yields smaller CNTFET gates compared to the CMOS gates implementing the same function. We simulated the correct operation of the designed CNTFET families with the Stanford CNTFET model for unipolar devices [11] , using a lithography pitch of 32 nm. At the time of writing this paper, no SPICE model for controllable ambipolar CNTFETs was available. Ambipolar behavior was modeled by fixing the polarity gate signals, i.e., the device polarities during simulations, along the lines suggested in [6] . All results are in comparison to the 32nm technology node for CMOS.
Transmission gate static design
We denote by Rn (Rp) the on-resistance of the n-type (p-type) device. The resistance of a transistor conducting in the weak direction is roughly double its on-resistance [12] . Hence, the resistance of a transmission gate is estimated as R n 2Rp if it conducts a low signal, and 2R n Rp if it conducts a high signal. Since R = R n = Rp holds for CNTFETs, the equivalent resistance of the transmission gate is always ∼ 2R/3. These values were taken into account in sizing the transmission gates. Note that although the decrease of the on-resistance to ∼ 2R/3 instead of R speeds up the gates, transmission gates with a unit on-resistance have a larger area (2 × 2A/3) than unit transistors (A), which may offset the speed advantages due to the higher input capacitance.
Alternate CNTFET families
The pass transistors were sized to achieve equal rise and fall times and to drive as much current as a unit inverter. Since the pass transistors potentially operate as n-type in the PU network or p-type in the PD network, their worst-case on-resistance is 2R. Thus, they were designed to be double the unit size (area = 2A). Despite the reduction in transistor count of the pass transistor family over the transmission gate family, the area cost to achieve unit on-resistance is higher (2A vs. 4A/3). Consequently, transmission gates are preferable to pass transistors in static logic. In pseudo logic, pass transistors may be useful because the logic gates require no inverted inputs, unlike other logic families. We assumed for pseudo logic gates (with either transmission gates or pass transistors) that the PU device is 4× weaker than the PD network, which offers a good compromise between delay and area. Table 2 summarizes the area and FO4 delay estimates for the library cells. Note that the additional gates obtained by swapping the signal polarities at the transmission gates (Sec. 3.1) have the same area and delay as the gates from which they were derived. Then, we compared them to their CMOS counterparts, whenever they exist with the same topology and with no more than 3 transistors in the PU and PD networks, respectively. The area of the logic gates was estimated in a normalized manner as the number of transistors multiplied by their respective aspect ratios (W/L), given that all gates were designed to drive the current of a unit inverter. The FO4 delay was calculated with the switch-level RC delay model [12] and is equal to the delay of a gate driving 4 instances of itself. In this model, the FO4-delay is given by p + 4g, where p is the parasitic (or intrinsic) delay of the logic gate and g is the logical effort [12] . The input capacitance of the polarity gate and the actual gate were assumed to be equal. Similar to MOSFETs, we also assumed that the gate capacitance of CNTFETs is roughly equal to the drain/source parasitic capacitances. We calculated the FO4 delay on average (for all inputs) and in the worst case (for the slowest input). The FO4 delay was normalized to the delay of a unit inverter τ (defined as the delay of a fanout-of-1 inverter with no parasitic capacitances). This metric is technology-dependent and CNTFETs are roughly 5.1× better than CMOS [1] .
Library characterization
Note that the static transmission gate XNOR gate has a lower FO4 delay than the unit inverter. This is because of the lower parasitic drain capacitance of the transmission gates in the XNOR, when compared to an inverter driving the same output current. Most of the cells designed with static transmission gates present this advantage. Thus, the normalized average FO4 delay of all CNTFET transmission gate static logic gates is comparable to that for all static CMOS gates, even though the CNTFET library implements more complex functions. Simultaneously, since equally sized pand n-type CNTFETs devices have the same on-resistance, the CNT-FET cells are more compact: despite the larger average number of transistors per gate in the CNTFET static library, its average area is slightly smaller (12.3 vs. 12.7) than the CMOS library. As expected, the CNTFET transmission gate pseudo logic family has a 31% smaller average gate area than its static counterpart (8.5 vs. 12.3); however, it is 33% slower (12 vs. 9) . Surprisingly, the CNT-FET pass transistor pseudo logic family is less area efficient than its transmission gate counter-part. This confirms the conjecture in Sec. 4.2 that larger area is needed for pass transistors in order to compensate for the high on-resistance of p-type (n-type) transistors operating in the PD (PU) network. This family is only 7% more compact than the transmission gate static logic family (average area: 11.5 vs. 12.3), while it is 2.7× slower (delay: 9 vs. 24.1). This makes the CNTFET pass transistor family a bad choice for circuit design. All the CNTFET logic families need both polarities of inputs for XOR operations. Consequently, we included an output inverter in every gate, in order to provide both polarities of every output. The average delay and area of the logic families with output inverters are indicated in the penultimate row of Table 2 .
Logic synthesis and mapping results
We used the tool ABC developed at Berkeley [13] for logic synthesis and technology mapping of several benchmark circuits. The circuits were first synthesized using the resyn2rs script, followed by technology mapping using genlib libraries that were compiled for each logic family based on the area-delay values from Table 2. The results for 15 benchmark circuits are summarized in Table 3 . In Sec. 3.2 and 4.3, we demonstrated that the transmission gate configuration outperforms the pass transistor configuration in terms of area and delay. We therefore considered only transmission gate implementations in static and pseudo logic and we compared them to a CMOS library. For each family, the number of gates, the normalized circuit area (to a unit transistor), the logic depth, the normalized delay (to the technology-dependent intrinsic delay τ [1] ), and the absolute delay in picoseconds are reported. Whereas both CNTFET families reduce the implementation complexity, the static family is more efficient in terms of speed and the pseudo family is more attractive in terms of area. Of the benchmarks, circuits that embed XOR operations -the adders, ALUs, error correcting circuits, and the multiplier C6288 -return the largest area and speed improvements when implemented in CNTFET technology.
The implementation with both transmission gate CNT families requires on average ∼ 38% fewer gates and 40% less logic levels than CMOS. While the static logic CNTFET family saves 37.7% area on average compared to CMOS, the pseudo logic CNTFET family saves 64.5% area on average. The area normalization factor was set to the area of a unit transistor, which is expected to be equal for MOSFETs and ambipolar CNTFETs [6] , since the additional polarity gate is buried underneath the channel or defined on top of the actual gate. However, we may expect a negligible area cost due to the contact area of the polarity gate.
The circuits implemented in static and pseudo CNTFET families are 26.4% and 13.0% faster than the CMOS implementation respectively in terms of normalized delay. Delay was normalized to the technology-dependent intrinsic delay τ , and unipolar CNTFETs are expected to be 5.1× faster that CMOS [1] . We assumed the same intrinsic delay for unipolar and ambipolar CNTFETs to calculate the absolute delay of the logic circuits. Figure 6 shows the cumulative benefits of technology and design that translate into an average speed-up of 6.9× and 5.8× for static and pseudo CNT-FET logic families respectively compared to CMOS. The largest speed-up was calculated for the static CNTFET implementation of multipliers (∼ 10×) and error correcting circuits (more than 8×). For delay calculations, we considered the worst case scenario when every signal, i.e., either input or control signal, needs to charge or discharge an input capacitance equal to a unit drain/source intrinsic capacitance on every switching operation. Consequently, the reported estimates for the delay of the mapped circuits are the worstcase values. Even though the delay due to signal routing around ambipolar cells was not considered, its impact is expected to be mitigated due to the advantages of smaller CNTFET cell layout. 
Discussion and opportunities
Since the proposed CNTFET logic gates have a higher expressive power than their CMOS counterparts, their regular structure motivates their use to design regular fabrics. A regular fabric is a set of resources (gates, memory, interconnect. . . ) laid out in a regular manner, which can be mask-or in-field configured to implement specific logic functions. Several regular gate and logic arrays have been recently proposed to reduce the design risk due to increasing variability in current and future CMOS nodes, e.g., [8, [14] [15] [16] .
The baseline architecture of an ambipolar CNTFET regular fabric is depicted in Fig 7. a. Two types of logic blocks are interleaved. Their respective outputs are routed throughout the circuit by means of an interconnection network, which can be configured with SRAM cells in a similar manner to FPGAs. A detailed view of the two types of logic blocks is presented in Fig. 7.b and 7 .c. The main components of the logic blocks are generalized NOR and NAND gates whose circuit implementation with CNTFET technology is presented in Fig. 8 . The design takes advantage of their identical physical layout rotated by 180
• . Depending on the signals connected to the inputs of the generalized gates, they can be configured in order to implement a large set of cells from the library presented in Sec. 3 
Figure 8: (a) GNOR and (b) GNAND gates for regular fabrics
the other logic families can be derived in a straightforward manner from the static transmission gate family. In-field programmable regular fabrics offer a simple design flow, reconfigurability, and immunity to process variability. The regularity and symmetry allows easy bounding of delays, and if the local routing delay is small enough, then the gates can be designed with dynamic logic with no risk of internal signal races. This yields a more robust dynamic logic, while taking advantage of its lower power and area compared to static logic.
Conclusions
This paper described novel design guidelines for logic gates based on in-field reconfigurable ambipolar CNTFETs. The logic gates embed XOR operations efficiently, offering higher expressive power than equivalent CMOS gates. When used to map several benchmark circuits, the proposed static transmission gate CNTFET library requires on average 37.7% less area than static CMOS, with further reductions possible with pseudo logic CNTFETs. Multipliers, adders and error correcting circuits showed the highest improvement with up to 76.6% savings in area. The lower logic depth and lower intrinsic delay of CNTFETs result in an average circuit speed-up of 6.9×. The regular structure of the proposed gates can be exploited to manufacture regular fabrics, offering enhanced flexibility and robustness within a simple design framework.
