Abstract We propose a novel nanofabric approach that mixes unconventional nanomanufacturing with CMOS manufacturing flow and design rules in order to build a reliable nanowire-CMOS fabric called N 3 ASIC with no new manufacturing constraints added. Active devices are formed on a dense uniform semiconductor nanowire array and standard area distributed pins/vias; metal interconnects route the signals in 3D. CMOS design rules are followed. Novel nanowire based devices are envisioned and characterized based on 3D physics modeling. Overall N 3 ASIC fabric design, associated circuits, interconnection approach, and a layer-by-layer assembly sequence for the fabric are introduced. Key system level metrics such as power, performance, and density for a nanoprocessor design built using N 3 ASICs were evaluated and compared against a functionally equivalent CMOS design synthesized with state-of-the-art CAD tools. We show that the N 3 ASICs version of the processor is 3X denser and 5X more power efficient for a comparable performance than the 16-nm scaled CMOS version even without any new/unknown-manufacturing requirement added.
, CMOL [7] and FPNI [8] have been proposed minimizing certain manufacturing constraints, some or all of the aforementioned concerns still exist.
In this paper we propose a novel approach that mixes unconventional nanomanufacturing with conventional CMOS lithography and design rules to build a new class of 3-D integrated nanofabrics without any new manufacturing requirements. A new nanofabric, called N 3 ASICs (Nanoscale 3-D Application Specific Integrated Circuits) is presented. This fabric combines the advantages of high density obtained from unconventional manufacturing with the reliability and overlay precision of conventional CMOS manufacturing.
In N 3 ASICs, active devices are formed on uniform aligned semiconductor nanowire arrays, and area-distributed interfaces are used to connect to a 3-D CMOS metal stack for routing. To enable full integration with CMOS, lithographic design rules are also followed in shaping the fabric. Furthermore, a single unconventional manufacturing step to pattern/assemble sublithographic nanostructures is carried out at the beginning without any overlay requirement before any lithographic step. Thus, registration and overlay requirements exist only for subsequent photolithography steps, which is [9] ). This is in direct contrast to proposals such as CMOL [7] and FPNI [8] , where an unconventional manufacturing step (e.g., Nano-Imprint Lithography (NIL) [10] ) with fairly poor [11] ) is required after conventional lithography steps.
Core concepts of the N 3 ASIC fabric are introduced. A layer-by-layer assembly sequence is shown demonstrating how the fabric may be realized on a single Silicon-onInsulator (SOI) wafer. Novel dual-channel Crossed Nanowire Field Effect Transistors (2C-xnwFETs), the active devices in N 3 ASICs, are extensively characterized using accurate 3-D physics-based simulations calibrated with experimental data. Associated circuit styles and interconnection approach are described and validated for functionality. A nanoprocessor design is implemented on N 3 ASICs, and key system-level metrics, including area, power and performance are evaluated.
The key contributions of this paper are: (i) N 3 ASIC, a new hybrid CMOS/nano computational fabric is described; (ii) Extensive device-level characterization of novel 2C-xnwFETs for N 3 ASIC is shown; (iii) Key system-level metrics such as density, performance and power for N 3 ASIC are evaluated and compared against an equivalent 16nm CMOS design. We show that N 3 ASICs has 3X density and 5X power advantage over an end-of-the-line 16nm CMOS with comparable performance even if all CMOS design rules ASIC fabric built on a standard Silicon-on-Insulator (SOI) wafer. It consists of uniform parallel semiconductor nanowire arrays on which logic is implemented. Area-distributed standard pins or vias are used to connect inputs and outputs of these logic planes to the CMOS routing stack. Metal interconnections between vias achieve arbitrary routing. Support peripheral CMOS circuitry can be used for external control and dynamic clocking.
The underlying uniform nanowire array at the bottom layer can be direct patterned on an ultra-thin SOI substrate using approaches such as Nano-Imprint Lithography (NIL) [10] or Superlattice Nanowire Pattern Transfer (SNAP) [12] [13] . For example, SNAP has shown uniform Silicon nanowire arrays at dimensions as low as 7nm width and 13nm pitch [14] . All subsequent steps, including the creation of vias, contacts and metal interconnect are achieved using conventional lithography and by obeying standard design rules.
To enable full and fine-grained integration with CMOS (e.g., not only IO signals but also inputs/outputs for each nanowire gate) without new manufacturing requirements, lithographic design rules need to be followed. Fig. 2 shows 3 ASIC fabric. All requirements for via overhang, metal-via and metal-metal spacing etc. are followed (e.g. [15] projects metal pitch = requirements decide spacing, more sub-lithographically patterned nanowires may be bundled within the same dimension without loss of density. This allows for better contact, performance and inherent defect resilience, as will be shown in subsequent sections. Fig. 3 shows a layer-by-layer assembly sequence for N 3 ASICs. At the bottom of the fabric is a uniform semiconductor nanowire array (Fig. 3A) . Metal gates (shown in green) are deposited at certain positions to define 2C-xnwFETs ( Fig. 3B ) using conventional lithography. A selfaligning ion implantation is then used to create n+/p/n+ structures for enhancement mode 2C-xnwFETs similar to conventional CMOS. All device channels are oriented along the same direction and lie on the substrate itself. Power and dynamic control rails are also established to define two separate logic planes. Metal lines and vias may then be laid down for interconnection. Inputs are received through an M1 array (light blue lines) and vias are dropped on to the nanowires to tap the outputs (blue dots) (Fig. 3C ). In Fig. 3D , outputs from the left logic plane are cascaded to the inputs of the right plane using M2 (orange lines). This approach can be scaled to a large scale design with multiple cascaded logic planes. Since a single unconventional patterning step such as SNAP or NIL is carried out a priori to any lithography, it will not have any registration or overlay requirement. Furthermore, registration of the first lithographic mask against the patterned nanowires can be achieved by transferring alignment markers to the substrate in the same step as logic nanowires (which ensures that the features are self-aligned). For an approach such as NIL, an arbitrary alignment marker could be created. For SNAP, where it may not be possible to create arbitrary markers as part of the superlattice, Moire patterns [16] could be used for registration. Furthermore, the underlying pattern of nanowires is uniform, which implies that the first lithographic mask can be offset with some tolerance and no loss of functionality.
This approach is in direct contrast to such as CMOL and FPNI, where the fabric organization requires nonconventional techniques such as imprint lithography to be employed after fabrication of CMOS layers. Overlay [11] , which implies significant challenges in alignment against previously formed CMOS features and would result in very low or zero yield.
A. Device Structure
The use of standard design rules and lithography for manufacturing determines device structure and dimensions. Given that channel nanowires could have much smaller dimensions than metal vias, they are bundled into pairs to make better contact, and provide for dual channel FETs.
In this paper the 2C-xnwFET along with an omega-like structured deposited metal gate shown in Fig. 4 was used. The gate width and the channel length of the device are defined by the technology node as they are lithographically defined. So for the purpose of study, devices with 16nm gate lengths were simulated. A high-k dielectric (HfO 2 [17]) was used as gate oxide material. A gate self-aligned process with etch back can be used for defining the oxide structure. Since this is an Omega-gated structure (somewhat similar to multi gate FETs [18]), one can expect good electrostatic control of the gate over the channel as it has a better gate to channel coupling as opposed to a top-gated structure. A better electrostatic control over the channel gives a higher on to off current ratio. The use of dual channels implies higher on-current, with potential benefits for system-level performance. Furthermore, the dualchannel structure implies inherent defect resilience against broken nanowires and some types of stuck-off defects, without a density impact. Stuck-off defects are very difficult to mask in general (vs. stuck-on defects that can be masked with redundancy fairly easily): this therefore is a good compromise.
B. Circuit Style
N 3 ASICs uses a dynamic circuit style similar to the circuit style employed by NASICs [3] . These dynamic circuit styles are amenable to implementation on regular nanowire arrays without the need for complementary devices, arbitrary sizing or placement, simplifying manufacturing requirements on N 3 ASICs. Logic customization is limited to defining the positions of the 2C-xnwFETs on the logic planes. Cascading and noise concerns for dynamic circuits arising from high output impedance are carefully managed through device design and intelligent fabric-level sequencing schemes similar to the approaches presented in [19] [20] [21] .
One dynamic sequencing scheme for cascading is shown in Fig. 5 . In this scheme, successive stages are clocked using different precharge and evaluate signals, with hold phases inserted for correct cascading. During a hold phase, the output node of a given stage is implicitly latched, and used for evaluation of the next stage, similar to [19] [21] . Implicit latching implies that area expensive latches or flip-flops requiring complementary devices/local feedback paths are not needed. Fig. 6 shows the top view of a 1-bit full adder circuit built using two N 3 ASIC logic planes. In this example, a 2-level 
III. EVALUATION AND RESULTS
The N 3 ASIC fabric was extensively evaluated at device, circuit and architectural levels. Device I-V and C-V characteristics were extracted, reflecting accurate 3-D physics. An integrated device-fabric methodology was used to create behavioral models of devices for a circuit simulator. Circuit level simulations were carried out to verify functionality. System-level metrics such as power and performance were evaluated for an N 3 ASIC processor design. The following subsections describe each phase in detail.
A. Device Simulations
Enhancement mode Dual-Channel Crossed Nanowire FETs (2C-xnwFETs, Fig. 4 ) were extensively characterized using accurate physics-based 3D simulation of the electrostatics and operations using Synopsys Sentaurus TM [22] . The 2C-xnwFETs employ metal Omega gate structures for tighter electrostatic control. Gate material work function is 4.6 eV. 16nm channel devices were simulated given that it is the minimum feature size for lithographically defined gates. The notation N 3 ASIC-16 represents N 3 ASIC constructed with 16nm CMOS design rules, which the scale length, is equal to 8nm. The channels are doped ptype of the order of 10 18 cm -3 and the source/drain regions were doped n-type of the order of 10 20 cm -3 . A substrate bias of -3V was assumed to deplete the channel and adjust device parameters such as threshold voltage and on/off current ratios for correct cascading. A high-k HfO 2 material is used for gate oxide. The gate oxide thickness was 3nm. Drift diffusion transport models [23] were used to simulate the 3D devices. Simulations were calibrated to account for interface scattering, surface roughness and interface trapped charges as explained in [20] .
Drain current vs. drain voltage (I DS -V DS ), drain current vs. gate voltage (I DS -V GS ), and different parasitic capacitances vs. gate voltage (C vs V GS ) were simulated. On-current (I ON ) and on/off (I ON /I OFF ) current ratio were extracted. Fig. 7 shows the I DS -V DS curve for different V GS values. Fig. 8 shows the I DS -V GS curves for different V DS values. These simulations verify inversion mode behavior for 2C-xnwFETs with a positive threshold voltage. Table 1 shows key device parameters for N 3 ASIC-16 2C-xnwFET and for also the NASIC xnwFET described in [20] . Due to the dual channel the N 3 ASIC-16 2C-xnwFET have a higher ON current compared to the NASIC xnwFET that lowers intrinsic delay and can improve circuit performance. Also, VTH > 0.2, and ION/IOFF > 10 4 were obtained, implying that the devices meet circuit requirements for correct functionality and noise [20] . [15] . With the help of behavioral models, HSPICE simulations were carried out to verify functionality and measure the performance and power of N 3 ASIC-16. The full-adder in Fig. 6 was simulated in HSPICE to verify expected circuit level behavior. Fig. 11 shows the output waveforms of the one bit full adder simulated in HSPICE with the behavioral model. These simulations verify functionality of the circuits and adequate noise margins. It can be noted that the data on the output node is latched during the hold phases.
C. System-level Evaluation
For the purpose of system-level evaluation WISP-0 [5] [27], a processor incorporating nanopipeling was chosen. Area of each tile in N 3 ASIC-16 WISP-0 was calculated based on the design rules and the number of metal tracks. A HSPICE circuit definition of the entire WISP-0 was created with proper interconnects to calculate the power and performance of N 3 ASIC-16 WISP-0. Key system level metrics such as area, performance and power were compared with a functionally equivalent 16nm static CMOS baseline.
The 16nm static CMOS baseline was created using the following methodology. A functional description of WISP-0 was written in Verilog. Using Synopsys Design Compiler, and standard cell library, gate level Verilog netlist was created. This was converted to a circuit-level netlist using the nettran utility. HSPICE definition of the standard cell library was used for this purpose. The MOSFET device dimensions were scaled to the 16nm technology node. The netlist and PTM 16nm MOSFET high performance models were used to run circuit level simulations in Synopsys HSPICE to measure the performance and power of the CMOS design. For area estimation the WISP-0 was synthesized using 45nm standard cell library and quadratically scaled down to 16nm. Fig. 12 shows the density advantage of N 3 ASICs at various technology nodes. The proposed N 3 ASIC-16 is 3X denser compared to 16nm CMOS. The density improvement is due to the regular dense nanowire logic array at the bottom, use of single type of FET, smaller device footprint, and use of implicit latching without the need for area expensive flip flops. Since CMOS design rules are used for pitch and spacing, the scaling trend is almost constant across other technology nodes considered.
Power and performance comparisons are shown in Table  2 . We notice that the performance of N 3 ASIC-16 is comparable to that of 16nm CMOS equivalent WISP-0. These simulations do not consider key optimizations for 2C-xnwFETs making comparisons pessimistic. For example, while the PTM models employ strained silicon, no straining was assumed for 2C-xnwFETs. It is expected that a better mobility and hence better performance could be obtained when straining techniques are employed in N 3 ASIC. A significant reduction in average power of 5.4X was observed in case of N 3 ASIC-16. To clearly explain this, experiments were carried out with different circuits and varying number of inputs. With the voltage and the frequency of operation being the same, the capacitances were investigated. Since there is no arbitrary sizing in the case of N 3 ASICs and all 2C-xnwFETs are identical, the maximum input gate capacitance is always 20.42aF (Fig. 9) . In case of the CMOS WISP-0 design, the transistors are sized, contributing to increased gate capacitance. The input gate capacitance in the case of minimum sized inverter in CMOS is 75.14aF which is more than 3.5X that of the N 3 ASICs. The largest NMOS device used has a gate capacitance of 135.4aF and the largest PMOS device has a gate capacitance of ASICs to improve the density and performance of the fabric will be explored. Currently the density of the fabric is determined by metal 1 pitch and the via spacing. The density of the fabric would greatly improve if we just have the vias/pins to connect the inputs and outputs of a tile. This would be possible if nanowires are used to route signals within the tile instead of metal interconnects. By reducing the number of vias/pins we can take advantage of more densely packed nanowire arrays.
Another benefit of the 2C-xnwFET is that it provides fault tolerance against stuck-open devices in comparison to a single channel device. For example, dual channel structures are more resilient to broken nanowires. A single conducting nanowire can still achieve correct functionality. Given that nanowires can have much smaller dimensions than metal vias, multiple channel devices (greater than 2) could also be considered. From a fault-tolerance perspective, more channels would imply better resilience to stuck-off defects. However, due to reduced channel cross-section scattering effects would increase causing deterioration in device performance. More detailed evaluations will be carried out as part of future work. Ultimately, defect distributions and performance targets will drive device design.
In order to improve the performance of the device, techniques like strain engineering [28] [29] can be applied to increase the mobility of the charge carriers in the channel. Apart from the enhancement mode devices, junctionless devices similar to [29] [30] can be used for N 3 ASICs. This would imply manufacturing and performance benefits. In a junctionless FET, a uniform doping profile is used on the channel without the need for n+/p/n+ junctions, which implies simpler manufacturability. The performance of devices and circuits could be expected to increase owing to bulk conduction in junctionless devices unlike the enhancement mode devices which exhibit inversion layer conduction.
To reduce the manufacturing costs involved, a structured N 3 ASICs can be envisioned similar to the structured ASIC [31] approach. All nanowire logic planes could be identically sized and with pre-defined 2C-xnwFET positions. Arbitrary functionality and logic may be achieved purely with routing customizations using custom metal interconnects. This can potentially reduce design time and the manufacturing cost as it reduces the number of masks required.
V.
CONCLUSION N 3 ASICs, a 3-D integrated nano-CMOS hybrid fabric was shown. Integration is fine grained: each input and output of a nanowire gate can be routed to any CMOS gate. The fabric uses unconventional manufacturing processes in conjunction with CMOS design rules for full 3-D integration without any special manufacturing requirements. A detailed layer by layer assembly sequence was presented. Detailed fabric evaluations were carried out at device, circuit, and system levels. A nanoprocessor implemented using the proposed N 3 ASIC fabric was shown to be 3X denser than an equivalent CMOS design even if all conservative CMOS design rules are obeyed. At a 5X lower power consumption the N 3 ASIC fabric is able to achieve the same performance as that of the CMOS processor even without device optimizations like straining that were supported in the 16nm CMOS device. With straining and by relaxing some of the design rule requirements much additional benefits may be possible. - 
