A field-programmable nanowire interconnect (FPNI) enables a family of hybrid nano/CMOS circuit architectures that generalizes the CMOL (CMOS/molecular hybrid) approach proposed by Strukov and Likharev, allowing for simpler fabrication, more conservative process parameters, and greater flexibility in the choice of nanoscale devices. The FPNI improves on a field-programmable gate array (FPGA) architecture by lifting the configuration bit and associated components out of the semiconductor plane and replacing them in the interconnect with nonvolatile switches, which decreases both the area and power consumption of the circuit. This is an example of a more comprehensive strategy for improving the efficiency of existing semiconductor technology: placing a level of intelligence and configurability in the interconnect can have a profound effect on integrated circuit performance, and can be used to significantly extend Moore's law without having to shrink the transistors. Compilation of standard benchmark circuits onto FPNI chip models shows reduced area (8× to 25×), reduced power, slightly lower clock speeds, and high defect tolerance-an FPNI chip with 20% defective junctions and 20% broken nanowires has an effective yield of 75% with no significant slowdown along the critical path, compared to a defect-free chip. Simulations show that the density and power improvements continue as both CMOS and nano fabrication parameters scale down, although the maximum clock rate decreases due to the high resistance of very small (<10 nm) metallic nanowires.
Overview
Micro/nano hybrid architectures have been proposed which integrate nanowire crossbars with a CMOS chip (figure 1, right) [6, 25, 26, 18, 20] . Each crossbar 'junction' (formed by one nanowire crossing over another separated by a small distance ( figure 1, left) ) is generally hypothesized to be an electrically configurable or reconfigurable device, the simplest being an antifuse. Metallic 'pins' on the surface of the chip connect down into CMOS gates and provide contact points for electrically attaching nanowires in the crossbar. The architects of these hybrids must decide: (1) how to split functionality between the CMOS and nano layers; (2) how to interconnect the two layers by appropriate placements of pins and nanowires; and (3) how to deal with the high defect rates and device variability found in the nanowire crossbars. One of the earliest ideas proposed that demultiplexers, implemented in the nanowire crossbars, would allow a small number of pins to control a large number of nanowires [12, 6, 24] . Although progress has been made with this approach [13, 19, 9, 14] , demultiplexers are difficult to build without nonlinear devices. Demultiplexers also present architectural challenges since they are often called upon to do double-duty: to configure selected junctions in the crossbars as well as shuttle data between the CMOS and nano layers.
CMOL
A more recent approach, CMOL (CMOS/molecular hybrid) [20, 22] , proposes moving the most difficult functions necessary for logic-inversion and gain-along with the demultiplexing into the CMOS layer, using the nanowires and selected configured junctions only for wired-OR logic and signal routing. By distributing pins uniformly across the surface of the CMOS chip and structuring the crossbars such that each nanowire connects to exactly one pin, CMOL achieves simplicity in junction configuration and maximal signal bandwidth between the CMOS and nano layers. Preliminary studies have shown that CMOL field-programmable gate arrays (FPGAs) might provide circuits about two orders of magnitude denser than conventional CMOS FPGAs and with similar performance [22] .
The cleverness and chief virtues of CMOL are its simplicity, density, and clean separation of configuration and data communication, but it does present some operational and fabrication issues. Because it uses non-complementary, wired-OR logic, keeping static power dissipation within bounds (200 W cm −2 ) requires careful optimization of the closedjunction resistance, pass-transistor resistance, and supply voltage. An initial estimate places the supply voltage, V dd , at about 0.3 V [22] , far lower than any projected by the International Technology Roadmap for Semiconductors (ITRS) through the year 2020 [10] . The configurable junctions are assumed to be extremely nonlinear antifuses in order to implement the wired-OR function, but device variability or insufficient nonlinearity might limit the fan-in to less than desired (a fan-in of 7 was assumed in [22] ), decreasing the circuit density. The wired-OR logic also restricts the architecture to nonlinear nanodevices, excluding the use of more linear devices that might be easier to fabricate.
CMOL pins present a fabrication challenge, as shown in the left half of figure 2-the pins are actually 'nanopins,' just a few nanometres in diameter. Half of the nanopins (the blue pins in figure 2 ) must be taller (by about 8 nm) than the other half (red pins), and they must be surrounded by an insulating layer (shown as grey cladding in the figure). This might be accomplished by projecting the taller pins above the surface of the CMOS layer, which would be flush with the tops of the shorter pins, but the projecting nanopins would severely complicate subsequent circuit processing steps.
Another challenge is registration error or uncertainty in the locations of the nanopins. The protruding blue nanopins of figure 2 are designed to extend through and break the lower level nanowires at regular intervals in order to make contact with the upper level nanowires, but sufficient variation in nanopin location could lead to missing or extra breaks in the nanowires. Although in principle this need not significantly reduce the CMOL connectivity, it does present problems to a compiler attempting to map a circuit onto a CMOL The crossbar is slightly rotated so that each nanowire is electrically connected to one pin extending up from the CMOS layer. Electrically configured, nonlinear antifuses (green, bottom panel) allow wired-OR logic to be implemented, with CMOS supplying gain and inversion. The FPNI (right column) places a sparser crossbar on top of CMOS gates and buffers. Nanowires are also rotated so that each one connects to only one pin, but configured junctions (green, bottom panel) are used only for programmable interconnect, with all logic done in CMOS.
chip-not only must the compiler be aware of defective junctions and broken or shorted wires, it must also have a complete characterization of the actual nanowire lengths and connectivity. The inverter-based structure of CMOL presents a challenge to routing algorithms, which almost always assume routing networks of non-inverting buffers and switches. This structure requires either the development of new routing algorithms that keep track of signal polarity of nets during routing, or a pairing up of inverters into buffers, reducing the density but allowing conventional algorithms to be used.
One final challenge is the nanowire size (4.5 nm) and pitch (9 nm) chosen for CMOL. These values are far beyond any demonstrated lithographic capabilities, and essentially represent an extrapolation of the ITRS out to the year 2030 [10] . Thus, CMOL as proposed and described is a decade or more away from being realized.
FPNI
In this paper we present a general hybrid architectural approach named FPNI (field-programmable nanowire interconnect) that trades off some of the speed, density and defect-tolerance of CMOL in exchange for easier fabrication, lower power Figure 3 . Atomic force microscope topographs of a nearly defect-free region (left) and highly defective region (right) in a 34 nm pitch nanowire crossbar [11] . dissipation, and greater freedom in the selection of nanodevices in the crossbar junctions. The key differences are as follows.
(1) In FPNI architectures, logic is done only in CMOS, routing only in the nanowires (figure 3 shows some nanowire crossbars we have fabricated in our laboratory). This significantly reduces static power dissipation, and allows us to use linear (or approximately linear) antifuses for the nanowire junctions. In addition, the FPNI routing network is buffer-based, not inverter-based, which simplifies the routing. and 'nanopins' of two different heights on a nonplanar silicon surface [22] . Figure 2 compares the geometry of nanowires, pins and underlying CMOS for the two approaches. The CMOL (left column of figure 2) assumes a sea of inverters regularly connected to pins on the surface of the silicon. The nanowire crossbar on top is rotated slightly to allow each nanowire to contact exactly one pin; the approximately horizontal nanowires connect only to inverter inputs and the vertical only to inverter outputs. Selected junctions (shown in green in the bottom panel) are configured as nonlinear resistors to implement wired-OR logic (in conjunction with a pull-down transistor in the CMOS), with the CMOS inverters providing gain and inversion.
The FPNI (right column of figure 2 ), on the other hand, assumes a sea of logic gates, buffers and other components in the CMOS layer, and uses the nanowires only for the interconnect. The nanowires include large 'pads' to cover the pins (much larger than the CMOL nanopins), and there is a similar crossbar rotation so that each nanowire connects to only one pin. Selected junctions (green, bottom panel) are configured as resistors to interconnect the underlying logic. As the pin size and alignment error approach zero, the CMOL nanowire layout emerges as a special case of FPNI. The minimum planar separation between a CMOS pin and a passing nanowire that must be electrically isolated from it. R closed Closed junction resistance.
Architecture

Nanowires
The nanowire and cell geometries are derived from the first seven architectural parameters listed in table 1. We route the nanowires diagonally (with a slight rotation to maximize their length) in order to maximize routability. The two nanowire 'arms' emanating from the central 'pad' are of equal length to minimize worst-case RC (resistance-capacitance) delays. The core nanowire fabric is derived from the parameters of table 1 using a little geometry and trigonometry.
CMOS
The CMOS layer is divided into an array of square cells, with each cell connected to one input pin (for reading a signal driven from a nanowire) and one output pin (for driving a signal from a gate to a nanowire). A buffer is implemented in a single cell, while logic gates and flipflops require multiple cells. The logic gate used in this architecture is an n-input NAND/AND gate, implemented in n cells. It computes the logical AND of its inputs and drives the true (AND) and complemented (NAND) forms of the result. The motivation for such a nonstandard gate derives from the equal numbers of input and output pins in the cell array. A simple, n-input NAND gate would waste one or more output pins for n 2; the NAND/AND makes more effective use of the pins and also eliminates the need for inverters. For large n, the true and complemented outputs can be replicated to improve routability and defect tolerance. For all n > 2, at least one output pin per gate is used to drive a logical '1' signal, which is connected to unused gate inputs to prevent them from floating.
A flipflop is implemented in four cells. The four input pins are all connected to the D input of the flipflop, allowing the compiler to connect to any of the input pins to reach the D input. Two of the four output pins are driven by the Q output of the flipflop, the other two are driven by the −Q output.
Primary inputs and outputs are implemented with a pair of cells that together handle one input and one output signal. An input signal is brought in from circuitry external to the cell array and driven out in true and complemented form on the two output pins. An output signal is driven through a nanowire to one of the two input pins and from there delivered to the outside world. These I/O cell pairs occupy the outer cell edges of the cell array.
Logic gates, buffers and flipflops are collected together into a rectangular region known as a 'hypercell,' a structure analogous to a configurable logic block (CLB) in a conventional FPGA. An FPNI chip consists of a rectangular array of identical hypercells, surrounded by a periphery of I/O cell pairs. An example is shown in figure 4.
Configuration
The junction configuration uses the same scheme proposed for use in CMOL [20] . A junction is electrically configured by driving appropriate voltages onto the two nanowires that define it. Configuration lines in the CMOS chip (figure 5), running through each of the cells, provide this capability. During the configuration of a junction, the buffers, gates and flipflops in the cells are disabled; decoders along the edge of the cell array each drive a single configuration line with a programming voltage while grounding the remaining configuration lines. Driving appropriate voltages through the decoders causes two transistors, typically in different cells, to drive two different voltages onto a selected 'output' nanowire and a selected 'input' nanowire. If the two nanowires share a junction, the voltage drop across it can be used to configure the junction's state. For example, a positive voltage drop across an antifuse junction might drive it into a low-impedance state, while a negative voltage drop might return it to a high-impedance state.
Once a circuit has been configured, the configuration lines are driven to turn off the configuration transistors in each cell, and the buffers, gates and flipflops are then enabled to allow programmed circuit operation.
Fabrication
Since the nanoscale electronics will almost by definition be too small to fabricate by any existing generation of photolithography, the likely approach for fabricating the nanoelectronics will be imprint lithography with alignment capabilities at the scale of (or better than if available, but not necessarily) the alignment required for the CMOS circuitry, in order to achieve registry between the pads and the nanowire connections. It will be possible to begin with a planarized surface on which the pads for both levels of the nanoelectronics are at the same height. A representative process flow is as follows (figure 6).
(1) The first layer of connectors and wires are defined by nanoimprint [11] , with the pads of the bottom nanowires aligned over one set of pins on the substrate. (2) The entire surface of the chip is coated with whatever switching layer or layers are necessary, followed by a thin blanket of a protective material such as Ti [2] . (3) Using standard lithography, a mask layer is deposited with openings over the second set of substrate pins, the materials covering these pins are etched away to expose the pins, and the mask layer is removed. (4) The second layer of nanowires can be defined and deposited in a manner similar to the first, with the pads aligned over the exposed pins. (5) An etching process is used to remove all the switching and protective materials that are not directly under the upper nanowires, which electrically isolates parallel sets of nanowires, e.g. the formation of the switching junctions is a self-aligned process.
This process flow will require that the upper nanowires 'climb over' the lower nanowires that they cross, as shown in figure 3 . While this may lead to breaking of the top nanowires as the wire width shrinks, it has not been a problem for crossbars with 65 nm or greater half-pitch [2, 11] , and we have developed a strategy to mitigate this issue for wires narrower than 65 nm. Afterwards, a layer of dielectric material can be deposited over the nanowire layer, which is then planarized to prepare the system for subsequent processing steps. Alternately, a planarization process can be inserted between steps 1 and 2 above, if this is considered to be important and appropriate.
Electrical model
Calculating the performance and dynamic power of an application compiled onto an FPNI chip requires an electrical model of the nanowires, junctions and CMOS components. For nanowires we need to know the capacitance and resistance per unit length, the closed-junction resistance, and must take into account the geometry of the wires and their interconnections through closed junctions. For the CMOS we need to know the intrinsic gate delay. Nanowire capacitance per unit length is difficult to estimate because of the non-regular nanowire crossbar structure in the FPNI along with the fact that the top layer nanowires created by nanoimprint are not parallelepipeds, but undulate as they cross over the spaces between nanowires in the layer beneath ( figure 3 ). We begin with Strukov's model for a regular, parallelepiped nanowire crossbar [20] , where nanowires within a layer are separated by a distance equal to their width. Given a 3 nm thick switching layer separating the two nanowire layers, a nanowire width of 15 nm, and an insulator between and around all nanowires with a dielectric constant of 3.9 (that of SiO 2 ), that model predicts a capacitance per length of approximately 2.8 pF cm −1 . FPNI crossbars are somewhat sparse, though (see figures 2 and 8), reducing the capacitance between layers. In addition, switching layers that we anticipate using to separate nanowire layers typically 
have a dielectric constant of about 2.5, reducing the interlayer capacitance still further. A passivation layer covering the top nanowire layer need not be SiO 2 , allowing us to choose a material that protects the nanowire layers while having a somewhat lower dielectric constant-perhaps 3.5 or so. Nanowire pads add additional capacitance, but since their area is quite small compared to that of the nanowires, we neglect their contribution. From these considerations we have estimated the nanowire capacitance at 2.0 pF cm −1 . Nanowire resistance per unit length depends upon the effective resistivity of the nanowire material. We assume here nanowires of copper (the metal specified in the ITRS roadmap), which allows us to estimate the resistivity for wires down to 15 nm by interpolating the ITRS projections. For example, Cu wires with a line width of 15 nm are projected to have an effective resistivity of approximately 8 μ cm, so a square Cu nanowire, 15 nm on a side, would have a resistance of about 355 μm −1 . Nanowire resistivity, ρ, is difficult to model for very small (<10 nm) wires. Strukov [22] used a common approximation
with p (the fraction of electrons scattered specularly at the surface) assumed to be 0.67, λ (the mean free path) equal to 40 nm, ρ 0 (the bulk resistivity) equal to 2 μ cm, and d set to the nanowire width. However, this model is known to underestimate the effective resistivity for small wires [21] and assumes negligible increased resistivity due to scattering at grain boundaries (which is possible for very large grain sizes). For our study we adopted a more conservative model, using
Matthiessen's rule to combine the above surface scattering model with the Mayadas-Shatzkes grain boundary scattering model [21] , assuming an average grain size equal to the nanowire width (which might require annealing to achieve). We then fitted the resulting model to the ITRS resistivity model, finding a reasonable fit for p = 0.6 and a grain boundary reflectivity coefficient of 0.43. Extrapolation yielded an estimated resistivity of about 24 μ cm for 4.5 nm Cu nanowires; the uncertainty of this estimate, though, is quite high. Closed-junction resistance depends on the materials used to build the nanowire crossbar, but we have experimentally observed as a rule of thumb that it is difficult to configure a closed junction to a resistance less than the sum of the resistances of the nanowires from their junction to their respective drivers.
Our baseline experiments assume a nanowire width of 15 nm and a maximum nanowire length of about 7.11 μm, yielding a maximum nanowire resistance from pad to tip of about 2.5 k . Based on this and experimental work [15] we have chosen a value of 24 k as a reasonably conservative estimate of obtainable closed-junction resistance for 30 nm pitch nanowires; smaller values have been observed repeatedly in experimental investigations of the closed-switch state for switchable junctions [15] . For higher-resistance, 4.5 nm nanowires, the closed-junction resistance cannot be much less than 120 k .
CMOS gate delay was estimated to be 10 ps by noting the projected n-FET switching time of 0.39 ps for the year 2010 from the ITRS roadmap [10] . Our analysis is not sensitive to this value, though, since circuit timing is strongly dominated by the RC delays of the nanowires. Evaluating the speed of a circuit mapped onto an FPNI fabric requires a detailed calculation of delay through every component and interconnect, and then extracting the critical timing path by searching for the longest delay through a chain of wires from any flipflop output (or primary input) to any flipflop input (or primary output). Ideally one would use a program like SPICE [23] to do this analysis, but this is practical only for small circuits. Instead we use the simpler Elmore delay model [7, 4, 16] to estimate the delay for each path through nanowires and junctions (see figure 7) .
Dynamic power analysis tallies the number of nanowires allocated by our compiler to implement a circuit onto an FPNI target, and computes the dynamic power required to charge and discharge the capacitance of those wires using the formula [5] Dynamic power = 1 2
where A is the average 'activity' of a signal, N is the number of allocated nanowires, C is the capacitance of a single nanowire, V dd is the supply voltage used by the CMOS, and f is the maximum clock frequency determined by timing analysis. We have chosen an activity of 0.1, following Davis [5] , and V dd of 1.0 V from the ITRS roadmap for the year 2010 [10] . Note that we have not computed the static power dissipation in the CMOS gates and flipflops nor the dynamic power needed to drive the CMOS clock tree. 
Experiments
Since there is no simple way of analytically comparing the FPNI architecture with CMOL and conventional CMOS FPGAs, we use the time-honoured tradition of modelling and simulation using standard benchmarks. We chose 17 benchmark circuits 1 from the 'FPGA place-and-route challenge' suite [8] .
We created two different FPNI architectural models for this study: a conservative model, FPNI 30 nm, that we believe is technologically viable by 2010 for both nanowires and CMOS; and an aggressive model, FPNI 9 nm, that uses the same CMOS and nanowire technology assumptions used by CMOL [22] (most likely aimed at the year 2020) so that we may compare FPNI with CMOL. The CMOS parameters were based on values from the ITRS roadmap for the year 2010 [10] and on discussions with CMOS fabrication engineers. The parameters for both architectures are shown in table 2.
Our initial set of experiments was designed to establish (1) the optimal number of NAND/AND inputs; and (2) the optimal hypercell composition (proportion of gates, buffers and flipflops). To do this we created an ensemble of FPNI chip models meeting the FPNI 30 nm architectural parameters of table 2, but that varied in the number of NAND/AND gate inputs and hypercell composition. We then compiled each of the circuits onto each chip of the ensemble, sizing the FPNI chip in each case to be the smallest hypercell array capable of containing that circuit's logic (although not necessarily its routing). From the results of those experiments, we then fixed our optimal gate size and hypercell composition and continued compiling the benchmarks onto an ensemble to explore
• the performance (power, clock speed, area); and • the defect tolerance (stuck-open junctions and broken nanowires).
The left side of figure 8 shows a close-up of the nanowires and pins derived from the FPNI 30 nm parameters. The right side shows the result of compiling a small circuit onto a very small FPNI fabric-for clarity, only the active electronic connections are shown. The compiler is described in the appendix.
Logic gate and hypercell
Our first set of experiments explored varying the number of inputs to NAND/AND gates from two to five and varying the relative proportions of gates, buffers and flipflops within a hypercell. The goal was to determine the parameters that would minimize the area while maximizing the clock speed. Since we wish to compare our results with conventional FPGAs, which often use a four-input look-up table (LUT) and flipflop as their basic logic element, we needed to determine the computational power of n-input NAND/AND gates relative to four-input LUTs. Technology mapping experiments showed that this relationship is strongly circuit dependent, but suggested an approximation. Our hypercells were thus constructed with a single flipflop combined with either five two-input gates, four three-input gates, three four-input gates or three five-input gates. This left only the number of buffers in a hypercell to be determined.
For each value of n from 2 to 5, we constructed a set of hypercells with the number of n-input gates and flipflops just described, but with varying numbers of buffers that kept the hypercell compactly rectangular. Every benchmark circuit was compiled onto FPNI chips composed of the smallest possible grid for a given hypercell. For each n, we selected the smallest hypercell (the one with the fewest buffers) that all 17 benchmarks could be compiled onto. To compare the results, we summed up the total area and total critical path for all circuits onto chips made of that hypercell. Because our placer and router are nondeterministic, we did this 25 times and averaged the results, shown in table 3. We were somewhat surprised that neither the area nor the critical path delay was strongly sensitive to the type of logic gate used. But the three-input NAND/AND proved to be the best on both counts, and the smallest three-input hypercell that worked for all circuits was 6 × 7 cells (shown in figure 4 ). That hypercell, containing four three-input gates, one flipflop and 26 buffers, was used for the remainder of our experiments.
Performance (power, speed, area)
To determine the FPNI performance, we compiled each circuit 25 times onto each of the two architectures, FPNI 30 nm and FPNI 9 nm (table 2) and averaged the results for each circuit. The dynamic power calculations assumed V dd = 1.0 V, activity = 0.1 and circuits clocked at the maximum rate (equal to 1/critical path delay). The results (table 4) show that FPNI 30 nm requires only about one eighth of the area of a CMOS FPGA (with the same 45 nm node semiconductor technology) while running about 22% slower. CMOL 9 nm is far smaller than FPNI 30 nm, but because of our more conservative resistance model, it is not possible to directly compare the CMOL and FPNI performance. The FPNI 9 nm architecture is only about 4% the size of the CMOS FPGA (though still three times larger than CMOL), but is much slower. The rapidly increasing resistance of shrinking nanowires overwhelms the reduction in total circuit capacitance from the corresponding shortening of nanowires, reducing the clock rate. Note that the reduced power dissipation of the 9 nm architecture is due both to reduced capacitance (due to shorter nanowires) and reduced clock rate. If we normalize the clock rate, the 9 nm architecture dissipates only 57% as much dynamic power per cycle as the 30 nm architecture.
Defect tolerance
Nanowire antifuse crossbars are typically fabricated with all junctions initially in the 'open' or high-impedance state. The most common defect expected in such crossbars is the 'stuck-open' switch-a high-impedance junction that cannot be configured to a low-impedance state. To study how those defects impact the yield and critical path timing, we compiled each of the 17 benchmarks onto 'FPNI 30 nm' chips, varying the 'stuck-open' junction probability from 0, 0.1, 0.2, . . . , 0.9. To collect sufficient statistics, the compilation was done 100 times for each (circuit, defect rate) pair, for a total of 17 000 compilations. The results are summarized in table 5. Defect rates of 50% have almost no impact on the yield (99.7%) or critical path timing (an average increase of less than 3%). Even defect rates of 80% have respectable yield (88.5%) and degradation in critical path delay 2 (5%). We expect broken nanowires to be fairly common. To study their impact, we repeated the previous experiments on the 17 benchmarks with the 'stuck-open' defect rate fixed at 0.2, and varied the broken nanowire defect rate (0.0, 0.1, 0.2, . . . , 0.9). Each (circuit, broken nanowire rate) pair was compiled 100 times, for a total of 17 000 compiles, and the results averaged. For this experiment we defined a nanowire to be a single nanowire 'arm' connected to the 'pad' over the pin. Nanowire arms were selected at random with the given rate to be defective, and were broken at a random position along their length with uniform distribution. The experimental results (table 6) show a yield of about 75% when 20% of the nanowires are broken and 20% of the junctions are stuck open. The critical path delay appears to decrease as the nanowire breakage increases-one might expect this because of reduced nanowire capacitance, but we believe in this case it is merely uncovering instabilities in our routing algorithm. Note that the yield shown is pessimistic because our placer is not defect aware; a single gate placed such that one of its outputs has two broken (and extremely short) nanowires extending from its pad would cause the entire compilation to fail.
Discussion
FPNI architectures offer a path for continued shrinking of field programmable logic arrays. Simulation shows that the approach is extremely tolerant of the high defect rates likely to be found in nanoscale structures, and that clock rates need not be sacrificed. The eight-fold density increase for FPNI 30 nm compared to a CMOS-only FPGA for the ITRS 45 nm node is equivalent to leaping ahead on the ITRS roadmap by three generations, or nine years to the ITRS 16 nm node (i.e. 2019). According to our simulation results, FPNI can simultaneously improve three performance issues with respect to CMOSonly FPGAs: circuit density, power, and defect tolerance, without requiring improvements in the transistors themselves. In addition, we used fairly conservative estimates for wire and switch resistances-our estimates for the critical path delays would improve significantly if we assumed larger grain sizes in the wires and best-case experimental measurements for ONstate switches.
Variations in nanowire and junction electrical properties will present challenges in modelling ultimate device performance-it is likely that the power and clock rate will be need to be determined empirically for each chip. Device ageing will also need to be addressed: we do not yet know, for example, for how long a configured junction will maintain its state [3] . Perhaps an FPNI chip's configuration would need to be periodically 'refreshed' to continue correct operation.
Compilation presents economic challenges since a manufacturer cannot afford to expend hours of computation on each chip in order to find all defects and then place and route around them. We believe that the extremely high switch redundancy in FPNI, shown in the defect-tolerance experiments, can be exploited to make compilation viable in a reasonable amount of time (in, say, less than 1 min per chip). One approach might involve doing a global place and route on a generic model of a target FPNI chip, being careful to spread the placement out more sparsely than would be needed for a defect-free chip, and then perturbing that mapping as needed during the incremental configuration of the circuit. Algorithms that combine defect characterization with compilation might be most appropriate.
Scaling down both nano and CMOS fabrication dimensions causes dynamic power dissipation to scale down as well, primarily due to reduced wire capacitance from shorter wires. Unfortunately, static power dissipation in CMOS increases as feature sizes decrease, and nanowire resistance increases rapidly as cross-sectional area shrinks, causing RC delays to reduce performance. Circuit designers at the nanoscale will be forced to make trade-offs between clock rate, area, power, and fabrication cost.
Appendix. FPNI compiler
The FPNI compiler maps logic circuits onto FPNI chips. It takes two files as inputs-a circuit file and an FPNI chip description file-and produces a mapping of that circuit onto the chip. The flow through the compiler is linear.
(1) Model building reads a file containing a set of architectural parameters describing an FPNI chip (table 1) and builds models of the chip (e.g., logic gates, nanowire geometry, nanowire electrical properties, chip area) used by the remaining compiler passes. analyses delay through a successfully placed and routed circuit to determine the maximum clock speed for the circuit. (7) Power analysis estimates the dynamic power required to drive the allocated nanowires at the maximum clock speed.
Technology mapping begins by replacing each NAND gate with a NAND/AND gate with the same number of inputs, and replacing each primary input with a bipolar primary input that has both true and complemented outputs. Inverters are removed by replacing the output signal driven by the inverter with the inverted form of the inverter's input signal from the appropriate NAND/AND gate or bipolar primary input. Flipflops and primary outputs are left untouched. Clustering implements Singh's greedy algorithm [17] with a Rent exponent of 0.667. One logic cluster corresponds to a single logic hypercell, so we add additional code to ensure that the number of gates and flipflops packed into a cluster does not exceed the number available in the hypercell. One I/O cluster is allocated for each primary input and each primary output.
Placement uses the simulated annealing algorithm described by Betz [1] . I/O clusters may only be placed on the periphery of the chip in I/O hypercells, and logic clusters may only be placed in the interior logic hypercells. Only one logic cluster may be placed in a logic hypercell, but multiple I/O clusters may be placed in an I/O hypercell if there are sufficient resources to support it (generally the case).
The router implements the timing-driven, directed-search maze algorithm described by Betz [1] . The algorithm requires multiple iterations, with each iteration consisting of routing all nets, recording wire congestion resulting from the attempt, and ripping up the routings. The iterations continue until all nets are successfully routed without overusing any routing resource, up to a maximum of 50 iterations, at which point we declare a circuit to be unrouteable. Wire delays are estimated during routing using a simple, linear-delay model; a more sophisticated delay model (such as the Elmore delay of figure 7 ) would probably lead to faster critical paths, but at the cost of significant additional computational overhead during routing.
Timing analysis is done at the end of each routing iteration. The Elmore delay is evaluated for every allocated wire and switch, and the critical path is extracted by searching for the longest delay through a chain of wires from any flipflop output (or primary input) to any flipflop input (or primary output). This information is used by the router in the subsequent iteration, helping it to preferentially allocate shorter paths to signals along the critical path.
Power analysis tallies the number of nanowires allocated by the router and computes the dynamic power required to charge and discharge the capacitance of those wires using equation (2) .
