Recent developments suggest both plausible fabrication techniques and viable architectures for building sublithographic Programmable Logic Arrays using molecular-scale wires and switches. Designs at this scale will see much higher defect rates than in conventional lithography. However; these defects need not be an impediment to programmable logic design as this scale. We introduce a strategy for toleratirig defective crosspoints and develop a linearrime, greedy algorithm for mapping PLA logic around crosspoint defects. We note that P-term famri must be bounded to guarantee lokc overhead mapping and develop analytical guidelines for bounding fanin. We further quantib analytical and empirical mapping overhead rates. Includingfanin bounding, our greedy mapping algorithm maps a large set of benchmark designs with 13% average overheadfor random junction defect rates as high as 20%.
Introduction
Recent work shows how to build nanoscale Programmable Logic Arrays (nanoPLAs) using the bottom-up synthesis techniques being developed by physical chemists [ I ] [2] [ 3 ] . With these bottomup techniques, it is possible to build features (e.g. wires and programmable junctions) without relying on lithography. As such. these techniques provide a path to continue the advance of field-programmable technology beyond the end of the traditional, lithographic roadmap (e.g. 141).
Nonetheless, nanoscale features, both in the sublithographic and lithographic arenas, come with a new set of challenges. Notably, as devices become smaller, they are constructed from fewer and fewer atoms and molecules. Since individual atoms behave statistically, this means we have higher variance in the shape and makeup of our devices, and a higher likelihood that devices are simply unusable. Designs AndrC DeHon Department of Computer Science Califomia Institute of Technology Pasadena, CA 91125 andre @cs.caltech.edu at this scale must be defect tolerant. This. and other aspects of sublithographic assembly techniques, suggest that all devices we build at these scales will be reconfigurable.
Hewlett-Packard has recently demonstrated an 8 x 8 crossbar using molecular switches at the crosspoints 151. In the HP crossbar, they observed that 85% of the crosspoint junctions were programmable (15% were defective). The HP crossbar is an early laboratory prototype, and we expect these defect rates to decrease. Nonetheless, we are unlikely to achieve 100% crosspoint yield at this scale using these kinds of bottom-up, statistical fabrication techniques. If defects are randomly distributed, at a 15% crosspoint defect rate, essentially every row and column in a lOOx 100 crosspoint array will contain a defective junction.
With the techniques in this paper, we show that nanoPLA arrays with a 20% crosspoint defect rate are still usable with modest (9% including fanin bounding) overhead. That is. despite the fact that no rows or columns are free of defective junctions, we can still make use of more than 90% of the nanowires. Snider et al. have also looked at defect tolerant mapping using a similar defect model and shown that a 4-bit microprocessor can tolerate defect rates up to 20% [6] .
This defect mapping must be applied on a perarray basis. That is. each nanoPLA will have a unique defect pattern. Since nanoPLAs are a few microns tall and 10-20 microns wide [2] , we can easily have millions of these nanoPLAs on a modest die. Consequently, it is important that we minimize the time required to map around defects. To this end. we introduce a linear-time, greedy mapping algorithm for assigning logical P-terms to physical nanowires avoiding defective junctions in a fabricated nanoPLA. In the next section, we review the emerging, bottomup fabrication techniques for nanowires and crosspoints and the architectural building blocks for restoration and nanoscale addressing. We then review the nanoPLA architecture (Section 3). In Section 4, we introduce our defect model. Section 5 formulates the problem and introduce the basic idea for the solution. Section 6 reviews exact algorithms to solve the identified mapping problem and develops our lineartime heuristic algorithms. In Section 7, we analyze the algorithms based on expected case behavior and derive bounds for input fanin (Section 8). Section 9 provides experimental results which ground and confirm the analysis.
driven by array size and defect rate ping overhead for our proposed algorithm
Substrate
Nanowire We can grow nanowires to controlled dimensions on the nanometer scale using seed catalysts to define their diameter. Nanowires with diameters down to 3nm have been demonstrated [7] . With suitable doping, conduction through nanowires can be controlled by an applied electrical field like FieldEffect Transistors [E] . Techniques have been demonstrated to align a set of nanowires into a single orientation, close pack them, and transfer them onto a surface. This step can be repeated and rotated by 90 degrees so that we get multiple layers of nanowires [9] .
Programmable Cmsspoints Over the past few years. many technologies have been demonstrated for molecular-scale memories. So far, they all seem to have: (I) resistance which changes significantly between ON and OFF states, (2) Restoring Crosspoint Programmable diode crosspoints in a crossbar array give us a programmable OR m a y (See Section 3 for more detail). Diodes alone do not give us cascadable logic. To achieve restoration, these programmable diode stages can be followed by dedicated. nonprogrammable restoring stages. The restoring stages can also provide selective inversion. DeHon and Wilson describe how to build a nanoscale nonprogrammable restoring stage using a stochastic assembly of nanowires with doping profiles [2] [ I I] .
Lithographic-scale Address Decoder The pitch of the nanowires can be much smaller than our lithographic patterning. We will be using the crosspoint programmability to configure logic functions into our nanoscale devices. In order to do this, we need a way to selectively place a defined voltage on a single row and column wire in order to set the state of the crosspoint. By constructing nanowires with doping profiles on their ends [ I I] [ 121, we can give each nanowire an address (See left end of Figure I (a) ). The dimensions of the address bit control regions can be set to the lithographic pitch so that a set of crossed, lithographic wires can be used to address a single nanowire. Detailed information of this addressing scheme can be found in [ 121 and [2] .
NanoPLA Architecture
NanoPLAs, like conventional PLAs, consist of two programmable NOR planes ( Figure I The logic array is the programmable part of each NOR plane. Its junctions are the bistable crosspoints described in Section 2. The logic array implements the OR function of its inputs. which is why the outputs of this array are called OR-terms. Each of the connected junctions behaves like a diode, and each OR-term is the wired OR logic of its inputs. The output of each OR-term is pulled down weakly. If any of the inputs is high, then it pulls up the OR-term output ( Figure I(b) ).
The two states of the logic array junctions are: I ) connected via a PN junction, 2) disconnected. If an input participates in an OR function, the junction of that input and the OR-term nanowire representing that function will be programmed "closed; thejunc- 
Defect Model
In this section we discuss possible defects in the nanoPLAs and the defect model used in this paper.
The two more probable defects that we focus on here are: I ) Defects in programmable crosspoints, 2) Defects in nanowires. The defective nanowires can be easily detected with the procedure suggested in [Z] .
The time required to test the nanowires of each aray is linear in the code space size of the stochastic address decoder. The defect models are broken nanowires, stuck-at-0 and stuck-at-I .
Defects in programmable crosspoints are due to the structure of the junctions, which is a sandwich of bistable molecules between two layers of nanowires. In each crosspoint lhere are only a few molecules. For example, nanowires of width 5nm having cross sectional area of 25nm2 can hold about 18 molecules [ 5 ] . (In [SI they havedifferent active area size. Therefore here the number of molecules is scaled accordingly.) The programmability of a crosspoint comes from the bistable attribute of the molecules located in the crosspoint area. If there are too few molecules at the crosspoint then the junction may never be able to be programmed "closed", or the "closcd" state may have higher resistance than the designed threshold chosen for correct operation and timing of the PLA.
We abstract this into a simple crosspoint defect model. Crosspoints will be in one of two states:
programmable -crosspoint can be programmed into both a "closed" state and an "open" state. e non-programmable -crosspoint cannot be programmed into an adequate "closed" state, but can be set into a suitable "open" state.
Crosspoints which cannot be programmed into a suitable "open" state will result in the entire horizontal and vertical nanowires being unusable. We treat these as nanowire defects rather than junction defects. Based on the physical model suggested above and discussion with physical scientists. we expect these defects which "short" horizontal and vertical nanowires to be much less likely and. consequently, believe it is reasonable to treat them as wire defects. Henceforth we assume that the input nanowires of logic arrays are previously assigned to the logical inputs; and order of the logical inputs is preserved. This assumption lets us use the same programming process for all [ 
Problem Statement

Challenge
Logic m a y s may contain defective junctions that cannot be programmed closed, as described in Section 4. An OR function can be assigned to a physical OR-term nanowire if and only if each of the ON-inputs of the OR function has a corresponding programmable junction on the physical OR-term nanowire.
If a logic array of a nanoPLA has defective junctions as mark@ in Figure 4 (a), then the OR function a+c+e cannot be assigned to nanowires w l or w2 because junctions (lol; e), (w2, e) and (w2, e ) are nonprogrammable, but it can be assigned to nanowires w3. w4, and w5. Although the nanowires w l and w2 cannot implement the OR function a + e + e they are still useful for some other OR functions such as b f d .
In spite of having defective junctions in a nanowire, some OR functions can be successfully mapped to that nanowire. The challenge is to find an assignment of the OR functions to the OR-term nanowires. Our key question is: How do we perform this assignment with a small number of spare nanowires arid it1 reasonable running time?
Idea
In each OR function there are always some O F F inputs, i.e. some of the junctions will always be left open. If there is a nanowire with defective junctions only at a subset of those positions. then this defective nanowire can be successfully assigned to the OR function.
Let F be the set of OR functions and PV be the set of physical OR-term nanowires. The problem is finding an assignment of OR functions to the nanowires. This problem can be formally stated as finding a bipartite matching from the set F to the set W . 
Definition of Bipartite Matching
Formal Problem Statement
G(F, T, V> E ) is a directed bipartite graph. Forevery
Every matching of size IF1 on this bipartite graph is a valid assignment of the OR functions to the ORterm nanowires, because it finds an assignment for all of the OR functions in F . Figure 4 (b) shows a bipartite graph G(F. T, T' > E ) . Set F is the set of OR functions in Figure 2 . and set W is the set of nanowires in the nanoPLA of Figure 4 (a). Figure 4( There are heuristic algorithms that, with high probability, and small time complexity find the maximum matching. A general heuristic algorithm is To make the set E, the condition (1) will be checked for each pair of ( f ; , wj). 
Exact Algorithm
For now assume that CV is large enough so that there exists a maximum matching of size IFI. Later, in Section 7, we calculate how large TY should be in practice.
There are a number of exact algorithms to solve the maximum bipartite matching problem, such as 
I F I .
IpiTl) computing operations.
The time complexity of all of these matching algorithm will be dominated by the time complexity of graph construction of Section 6.1. Therefore to shown in Figure 3(a) .
We distinguish the different heuristic algorithms by the way they choose the nodes in lines 2 and 4
of Figure 3(a) . One way is to choose both f and w randomly. Another way is to choose each of them in increasing order of node degree. A combination of the above is another option. We obtain our best results by choosing the least degree f from F and choosing w randomly.
Here we show how we can eliminate the need to actually build the graph G(F. W, E ) . There are two points in the algorithm that are dependent on graph G: 1) Choosing ft's based on their degrees G, 2) Line 5 of Figure 3 (a) that checks the matching condition by checking the existence of the edge ( f i ? w3).
To select OR-terms based on least degree, we would need to son F. Instead of sorting f;'s of F based on their degree, the nodes can be soned based on the expected value of their degree. Let PJ be the probability that a junction is programmable. and e; be the number of ON-inputs in the OR function f ; . The probability that ( f ; , w j ) E E is PJ"'. This is the probability that the OR function fz can he assigned to the nanowire wj. So the expected value of node degree off; is (PJ"' . IIVl). Ordering F based on the expecfed value of node degrees is the same as ordering it based on the value of e;. This means there is no need to build the graph for sorting purpose.
To test the condition of line 5 of Figure 3 (a), in the case that there is no graph, we need to program and test every single nanowire that is picked up to he assigned to each OR function f,. The time complexity of mapping and testing is O(c,) for each OR function fz. In order to have time complexity of O ( e t ) instead of 0 ( N ) the IZ.k's need to be stored efficiently (sparsely). Hence by paying this cost there is no longer a need to build the graph G ( F , L. V. E ) , and the total time complexity is only due to the algorithm of finding a matching (Figure 3(h) ).
Analysis
Running Time Complexity
We first compute the worst-case time complexity. As explained above, line 6 of the algorithm in -i) . ci)). It can be written as 0 (IF1 . 1l.L' . cb1) when ch1 is the maximum of ci's. In Section 8 we show how to bound the size of C A I , without scaling IF1 by more than a small constant factor. Sorting F in the first line of Figure 3(b)  takes O(lF1 log(lF1) ) computing operations. So assuming = 114' 1 = N, our greedy algorithm takes N2 program and test operations and O ( N l o g ( N ) ) computing operations, while the exact approach takes N 2 program and test and O(iV3) computing operations.
On average the number of iterations will be smaller than this. Let mi be the number of iterations that it takes to find a match for OR function fj. If we want the expected value of the matching for f i in m, nanowires to be 1 then: 
Area Overhead Estimation
Here we compute how large W should he in practice. In the average case as shown before, if the size of the unmatched set of nanowires when matching the ith OR function is at least P;"' then the expected value of finding a match in this set is 1. Therefore defines a lower bound on the size of W .
Remember that in our algorithm the set of nanowires that f z can choose from, is of size (IWI -i). Therefore the probability of successfully assigning f t to a nanowire is I -(1 -PJc*)"'~'+'
Hence the probability of successfully mapping all the Figure 5 (c) shows the average area overhead ratio over all the benchmark set designs. Bounding the fanin scales the number of OR functions by an average factor of 1.11 for a defect rate ( P J ) of 0.20. The additional average factor of 1.02 is incurred after physically mapping these OR functions onto nanowires. This brings the total average overhead factor to 1.13.
The area overhead of this greedy algorithm is compared with an exact matching algorithm for a 4 x 4 multiplier that is implemented in two logic planes.
The first plane has 697 OR functions and 33 inputs and the other one has 25 OR functions and 697 inputs. In Figure 7 (a) area overhead of each of the planes is plotted for both greedy and exact algorithm. In Figure 7 (h) the ratio of the total area of the exact algorithm over the total area of the greedy algorithm is plotted. This shows that our greedy algorithm is within a few percent of optimal on average for modest fault rates.
Summary
A plausible architecture for nanoPLA design is suggested in [ 2 ] . The defect rate of different fahri- 
Acknowledgments
