We have calculated the minimum chip area overhead, and hence the bit density reduction, that may be achieved by memory array reconfiguration (bad bit exclusion), combined with error correction code techniques, in prospective terabit-scale hybrid semiconductor/nanodevice memories, as a function of the nanodevice fabrication yield and the micro-to-nano pitch ratio. The results show that by using the best (but hardly practicable) reconfiguration and block size optimization, hybrid memories with a pitch ratio of 10 may overcome purely semiconductor memories in useful bit density if the fraction of bad nanodevices is below ∼15%, while in order to get an order-of-magnitude advantage in density, the number of bad devices has to be decreased to ∼2%. For the simple 'Repair Most' technique of bad bit exclusion, complemented with the Hamming-code error correction, these numbers are close to 2% and 0.1%, respectively. When applied to purely semiconductor memories, the same technique allows us to reduce the chip area 'swelling' to just 40% at as many as 0.1% of bad devices. We have also estimated the power and speed of the hybrid memories and have found that, at a reasonable choice of nanodevice resistance, both the additional power and speed loss due to the nanodevice subsystem may be negligible.
Introduction
The recent spectacular advances in molecular electronics, in particular the demonstration of single-molecule singleelectron transistors by several groups [1] [2] [3] [4] [5] , offer the hope for a practical introduction of hybrid semiconductor/nanodevice circuits, first of all for terabit-scale memory applications [6] [7] [8] . In such memories, nanodevices (e.g., single molecules) would be used as single-bit memory cells, while the semiconductor transistor subsystem would perform all the peripheral (input/output, coding/decoding, line driving, and sense amplification) functions that require relatively smaller number of devices (scaling as N 1/2 , where N is the memory size in bits).
The first experimental steps toward the implementation of the hybrid memories have already been made; see, for example, [9] [10] [11] .
The main architectural challenge faced by the hybrid memories is the anticipated substantial fraction of 'bad' nanodevices, limited by both the integration technique (e.g., the chemically-directed molecular self-assembly; see, for example, [12, 13] ), and the vulnerability of nanoscale devices to random charged defects [7] . The main approach to addressing this problem in semiconductor memory technology [14, 15] is reconfiguration, i.e. the replacement of memory array lines (rows or columns) containing bad cells by spare lines. The effectiveness of the replacement depends on how good its algorithm is [15, 16] . The Exhaustive Search approach (trying all possible combinations) finds the best repair solution, though it is not practicable because of the exponentially large execution time. A more acceptable choice is the 'Repair Most' method that allows a simple hardware implementation and an execution time scaling linearly with the number of bits. In this approach, the number of faults in each line of a memory block (matrix) is counted, and the lines having the largest number of defects are replaced with the spare lines.
For a larger fraction of bad bits, better results may be achieved [17] by combining the bad line exclusion with error-correction code (ECC) techniques. In semiconductor memories the ECC is reserved mainly for the suppression of soft rather than the hard errors, i.e., for insuring the memory fault tolerance rather than defect tolerance [15, 18] . However, for the prospective hybrid memories the defect tolerance is expected to be a much more acute problem, and ECC may need to be involved at the initial repair process as well. The technical analysis of the opportunity, carried out in the pioneering work [17] , has been limited to the exclusion of just a few spare lines.
The objective of our work is a study of the bad component exclusion techniques extended to an arbitrary number of redundant rows and columns of the matrix blocks, applied both with and without the error correction. We have calculated the chip real estate overhead necessary for the fabrication of spare rows, columns, and auxiliary circuits, as a function of the bad bit fraction q and the ratio R of critical dimensions of the semiconductor transistors and nanodevice components. The results for R = 1 are also presented, since we believe that they are important for the evaluation of scaling prospects of purely semiconductor memories. Figure 1 shows the assumed general structure of the hybrid memory. It is essentially a matrix of L memory blocks, while each block is in turn a rectangular array of (n + a) × (m + b) memory cells. Here a and b present redundant resources that are being used for bad bit replacement at the initial test and repair stage, so that the final number of used memory cells is L × n × m. 1 The good cells, addressed at each particular time step, form a row with one cell per block. They have the same external word and bit addresses in each block, though due to the internal re-routing during the initial reconfiguration process, the real physical location of the used cell may be different in each block.
Model

Memory structure
In contrast to the memory top structure (figure 1), the block architecture (figure 2) is substantially different from the traditional 'uniform' (semiconductor) memories. At each elementary operation, the block decoders address two vertical and two horizontal lines implemented in the CMOS layers of the integrated circuit, thus selecting a pair of 'relay' CMOS cells. The reason for such doubling is straightforward: the number of nanodevice memory cells may be much higher than the product of the number of CMOS-level wires going in each direction, and hence the usual procedure of selection of one cell (at the crosspoint of one horizontal and one vertical line) does not allow the user to address every nanodevice cell. Figure 2 shows how such addressing may be accomplished using the 'CMOL' circuit concept [7, 19, 20] . In this approach, just as in virtually all other hybrid circuit proposals [6, 8-10, 21, 22] , the nanodevices are formed at each crosspoint of two layers of a 'crossbar' array, consisting of two levels of parallel nanowires. Such parallel nanowire arrays may be fabricated by several advanced patterning technologies, such as nanoimprint [24] or interference lithography [25] , which may provide much better resolution (in future, down to a few nanometres) than the standard photolithography. These novel technologies cannot be used for patterning of arbitrary integrated circuits, in particular because they lack adequate layer alignment accuracy; however, the fabrication of a crossbar array does not require accurate alignment. Such an approach, of course, implies that nanodevice formation at nanowire crosspoints is also accomplished without lithographic patterning. (The chemically directed self-assembly of pre-synthesized molecular devices from solution [12, 13] is probably the most evident, but not the only possible, example of such a process.) The difference between the CMOL approach and the earlier suggestions [21, 23] (which seem hardly feasible from the fabrication point of view [20] ) is in the interfacing between the nanowires and the underlying CMOS-level wires: in CMOL it is provided by sharp-pointed pins 2 that are distributed all over the circuit area. Somewhat counter-intuitively, this approach allows individual access to each nanowire even if the half-pitch F nano of the nanowire crossbar is much less than that (F CMOS ) of semiconductor-level wiring. Figures 2(a) , (b) 2 The technology for fabrication of a-few-nm-sharp points has been already developed in the context of field-emission array applications; see, for example, [26] . CMOS of the CMOS relay cell. Since the cell structure should incorporate at least one transistor-see figure 3 and its discussion below-β may be substantially larger than unity.) For example, the selection of relay cells 1 and 2 (figure 2(b)) enables contacts to the nanowires leading to the left one of the two shown nanodevices. Now, if we keep selecting cell 1, and instead of cell 2 select cell 2 (using the next CMOS wiring row), we contact the nanowires going to the right nanodevice instead. It is easy to understand that this trick allows addressing each nanodevice via a pair of relay cells located at the Manhattan distance | x| + | y| < R + 1, where R ≡ 1/ sin α is just the ratio of the CMOS wiring pitch 2β F CMOS to the nanowiring pitch 2F nano .
The simplest and the most compact design of the relay cell may be based on just one pass transistor and one pull-down resistor ( figure 3(a) ). Another, complementary version of relay cell ( figure 3(b) ) may have a somewhat larger footprint, but ensures lower power consumption. In any version, the selected relay cell connects the CMOS data line to the corresponding nanowire. Figure 4 shows (schematically) the assumed I -V curve of the nanodevice that is typical, in particular, for the singleelectron latching switch [7, [27] [28] [29] 4 . In one of the states (binary '0') the device passes almost no current until the applied voltage V reaches a certain threshold value V 
With this assumption, the read and write operations are very similar to those in semiconductor memories [14] , especially floating-gate nonvolatile memories (NVM [31] ). For READ operation, one of the relay cells (e.g., cell 1 in figure 2 (a)) applies voltage V READ /2 to the corresponding word nanowire, while the complementary relay cell (cell 2 in figure 2(a) ) connects the selected device, via the bit nanowire, to the CMOS data line biased by voltage −V READ /2, and then to the input of a CMOS sense amplifier. As a result, the selected nanodevice becomes biased by voltage V READ > V C (figure 4), so that if this device is in the open state, it passes current I ≈ (V READ − V C )/R nano to the sense amplifier. Similarly, for WRITE 1 and WRITE 0 operations, each of the relay cells applies half of the necessary voltage (V WRITE 1 or V WRITE 0 ; see figure 4) to the corresponding nanowire, so that only one nanodevice, located at the crosspoint of these wires, is switched into the opposite state.
Defect model
The following assumptions have been made about the memory defects:
(i) Defective nanodevice cells are randomly distributed, with probability q, among the block matrices. (ii) The density of defective interface pins, nanowires, and CMOS components is negligibly small 5 . (iii) The bad memory cells correspond to 'stuck-on-open' defects and do not affect (i.e. neither shorten nor interrupt) the nanowires. This assumption seems reasonable at 5 Concerning the pins and nanowires, this assumption is made more plausible by the following fact. The hybrid memory structure (figure 2) has an inherent redundancy: nominally, each nanowire is contacted by several pins (spaced by distance
. This means that even if the nanowires have a limited number of breaks (with the average distance between them much larger than D) and/or a small fraction (well below F nano /β F CMOS = 1/R) of pins do not provide good contacts, these defects may be excluded from the circuit operation without any area overhead.
least for molecular self-assembly, where most defects are expected to result from a failure to have a molecular device assembled at a certain nanowire crosspoint.
Memory 'repair' (reconfiguration)
At the initial testing of each memory block, bad nanodevices are detected, and their physical location is used to compile a special table (additional memory) that maps the continuous external address space into the space of physical addresses of relay cells leading to good nanodevices. While the temporal latency of the mapping procedure can be overlapped with block decoding, the area of the mapping table, implemented in the CMOS subsystem, may be substantial and is included in our area calculations (see below).
ECC
The ECC circuitry is assumed to be basically the same as in traditional memories, where it usually takes no more than 10% of the chip area; see, for example, [15, 32] . For the most interesting cases such real estate overhead is insignificant, and we will ignore it. In contrast, the area occupied by redundant bits involved in the error correction may be substantial, and we are including it in our calculations (section 4).
Yield calculations
We have calculated the necessary redundant resources for two methods of bad bit exclusion: a simple 'Repair Most' approach and the absolutely best 'Exhaustive Search' method, both with and without the error-correction code enhancement. For that, we first calculate the full memory yield Y as a function of the bad bit probability q = 1− p and other parameters (L, m, n, a, b, R), and then determine the real estate overhead by requiring the yield to be equal to a realistic number.
Repair Most
In some versions of the Repair Most approach the direction of excluded lines is arbitrary. In this study we use a slightly simpler version of this method, in which lines are first excluded in one dimension (e.g., rows) and then in the other (columns). With our assumption of defect independence, the probability to have k bad bits in an initial row of length (m +b) obeys the binomial distribution:
For all reasonable parameters (see section 5 below), both (m + b) and 1/q are large in comparison with the average number d i = k = q(m + b) of defects in the initial row, so that we can use for P(k) the Poisson approximation:
Now, considering all rows in a block matrix, the statistical distribution of rows having k defects follows the same equation (1) . The exclusion of a rows with the largest number of defects of the initial (n + a) rows is equivalent to dropping a tail, with the area a/(n + a) underneath, of that distribution.
Finding the average number d r of bad bits per row after such an exclusion is easiest if the number is large. In this case, the distribution P(k) may be treated as a continuous one, so that
where x, the maximum number of defects per line after the row exclusion, can be found from the solution to the equation
In the case when d r is comparable with, or equal to, unity, it may be found numerically from the discrete version of equations (2):
where P r (x) is the part of the probability P(x) which reflects the number of retained rows with the threshold number (x) of bad bits. The integer x and the fractional number P r (x) are now defined by the normalization condition similar to equation (2b):
Finally, if x = 0, then in an average array all defective lines will be excluded. To account for deviations from this average, d i can be expressed as a sum
e j P(e j ), where e j = j/(n + a) plays the role of a local d i and
is its probability. Indeed, for a large number L of arrays with the same probability q of defective cells, L j = L P(e j ) arrays will have j defects, which translates to the local e j . In that case d r can be found as a sum of local d r (e j ) with the corresponding weights 6 :
After the row exclusion, the remaining defects are distributed randomly over the columns (no correlation between the defect's vertical and horizontal positions), and can be characterized by the new probability
From here, the probability of a column of height n to be perfect is (1−q r ) n , so that the probability to have an imperfect column is
The statistics of the number j of defective columns obeys the binomial distribution, and the probability y of finding not 6 Practical calculations of d r from equations (3), (4) are facilitated by the fact that the Poisson distribution obeys the formula
, where the s in the nominator and denominator are, respectively, the incomplete and complete gamma functions [33] . more than b defect-free columns which can be excluded at the second stage of repair (i.e. a completely good sub-block with the final size n × m) can be expressed through this distribution:
This formula gives the final (post-repair) yield of single blocks. The yield of the whole memory consisting of L blocks is simply Y = y L .
Exhaustive Search
The Exhaustive Search looks for the absolutely best way of excluding a spare lines and b spare rows. Though the practical hardware implementation of such a repair technique is formidable, an estimate of its potential results is important for the evaluation of quality of practicable algorithms (such as the Repair Most method). In this context, we may restrict our task to finding an upper bound Y max for the yield after the Exhaustive Search repair 7 . The total number t of ways to choose a subarray n × m from the initial (n + a) × (m + b) array, and the total number h of ways to put d defects into the initial array, are, respectively:
Consider the t ×h matrix W shown in figure 5 (a), with elements w i, j equal to unity if the subarray number i (ranging from 1 to t) is defect-free for the defect pattern number j (ranging from 1 to h), and equal to zero otherwise. The real yield y(d) after the Exhaustive Search repair of a block array with d defects may be expressed as
where
because W j shows whether a particular ( j th) configuration of d defects may be completely excluded by some choice of subarray n ×m. An upper bound y max (d) y(d) can be found by replacing W j with the larger or equal amount 
(11) Note that y max does not depend on the array geometry (i.e. the particular values of n, m and a, b) but is rather a function of its total area (n + a)(m + b) and the total number s of redundant cells. Figure 6 shows, with dotted lines, typical results for y max given by equation (11) (6)), while the dashed lines show yield y 0 = q mn without repair. One can see that the difference between the two repair methods is not too large, especially if the number of redundant lines is not too high (below, or of the order of the final memory size) 8 . The difference, however, becomes larger if we use the array reconfiguration together with the ECC technique; see below. 8 For example, it is easy to comprehend that at a = b = 1 the Exhaustive Search can eliminate at most only one additional defect. 
Hamming-code ECC alone
We have restricted our analysis to the most popular ECC technique based on SEC-DED (single error correction, double error detection) Hamming codes (c, k) [15, 17, 18, 32, 36, 37] , where c and k are the code and the data word lengths, respectively. (The number (c − k) of parity bits should satisfy the condition [37] , i.e. (c − k) 2 + log 2 c). The probability Q that the decoded word will not have an error is the sum of two probabilities: that the initial code has no errors, and that it has one error. Both probabilities follow the binomial distribution, therefore Q = (1 − q) c + cq(1 − q) c−1 . Since the memory consisting of L blocks, with n ×m bits each, can accommodate Lnm/c Hamming codes, its effective yield is
Hamming-code ECC on top of the Repair Most exclusion
Since the Repair Most exclusion is performed for each block individually, the defect distribution over c cells used to keep each Hamming code remains random, and we can use the formula similar to equation (12)
where q f is the probability of a single cell to be bad after the whole exclusion (of rows and columns) has been completed. This probability can be calculated from q r exactly as q r itself has been calculated from the initial probability q, i.e. using equations (1)- (5) with the substitutions
reflecting the swapping of rows and columns at the second stage of the repair process.
Exhaustive Search combined with Hamming-code EEC
The Hamming-code error correction could be also performed 'on top of' (after) the Exhaustive Search repair. Since such a repair is hardly practicable, and we are mostly interested in it as the upper bound for the yield, it is more logical to consider an even more perfect strategy in which the bad bit exclusion and error correction are intertwined. In this approach, each possible combination of n × m subarrays, with arbitrary renumbered rows and columns, in each of c blocks used by the Hamming code, is evaluated for the best error correction outcome, and the best found combination is kept in the mapping table 9 . The number of ways to exclude a rows and b columns from each of c blocks is simply . Therefore, the full number t of ways to choose a set of c subarrays with renumbered n rows and m columns (i.e. a certain combination of code words) is
where the subtraction of 1 from c in the last term cancels the counting of cases when all the bits in the Hamming code words are simply exchanged. Let us introduce matrix W similar to that described in section 3.2 above, although now it describes all c blocks, i.e. t is now given by equation (15), while
The upper bound y max (d) is given by the first part of equation (10); however, the value of h j =1 w i, j has to be recalculated.
Let l d defects be located in the selected set of subarrays, of the total size nmc. In order to be fixed with a Hamming code, only one defect is allowed in a code word; such a defect can be located in any of c bit positions. Hence, for l defects in l words (with one defect per word), the number of ECC-correctable bit patterns is c l . Now, accounting for all possible positions of l code words inside the set of subarrays (with the total number of code words nm) the number of ECCcorrectable patterns is nm l c l . Note that there is a total of nmc l defect patterns. Since for all of these correctable patterns the remaining (d − l) defects outside the set of subarrays can be in any bit positions, the total number of perfect or ECCcorrectable patterns is 
Since this sum does not depend on index i (just as in equation (10) above),
and, similarly to equation (11), the upper bound for yield of the whole memory is
Chip area assumptions
We have calculated the hybrid memory chip area necessary to reach a fixed chip yield Y , using the following assumptions:
(i) The total chip area is the sum
while its useful bit capacity is
The factor (k/c) in the last equation implies that the cells keeping the ECC parity bits are considered (together with the cells of excluded rows and columns) as an overhead rather than the useful capacity. (ii) Memory blocks are symmetric (n = m and a = b).
Such simple organization is preferable from the system architecture standpoint, while giving nearly the best density results. (iii) The block array area is
where the second term is the total area of 2(n+a) relay cells and pins necessary for contacting (n + a) word and (n + a) bit nanowires. (As figure 2(b) shows, (2β F CMOS ) 2 gives the area of two such cells.) Note that A array is determined by A cells if (n + a) > R 2 (where R ≡ β F CMOS /F nano ); in this case the CMOL interface is (n+a)/R 2 -fold redundant; see footnote 3. In the opposite case, (n + a) < R 2 , the array area is determined by that of the interface that allows one to address only a fraction of nanodevices formed on that area 10 . (iv) For the CMOS relay cell area, we have used an estimate of 5F 2 CMOS , resulting in β = √ 5/2 ≈ 1.6. We believe this is a fair estimate for the cells shown in figure 3 (a) implemented in a style similar to the usual NAND flash memory cell [31, 38] . If this estimate seems too optimistic, note that it does not affect the results in the most realistic case (n + a) > R 2 ; see equation (23 
where the number of bits in the table is given by
This formula reflects the fact that the memory should keep physical addresses of the CMOS relay cells serving both word-line and bit-line nanowires (figure 2), each with n entries. (vi) Each block needs four cell address decoders ( figure 1) for the selection of two relay cells. For keeping open the option of nanowire and pin repair (see footnote 3), each decoder should have the output width equal that of the array, i.e., (n + a)/R wires. Assuming that the circuit is implemented as the tree decoder [38] , it requires approximately 2(n + a)/R transistors, and the pitch 2β F CMOS of the CMOS nanowiring ( figure 2(b) ) should be sufficient to house the decoder output pitch. Another linear size ('depth') of the tree decoder with complementary wires may be estimated as
where the last term reflects the contribution from line drivers and sense amplifiers. This term may be estimated by assuming that the area d drive/sense × 2β F CMOS is that of a four-transistor logic gate (e.g., two inverters). The latter area is always close to 320 (F CMOS ) 2 [39] . Now, since the corner area between the adjacent cell decoders ( figure 1) can hardly be used effectively, three contributions to equation (21) 
(vii) Finally, the application of the decoder area estimates given above to block decoders and to mapping table decoders shows that their contribution to equation (21) is negligible for all examples we have studied.
Results
Requiring that the total yield Y , calculated as discussed in section 3, is fixed at a certain level, we can use the formulae of section 4 to calculate the chip area A necessary to achieve a certain useful bit capacity N , and hence the area per useful bit, A/N . The last number, normalized to the CMOS half-pitch area,
is a very convenient figure of merit that depends only on the ratio F CMOS /F nano rather than on the absolute parameters of the fabrication technology. Another parameter affecting a is the size n of a single memory block (or, in other words, the number of blocks L at a fixed useful memory capacity N ; see equation (22)). Figure 7 shows, by the top orange line, this dependence for a typical case (F CMOS /F nano = 10, bad bit fraction q = 3 × 10 −3 ). 11 The plot shows that as the block size n is increased, some overheads are reduced: most importantly, the cell address decoder area that is logarithmic in (n + a); see equation (26) . On the other hand, the redundancy overhead starts to grow quickly, since the probability to have a completely good bit row or column without repair drops exponentially, as q (n+a) . As a result, there always exists an optimum block size (and hence an optimum number L of blocks at fixed N ) for which the useful bit density is minimal. For example, in the case shown in figure 7 , the optimal n is 512, and at N = 4 Tb the optimal L is close to 16 millions. Figure 8 shows similar curves (for the total cell area only) for two realistic values of F CMOS /F nano (10 and 3.3) and for several values of the single bit yield, while figure 9 shows the results optimized over the block size, as functions of bad bit fraction, for c = 137, k = 128. 12 The results show that hybrid memories can indeed be denser than purely semiconductor memories, but for that the single-bit yield has to exceed a certain minimum value. For example, in the case F CMOS /F nano = 10, the hybrid memories can overcome a perfect CMOS memory only if the fraction of bad bits is below ∼15%, even using the Exhaustive Search algorithm of bad bits exclusion, which may require an impracticably long time. For 11 The dependence of a on the memory capacity N, is relatively weak. For our illustrations (figures 7-11) we have selected the values of N that would allow fitting a perfect memory on a 1 cm 2 chip. The dependence on the final yield Y (within reasonable limits, e.g., between 10 and 90%) is also weak; see figures 6 and 10, and the numbers at the top horizontal axis of figure 7. 12 Our calculations using various Hamming codes with k 128 for the Repair Most algorithm have shown that (137, 128) is the best code in terms of redundancy usage. For example, the shorter code (12, 8) provides better error correction, but it has to be replicated 16 times to cover 128 data bits. This is why, from the real estate standpoint, it is beneficial to use a longer code and allocate the saved real estate for extra spare lines of memory blocks. Figure 7 . A typical dependence of the total chip area a per useful bit (top curve), and its major components, on the block size, providing the final memory yield Y of 90%. The horizontal line (square points) shows the 'ideal' case when the chip area
Most of the shown components of the total area are specified by equation (21) . The redundancy overhead is defined as
2 , while the interface overhead as
is the total number of memory cells. The numbers at the top horizontal axis show the number a of excluded lines necessary to achieve the 90% yield (in parentheses, the 10% yield).
the simple and fast Repair Most algorithm, the bad bit fraction should be reduced to ∼2%. If one wants to obtain an orderof-magnitude density advantage from the transfer to hybrid memories (such a goal seems natural for the introduction of a novel technology), the numbers given above should be reduced to approximately 2% and 0.1%, respectively.
Uniform (semiconductor) memory
Since the bit yield requirements for the hybrid memories, cited in the previous section, do not look overly optimistic, it makes sense to carry out a similar estimate for a purely semiconductor memory with the traditional architecture (several code words stored in a single row of a memory block).
We assume the same defect model (section 2.2) and the same reconfiguration algorithms (section 3) as for the hybrid memory. For the Repair Most algorithm augmented with Hamming-code error correction, repeating the argumentation of section 3.4, we get
so that the total yield Y = y L is again expressed by equation (13) .
For the ECC-enhanced Exhaustive Search, we repeat the calculation steps leading to equation (19) , adjusted for the present case (each code word is located completely in one row). Since all permutations of subarrays and defect patterns are taken within one block, values of t and h are the same as in equation (8) . Similarly to section 3.5, the total number of correctable defect patterns for a given subarray is derived by first finding the number of correctable defect patterns provided that the array of the size n × m has exactly Useful linear size n of the block l defects (equation (17)) and then summing over the range of l (equation (18)), i.e.
The upper bound on the yield of a whole memory is then given by Figure 10 shows the defect tolerance potentials of these two techniques, calculated from equations (13) and (31), respectively, for the arrays with m = nc/k (giving n 2 useful bits in array) and a = b. Comparing these results to those shown in figure 6 , one can clearly see the advantage of the array reconfiguration and Hamming-code error-correction synergy: the total yield is substantially better than in the case when these techniques are applied separately. Proceeding to the memory area evaluation, we first of all note that each array requires only one row and one column decoder. Next, the uniform memory does not necessarily need a mapping table, since simple nanofuse-nanoshort circuitry [40, 41] enables a simple bypass of defective rows and columns. (The classic redundancy implementations [15] are not acceptable because they are limited to just a few spare lines.) By incorporating the line steering structure of the decoders, the total area of the uniform memory may be calculated as
Here, the array area is just A array = (n + a) 2 (2F CMOS ) 2 . The decoders' area overhead (again assuming tree decoders) is determined by the products of the array's linear size and decoder depths
This expression implies that the bypass implementation requires one additional lithographic wire for each redundant line. Figure 11 and the top lines in figure 9 show the results of calculations using these formulae. They indicate that for the bit yield typical for present-day memory technologies (q ∼ 10 −6 ), the area overhead is virtually negligible, because of the possibility to use large block sizes (e.g., 8 K × 8 K bits). However, as the fraction of bad memory cells is increased (as will certainly happen with scaling down their size), the purely semiconductor memories will experience virtually the same 'swelling' as the hybrid memories. For example, for the realistic Repair Most technique, augmented with the Hamming-code error correction, at q = 0.1% the effective cell area is about 40% larger than its physical area, while at q = 1% it becomes almost three times larger. Moreover, figure 9 clearly shows that at the same bit yield, the hybrid memories can clearly provide higher memory density. Figure 10 . The plots similar to those of figure 6 (but both with the Hamming-code error correction), for a purely semiconductor memory.
Discussion
To summarize, this study shows a clear potential advantage of hybrid memories in useful density, provided that the nanodevice fabrication technology offers a reasonable yield of single cells. Moreover, some nanodevices (e.g., singleelectron latching switches [19, 20, 28] ) promise nonvolatile operation 13 . All these prospects would, however, be insubstantial if the addition of the nanowire/nanodevice layer to CMOS chips resulted in unacceptable power dissipation or memory access slowdown. A detailed analysis and optimization of the power-to-speed tradeoff are beyond the scope of the present work, so we will only provide a crude 64 Figure 11 . The plots similar to those of figure 8, for a purely semiconductor memory.
estimate showing that these challenges may be met by a proper choice of nanodevice resistance. Let us consider, as an example, the case q = 10
and F CMOS /F nano = 10. According to figure 8(b), the Hamming-code error correction on top of the simple Repair Most reconfiguration allows a 4 Tb memory using L = 4 M of 1 K × 1 K blocks. (This optimum is achieved at a = b = 681, so that (n + a) ≈ 1.7 × 10 3 .) We have used the FastCap code [43] to calculate the capacitance of a nanowire fragment of length 2F nano for a simple geometric model of the nanodevice layer (figure 2), in which the both the width and the thickness of the wire, and both the vertical and the horizontal distances between the nanowires, are all equal to F nano . Such a calculation yields C/F nano ≈ 0.48 × 10 −10 ε (F m −1 ), where ε is the relative dielectric constant of the insulating environment. (This number is in reasonable agreement with numerical and experimental results for isolated nanowires [44] .) With ε = 3.9 (corresponding to SiO 2 ) and F nano = 3 nm (a substantially smaller half-pitch would result in unacceptable quantum tunnelling between nanowires), this calculation gives capacitance C ≈ 1.9 fF for a nanowire passing through the whole block 14 . With the open nanodevice resistance R open = 1 M , we find that the nanowire recharging through the resistance (the longest process in the nanodevice subsystem 15 ) would take just a few nanoseconds, quite comparable with the typical access time of contemporary semiconductor memory chips.
In order to estimate the power dissipation in the nanodevices for the same value of R open , we should take into account at least three components. The first component, the dynamic power, may be estimated as P dyn < 2L 1/2 CV 2 f , where the factor L 1/2 is the upper bound for the number of blocks addressed simultaneously; see figure 1. (In practice, this number may be considerably less than L 1/2 because of the memory bus width limitations.) Assuming that the 14 Note that the linear size, 2F nano × (n + a) ≈ 10 µm, of the 1 Mb block corresponds to useful memory density close to 1 Tb cm −2 . 15 The internal switching of a single-electron transistor with such resistance is much shorter [7] . Also, resistance of a metallic nanowire of this crosssection (F 2 nano ≈ 10 nm 2 ) and length (2F nano × (n + a) ≈ 10 µm) is of the order of 10 5 , i.e. much less than the accepted value of R nano . Note, however, that the use of molecular nanowires would make the access time unacceptably slow. read/write voltages V are close to 1 V (a typical value for roomtemperature single-electron devices [7] ), we get P dyn 5 mW even at a high access frequency f of 300 MHz. The static power P open < 2L 1/2 V 2 /R open , that is due to dissipation in simultaneously open nanodevices (one device per block), is equally negligible: with the parameters used above we get P sta 4 mW. More substantial may be the static power dissipation P leak < 2L 1/2 (n + a)(V /2) 2 /R leak due to current leakage through 'semi-selected' nanodevices connected to both (bit and word) addressed nanowires, where R leak is the effective resistance of nanodevices in the closed (e.g., Coulomb blockade) state. Requiring that P leak < 0.1 W, for our case we get the condition R leak 10 M , i.e. R leak /R open 10, that seems quite realistic, at least for molecular single-electron transistors [1] [2] [3] [4] [5] .
To summarize, our results show that a simple combination of array reconfiguration (bad line exclusion) and Hamming-code error-correction techniques may make hybrid CMOS/nanodevice memories tolerant to a substantial fraction of bad nanodevices. However, the fabrication technology still has to reduce this fraction to below ∼0.1% before terabit-scale memory chips, exceeding purely semiconductor memories in bit density by an order of magnitude, become possible. Possibly, this threshold may be moved up by almost an order of magnitude by the development of novel reconfiguration algorithms that would combine the simplicity and speed of the Repair Most exclusion with a higher repair quality (comparable to that of the Exhaustive Search).
