Absfracf-This paper lays foundations for an approach to on-chip row/column allocation that exploits certain properties offered by laterally connected networks of simple threshold devices. As a sample application, it is demonstrated how electronic implementations of these networks can be used as the basis for effective memory array repair systems that require little hardware overhead.
INTRODUCTION
THE decade of the 1980s saw phenomenal advances in VLSI circuit technology, as shrinking feature width and increasing die size allowed unprecedented levels of integration. These advances, unfortunately, also rendered fabrication processes more susceptible to impurity-related manufacturing defects, even as greater circuit complexity left many subsystems "embedded," where they cannot be observed externally or controlled directly As prospects for further technological development reach previously unthinkable levels, it is becoming necessary to develop built-in systems which can automatically repair partially faulty integrated circuits. This paper lays theoretical foundations for an approach to onchip spare allocation in regular VLSI structures such as memories. In high-density DRAM chips, manufacturers commonly include spare rows and columns of memory cells so that rows and columns in which faulty cells are detected can be replaced. Repair by the replacement of rows and columns of memory cells, rather than by replacement of individual cells or entire memory sub-arrays, offers a good compromise between simplicity of reconfiguration hardware and efficient utilization of spare memory. This reconfiguration strategy, however, gives rise to a difficult optimization problem. Given a scattering of faulty cells in a two-dimensional rectangular array, determining whether the array can be repaired using a limited number of spare rows and columns is NP-complete [l] .
The problem of efficient spare row and column allocation in redundant memories has been widely studied [l] , [2], [3] , [4], [SI, [6] , [TI, [81, [9] , [lo] , 1111, and a variety of applicable algorithms are present in the literature. To be considered viable for the purpose of on-chip spare allocation, however, an algorithm must satisfy two challenging and mutually conflicting design requirements. First, it must lend itself to easy implementation using very little hardware, incurring a minimal penalty in terms of silicon area overhead, and second, the algorithm must be capable of rapidly generating highquality solutions to spare allocation problem instances. Previously reported algorithms for spare row and column allocation, most of which were designed for incorporation into dedicated repair systems on integrated circuit production lines, generally fail to meet 23,1993; revised Feb. 8,1995. For information on obtaining reprints of this article, please send e-mail to: transactionsQcomputer.org, and reference IEEECS Log Number C95137. one or both of these criteria. In contrast to these existing methods of spare allocation, the approach presented in this paper offers the combined advantages of high execution speed, simplicity of implementation, and high rates of successful spare allocation in problem instances for which solutions exist.
Section 2 of this paper begins the theoretical discussion by describing a graph-thatic model for the problem of array repair through row and column replacement, relating the problem to that of finding a constrained vertex cover in an undirected bipartite graph. Some rele vant pmperties of laterally connected threshold devices are reviewed briefly in Section 3. Section 4 introduces a class of threshold device networks which is proven applicable to a generalized form of the vertex cover problem, and Section 5 concludes the theoretical discussion with an examination of certain convergence issues. Section 6 demonstrates how electronic implementations of the proposed networks can be applied to the problem of spare allocation in embedded memories. Simulation results indicate that these networks can be made to provide consistently feasible solutions to the computationally intractable spare allocation problem.
GRAPH-THEORETIC DEFINITIONS AND MODEL
We begin by defining some common terms from graph theory, regarding the vertex covers of undirected graphs. DEFINITION 1. Given 
The problem of finding a vertex cover whose cardinality is less than some specified constant is known as the vertex cover problem, and a vertex cover which contains the least number of vertices necessary to cover a given graph is considered optimal. For the purposes of this paper it will prove useful to define a new term, the generalized vertex cover (GVC) problem, as follows. An optimal solution to the generalized vertex cover problem is a vertex cover with the property that is a minimum, where I R, n V, I denotes the cardinality of R, n V,. Finally, we define the connection matrix associated with an undirected graph, a construction which will prove useful in later analyses.
DEFINITION 4. Given an undirected graph G = (V, E ) in which V has
cardinality N , let the connection matrix C associated with G be the N-by-N matrix whose element c,, equals 1 i f a n edge connects vertices i and j , i.e., ife(i, j ) E E, or equals 0 otherwise. It is assumed that the N vertices of G are numbered 1 through N in some arbitrary manner. Note that, by definition, the connection matrix associated with an undirected graph must be symmetric. Furthermore, if no edge in G connects any vertex with itself, then the graph is said to possess no self-loops and the main diagonal of its associated connection matrix consists entirely of Os.
0018-9340/96$05.00 0 1995 IEEE An instance of the spare row/column allocation problem consists of a two-dimensional rectangular array of cells, some of which are designated faulty, and two integers providing, respectively, the number of spare rows and the number of spare columns available with which to make repairs. Any subset of the rows and columns in such an array constitutes a spare allocation scheme. In the context of memory stuck-at fault repair, we define a spare docation scheme to be valid if and only if its constituent rows and columns, taken together, contain every faulty cell in the array at hand; a valid spare allocation scheme is said to be minimal if and only if there does not exist any proper subset of its constituent rows and columns which is itself valid. We define a valid spare allocation scheme to befeasible if and only if the number of designated rows and the number of designated columns each obey their respective upper bounds as set forth under the particular problem instance at hand. Finally, we consider optimal any feasible spare allocation scheme which designates for replacement the minimum total number of rows and columns. Fig. 1 illustrates these concepts. Assuming there exist four spare rows and four spare columns with which to make repairs, the diagram in Fig. la represents an instance of the spare allocation problem. vertices is constructed such that each row and each column of the array corresponds to exactly one vertex, and such that an edge connects vertices i and j if and only if a faulty cell lies at the intersection of the corresponding row and column, then it is easily proven that valid spare allocation schemes for the faulty array exhibit a one-to-one correspondence with vertex covers of the associated graph. Applying the graph-theoretic definition provided earlier, furthermore, it is clear that an instance of the spare allocation problem is easily represented as a special case of the generalized vertex cover problem, in which D = 2, the subset R, consists of all vertices corresponding to array rows, R, consists of all vertices corresponding to array columns, and the constants C, and C, are determined by the number of available spare rows and spare columns, respectively It is worth mentioning that this graph-theoretic model, and hence the results described in this paper, apply only to arrays of two dimensions.
As an example, consider the faulty 1,024-by-1,024 array illustrated in Fig. 2a . Fig. 2b depicts an associated bipartite graph in which vertices 1 through 4 correspond, respectively, to rows 42, 118,629, and 823 of the array, and vertices 5 through 8 correspond, respectively to columns 37, 125,225, and 921. Every vertex cover of this graph can be mapped to a valid spare allocation scheme for the original array. 
RELEVANT PROPERTIES OF LATERALLY CONNECTED THRESHOLD DEVICES
A threshold device network of the form considered throughout this paper consists of highly interconnected simple processing elements, which are numbered 1 through N in some arbitrary manner. The current state s, of threshold device i at any given time is either 0 or 1, and the current state S of the system is the binary vector determined collectively by the states of all N devices.
Given a network in the current state S, its next state S' is brought about by updating the state of exactly one of its constituent threshold devices. Thus, S and S'are separated by a maximum Hamming distance of one. The next state s, of device i is determined by If the matrix W of interconnection weights is symmetric and possesses no nonzero terms on its main diagonal, i.e., if w,, = wi, and w,, = 0 for I I i, j I N, then the value of the function threshold devices, and defining V , as before, we state the following lemma. 
THRESHOLD DEVICE NETWORKS FOR GENERALIZED
A > B1(2 I L i J -') and Bi 2 0 for 1 I i I K. 
VERTEX COVER PROBLEMS
The primary theoretical contributions of this paper are expressed in three theorems, each of which contributes to the eventual d e velopment of methodologies for effective on-chip spare row/ column allocation in VLSI arrays. The purpose of this section is to identify and analyze a class of threshold device networks which is proven applicable to the GVC problem. Consider a network of threshold devices whose energy function is of the form 1) some edge in G has neither of its endpoints in V, and 2) system energy cannot be reduced by rectifying this
We demonstrate that our choice of network parameters ensures conditions 1 and 2 cannot simultaneously be satisfied.
Condition 1 implies that z, < I L, I for some i between 1 and K.
Suppose that switching one of the remaining I L, I -z, threshold devices in subset L, from state 1 to state 0 will have the effect of including in V, an endpoint of n edges whose other endpoints did not previously lie in V,. Doing so will prevent 2n nonzero situation. 2) it does not represent a vertex cover of G which is minimal R over the set oi vertices corresponding to U. E M V C ( S ) = Evc(S)+CBizi2 .
Establishing the usual correspondence between vertices and i=l Condition 1 and Lemma 2 guarantee that S does in fact represent a vertex cover of G. Condition 2 implies that there exists some vertex i in V, whose elimination from the set would yield a new set which is also a vertex cover, and Lemma 1 dictates that the value of E, cannot change as a result. The elimination of vertex i from V, is effected by switching s, from 0 to 1, a process which results in a reduction of system energy given by qz,z -(ZI -1YI assuming that device i is a member of set L,, and that z, members of this set are in state 0 when His in state S. As this reduction is a strictly positive value whenever device i is a member of U, it must be true that S is not a minimum of EMvc Hence, S is not a stable state of H. PROOF OF SUFFICIENCY. Suppose H is currently in some state S which represents a vertex cover that is minimal over the vertices corresponding to U. The next state of H can differ from S in one of three ways. Either 1) the state of some device is switched from 1 to 0, or 2) the state of some device which does not belong to U is 3) the state of some device which belongs to U is switched switched from 0 to 1, or from 0 to 1.
Lemma 1 guarantees that S is a local minimum of E, . Since the sum of the remaining terms in E, , never decreases as the number of Os in S increases, option 1 cannot result in a reduction of system energy. Nor can option 2, since any device which is not a member of U must belong to some set L, for which B, = 0. Because option 3 inevitably ylelds a new state which does not represent a vertex cover of G, it must lead to an increase in system energy, as made evident in the proof of Lemma 2. We conclude that S is a local minimum of E, , and hence a stable state of H.
0
Clearly, a vertex cover which is nonminimal over some vertex set R can never be an optimal solution to the GVC problem with respect to sets R,, R,, ..., X, if X is a subset of R, u R,u ... u &,. The threshold device network of Theorem 1 is naturally applicable to the GVC problem, and a simple method for mapping any specific problem instance onto such a network is apparent. For a graph G with any given choice of disjoint vertex sets, one simply constructs a network whose energy function is of the form of E, , where each c,) is defined by the connection matrix associated with G, and where some set of threshold devices corresponds to each vertex set over whch vertex covers are to be minimized. A positive B, value is chosen for each such set of threshold devices, while B, values for any remaining sets are left at 0, and the constant A is chosen as per the specification of Theorem 1. Aspects of the mapping process are illustrated in Fig. 4 . Given the graph depicted in Fig. 4a , suppose it is desired to generate vertex covers which are minimal with respect to the set of vertices consisting of vertex 1 and vertex 6. The graph's connection matrix, shown in Fig. 4b , is used to determine each c ,~ in the network energy functions of Fig. 4c . Function €, results from the mapping technique described above. Function E,, the product of an alternative mapping process, gives rise to identical stable states and serves to demonstrate that the mapping from a given instance of the GVC problem to a network of threshold devices is not generd y unique.
ALGQRITHMS FOR NETWQRK CONVERGENCE
Given any instance of the GVC problem, it is now possible to construct a network of laterally connected threshold devices whose stable states represent potentidy optimal solutions. Nevertheless, fhe applicability of such a network to spare allocation is limited without some means to ensure that stable states are encountered within a reasonable length of time. This section describes threshold device clocking procedures which, for networks that fulfill the requirements of Theorem 1, guarantee convergence to a stable state in a maximum number of steps.
Theorem 2 describes a procedure which can be preset, in the sense that an inflexible sequence can be established in which devices are to be clocked. It is presented below without proof. It should be noted that a similar scheme was derived in [13] for a less general class of networks.
One drawback to the procedure of Theorem 2 is that it disregards information present in the current state of the network which can be used to inform the choice of which network state to select next. Theom 3 describes an alternative procedure which makes use of this information. Section 2 established that the problem of spare row/column allocation can be modeled as a special case of the generalized vertex cover problem. Theorem 1 went on to demonstrate that, given any instance of the GVC problem, a network of laterally connected threshold devices exists which possesses a unique stable state representing each minimal solution to the instance at hand. Further-more, given a specified network architecture, it is clear that different problem instances can be accommodated by simply modifymg the connection matrix upon which the network's energy function is based. It follows from these results that a programmable network of an appropriate size could provide the basis for an optimization system capable of devising spare allocation schemes for any pattem of faults likely to occur in a given array This section demonstrates how Theorems 2 and 3, when applied to appropriate network architectures, can offer a high probability of encountering stable states which represent feasible spare allocation schemes. Electronic implementations of the resulting systems, though extremely simple, nevertheless offer a high rate of success in solving the computationally intractable spare allocation problem. r By expanding squared terms and relating the result to the general energy function of Section 3, values can be derived for network interconnection weights and bias terms. These values, along with an electronic implementation of the resulting network, are illustrated in Fig. 5 . If the initial state of H, is fixed to the N-vector consisting entirely of Is, then the gradient descent procedure of Theorem 3 allocates spare rows and columns according to what, in effect, is a simple heuristic algorithm. Gradient descent allocates the first spare to cover some row or column containing the maximum number of faults. Spares are assigned thereafter in the same manner, with ties between any row and column broken in favor of that set from which fewer spares have already been allocated. The tendency to eliminate the maximum number of faults with each allocation may even be overridden if the number of spares allocated from one set greatly exceeds that of the other. Intuitively, the system will tend to preserve a balance between the number of spare rows and spare columns assigned, thereby preferring stable states which represent feasible spare allocation schemes.
Theorem 2 can be used as the basis for an even less complex alternative system. Since the preset updating procedure is unable to exploit any heuristic qualities present in a given energy function, however, it is worthwhile to construct a new energy function whose implied network H , is as simple as possible. Let us modify network HI by establishing N threshold device subsets, L, through LN, rather than the original L, and L,. The resulting energy function gives rise to the interconnection weights and bias terms detailed in Fig. 6 . An electronic implementation of network H2, also shown in Fig. 6 , is easily constructed using strictly digital components. Approximating the silicon layout area of an integrated circuit by the number of transistors it comprises, and given an expected maximum fault pattern size of 32 by 32 elements, a network of this kind would require only 0.29 percent of the total area of a 1 MBit DRAM. Consideration must now be given to a means of generating initial states for network H,, and to a strategy for clocking its threshold devices. In order to ensure that neither row nor column re-placements are preferred, and to increase the likelihood of devising feasible spare allocation schemes, it is reasonable to establish an updating protocol which alternates between threshold devices corresponding to rows and devices corresponding to columns. Initial states may be chosen in a random fashion to allow for thorough exploration of the search space, and repeated iterations performed until the network converges to a state representing a feasible spare allocation scheme. Fig. 7 illustrates a paradigm for using a threshold device network to provide for on-chip spare allocation in embedded memory arrays. Faulty cells are located by BIST hardware, and the resulting pattern of faults is used to configure the interconnection weight matrix of an appropriate network. (It is important to note that, typically, fault distributions within large memories can be represented by compacted patterns which are many orders of magnitude smaller.) With programming completed, control hardware resets the threshold devices to some initial state, and begins an updating procedure which brings the network to convergence. The resulting stable state is evaluated to determine whether it represents a feasible solution to the problem instance at hand. If so, a signal is generated which directs reconfiguration hardware to repair the memory as per the suggested spare allocation scheme. Tables 1 and 2 summarize the performance of the heuristic and iterative approaches, respectively under variations of fault pattem size, number of available spare rows and columns, and average number of faulty elements per pattern. Fault pattems were generated randomly through a process which guarantees repairability. 
CONCLUSION
This paper lays foundations for an approach to on-chip spare allocation in rectangular VLSI arrays, demonstrating how properly designed networks of simple threshold devices can be used as the basis for optimization systems which are simple and effective. It should be emphasized that the proposed hardware systems are MOL as powerful as some existing software-based spare allocation techniques. Such existing techniques, however, were developed for utilization by external repair equipment, and the algorithms employed typically require the full resources of a general-purpose digital computer. Hardware implementations would incur far too much area overhead to be considered viable for the purpose of built-in self-repair. In contrast, the systems developed in this paper achieve near perfect success in devising spare allocation schemes, using hardware whose complexity is negligible when compared with a VLSI array of any substantial size.
Earlier research along these lines [14] succeeded in developing threshold device networks which, intuitively, seemed applicable to the spare allocation problem, but which occasionally failed to yield valid spare allocation schemes in practice. While they do build upon these earlier results, the systems described in this pa-per include a number of major refinements which make their actual implementation for the purpose of built-in self-repair an immediate practical possibility. First and most importantly, the general network architectures developed are proven to possess stable states which represent only valid spare allocation schemes. Second, certain special instances of these networks can be implemented in a straightforward manner using strictly digital components. It deserves mention also that Theorem 1, as derived herein, is easily extended to the case of threshold device networks which operate in the analog domain [15] , raising the possibility of alternative optimization system designs that offer much greater speed of operation than those described here. The lay out of the paper is as follows: Section 2 provides the background needed in later sections. We discuss two formulations to present the concept of HNN in Section 3. Section 4 describes the technique. Section 5 provides experimental results for some typi-
INTRODUCTION

